500 million sounds very nice. The question is also, how much logic calculation is done per data point and that depends on the strategy individually.
I think to iterate simple Arrays (in C or C++) could be the fastest method. How do you do it?
My optimization software is written in Java. It took me a number of years to refactor it in various ways to squeeze all that processing speed. There is nothing particularly magic about it. Just standard software engineering practices:
-- efficient use of data structures (maps, queues, sets, lists)
-- eliminating the processing bottlenecks (with the use of a CPU profiler)
-- engaging all CPU cores to the full capacity, with good use of multi-threading
-- ensuring that there is no disk I/O (beyond the initial loading of the data set)
-- making it compute-bound as much as possible, relative to memory-bound
-- caching everything that can be cached
-- eliminating the unnecessary repetitions
-- identifying what's computationally expensive, and refactoring it
-- simulating GPU on CPU (think of running a chuck of tasks all at once, rather than one at a time)
-- minimizing the memory foot print, the scope, and the immutability of objects
My data sets are huge (about 70 million bars per symbol), so even with that speed of 500 million bars per second, some optimizations run for hours.
In my typical trading strategy, there could be 5 parameters. Let's say we want to test the range of [1..10] for each parameter. This gives us 100K parameter permutations to back test. Each of this permutation has to be applied to the 70 million bars, so we have the total of:
100,000 * 70,000,000 = 7 trillion passes
With the speed of 500 million passes per second, it would take about 4 hours to complete the optimization. For some optimizations, I let them run for days.
So, there is a combinatorial explosion (i.e. "the curse of dimentiality") to fight, and the over-fitting effects to address. I have a number of techniques to deal with both. For the combinatorial explosion, it comes to the use of "smart" optimization techniques (as opposed to the brute-force optimization). For overfitting, it's about carefully choosing the cost functions (i.e. performance metrics), and performing the cluster analysis of the optimization space (looking for broad, sustained regions of elevated performance).