I use Ami broker for intraday bar testing. Testing 1 system variation over 1 year of 5 minute bars for 800 symbols (about 15 million bars) takes about 15 seconds on a Intel i7. This is under 1G of data and is cashed in memory.
For tick backtests, I was not happy with anything I could find, so I built my own. I built my own backtest software that runs on a 10 node server cluster.
It can test several thousand system variations over a data set of 1 billion ticks in 2-3 minutes. This is about 100GB of data and spreads the io across 30 disk drives in the cluster.
I have also dabbled with GPU programing. I was able to process the data set I use with AmiBroker in just 5 ms, or 3000 times faster than Amibroker. But, it needed a lot of dedicated c++ code.