Brain usage helps you a lot. With some thinking you may get quite most of the answer sorted out without asking.
* I do not think that Cuda style systems are applicable to most algorythms (and that is what Nvidia does). Problem is that the calculation of the indices may be accelerated, but all the trading simulation is harder. I think the benefit may be minimal. The main problem here is that most algorythms are not exactly computing intensive in absolute terms.
* Network distribution may work, if one has a network to distribute (and most traders do not). Seriously, the range where this makes sense is really high - first you can upgrade your main system to a dual opteron (now 8, soon 12 cores) and take advantage of the processing power ALSO for other things. Getting a network (of mulitple4 core systems)... what you want to use them for otherwise?
No system I know of supports either. I plan writing my own framework, and I may support the second option. Not the first, because It makes little sense to support CUDA and another langauge or strategies, and one can not rely on cuda eing there, and second, as I said, I think the benefit is not that great. Network - well, happens I HAVE a couple of computers lying around. But before I do that it must make sense - which I do not see (mostly because I do not really think parameter optimization is a good approach, and if I want to try it out occasionally, can wait. Helps that I currently have an 8 core server

But at least I can tell you that I have around 7 servers running all the time anyway with low resource usage... CPU wise (pretty RAM loaded). In this case network distributed backtesting may make sense.