I bit the bullet and got some of these cards inside a PC for development alongside a good Nvidia card. Here are my thoughts:
1. Call of Duty 2 runs amazingly - if you are interested in doing any development on these cards do not buy COD2, because you will not get any work done.
2. Development for these cards is not easy - the average developer will find it difficult to optimize for these cards. However speed-ups of between 12% and 100%+ are possible for a range of algorithms with not too much work.
3. Reaction speed is not necessarily fast. Loading the data into the card, processing and returning the data takes some time. For highly reactive systems, these cards are not particularly great.
4. What makes the difference are algorithms which can be split into (a high number of) distinct sub-units each with a high number of arithmetic operations per unit and little branching.
5. Double precision floating point performance sucks at the moment. If you need it, I would not bother at the moment.
6. Development/Profiling tools included are ok, but imho require more work.
7. It is not possible to create a coherent 'link' between device and system memory with current architectures. All atomic operations MUST be performed on the device and then all results transferred from the device.
Overall view. If a developer is in the top 10% of developers, and understands the following:
How to optimize algorithms, decomposition, branch removal etc.
A good understanding of memory models.
Then, you can probably expect some success on algorithms which involve LOTS of computation. Development time of systems will be much slower than conventional programming, development is in C++ which is inherently more costly.
If you have a high speed trading system which has a few calculations and you want to make it faster, don't even bother, these cards will make no difference whatsoever.
1. Call of Duty 2 runs amazingly - if you are interested in doing any development on these cards do not buy COD2, because you will not get any work done.
2. Development for these cards is not easy - the average developer will find it difficult to optimize for these cards. However speed-ups of between 12% and 100%+ are possible for a range of algorithms with not too much work.
3. Reaction speed is not necessarily fast. Loading the data into the card, processing and returning the data takes some time. For highly reactive systems, these cards are not particularly great.
4. What makes the difference are algorithms which can be split into (a high number of) distinct sub-units each with a high number of arithmetic operations per unit and little branching.
5. Double precision floating point performance sucks at the moment. If you need it, I would not bother at the moment.
6. Development/Profiling tools included are ok, but imho require more work.
7. It is not possible to create a coherent 'link' between device and system memory with current architectures. All atomic operations MUST be performed on the device and then all results transferred from the device.
Overall view. If a developer is in the top 10% of developers, and understands the following:
How to optimize algorithms, decomposition, branch removal etc.
A good understanding of memory models.
Then, you can probably expect some success on algorithms which involve LOTS of computation. Development time of systems will be much slower than conventional programming, development is in C++ which is inherently more costly.
If you have a high speed trading system which has a few calculations and you want to make it faster, don't even bother, these cards will make no difference whatsoever.