Quote from pfranz:
I quickly read this long thread, don't know whether I'm repeating points from someone else.
When programming complex systems like nowadays computers,you never know the outcome of a particular way of programming until you test. And testing shows that often, ways that look cumbersome produce the fastest results.
This totally supercedes the language you are using. So, though I believe that C# wastes machine resources, it can have parts correctly optimized which outperform standard C++.
Let me give you an example. Some years ago I read a study from AMD where they tried to do a simple task, copying memory, as fast as possible. They were just using Assembly.
They began with standard REP MOVSD and compared results with memory bandwidth: speed was much lower,so went on experimenting.
Any x86 programmer knows that REP MOVSD sucks a lot and is there for compatibility only.So he would think that using MOV, some jump instructions, and a bit of loop unrolling, interleaving instructions so that the superscalar pipe doesn't stall, would solve the problem.
That's what the study tried. And it improved the results, which remained far from the bandwidth limit.
To cut a long story short, they ended up examining the cache structure, and adding a loop - before the actual copy - which read a word from each cache line IN REVERSE ORDER to fill up the cache, using some specific AMD instructions (MFENCE, I believe).
This complicated program would nearly reach memory bandwidth.
Had someone written all that stuff in C (or even maybe C#), would have outperformed a straightforward ASM loop with MOV instructions,loop unrolling,instruction interleaving.
Yet I still use ASM in my Visual Basic 6 software, and get improvements in speed, which are quite useful for reducing order submission latency and dealing with many symbols data in fast markets, even on old hardware.