I think it depends on whether you want to develop a "home owner" ATS or a Mission Critical "real-time" system...
My Experience started in '73 before Software & CPU's existed and for 10 years or so I was a hardware design engineer [analog & digital as chips were born], from there there were Bit-Slice State Machines in hardware with ROM's, then soon to follow "wow & behold" there were CPU's with the first Rudimentary OS's and Drivers to My hardware written in ASM, later there were higher level languages, like; Fortran, Pascal, C, BLISS and eventually C++, Java and now .NET.
The thing to consider is there is a balance between hardware, software, the OS and CPU + MEMORY + I/O & GRAPHICAL SUB-SYSTEMS...we can ignore Hardware Acceleration for the time-being.
...all of the higher level languages simplify "coding" but they fail in all areas of performance when it comes to ATS Design because they ALL have a "home-owner" personality...that is excluding ASM, C and C++...and the reason I say this is the following;
a) Java & .NET are written on top of C++
b) C++ is good if you know what you're doing and use it like C or ASM
c) Under the hood with Java & .NET they use the Standard Convention; "String, COM Object, STL Containers"
d) STL Containers individually allocate pointers and for a 4 byte pointer (x32) there is a Heap Allocation preamble and postamble of data, say for argument 24 bytes (often it's more and we can discuss specifics, but not to rat-hole yet).
e) Map and Set look-up is done with Red-Black or Balanced Tree's and each item is allocated "individually", so if you're allocating 500,000 Instruments [by symbol] with a pointer to an instrument Object-class associated, you have 'N' number of bytes [plus overhead] for the string and 4-bytes [plus overhead] for the pointer. And include; one-minute, five-second, one-second price time-series on all instruments and full trade-history on ALL those Instruments in STD Containers. That's a lot of memory and a hell of a lot More Wasted due to small object Allocation overhead!
f) Notoriously, STD Map & Set walk thru all of the keys to FIND using LowerBound [Less Than Compare] which is slow as hell.
g) Some Genius may say "No, they use an Unsorted Map"...well they don't, but even if they did they are STILL doing a String Compare on a discretely allocated element.
What I do in C++ is the following (example);
a) Create a "custom" in-place String Class-object, which has two personalities; a) a Byte array, and b) an Integer array [of Modulus 4 and Aligned on the Native Boundary].
b) Use Custom Map & Set, which are Hash based in 2x Dimensions with Nodes allocated in a Flat Contiguous Memory region [which may & can dynamically re-size].
c) String [in Integer format] Hashing is done by Integer to pipeline the CPU and Key Comparison is done similarly.
With these techniques, which can only be done in C++, C or ASM there are at least 4-5x ORDERS OF MAGNITUDE the performance of the same thing done in .NET, C# or Java.
Note:
One order of magnitude = 10x
Four orders of magnitude = 10,000x
Five orders of magnitude = 100,000x
If you want benchmarks let me know, but that'll be for $'s, here's it's just sharing & play
