Java vs C++ or C#

Rodney King · Aug 6, 2010

Quote from byteme:

Why don't you ask them yourself?

Actually, I will. I talk with their salesguy now and again, and I've met a number of people at different levels in the company.

byteme · Aug 6, 2010

Quote from Rodney King:

Actually, I will. I talk with their salesguy now and again, and I've met a number of people at different levels in the company.

Congratulations.

gtor514 · Aug 6, 2010

Quote from januson:

Kind of funny to keep reading these occasionally.

This goes to both CPtrader and you! ->

I can guarantee you both that it will not be the language that creates the bottleneck, Java, C# or C++ , same same....

What you really should be worrying about is the tick/ trade/ quote-rate at which your provider can deliver his feed and how many messages your trading- pc/ server/ portable can process.

What you read about.. take for instance... StreamBase, Esper that announce events up to 500.000 per second, this is nothing compared to doing it yourself.
I will soon post my performance measures of commercial CEP contra Homebrewed CEP.

Feel free to inspect the comparison between c++, c# and java:
http://www.tommti-systems.de/go.htm...ain-Dateien/reviews/languages/benchmarks.html

Remember to do your homework before posting such content

You should actually do some programming before posting such crap.

This is taken directly from the link you provided...

"Cpp is the fastest, except the STL "hashmaps" test is a lot slower, maybe someone can take a look at the source code, I use STL <map>, because <hash_map> was slower.... The memory footprint of Cpp is also the lowest, which was expected."

Seems like you confirmed my point. But I would question the results of anybody who needs to have his code checked. You can do any search and get results that show for benchmark calcs C++/stl is faster with lower memory footprint.

I do agree with this statement though

"What you really should be worrying about is the tick/ trade/ quote-rate at which your provider can deliver his feed and how many messages your trading- pc/ server/ portable can process."

Rodney King · Aug 6, 2010

Quote from byteme:

Congratulations.

Did you mean to say, "condolences"?

nbates · Aug 11, 2010

I think it depends on whether you want to develop a "home owner" ATS or a Mission Critical "real-time" system...

My Experience started in '73 before Software & CPU's existed and for 10 years or so I was a hardware design engineer [analog & digital as chips were born], from there there were Bit-Slice State Machines in hardware with ROM's, then soon to follow "wow & behold" there were CPU's with the first Rudimentary OS's and Drivers to My hardware written in ASM, later there were higher level languages, like; Fortran, Pascal, C, BLISS and eventually C++, Java and now .NET.

The thing to consider is there is a balance between hardware, software, the OS and CPU + MEMORY + I/O & GRAPHICAL SUB-SYSTEMS...we can ignore Hardware Acceleration for the time-being.

...all of the higher level languages simplify "coding" but they fail in all areas of performance when it comes to ATS Design because they ALL have a "home-owner" personality...that is excluding ASM, C and C++...and the reason I say this is the following;

a) Java & .NET are written on top of C++
b) C++ is good if you know what you're doing and use it like C or ASM
c) Under the hood with Java & .NET they use the Standard Convention; "String, COM Object, STL Containers"
d) STL Containers individually allocate pointers and for a 4 byte pointer (x32) there is a Heap Allocation preamble and postamble of data, say for argument 24 bytes (often it's more and we can discuss specifics, but not to rat-hole yet).
e) Map and Set look-up is done with Red-Black or Balanced Tree's and each item is allocated "individually", so if you're allocating 500,000 Instruments [by symbol] with a pointer to an instrument Object-class associated, you have 'N' number of bytes [plus overhead] for the string and 4-bytes [plus overhead] for the pointer. And include; one-minute, five-second, one-second price time-series on all instruments and full trade-history on ALL those Instruments in STD Containers. That's a lot of memory and a hell of a lot More Wasted due to small object Allocation overhead!
f) Notoriously, STD Map & Set walk thru all of the keys to FIND using LowerBound [Less Than Compare] which is slow as hell.
g) Some Genius may say "No, they use an Unsorted Map"...well they don't, but even if they did they are STILL doing a String Compare on a discretely allocated element.

What I do in C++ is the following (example);

a) Create a "custom" in-place String Class-object, which has two personalities; a) a Byte array, and b) an Integer array [of Modulus 4 and Aligned on the Native Boundary].
b) Use Custom Map & Set, which are Hash based in 2x Dimensions with Nodes allocated in a Flat Contiguous Memory region [which may & can dynamically re-size].
c) String [in Integer format] Hashing is done by Integer to pipeline the CPU and Key Comparison is done similarly.

With these techniques, which can only be done in C++, C or ASM there are at least 4-5x ORDERS OF MAGNITUDE the performance of the same thing done in .NET, C# or Java.

Note:

One order of magnitude = 10x
Four orders of magnitude = 10,000x
Five orders of magnitude = 100,000x

If you want benchmarks let me know, but that'll be for $'s, here's it's just sharing & play

Pippi436 · Aug 12, 2010

http://www.youtube.com/watch?v=gLDFQ_IhnDc

Pretty much settles C# vs. Java

dcraig · Aug 12, 2010

Quote from nbates:

With these techniques, which can only be done in C++, C or ASM there are at least 4-5x ORDERS OF MAGNITUDE the performance of the same thing done in .NET, C# or Java.

I don't believe ya! It also depends on what the "thing" that you are doing is. 4-5 order of magnitude difference is right off the deep end - well and truly. Some things are actually faster in Java than C++ - object creation, I believe, being one of them.

As an indication, with Java, I can easily deal with several hundred stocks real time using IQFeed which is supposed to send every tick, creating a new tick object for each tick and queuing them in standard Java thread safe queues. This uses a trivial amount of CPU on a Q9550. Fast markets - no problem.

I really don't understand what you are talking about with Java collections. There is no COM involved and no C++ STL either, as far as I know. I think the implementation is pure Java.

Java is perfectly fast enough for most things including most ATS. Even garbage collection pauses, which has been held to be it's achilles heel for real time work really are not very significant on modern CPUs.

walterjennings · Aug 12, 2010

Imo. You'll probably want to use C# for most of your coding (its a treat to code in), then use something similar to Codeanalyst to figure out where your bottle necks are (if you need a performance boost). Then fix any bottle necks by coding those most used / slowest functions into a DLL using C/C++/ASM etc. Same argument / solution from back in the ASM vs C days. Its really a question of how to get the quickest code which is the easiest to work with / fastest to refactor.

januson · Aug 12, 2010

Quote from nbates:
a) Java & .NET are written on top of C++

Please google this, before writing

, it is a truth with modification when speaking about C#

Please read this regarding performance between C++ and C#: http://www.codeproject.com/KB/cs/CSharpVsCPP.aspx

dloyer · Aug 12, 2010

One performance trick to working with any language, C++, Java, C# is to avoid object creation. It's not the cost of allocation or GC, its the cost to access large memory arrays that dont fit in the CPU cache.

Modern CPU's are much faster than their memory. They stall for many, many cycles for each cache miss. Most of the CPU transister budget is allocated to reduce this with large caches and lots of ticks.

GPU's solve the problem differently by having lots of threads ready to execute to hide memory access latency and have little or no cache and spend the transistors on more cores.

So, for example, rather than using String's and split to parse a message, use byte arrays that can be updated in place. You really want to avoid random memory access over large data structures, at least in the inner loops.

My back test platform is written in Java and can process > 7 million ticks per second. But it is also distributed across 8 servers.

For trading the data rates are low enough, even for hundreds of symbols that is does not matter much compaired to the network latency unless you are colocated. Even then, you can model anything that happens "in process", with out disk access as taking close to zero time.