Just another trading platform - But this time different!

LeeD · Jan 20, 2011

Quote from Steven.Davis:

You make an important point LeeD about response time. Garbage collection causes major response time problems. Being per process, it halts all of your threads, and can take long enough to cause communication issues beyond latency.
Accessing a trading system class using virtual members or events works out about the same.

Another approach is used in Windows event processing. Windows generates hundreads of different events. If each application and process had a handler for each of the events, it would have a potential to bring to it's knees most powerful PCs. Instead each application cheks if the current event is among the dozen of events it wants to handle and passes flow control on. To build on that, MFC (Microsoft GUI for native C++) in Visual Studio has wizards that add handlers only for the events a user wants. Feels almost like implementing a virtual method but without overhead.
Another approach is used in C++ templates. When it is known at compile time what exact class is involved, the compiler will optimise away any event handlers that do nothing together with the code necessatry to invoke them. However, this would require compiling the code that invokes event handlers together with the handlers. Perhaps, this is what you are referring to when you say:

Quote from Steven.Davis:

One would have to include a largish chunk of platform code into the DLL to avoid the run-time costs of extensibility.

NetTecture · Jan 20, 2011

Quote from Steven.Davis:

You make an important point LeeD about response time. Garbage collection causes major response time problems. Being per process, it halts all of your threads, and can take long enough to cause communication issues beyond latency.

That is a very good argument.

Oh,. correction. It WOULD be a very good argument IF it would be true. Sadly for you, it is not. There is a server garbage collector that is standard for server apps and can be anabled by the app config file for desktop programs that runs WITHOUT halting the threads.

Let me point you to some enlightment:

http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=621

The main issue with GC interrupting a program is a stupid developer not knowing his platform. The server GC requires a little more memory, but it works very nice without interrrupting program flow.

In your case, you possibly should manually enable the concurrent workstation gc

Multi processor machine, btw., is multi core (able to put the GC on a separate core so it really runs in the background). Works like a charm here.

Not saying .NET is perfect for trading, but it is definietly good enough for anyone not trying to win the millisecond game.

mhtrader · Jan 20, 2011

Quote from ScoobyStoo:

If you want any respect (or response) from the guys in this forum that actually know what they are talking about, I suggest you do some basic research first.

The .NET JIT compiler produces bytecode that is specific to the machine's CPU architecture. In some instances it can make use of CPU optimisations which are very tricky to implement using C++. Thus in some instances .NET code will actually run faster than compiled C++.

ScoobyStoo,

Sorry, but your comments are incorrect.

.NET compilers outputs IL(you could call it bytecode if you are not very technical). Java compilers compile into ByteCode too. Jitters compile ByteCode into assembler. Jitters don't produce ByteCode.

So right of the bat, everything you write in .NET or Java is always compile twice. One when you compile it. And one when it is going to be executed( it is true, the second compilation is cache and not done again until you changed the source code, but editing the source code of your strategy is what you are doing all day when you are working on it ).

The following is not true, or at least it is not true in 99% of the cases"

In some instances it can make use of CPU optimisations which are very tricky to implement using C++.

I have examined the source code of .NET since it came out in version 1.0. The source code project was called rotor. If you have some time do the following homework:

1.Write a for...loop in C# with nothing inside.
Write the same for...loop in C++ with nothing inside.

2. Dis-assemble the C# into IL(bytecode as you calle it. The For empty for loop is there in IL. Run the code and dis-assemble the running assembler generated by the jitter: the empty for-loop is still there!

3. Dis-assemble the C++ code. The for-loop is gone!

And this is just 1 example. I can give you 20 more examples of optimizations a C++ compiler can do that cannot be done by any jitter.

You have other constructions in C#/.NET like generics that are "resolved" at run-time almost "a la scripting language". In the other hand C++ compilers will resolve Templates( from where generics was copied into .NET) at compile time and will run at lightning speed.( if you don't understand what I'm talking about, please do some research ).

Thanks,
~mhtrader~

mhtrader · Jan 20, 2011

I can tell you that no .NET will be faster than for example TS EL. EL is compiled into 32bit assembler. Saying that: no .NET can compete there. I'm pretty sure ppl are most amazed about their speed more than yours. And some of these ppl are/were company owners trying to replicate TS EL speed.

Anyway fullautotrading, I don't have anything against your "programs" and you are welcome to defend them. But do it with facts and with something that can be measure and observed.

Thanks,
~mhtrader~

Quote from fullautotrading:

Right .NET is perfectly fine for the purpose.

Some of my "users" are still wondering how i can process so fast amount of data like 100 or more years of millisec tickdata without minimally blowing up or taking much memory and with such incredible speed in a totally graphic environment. Well .NET is the answer (clearly must also know what you are doing) !

Especially if you work with Windows, it is for sure currently the best way.

Tom

Steven.Davis · Jan 20, 2011

Quote from NetTecture:

... There is a server garbage collector that is standard for server apps and can be anabled by the app config file for desktop programs that runs WITHOUT halting the threads.... The server GC requires a little more memory, but it works very nice without interrrupting program flow. ...

My experience is that large object garbage collection is a huge stability problem. If you are caching heavily, then the peak activity of the day (market close) involves huge amounts of allocating and sometimes large object reallocating. We tried workstation concurrent and server garbage collection, but it didn't matter. The incoming data feed threads have to wait for the caching object to be reallocated. We were able to workaround this, but if we weren't already committed to .Net, we would have switched to C++ at that point.

Steven.Davis · Jan 20, 2011

Quote from LeeD:

Another approach is used in Windows event processing. Windows generates hundreads of different events. ...

You were right. I wasn't really thinking about using Windows events. My experience is with using APIs with callbacks and reading multicasts.

My point was quite minor. The custom trading logic can either run in-process with the platform which is fast, or run in a separate process. In that case the separate process, for efficiency sake, should implement a strong supporting API for the custom trading logic (like data lookbacks and order validation.) If not, then the execution of the custom trading logic will be delayed by the interprocess communication. Allowing custom trading logic in the same process (.Net CLR execution environment) is faster, but more appropriate on a client's computer than on a server which may be hosting other trading systems.

mhtrader · Jan 20, 2011

Hi Lee,

Thanks for the comments. Of course it is "for sale" : )

This is how all "server" software is created. The developer's PC acts as a server for the developer. Once software is ready, it's deployed to the server.

It is not clear to me what you said. But the point is that user strategies can run on the client computer and on the server. We are coming with a big library of strategies/programs and can be used as reference. It is impossible to do certain calculations on the client-side. Basically having 10K symbols ticking at the same time and running strategies on it( no just plotting, but doing real calculations ) is almost undoable without falling behind or without crashing. We have most of the conditions solved on the server side.

Every platform that accepts trading strategies compiled in a DLL does it. Starting with TradeStation that did it over 10 years ago...

Not true. TS doesnât produce DLLs. TS compiles and save the things in their own storage. I think they use Microsoft structured storage.

On another topic, I don't share your frouning upon .NET and Java. That's true that it's not the leading technologies in HFT. Because of processes happening internally in the virtuial machine it is near impossible to guarantee response time. But then Windows is handicap too because of mutiple processes having to share the processor and so is Linux.

I have nothing against them. They are wonderful environments. But after my years of experience working on these things, trying .NET and profiling itâ¦ I can tell you .NET is not the way. There was a very famous company that re-wrote their platform some years ago in .NET and they have to reverse it back. I don't recall the name right now. But I can look it up for you.

Event-based approach was a part of QuantFactory and OpenQuant for over 5 years and is a part of current NinjaTrader.

That's a good thing. I'm not sure how they have it implemented. Do you think they simulate the whole thing during back-testing like if it were real-time? Because that is the main problem I see on the existing platforms right now. The back-testing "events" are somehow always different than the real-time events. And it defeats the purpose of most automated strategies that were deem as "successful" when you back-test them.

If you are talking about simple things like aggregation (say, building 60-tick bars from tick data), specialised database products do aggregation in a shorter time than it takes a layman's application read the aggregated data from disk (and these databases cache things too). Further, the most time-consuming calculations are likely happening inside the user's trading strategy. Do you actually parse and optimise the user's code before it is compiled so that it can take advantage of caching? This is very non-trivial task with ahigh risk of making the resulting code actually slower.
Also every platform that allows optimising trading strategy parameter uses some sort of caching for data. If users choose to do some extra work, in a number of platform they are allowed to manually cache further intermediate calculations. To reiterate, this feature would only be new if it could parse user-written code and automatically decide what to cahce based on the logic.

I'm not talking about data, but about user written code. To give you a hint: results from a user written code that don't use a random function inside can be cached based on certain input variables (donât take it literal as inputs, but as the math concept).
About optimizing the user code BEFORE it is compiled... never heard of that

. But yes we do code analysis and data analysis of the code. If you want this calculation Calc(x,y,z,x'....) x can be symbol, y interval, z can be range and x' can be for example your own value. You can request for the cached values if they are available. And you can make them cachable, but you have to specify that you want to do it.

Distributed optimisation... This is something NeoTicker has been doing for 6 years or more.

I didn't know. Thanks for the info.

I don't mean to be negative. Creating a trading platform from scratch is an interesting and challenging endevour. Good luck making a profitable enterprise too.

Thank you

OK, I don't know a platfrom that has all these features in one but is it enough for unique selling point? [/B]

That's the point.

LeeD · Jan 20, 2011

Quote from NetTecture:

Not saying .NET is perfect for trading, but it is definietly good enough for anyone not trying to win the millisecond game.

That's the reason I specifically mentioned HFT. Just as benchmark of what is considered "low latency", from http://www.rithmic.com/home.html:

Rithmic's trade execution software delivers to you the low latency and high throughput performance formerly seen only by the very large trading houses and boutique hedge funds - Tick-to-Trade in less than 250Âµs.

To be fair, I don't think 0.25 milliseconds is where the OP is aiming... but it was worth mentioning.

mhtrader · Jan 20, 2011

â¦.â At run time, the JIT knows whether or not it can make use of SSE or 3DNow instructions. Your executable will be compiled specially for P4, Athlon or any future processor families. You deploy once, and the same code will improve along with the JIT and the user's machine.

These were Microsoft selling points when they came out with .NET around 2001/2002â¦. I used to belive it at the timeâ¦lookâ¦ here you are mentioning Pentium 4. I had my last P4 computer was around 10yrs agoâ¦ these are all pure selling points from Microsoft. They were really scare with SUNâs Java. By the way .NET started as a different project called COM3. All written as the next version of COMâ¦ that never came out. You can only image the engineering disaster that this created inside some parts of the .NET implementation because they have to âadaptâ old source code to their brand new technology. By the way the instruction sets you mentioned were used mostly by multimedia apps, and graphic apps like CAD apps. Never hear of a trading strategy on need of these instructions, but it can be possible.

* Optimizing away levels of indirection, since function and object location are available at run time.

This is mental gymnastics : )

* The JIT can perform optimizations across assemblies, providing many of the benefits you get when compiling a program with static libraries but maintaining the flexibility and small footprint of using dynamic ones.

This is more mental gymnastics. Optimized Dlls interact the same way as what they are referencing there.

* Aggressively inline functions that are called more often, since it is aware of control flow during run time. The optimizations can provide a substantial speed boost, and there is a lot of room for additional improvement in vNext.

It means they may be caching up with C++ compiler when it comes "Aggressively inline functions". C++ compilers can even inline virtual functions based on the context. While they claim something like this in .NET, it is not 100% true.

It is also true that the JIT optimizer is correctly limited to shallow, high-return/low-cost strategies. This limitation comes from the fact that the user is already waiting so it is counterproductive to spend a few seconds to save a few milliseconds.

The reason is that in the moment you need to run somethingâ¦. the jitter will be in the middle and it is just another compiler that is going to compile the code that you already compiled!!

Dr Dobbs had an article showing different primitive benchmarks (we are talking about trading systems which do alot of math and not much else.) From the first few figures, C++ on Linux kicked butt. C++ on Windows was usually 2nd, C# 1.1 3rd, C# 2.0 4th, and all Javas were far behind.

Great!

I am a fan of C#, but I have seen platforms before that were extensible using C++ for performance sake. More power to him.

I am a fan of C# too and a fan of Anders Hejlsberg. And trust me .NET( I mean the .NET source code ) have served me as inspiration for a lot of my previous work.

Nobody can disprove any of these facts.

LeeD · Jan 20, 2011

Quote from mhtrader:

This is how all "server" software is created. The developer's PC acts as a server for the developer. Once software is ready, it's deployed to the server.

More...

It is not clear to me what you said.

Let me give an example. Oracle database engine is not just a database but what they call "application server", that is you can develop and run complex code that will run inside the server and act as a part of database queries.

The way to develop such a code is to install a full-blown Oracle server on a developer's PC. (It's a developer's version, which is free and only differs from the "real" server in the number of users that can simultaneously conect to it and other restrictions like that.) Then software is developed and tested on a developers PC and then installed on an actual server.

10 years ago servers were a different platform, ran a different operating system and making software developed on a Desktop work on a server was quite an effort. These days most servers are the same PCs that run more powerful (but otherwise compatible) hardware and have an operating system that is indistinguishable from that on desktops for most practical purposes.

So, making sure something work both on a desktop and on a server is usually a trivial task.

Quote from mhtrader:

But the point is that user strategies can run on the client computer and on the server.

The fact that you mention it as a big deal implies, unlike Oracle, you are not giving desktop users the same platform code that runs on the server... which likely means users will have to struggle with things that work slightly differently.

Quote from mhtrader:

We are coming with a big library of strategies/programs and can be used as reference. It is impossible to do certain calculations on the client-side. Basically having 10K symbols ticking at the same time and running strategies on it( no just plotting, but doing real calculations ) is almost undoable without falling behind or without crashing. We have most of the conditions solved on the server side.

Depending on what exactly a trader is doing, it's often unnecessary. Instead of tracking 10,000 symbols on 1 "server". You might prefer to track 1,000 symbols on 10 desktops, which is doable with "retail" software. The additional advantage is if one lags behind others will keep running.

I don't doubt tracking 10,000 symbols on one machine is quite a feat of strenth. It's just one thing some people may be prepared to pay up for and others may not.

Quote from mhtrader:

Not true. TS doesnât produce DLLs. TS compiles and save the things in their own storage. I think they use Microsoft structured storage.

I was saying you could compile a DLL using visual C++, Delphi or other favlurite tool and load a trading strategy implemented in this DLL into TradeStation.

Not only this gives a choice of programming language sto suit any taste but also specialised developer's tools tend to have far superior debuggers, profilers and whatever alse a serious programmer needs.

Quote from mhtrader:

There was a very famous company that re-wrote their platform some years ago in .NET and they have to reverse it back. I don't recall the name right now. But I can look it up for you.

I would very much like to read about them. Can you please look it up?

Quote from mhtrader:

Do you think they simulate the whole thing during back-testing like if it were real-time? Because that is the main problem I see on the existing platforms right now. The back-testing "events" are somehow always different than the real-time events. And it defeats the purpose of most automated strategies that were deem as "successful" when you back-test them.

Most platforms don't have millisecond precision. The resolution of tick data is one second. This means there is no way such platforms can simulate, for example, 300ms it takes to send an order and receive confirmation it has been placed on an exchange.

Another issue is there are things that occur in real trading that platforms don't tend to produce on a backtest. For example, rejected orders.

However, I know at least one platform that can simulate that it takes time for an order to be cancelled.

Do you suggest your platform can simulate all the unusual states that may occur during real trading? Rejected orders, loss of network connectivity, realistic delays when orders are placed and cancelled, lag spikes when the time it takes to place, modify or cancel order unexpectedly increases etc...

Quote from mhtrader:

I'm not talking about data, but about user written code. To give you a hint: results from a user written code that don't use a random function inside can be cached based on certain input variables (donât take it literal as inputs, but as the math concept).
About optimizing the user code BEFORE it is compiled... never heard of that . But yes we do code analysis and data analysis of the code. If you want this calculation Calc(x,y,z,x'....) x can be symbol, y interval, z can be range and x' can be for example your own value. You can request for the cached values if they are available. And you can make them cachable, but you have to specify that you want to do it.

This is exactly my point!

Imagine you run optimisation on a strategy. Every run you may change parameters but you know the market data will be the same.

Let's start with an example for the sake of example. Imagine moving average crossover strategy. You want to keep the "short" MA period fixed and optimise over the "long" MA period. Because historical market data doesn't change and the "short" period doesn't change the "short" moving average can be cached.

The problem is the platform wouldn't know it can be cached unless the strategy exposes moving average indicator as a separate component. If it does, there are platforms that allow calculating "short" MA and further presenting it as data. I understand that's what your platform is effectively doing. It is nothing new.

If you could figure out from the strategy code what should and what should not be cached (for MA as a simple algorithm caching will likely make it slower), that would have been a great step.

There is a class of platforms that can figure out what parts of the strategy "produce" can be chached without user hinting at it... but these are platfroms that generate a strategy themselve. So, they have full control over what a strategy doing.