How to build an automated system

gmst · Oct 17, 2013

Quote from bwolinsky:

These are two different definitions of "automated system" and we want the strategic not programmatic explanations from you, hft_boy.

Bowo - your viewpoint is too narrow. There are many people on this board who have moved beyond MC and want to develop their own trading platform - since having your own trading platform liberates you from the inadequacies of off-the-shelf-products like MC, TS, NT etc.

I understand you would rather spend time learning and developing strategies in MC, and thats fine with me. But it will be great if you can also understand that there are quite a few people on this board who are very interested in threads like this.

So, thanks for your time and consideration.

slickpick · Oct 17, 2013

Quote from vincegata:

Thank you both for responding.

@slickpick - so I have a producer (Data Adapter) that reads the data from the Internet over the sockets when in live mode, or it reads the data from the database / text file in the backtest mode. So I am using the same code for live and backtesting. Producer is housed in a separate process and it sends the data to the consumer (Strategy Execution System) over the named pipes (FIFO). This set up works well in live mode (I am not after HFT) but it's slow during the backtesting when I run the app through millions of records. Hence, I am looking to replace FIFO with either shared memory or using callbacks aka Observer Pattern.

What do you use for your producer-consumer to communicate? I know some people on this forum use shared memory, others use messaging such as ZeroMQ, yet others use Observer.

So first question I have for you, is how are you reading the data? Are you loading all records into memory? Or iterating line by line?

I use callbacks via observer/observable for the backester in our production trading system, that being said majority of my research is done in either R or Python in a vectorized fashion. Not sure what your production system is coded in but you run into difficulty iterating over sets with more traditional languages.

vincegata · Oct 17, 2013

Quote from slickpick:

So first question I have for you, is how are you reading the data? Are you loading all records into memory? Or iterating line by line?

I use callbacks via observer/observable for the backester in our production trading system, that being said majority of my research is done in either R or Python in a vectorized fashion. Not sure what your production system is coded in but you run into difficulty iterating over sets with more traditional languages.

Just checked, I read the records line by line... So the disk reading could be the bottleneck.

I am also leaning now towards using callbacks. Have you ever looked at QuickFix, I wonder how they implemented SocketInitiator class. I know code is all there but it's such a spagetti of code it takes time to figure it out.

I use C++ on Linux to develop the platform and algos. I've done MATLAB and R (not Python), but I am doing research with C++ which is slower but when I do find something I do not have to translate it to C++.

slickpick · Oct 17, 2013

Quote from vincegata:

Just checked, I read the records line by line... So the disk reading could be the bottleneck.

I am also leaning now towards using callbacks. Have you ever looked at QuickFix, I wonder how they implemented SocketInitiator class. I know code is all there but it's such a spagetti of code it takes time to figure it out.

I use C++ on Linux to develop the platform and algos. I've done MATLAB and R (not Python), but I am doing research with C++ which is slower but when I do find something I do not have to translate it to C++.

Being read from a database or something else? For tick data I like HDF5 and for less granular data flat files are fine (we are far too cheap to pay for kdb). Iterating over a text file shouldn't be your bottleneck I don't think.

I've only briefly glanced at QuickFix, nothing really serious.

To be honest, I think C++ is an awful environment for research. The big difficulty I see is iterating over variants and it's not exactly flexible either. Though I do think the upside is that your strategy is isolated.

vincegata · Oct 17, 2013

Quote from slickpick:

Being read from a database or something else? For tick data I like HDF5 and for less granular data flat files are fine (we are far too cheap to pay for kdb). Iterating over a text file shouldn't be your bottleneck I don't think.

I've only briefly glanced at QuickFix, nothing really serious.

To be honest, I think C++ is an awful environment for research. The big difficulty I see is iterating over variants and it's not exactly flexible either. Though I do think the upside is that your strategy is isolated.

I just read from a text file right now. e.g. EUR/USD tick data file for one year is around 1GB with 75mln+ records.

MATLAB and R, I do not know about Python, are slow comparing to C++. That's probably the main reason I am using C++. What's taking MATLAB hours may take C++ only 15min. The other reason is that I do not have to translate my code later from MATLAB/R to C++. MATLAB/R do have that nice vector operations that cut down development time.

bwolinsky · Oct 17, 2013

Quote from gmst:

Bowo - your viewpoint is too narrow. There are many people on this board who have moved beyond MC and want to develop their own trading platform - since having your own trading platform liberates you from the inadequacies of off-the-shelf-products like MC, TS, NT etc.

I understand you would rather spend time learning and developing strategies in MC, and thats fine with me. But it will be great if you can also understand that there are quite a few people on this board who are very interested in threads like this.

So, thanks for your time and consideration.

If they want to be the next MC, I don't get the motivation.

vincegata · Oct 27, 2013

Someone suggested instead of using multiple processes communicating through IPC but to have all code in the same process (as long as it's on the same host) and use concurrency. So I have a strategies (S) class and order execution (OE) class where S class sends orders to OE and OE send order confirmations to S. I can implement producer-consumer model where producer sends data asynchronously to consumer, but I do not know how to implement the modules to talk to each other.

So, do you guys have any code samples or suggestions on how to implement a two-way asynchronous communication between objects? THX

GloriaBrown · Nov 14, 2013

Hi hft_boy, can you show us a very simple backtest program structure?

Quote from hft_boy:

Hey,
I figure I'll spend half an hour to an hour a week writing about various aspects of building the infrastructure behind a trading system. Why? I guess I'm just kind of bored, and there seems to be precious little written on by people who know what they are doing. And I feel the need to thrust my views on the world .

I guess my first post will be about my general philosophy/approach towards programming.

First, don't optimize until it is needed. Focus on the logic of the code first, get it right, get it simple, make it easy to understand. Then, when optimizing, know your patterns of usage, and use the correct hardware/language/data structure for the job. It seems like every other day somebody recommends an SSD for accessing files faster. Well, for files which are read sequentially (e.g. data files), there is basically no difference between spinny drives and NAND drives because the bottleneck is in SATA transfer speeds, not seek time! Yeah there is like a millisecond difference in seeking to the start of the file but unless you are hammering the drive with a thousand requests a second, there is not going to be a noticeable difference.

Don't use too much abstraction, and don't use too little abstraction. Einstein is attributed with saying that "everything should be as simple as it can be, but not simpler." I tend to agree. Don't make too many classes. Don't make too few. (Don't use C++, herp derp). It comes down to patterns of usage and not prematurely optimizing. There is no point in abstracting away five lines of code. More to the point, it can actually be dangerous -- too much abstraction kills the ability to know what can be assumed about the code, which is a serious problem come testing time. This brings me to my next agenda.

Write code which is easy to test for correctness. It's really hard to get this right, and even experienced programmers mess up all the time. What I try to do is write code in such a way that there are 'logical bottlenecks', so that the number of assumptions that have to be made about each section is limited and you can assert the crap out of it, so that when it breaks it isn't subtle. Test pre-conditions, post-conditions and invariants. Put as many of these tests as possible into compile time (e.g. const-correctness).

This problem of actually knowing what various pieces of code are doing is one of the reasons I don't like using really high level abstractions/languages and try to keep external library usage to a minimum -- unless the documentation is very good, and you read it carefully, you won't know what assumptions you can make about the code (e.g. did you know that java.lang.Math.round is different from C's "math.h" round? #omgmymindwasblown). Even if you read the documentation, you should just go and read the source code anyways to double check that it actually does what it claims to do. IMHO not being aware of what assumptions can be made about code which executes behind the scenes especially as it pertains to the outcome of the code is seriously sloppy work and is not acceptable for production systems.

Well, anyways, that's about it for the week. Hope that this was entertaining or that maybe you even learned something from it!

eusdaiki · Nov 21, 2013

Hi hft_boy,

very interesting conversation you got here.
(at least it was before the whole flame war...)
It sort of gravitaded towards the low level details of FIX gateway server implementations...

in case you're interested to come back around I would like to push the discussion towards the high level view of the automated system... the different programs involved and such... to get the discussion moving in a positive direction again.

regards.