Hey,
I figure I'll spend half an hour to an hour a week writing about various aspects of building the infrastructure behind a trading system. Why? I guess I'm just kind of bored, and there seems to be precious little written on by people who know what they are doing. And I feel the need to thrust my views on the world
.
I guess my first post will be about my general philosophy/approach towards programming.
First, don't optimize until it is needed. Focus on the logic of the code first, get it right, get it simple, make it easy to understand. Then, when optimizing, know your patterns of usage, and use the correct hardware/language/data structure for the job. It seems like every other day somebody recommends an SSD for accessing files faster. Well, for files which are read sequentially (e.g. data files), there is basically no difference between spinny drives and NAND drives because the bottleneck is in SATA transfer speeds, not seek time! Yeah there is like a millisecond difference in seeking to the start of the file but unless you are hammering the drive with a thousand requests a second, there is not going to be a noticeable difference.
Don't use too much abstraction, and don't use too little abstraction. Einstein is attributed with saying that "everything should be as simple as it can be, but not simpler." I tend to agree. Don't make too many classes. Don't make too few. (Don't use C++, herp derp). It comes down to patterns of usage and not prematurely optimizing. There is no point in abstracting away five lines of code. More to the point, it can actually be dangerous -- too much abstraction kills the ability to know what can be assumed about the code, which is a serious problem come testing time. This brings me to my next agenda.
Write code which is easy to test for correctness. It's really hard to get this right, and even experienced programmers mess up all the time. What I try to do is write code in such a way that there are 'logical bottlenecks', so that the number of assumptions that have to be made about each section is limited and you can assert the crap out of it, so that when it breaks it isn't subtle. Test pre-conditions, post-conditions and invariants. Put as many of these tests as possible into compile time (e.g. const-correctness).
This problem of actually knowing what various pieces of code are doing is one of the reasons I don't like using really high level abstractions/languages and try to keep external library usage to a minimum -- unless the documentation is very good, and you read it carefully, you won't know what assumptions you can make about the code (e.g. did you know that java.lang.Math.round is different from C's "math.h" round? #omgmymindwasblown). Even if you read the documentation, you should just go and read the source code anyways to double check that it actually does what it claims to do. IMHO not being aware of what assumptions can be made about code which executes behind the scenes especially as it pertains to the outcome of the code is seriously sloppy work and is not acceptable for production systems.
Well, anyways, that's about it for the week. Hope that this was entertaining or that maybe you even learned something from it!
I figure I'll spend half an hour to an hour a week writing about various aspects of building the infrastructure behind a trading system. Why? I guess I'm just kind of bored, and there seems to be precious little written on by people who know what they are doing. And I feel the need to thrust my views on the world
.I guess my first post will be about my general philosophy/approach towards programming.
First, don't optimize until it is needed. Focus on the logic of the code first, get it right, get it simple, make it easy to understand. Then, when optimizing, know your patterns of usage, and use the correct hardware/language/data structure for the job. It seems like every other day somebody recommends an SSD for accessing files faster. Well, for files which are read sequentially (e.g. data files), there is basically no difference between spinny drives and NAND drives because the bottleneck is in SATA transfer speeds, not seek time! Yeah there is like a millisecond difference in seeking to the start of the file but unless you are hammering the drive with a thousand requests a second, there is not going to be a noticeable difference.
Don't use too much abstraction, and don't use too little abstraction. Einstein is attributed with saying that "everything should be as simple as it can be, but not simpler." I tend to agree. Don't make too many classes. Don't make too few. (Don't use C++, herp derp). It comes down to patterns of usage and not prematurely optimizing. There is no point in abstracting away five lines of code. More to the point, it can actually be dangerous -- too much abstraction kills the ability to know what can be assumed about the code, which is a serious problem come testing time. This brings me to my next agenda.
Write code which is easy to test for correctness. It's really hard to get this right, and even experienced programmers mess up all the time. What I try to do is write code in such a way that there are 'logical bottlenecks', so that the number of assumptions that have to be made about each section is limited and you can assert the crap out of it, so that when it breaks it isn't subtle. Test pre-conditions, post-conditions and invariants. Put as many of these tests as possible into compile time (e.g. const-correctness).
This problem of actually knowing what various pieces of code are doing is one of the reasons I don't like using really high level abstractions/languages and try to keep external library usage to a minimum -- unless the documentation is very good, and you read it carefully, you won't know what assumptions you can make about the code (e.g. did you know that java.lang.Math.round is different from C's "math.h" round? #omgmymindwasblown). Even if you read the documentation, you should just go and read the source code anyways to double check that it actually does what it claims to do. IMHO not being aware of what assumptions can be made about code which executes behind the scenes especially as it pertains to the outcome of the code is seriously sloppy work and is not acceptable for production systems.
Well, anyways, that's about it for the week. Hope that this was entertaining or that maybe you even learned something from it!
. But really, what do you use a tick data file for? You don't need to cache queries, build inner joins and spreadsheets and whatnot, because in the end you are going to analyze it just the same way you analyze market data in real time -- sequentially, one tick at a time. I guess I just don't really understand what problem a database solves in terms of backtesting.