Best Language for developing a backtesting platform

Quote from BlackMage:

There are a few open source projects which seems to be in relatively active development:

http://www.activequant.org/
http://sourceforge.net/projects/eclipsetrader/
http://code.google.com/p/jbooktrader/
http://code.google.com/p/tradelink/
http://code.google.com/p/algo-trader/

Non of them are particularly "high performance" and curiously no C++ open source trading engine seems to exists at all (at least not in active development).

Short story long, if you want raw performance you have to roll your own.

R is pain with large data sets.

Well, the above list is quite decent.. so at the least, if one were to pursue a genuine attempt at rolling-your-own in C++, it would be wise to at least try out/research the above options to gain insight.

I am quite sure if I spent a ton of time without any point of references such as the above list of FOSS projects, I would miss out on many helpful design/implementation details. And, it might be wise to simply run with one of the above and focus on creating strategies... I am not looking to slave away cloning the above in C++.

Thanks for the above list.
 
Quote from comintel:
Also the choice of tooling is controversial, as was mentioned. Most prefer RStudio, but I would only use Eclipse/StatET, in order to have its debugger. I am sure that RStudio will add a debugger at some point but it is not there now.
I used to use Eclipse, but have switched to RStudio about a year ago - the server-based version makes it worth while and, on average, you don't really need step-wise debugger in R. Also, RStudio is a hybrid development/analysis environment - there are table viewers, integrated doc writers etc. I just wish they would allow making custom colour schemes, I need my green on black back.

PS. Maybe it's just me, but does anyone else find the "shiny" package incredibly sexy for developing screens?
 
Quote from sle:

I used to use Eclipse, but have switched to RStudio about a year ago - the server-based version makes it worth while and, on average, you don't really need step-wise debugger in R. Also, RStudio is a hybrid development/analysis environment - there are table viewers, integrated doc writers etc. I just wish they would allow making custom colour schemes, I need my green on black back.

PS. Maybe it's just me, but does anyone else find the "shiny" package incredibly sexy for developing screens?

I will have to look at RStudio again.

StatET for Eclipse does have a superb table viewer now also. It can display huge data.tables and display any part of them instantly. I set the rownames based on some column and then it is better than Excel! It can handle much larger data much faster. I have dozens of columns and many thousands of rows. Maybe RStudio has that too - I will look.

Shiny is very popular but I have not tried it yet.
 
Am I the only one who likes a SQL backend combined with Matlab on the front? Throw the db onto a SSD and properly index the thing, and the only bottleneck is bad program design.

I can do anything I want -- and parallel computing toolbox can be used to speed things up ...
 
Quote from scriabinop23:

Am I the only one who likes a SQL backend combined with Matlab on the front? Throw the db onto a SSD and properly index the thing, and the only bottleneck is bad program design.

I can do anything I want -- and parallel computing toolbox can be used to speed things up ...

I am not opposed to the idea, but all of the niceties and overhead that an RDBMS adds, such as ACID compliance, do not matter to me much.. so, processing gigs of data via SQL seems inefficient?

I would imagine that one cannot compare flat serialized files to SQL, performance wise.
 
why did you leave out C#/.Net? Its extremely fast to develop, soon the number of libraries for C# will surpass those available in C++, if it has not already. Go to Stack Overflow, pretty much the majority of true professionals (we could define them users with 10k+ votes or whatever) code every single day in C# and those are the guys that drive parts of Google, the SE network, ....unless you are developing stuff for u-hft I highly recommend to test and evaluate in C# when choosing a higher level compiled language.

Of course if one is well versed in R/Matlab then it can be done there, but R by nature is quite slow, though there are packages out there that now handle large memory allocations, parallel computing, concurrency, and even GPU outsourcing.

Quote from NetTecture:

What is the question?

I mean, the best one is the one you know best that is suitable. PHP obviously is not suitable, Pyton not sure (assume no), C++ is very slow to develop in. Leaves Java.
 
So, after you vectorize your back test how are you gonna handle conditional branches? You basically can only vectorize what is repetitive. Anything else you need to loop through. By the way I process backtesting code inside loops at a rate of about 5-6 ticks per second which pretty much blows away any vectorized code you could write in amibroker or R or matlab. I heavily peruse concurrency and parallelization an I run everything in C#. I have not even hit the ceiling I could easily outsource certain matrix computations in some of my correlation strategies to a GPU. So much to c# not being up for the task.


Quote from comintel:

Correct.

For example:

Read all daily ES closes for 10 years into a vector.
Also compute some other vectors with the same data lagged by x days (of course do not use a loop to compute these - use a vector-aware function).

Write some functions to compute some indicators on the vector all at once, producing new vectors. These functions should not contain explicit loops.

etc.

I had some code with loops over many commodities histories that took 4 hours. I sped it up to 2 minutes once I eliminated the loops. It really takes a new mindset and skills to reformulate problems to be amenable to this style.

By the way, since functional programming was mentioned, R has functional capabilities also (as do many list processing languages). It is very nice to be able to program functions.
 
Excellent points made. I do not follow all that vector hype as well. Only the most basic backrest algorithms can run vectorized. Anything else that is conditional will not work. I am not saying vectorizations don't have their place but it should never be the outer layer of a comprehensive testing architecture.
Quote from hft_boy:

When people say R vectorization is really fast -- that's just compared to R loops which are incredibly slow because the expression to be executed in the loop has to get interpreted every iteration. With vectorization it just compiles once, and gets passed to an internal C loop -- nothing magical going on. I'm guessing the authors put in all this vectorization stuff because they realized that interpreted loops were way too slow to get anything done. And then they realized they could sell it as a feature -- no more loops to manage your arrays, yay!

Personally I tend to shy away from R since it's unwieldy for this kind of event based stuff (although it can be used well for getting a feel for the data), and, for me, code bases larger than like 100 lines get incredibly difficult to manage.

If your attitude is just 'whatever gets the job done' (and it's a good attitude) why don't you just use the language you are most comfortable with and roll your own? Depending on how accurate your testing needs to be and how good of a coder you are, it can be done in a few hours, and you'll be on your way to a production system. If you're looking for Rails feel then I've heard Python has some good OO / script qualities. Not my style though -- personally, I generally go with Java for sketching things out since it's easy to scale up the code base and/or port to C(++) when you need it to go.

If you're going to be doing big backtesting (GB/TB of data, optimization, etc.) you might as well put in the effort to write your infrastructure in native and get incredible speedups. If you just want to try something here and there, probably not worth it. Up to you to decide though.
 
How long does it take you from request to completion to pull 150 million ticks out of your SQL server, without consuming more than 1-2gb memory at a time (keep in mind that is just for one symbol over about half a year, you may want to expand later to test a basket strategy) ? Sorry but IMHO nothing beats a binary data store or otherwise high performance db, and any SQL based solution is surely none of them. Not trying to criticize your solution but just recommending to keep things in perspective.

Quote from scriabinop23:

Am I the only one who likes a SQL backend combined with Matlab on the front? Throw the db onto a SSD and properly index the thing, and the only bottleneck is bad program design.

I can do anything I want -- and parallel computing toolbox can be used to speed things up ...
 
Back
Top