Vectorized backtesting with pandas

Gotta vectorize. I did an initial backtest in Python with lots of large dataframes using looping. That took almost an hour to complete.

Then I vectorized everything and it ran in 4 minutes.
 
Gotta vectorize. I did an initial backtest in Python with lots of large dataframes using looping. That took almost an hour to complete.

Then I vectorized everything and it ran in 4 minutes.

Depends how you loop. Sounds like you were using iterrows instead of itertuples or iteritems. iterrows requires typing of every item and that's ridiculously slow.
 
Depends how you loop. Sounds like you were using iterrows instead of itertuples or iteritems. iterrows requires typing of every item and that's ridiculously slow.

I don't use iterrows but nice to learn about itertuples!
 
I don't have anything to add that is python specific. I find optimizing code one of the great joys of programming. First, I never optimize unless I find my self saying..."This is slow. This is painful!" So first, there must be pain. Next, I will look for obvious performance bottle necks. If after that I am still feeling pain, I will start to question my original design, sometimes puzzling for days over the problem. The key is you have to be willing to scrap code already written. I once had to optimize some javascript (Node.js) code and I said to the product owner we need to rewrite the entire application in GoLang and it would take 9 months. Of course I was joking...he was not amused.
 
I don't have anything to add that is python specific. I find optimizing code one of the great joys of programming. First, I never optimize unless I find my self saying..."This is slow. This is painful!" So first, there must be pain. Next, I will look for obvious performance bottle necks. If after that I am still feeling pain, I will start to question my original design, sometimes puzzling for days over the problem. The key is you have to be willing to scrap code already written. I once had to optimize some javascript (Node.js) code and I said to the product owner we need to rewrite the entire application in GoLang and it would take 9 months. Of course I was joking...he was not amused.

Haha go and nodejs.
 
Fail. It doesn't matter how long your backtest takes - the only thing that matters is that you can calculate an entry signal before you need to pull the trigger when your system is live.

See how you feel after backtesting a strategy on a day's worth of raw NASDAQ feed.

OP, I use pandas extensively for backtesting and it certianly helps circumvent the shortcomings of Python. The process you outlined above is a sinple and effective workflow.
 
See how you feel after backtesting a strategy on a day's worth of raw NASDAQ feed.

OP, I use pandas extensively for backtesting and it certianly helps circumvent the shortcomings of Python. The process you outlined above is a sinple and effective workflow.

I've been thinking of keeping bid/ask data around for experimentation. Grows at the rate of 1G per week. Should do it...
 
If speed is your biggest concern....checkout dask and/or pytorch. Vectorization on crack....if you have a Nvidia cuda enabled gpu.

Dask distributed is also nice, If you want to split your work across multiple machines.
 
If speed is your biggest concern....checkout dask and/or pytorch. Vectorization on crack....if you have a Nvidia cuda enabled gpu.

Dask distributed is also nice, If you want to split your work across multiple machines.

Not so far... But good to know about it, thanks!
 
Back
Top