Backtester for C++

Same Lazy Element · Jul 31, 2020

931 said:
But cant fully understand why midprice would be better in historical order simulation.

Good question. In both cases you will have to create a model that will get over the execution threshold, obviously, and thus have to apply transaction costs to the backtest. If you are backtesting it at mid as I am suggesting, you will know your strategy risk parameters as well as trade value (i.e. changes in target quantity), PnL per tradevalue and periodic volume participation. Taking the TCA data for your execution setup (which can be arbitrarily complex even if you are a retail trader), you will see if your strategy is making enough to overcome your expected execution cost. The benefits are (in no particular order):

being able to use much larger set of TCA data to optimize execution. If you want to be anal, you can have your strategy also output tradevalue per intraday buckets so you can use TCA specific to the volume skew.

take into account recent changes in liquidity or microstructure. Imagine that something has changed fairly recently, but you are backtesting for the last 10 years. You can use the most recent data to figure out what the expected T-costs are currently and apply that to your full backtest.

you will be instantly able to re-evaluate viability of each strategy (live or in-conservation) when you change your execution setup by adding new broker algos or new order types. That also includes the case where you add another strategy and start crossing positions between the two strategies when necessary.

finally, it will give you an easy set of metrics to watch if you are tweaking parameters to increase PnL/tradevalue. There are various techniques to increasing PnL per trade without overfitting the strategy such as adding hysteresis or cost-of-risk parameters.

931 · Aug 1, 2020

I guess with mid it would also be possible to collect spread info from bid-ask data and create avg spread tables for each stock , for various days and times. Without actually storing all bid ask info.

Also doing it on few years of data and still using 10 years might work ... If conditions changed in recent years , it might make old data more valid.

Could be hours and weekdays and avg spreads for times?

With month it would get complex as not many samples to get in few years.

Id guess avg might be quite accurate this way if using avg spread pricing table per each stock, unless edges found will be low spread related and actual spread is higher than avg under those circumstances.

That type of table would take lot less memory compared to full bid-ask data.
And it would be easy to compare accuracy vs full bid-ask. If make 2 versions and run same parameters.

Some NN dev might make neural net that estimates spread, not just based on time and avg values.

Its probably good example of what would be easy for custom backtester but hard if not hopeless on proprietary.

More simpler solution would be bid or midprice + fixed spread per stocks but that is bad idea imo.

931 · Aug 1, 2020

931 said:
. If conditions changed in recent years , it might make old data more valid.

Actually i think it might not be valid.
If spread was higher before then price would have reflected avalible strategies at that time.

SteveH · Aug 1, 2020

You could use Amibroker and access the backtester through its COM interface. You won't find anything faster or more robust.

Elji · Aug 1, 2020

@thefairarbiter
Maybe you could consider running your backtests on a trading platform that runs on C++.
I would recommend Sierra: it is fast, stable, and interfaced for c++.

Same Lazy Element · Aug 2, 2020

931 said:
Actually i think it might not be valid.
If spread was higher before then price would have reflected avalible strategies at that time.

It would only be true if you are leveraging some form of liquidity premiums or somehow exploiting the microstructure. Imagine that you have found that new moon influences the overnight returns of the French stock market - the moon has been around forever, the French stock market has been around for a fair bit, but tight spreads have only been a thing for the last ten-fifteen years. There is no reason to assume that your the effect of the moon is liquidity driven, so you can reasonably assume that your 30-40 year backtest is valid, while assuming current transaction costs.

931 said:
I guess with mid it would also be possible to collect spread info from bid-ask data and create avg spread tables for each stock , for various days and times. Without actually storing all bid ask info.

Well, ideally if you have been trading for a while and have been saving your fills vs arrival prices, you can create a reasonable execution data-set that would reflect your specific setup (access to venues, algos, latency etc) and use that. In absence of that, I'd ask the broker for some TCA data (any broker that has some institutional presence would have that available).

931 · Aug 3, 2020

For constructing midprice , what type of data do you use?

Same Lazy Element · Aug 3, 2020

931 said:
For constructing midprice , what type of data do you use?

It's a tricky question, especially true if you only have access to the top of the book information. For market that are always quoted single tick wide and has a nice thick book, I'd use weighted mid, i.e. bid*ask_size/(ask_size+bid_size) + ask*bid_size/(ask_size+bid_size). However, for something that frequently is quoted wide I'd use a simple arithmetic mid - otherwise, someone pennying a large order will actually bias your mid the wrong way.

If you happen to have the full order book dataset, there are several clever ways to construct probabilistic micro-mid (most notable one from Sasha Stoikov, it's on SSRN). I'd only engage in implementing something like that if you are locked out of Netflix and cut off from porn sites.

931 · Aug 6, 2020

Nothing to be locked out, no netflix or porn accounts.

Probably not doing order book based sim at this stage, too much learning, data and processing.

Same Lazy Element said:
you can reasonably assume that your 30-40 year backtest is valid, while assuming current transaction costs.

For positions where spread is small fraction it seems to be wise using mid prices + some way to generate spreads.2x less memory needed.

But i have few ideas involving scalping.

In that case could historic prices & wider spread reflect unavailable opportunities that are unrealistic with current spreads as price could be also affected by spreads?

Penny stocks have gigantic spreads, perhaps prices+spreads reflect opportunities.

Algokd · Mar 23, 2021

Same Lazy Element said:
Pretty much every open source backtesting engine I've looked at is not really suitable for hard-core use. That includes various venture sponsored projects like Lean or Zipline, as well as hobby projects like backtrader. Because most developers (and startup founders) lack actual quantitative trading experience, they have built features that are kinda useless and left out features that are a must have, at least in the institutional setting. @globalarbtrader s https://github.com/robcarver17/pysystemtrade is probably the closest I've seen to an institutional-quality product, but it's geared toward a very specific type of strategies.

Sorry to resurrect this thread, but I'm curious for your opinion on why the open source solutions are unsuitable for hardcore use. What are some of the features you find lacking in them?