Backtester for C++

guru · Jul 29, 2020

160 backtesting projects on Github, but only 4 of them in C++. Also only 4 in C#, which I use:
https://github.com/search?q=stock+backtest&type=Repositories

Same Lazy Element · Jul 29, 2020

thefairarbiter said:
First the status of my own work.

At my shop, we have developed and use both a separate C++ engine for backtesting high frequency strategies (though it's a bit of a tricky game and is a "glimpse" at best) and a separate python backtest engine for lower turnover strategies. Sadly I can't share or contribute, but I can give my (biased but professional) opinion on the state of backtesting engines in the open source space as well as opine on what you are trying to build.

Pretty much every open source backtesting engine I've looked at is not really suitable for hard-core use. That includes various venture sponsored projects like Lean or Zipline, as well as hobby projects like backtrader. Because most developers (and startup founders) lack actual quantitative trading experience, they have built features that are kinda useless and left out features that are a must have, at least in the institutional setting. @globalarbtrader s https://github.com/robcarver17/pysystemtrade is probably the closest I've seen to an institutional-quality product, but it's geared toward a very specific type of strategies.

There are two distinct types of backtesting engines. A fixed interval (bar-based) backtester creates a target position based on a snapshot of the market at some given point in time. An event-based backtest processes updates to the market state as they arrive. The latter usually consumes tick data, you can build a variety of market state simulations and latency responses - that makes them suitable for simulating higher-turnover intraday strategies. The former take bar data, allows for more complex computations and thus is more suitable for lower-turnover statistical strategies. It's very difficult to combine these two in a single product and there really is no reason for it - decide on one or the other. Unless you have real life LL development experience, I'd avoid anything order-book based.

A lot of what you are trying to build does not belong in the backtesting engine. Indicators are part of your alpha and should live in a separate, unrelated library. The state of your account and broker should not be part of your backtesting process. Also, do not try to combine backtesting engine with actual live execution, it's silly and will give you more problems than benefits.

Unless you are building an event-driven tool to test HFT strategies, do not bother with simulating the fills and trading costs. General idea for lower-turnover strategies is to backtest at mid, establish your PnL/tradevalue and use various portfolio formation/trade reduction techniques to increase it until you get above your expected cost of execution. I.e. you are breaking TCA and alpha into two separate threads and can use TCA from your prior experience instead of building uncertain costs of execution it into the actual backtest.

It makes sense to spend a lot of time and effort on backtest analysis, especially on turnover, pnl/trade analysis, drawdowns (both depth and length) and market correlation. This said, I am not sure it makes sense to write all that in C++ or export the backtest into a separate analysis tool. We have build a pretty Excel tool for it and I am using it for both HF and LF strategies.

Make it easy to store backtests together with the relevant version of the alpha code and all of the parameters (we dump the whole things into Mongo, for example, with code and parameters). Create a tool that allows you to read and compare multiple backtests graphically, as well as allows you to compare your backtest with live trading results. As you are start running multiple live strategies and start making changes/revisions, you will appreciate these features.

thefairarbiter · Jul 29, 2020

First, awesome post. Thank you.

I think the larger narrative you're conveying here is that the professional world of quantitative finance has a much harder time building a backtester that can actually emulate live trading. I totally believe that. But I'm not convinced that an amateur, do-it-yourself'er like me should be quite as concerned with the constraints of strategies like yours. In short, I've tried to build a live trading simulator as a backtester (hence the account and broker and stuff), and I believe that the strategies and technical implementations of my IB paper trader are simple enough for this to be possible. Let me elaborate:

Same Lazy Element said:
A fixed interval (bar-based) backtester creates a target position based on a snapshot of the market at some given point in time

Definitely fall into this category. This kind of backtesting is simple, and it's really just an abstraction of a live trader, but with pre-existing data. Hence my integration of the backtesting framework into a live trading application. Wouldn't you say that having a simple setting like this minimizes the drawbacks that I think you're talking about?

Same Lazy Element said:
Also, do not try to combine backtesting engine with actual live execution, it's silly and will give you more problems than benefits

Touched on this above. I suspect that this piece of advice is from professional experience at a real investment firm with complicated strategies. I imagine that the strategies you're trying to make are difficult to simulate in the first place, and may not be possible at all unless you start going crazy with randomization algorithms, since the latency of your data stream has a high variance in proportion to the mean (whereas my simple model has a variance in much lower proportion to the mean). Milliseconds just don't matter in the latter case, but matter a lot in the former. Is this is the main motivation of your perspective here?

Same Lazy Element said:
backtest analysis, especially on turnover, pnl/trade analysis, drawdowns (both depth and length) and market correlation.

This is a great insight. I believe what you're trying to say is that the backtester is really just supposed to tell you what your strategy's expectations (as in probability) are in terms of trade volume, trade profit, available capital, and relative performance, for a given piece of data in a known market setting. It will also make 1:1 comparisons like you mentioned easy. This kind of approach will definitely make its way into my current project.

Same Lazy Element said:
Unless you have real life LL development experience, I'd avoid anything order-book based.

Order book-based (synonym for depth-of-book right, or L2 data right?) is supposed to mean HF strategies? Also, LL development?

Two more questions: how many trades a day does your typical "low frequency" strategy make? How many for the high frequency strategies?

traider · Jul 30, 2020

thefairarbiter said:
I checked out Lean. Super awesome project! I'd heard of quantconnect before starting on building a backtester. I didn't really look into them because at the time, I didn't want to learn C# and I thought that developing a C++ backtester like BackTrader would only take like one summer. The codebase they've developed looks massive, and its probably supposed to support literally any kind of strategy. That's not what I'm going for. The strategies my backtester is supposed to support have a lot to do with statistical signal processing and discrete events (hence the time map). This system has been especially useful when running intra-day strategies on lots of data lines from IB. As you could probably guess, that SP has a lot of FFT and linear algebra in it, which plenty of C++ apis will easily integrate into.

Accurate. Implementing a backtester like Lean is simply untractable for a single shmuck like myself. But I don't want the kind of generality that they're going for. My backtester will have enough stuff to support data-driven strategies for momentum, volatility, and others. The detailed stuff like reading data will come as it needs to.

If enough people care about a C++ backtester and I'm able to recruit interested parties, then one day a C++ backtester like theirs could be possible.

A valuable perspective to consider. I'm open to any kind of strategy that works, but according to my reading, volatility bets and momentum strategies can work well with sufficient signal processing. I'm definitely going to see what quantconnect/lean has to offer and learn some C# in the near future though. They seem to support a lot of broker APIs, and I like the idea of being to move between brokers if the current one isn't being cooperative.

Just to confirm @traider, you're not interested in developing a C++ backtester?

I don't see any advantages to developing no HFT strategies in C++ especially when Python can easily do the job much faster. Also as your ideas become more complex and require machine learning it will be very tough to implement this in C++
IB has a python api so you might want to check that out

Market_Diver · Jul 30, 2020

You don't need C++ for writing a backtester.
Python is much faster to develop and it's easier due to ample third party libraries available.

C++ should only be used for trade execution.

Same Lazy Element · Jul 30, 2020

thefairarbiter said:
Order book-based (synonym for depth-of-book right, or L2 data right?) is supposed to mean HF strategies? Also, LL development?

For most part, yes, you'd use full order book (at least a few levels away from the touch) to figure out order book pressures, overhangs etc. It's HFTs bread and butter. LL = low latency. Some trade/book pressure strategies would only use top of the book, especially for situations where you are doing it across related products.

thefairarbiter said:
Two more questions: how many trades a day does your typical "low frequency" strategy make? How many for the high frequency strategies?

I think it's best to think of this in turnover terms (since my target positions are sliced for execution, sometimes I get a lot of trades that are not really "trades"). Stuff that turns over from once a day and higher is "non-HFT" for the purposes of this discussion. To be honest, even higher turnover strategies can be simulated well enough using bar data (e.g. secondly bars) as long as you make some assumptions about the market microstructure.

thefairarbiter said:
This is a great insight. I believe what you're trying to say is that the backtester is really just supposed to tell you what your strategy's expectations (as in probability) are in terms of trade volume, trade profit, available capital, and relative performance, for a given piece of data in a known market setting. It will also make 1:1 comparisons like you mentioned easy. This kind of approach will definitely make its way into my current project.

Best way to think of your backtest is that it's a Tinder profile. The ones that look ugly you swipe left right away and that's that. The ones that look passable (depending on how desperate you are) you swipe right, only to frequently discover flaming pimples and hairy armpits. Whenever that happens, you always wish that you had more pictures beforehand. Similarly, the task of a good backtesting framework is to to show you the potential issues with the strategy in every way possible. Sometimes it's possible to fix these issues without overfitting and go on. Sometimes you drop the idea all together after seeing these issues.

931 · Jul 31, 2020

thefairarbiter said:
Are you interested in sharing code @931?

If we can find common ground for future plans maybe develop some aspects together?

I would not think of open sourcing in this type of competitive field.

Had to cut corners like documentation to do things quicker but still took years.

I also use common timings for all instruments to save on memory and get faster seek times if accessing instruments by timing.

But also keep tick file position references to do accurate simulation of orders.

In developing backtester id reccomend to implement bid and ask data, at least for order simulation if working on lower timeframes or penny stocks.

The right chart is displaying bid-ask bars of AAPL for example.
White is bid center of bar is shared etc.
Depending on what tone you focus on will determine if you see bid or ask.

With visible spreads like these its important to use both in simulation IMO.

931 · Jul 31, 2020

Market_Diver said:
You don't need C++ for writing a backtester.

If using python ML libs then those are C anyway.

But if large portion of bottleneck code runs python then slow python runtime could slow development if model generation would take days for example.

In this scenario C++ both runs and allows faster development.

Same Lazy Element · Jul 31, 2020

931 said:
In developing backtester id reccomend to implement bid and ask data, at least for order simulation if working on lower timeframes or penny stocks.

I would say exactly the opposite, unless you are dealing with very short holding times or doing any sort of market making. Avoid using bid/ask spread, order simulation or simulating fills directly in the backtest process and simulate everything at mid, while maintaining good alpha/trade and volume participation statistics.

Execution is highly uncertain and you would rather separate it into your implementation analysis that includes TCA/volume/slicing etc. This way you can also optimize execution holistically instead of doing it on per-strategy basis.

931 · Jul 31, 2020

Same Lazy Element said:
I would say exactly the opposite

Without spreads its much easyer to find too much unrealistic oppertunity IMO.

Especially in penny stocks where spreads are enormous. Or even with sp500 low spread ones if going lower timeframes.

Same Lazy Element said:
Execution is highly uncertain and you would rather separate it into your implementation analysis that includes TCA/volume/slicing etc. This way you can also optimize execution holistically instead of doing it on per-strategy basis.

Generating signals and simulation can be separated and its great for rendering tests fast.
But even in this stage you could use bid ask strength multiplier or something that pulls to midprice or further using both bid and ask data for more accurate? results.

I am thinking about retail level spreads...
Takes alot to even overcome those on lower timeframes IMO.
Thats why i include in tests.

For algos i use mid in many places.
But cant fully understand why midprice would be better in historical order simulation.

So far i have gone from bid->mid->bid-ask simulation
While keeping backward compatability in the form of preprocessor macros to disable ask data.

Using mid+simulated spread will make it easyer but does not seem realistic if using sl & tp.