Storing trading strategy logic

Crikey you're brave. I would shudder with terror at the thought of running an 'algo' which wasn't comprehensible to any human being, and with that amount of data needed to describe it I'd be seriously concerned about overfitting.

On topic, I've tried various ways of doing this in the past. It makes most sense for a large organisation, where you might want to limit the amount of new code being written, all of which needs to be tested and understood all over again. For an individual, I don't mind specifying my strategy entirely in code, which during the development phase is natural anyway. Once you've 'locked down' the design then you don't need the flexibility that the storage option offers you; if anything its a temptation to change or meddle with the strategy. I have a very small number of additional parameters, which can easily live in a small database.


Global,

One of the downsides of human comprehensibility is the required dumbing down of what is a very complex process. The human mind can deal efficiently with 7 to 10 components/factors in a situation requiring evaluation or judgement. This is probably because for most of the last 5,000,000 years of our evolution life was pretty simple: is that cave going to be damp in the winter? is that good creature to eat or will it try to eat me? is that life form good to have sex with, or not....not a lot of choices or complexity.

So we are for the moment stuck with a brain that when given a domain with hundreds or thousands of interrelated factors, like financial markets, can't do much until it groups/summarizes/ignores or in some way whittles the whole thing down to 7 to 10 factors before proceeding. For more info read any good text on cognitive neuroscience.

However we are not suck with our brain. Software exists that can model 3,000,10,000, .... variables at the same time without the summarizing, fuzzifying process that would be required in a human brain.

Overfitting is a problem for folks who play with powerful software without the required skill, knowledge or training.....like letting middle schoolers play with shoulder fired surface to air missiles.....they are as likely to shoot up the school house as take down an enemy aircraft.
In very simple terms overfitting is avoided by rigidly segregating the learning, training and test data sets during model development. Definitions: Learning Set: Time series used to postulate and discover model components, interrelationships, functions, factors. Training Set: Time series used to learn the model by weighting component interrelationships. Test Set: application of trained model to a previously unseen time series to determine likely real world performance.
 
I like having Alpha.[ModelName] for name space organization. All the alpha models exist as their own class (with sub classes if needed). I like it because:

- I get "full" functionality to create a model. Whatever is possible in the language/platform/framework, I can use.

- Alpha models are isolated from each other. They'll re-use functionality from other areas (execution, transaction cost estimators, risk, etc), or subclass/polymorph if they need something custom.

- Bring alpha models online and offline is relatively trivial.

- Big alpha models and little ones can exist peacefully together.

- By having alpha models grouped into the same application, it's pretty easy to say "strat x isn't allowed to open positions on an instrument that strat y already is working with", etc.
 
You do not need to describe your strategy in script code. But you can write scripts in the language of choice and store as string and have it evaluated via reflection. C# like many other languages offers a variety of options (Roslyn, ScriptCS, CSScript,...)

I am courious how to you guys store the logic (the algorithm) of you strategies.

Prior I have written the logic directly into the source code of the program. But why not use a common format like JSON to store the logic. That way you can share development with other developers without giving away your valuable secrets and IP.

The logic can be loaded into backtesting or live trading engine for execution.
Multiple strategies can be traded at the same time with different allocations.

What do you think about using JSON or XML for this?

Example of a EMA crossover system with fixed 6% profit target and fixed 2% SL:

Code:
{
    "strategy_name": "EMA 20 Crossover 1",
    "instrument": "SPY",
    "timeframe": "5 minutes",
    "signal": {
        "long": {
            "entry": "price > ema(20)",
            "exit": "price >= entry * 1.06 || price <= entry * (1-0.02)"
        },
        "short": {
            "entry": "price < ema(20)",
            "exit": "price <= entry * (1-0.06) || price >= entry * 1.02"
        }
    }
}
 
You can store the whole core logic as script in a string and have it compiled via reflection or other options I outlined above. The benefit is that it is human readable, can be easily swapped in and out, you can even change core logic in a text editor and load in as string and have it evaluated with the changes. Obviously, this would not be an option if you cared about code steal. In that case you would develop the core strategy inside a library and obfuscate it. I do not understand the original motivation of the OP but the choices are endless either way.

I consider your json to be trading parameter configuration (not code or logic). You would still need code and logic to use that configuration
 
Global,

However we are not suck with our brain. Software exists that can model 3,000,10,000, .... variables at the same time without the summarizing, fuzzifying process that would be required in a human brain.

Overfitting is a problem for folks who play with powerful software without the required skill, knowledge or training.....like letting middle schoolers play with shoulder fired surface to air missiles.....they are as likely to shoot up the school house as take down an enemy aircraft.

In very simple terms overfitting is avoided by rigidly segregating the learning, training and test data sets during model development. Definitions: Learning Set: Time series used to postulate and discover model components, interrelationships, functions, factors. Training Set: Time series used to learn the model by weighting component interrelationships. Test Set: application of trained model to a previously unseen time series to determine likely real world performance.

In the interests of balance and in praise of the simpler approach, I will share a few anecdotes from my professional experience of systematic trading with multi billion dollar portfolios.

The first story occurs in 2011, when as you'll recall there was an earthquake in Japan. On Friday there hadn't been much news, but by sunday it was clear the damage was much deeper than expected. I was part of a group of PM's each covering a different asset class, and we had a conference call with the CEO and Chief Risk officer to decide what to do next, depending on how the Japanese markets opened monday.

Some of the PM's had absolutely no idea how their models would react conditional on price changes, because the complex non linear interactions were just way beyond what someone could intuitively or back of the envelope work out. When the markets opened, in some cases with very large moves, the non linear models did some really strange things. We lost complete trust in the fancy models. After that a decision was taken to strip back models to their simplest elements. We lost about 5% of the in sample performance, but probably almost nothing .

The second story concerns a really smart recent Phd who we interviewed for a job. After telling us about some fancy fitting technique he'd used on some data in his thesis, he was asked the killer question - how many degrees of freedom ? He didn't know. When he eventually worked it out it was obvious that the model was horribly over fitted. A relatively stupid second year undergraduate using simple statistical techniques would have been able to avoid this mistake.

The final story is about a really smart Phd who we did hire. He spent about 6 months with some really complex fitting tools, and managed to come up with a six parameter non linear model. Nobody could understand it, or predict exactly what it would do given a particular price movement, and it was 99% correlated to the much simpler model we already had. I left the shop shortly afterwards, so I can't tell you how that story pans out.
 
and what is the relationship to the topic of this thread?

o_Oo_Oo_O

In the interests of balance and in praise of the simpler approach, I will share a few anecdotes from my professional experience of systematic trading with multi billion dollar portfolios.

The first story occurs in 2011, when as you'll recall there was an earthquake in Japan. On Friday there hadn't been much news, but by sunday it was clear the damage was much deeper than expected. I was part of a group of PM's each covering a different asset class, and we had a conference call with the CEO and Chief Risk officer to decide what to do next, depending on how the Japanese markets opened monday.

Some of the PM's had absolutely no idea how their models would react conditional on price changes, because the complex non linear interactions were just way beyond what someone could intuitively or back of the envelope work out. When the markets opened, in some cases with very large moves, the non linear models did some really strange things. We lost complete trust in the fancy models. After that a decision was taken to strip back models to their simplest elements. We lost about 5% of the in sample performance, but probably almost nothing .

The second story concerns a really smart recent Phd who we interviewed for a job. After telling us about some fancy fitting technique he'd used on some data in his thesis, he was asked the killer question - how many degrees of freedom ? He didn't know. When he eventually worked it out it was obvious that the model was horribly over fitted. A relatively stupid second year undergraduate using simple statistical techniques would have been able to avoid this mistake.

The final story is about a really smart Phd who we did hire. He spent about 6 months with some really complex fitting tools, and managed to come up with a six parameter non linear model. Nobody could understand it, or predict exactly what it would do given a particular price movement, and it was 99% correlated to the much simpler model we already had. I left the shop shortly afterwards, so I can't tell you how that story pans out.
 
You can store the whole core logic as script in a string and have it compiled via reflection or other options I outlined above. The benefit is that it is human readable, can be easily swapped in and out, you can even change core logic in a text editor and load in as string and have it evaluated with the changes. Obviously, this would not be an option if you cared about code steal. In that case you would develop the core strategy inside a library and obfuscate it. I do not understand the original motivation of the OP but the choices are endless either way.

agreed with some. I use injection (visitor pattern) but I think simple is best; leave the logic in the code. I realize you can eval the json or xml but then why stop there you can also write a DSL, dink around with llvm, or a visual tool that links logic together.
 
In the interests of balance and in praise of the simpler approach, I will share a few anecdotes from my professional experience of systematic trading with multi billion dollar portfolios.

The first story occurs in 2011, when as you'll recall there was an earthquake in Japan.


So your models could not handle a Black Swan event? Why not just develop separate models for large earthquake, very large earthquake, very very large earthquake with tsunami, asteroid impact destroying 1/5 of Earth surface, outbreak of Zombies and so on? We have them. On slow days I love to play around with them by entering event conditionals like tribal waves destroy London and Tokyo AND Zombie outbreak kills everyone in New Jersey.
 
Not sure I follow what you are saying.

I simply stated that the simplest way to swap in and out core strategy code during run-time and have it human readable is via script. Maybe the following as one of many choices makes it clearer what I tried to say:

http://www.csscript.net/help/script_hosting_guideline_.html
http://scottksmith.com/blog/2013/05/08/getting-started-with-scriptcs/
http://blogs.msdn.com/b/csharpfaq/archive/2011/12/02/introduction-to-the-roslyn-scripting-api.aspx


agreed with some. I use injection (visitor pattern) but I think simple is best; leave the logic in the code. I realize you can eval the json or xml but then why stop there you can also write a DSL, dink around with llvm, or a visual tool that links logic together.
 
Back
Top