Developing "Spartan"

globalarbtrader · Nov 15, 2019

HobbyTrading said:
I am running a futures trading system largely based on the book you referred to. However, I did not use the Python code he provided, but wrote the entire code in Java. Although the core strategy is still in place, many of the practical aspects are rather different from how he implemented it.
I think that there are "two schools of thought". One school says that if you are convinced of a certain trade you should go full into it, or with a fixed portion of your account value (you often see "1% of account value", or statements like that).
The second school subscribes to the idea that your position size should be adjusted to (a) the riskiness of the trade and (b) how convinced you feel about this trade. This results in fading into, and fading out of, positions.
I like the second school of thought because it also enables a way to compare trade possibilities between various instruments. Which instrument has a higher conviction? Which instrument carries a higher risk?

Sorry I'm gradually reading through this thread from both ends :-)

It's important to say we have to seperate out two components here - the first is 'how large should a trade be given how risky it is, and my account size, and my risk target'; and the second is 'should I adjust my trade size given a forecast'. I'm not dogmatic about always doing the second part of this, c.f. the fixed forecasts in Systematic Trading and the 'starter system' in Leveraged Trading (which only has 'all in' trades with a fixed risk which get closed by a stoploss).

But I firmly believe you should properly size positions for risk, independent of whether you are doing the forecast adjustment thing. And that means using a fixed % of your account value is wrong, and potentially dangerous. The correct % of your account value to use will depend on:

- risk of the instrument (depending on whether 1% is based on the exposure you are taking or the risk you are taking - if the latter you can ignore this)
- forecast horizon (faster you are trading, the smaller the % risk on each trade)
- risk target
- number of instruments traded and any expected diversification benefits

Summing all these up might give you 1% as the right answer, but it's unlikely...

GAT

globalarbtrader · Nov 15, 2019

nooby_mcnoob said:
I think that's the right idea in general, to use other people's stuff as a base. It's so rare that the entirety of someone else's idea fits you properly. But of course, I would say that given the thread

Nice. I'm pretty much equally fancy.

Just to say that seeing people implementing my ideas with their own code is utterly brilliant - the whole idea of the stuff I do is to inspire people to go off and build their own systems: both writing their own code but also developing their own strategies.

Sadly not everyone has those skills at least initially which is why I've also shared my code. But I will never be one of those guys who sells a shrink wrapped system that does XYZ and expect that is what people will use (apart from anything else, it's a tough way to make not much money writing production quality paid for software and having to support it).

GAT

nooby_mcnoob · Nov 16, 2019

So turned out that I did not have stop limits working in live trading.

That was not fun to discover.

Fixed bug.

FML.

nooby_mcnoob · Nov 16, 2019

nooby_mcnoob said:
You're running your own data center! I agree that the costs are converging.

Highly recommended. Whether I do remote or local, I use terraform + docker to deploy the code and the best part about this is the reproducibility. My terraform config file is the workhorse.

Not sure what problems you've had here but make sure that you use PRAGMA journal_mode=WAL. Additionally, if you are committing to the database on a "high frequency" basis - I batch ticks every 5 seconds which seems to work OK for my purposes - then you should have the high frequency table on a separate database that you attach as it could block other tables. I haven't had any problems with corruption yet, though it has not been long enough for that problem to manifest. I regularly kill processes without any nice exit though so I would have expected to see some issue by now.

I looked into Arctic/Arrow/etc but in my testing, it didn't seem to be any better than a mildly tuned SQLite database. Obviously, we couldn't have so much global investment into such technology if I was right, so my testing must have been wrong. One of the things I want to do is start dumping realtime data into one of these fancy pants things and see if it does better without me tuning it since the tuning is what make SQLite fast. The only issue I have with using SQLite is SQLite -> Pandas is super duper slow since apparently no one really cares about it but even when I exported the tick data to parquet files, it was slow af to load in Pandas.

Re-ran some tests. Looks like Arctic's main benefit is that it works with Pandas dataframes natively which nicely steps around my SQLite <-> Pandas dataframe problem.

The query performance as a result was nearly instantaneous. Will start using Arctic alongside SQLite to store ticks and see if it makes me more productive.

For completeness, here is how the data is stored in mongodb:

1. Each date range is a document
2. Each column is a compressed base64 encoded value - presumably this is a serialized dataframe portion

So the query efficiency to me really seems to be due to the fact that it is storing pandas dataframes natively. This is fine, I guess.

It is 100% worth it to me to make the transition to use MongoDB for tick data. But then I realized... I could just do the same thing with SQLite: https://www.sqlite.org/json1.html

So the question comes back down to buy vs build.

Hmm...

nooby_mcnoob · Nov 16, 2019

nooby_mcnoob said:
Re-ran some tests. Looks like Arctic's main benefit is that it works with Pandas dataframes natively which nicely steps around my SQLite <-> Pandas dataframe problem.

The query performance as a result was nearly instantaneous. Will start using Arctic alongside SQLite to store ticks and see if it makes me more productive.

For completeness, here is how the data is stored in mongodb:

1. Each date range is a document
2. Each column is a compressed base64 encoded value - presumably this is a serialized dataframe portion

So the query efficiency to me really seems to be due to the fact that it is storing pandas dataframes natively. This is fine, I guess.

It is 100% worth it to me to make the transition to use MongoDB for tick data. But then I realized... I could just do the same thing with SQLite: https://www.sqlite.org/json1.html

So the question comes back down to buy vs build.

Hmm...

Well that was easy (chose build). Took all of one hour to write and test, now to convert the data...

nooby_mcnoob · Nov 16, 2019

I expect this code will change, but this is all of it:

And the little "app" to convert the data:

Loading about 90 days of ticks for a currency (with some gaps), around 8 million rows takes about 7 seconds unoptimized. I expect pickling is a big problem, but when I used feather as the binary format, I got seg faults. Will look into it again later but I'm very happy with this. Glad I did the investigation into how Arctic worked.

nooby_mcnoob · Nov 16, 2019

Updating the chunksize, now I can load months of ticks in milliseconds. wheeeeeee. And I don't need to run mongodb.

nooby_mcnoob · Nov 17, 2019

Super interesting result... Database size went up by only 5G from 100G after converting all the data overnight. Which means that my dumdum compression works surprisingly well.

nooby_mcnoob · Nov 17, 2019

Did some hard benchmarking with my duct tape (unoptimized) solution vs Arctic and discovered that my duct tape solution is about half the performance of Arctic. The majority of the slowdown in my solution is due to the compression being used, which is bz2. Because I'm retarded and stuck in the 90s. Now I will have to recompress using lz4

nooby_mcnoob · Nov 17, 2019

After switching to lz4 compression, my method is now 2x faster than Arctic.

Next!