Developing "Spartan"

nooby_mcnoob · Nov 14, 2019

HobbyTrading said:
The pleasure is all mine. I learn a lot from your posts.

By the way: the computer I use to run my trading automated is not your regular PC or laptop. It is actually an Intel NUC. It uses very little power so hooking it up to a small size UPS makes it run for several hours in case of the occasional electricity cut. It is connected to my home network so I can connect to it from my desktop computer. Some time ago I found an app which enables me to also connect to it from my iPad in case I am away. That is a bit cumbersome though, so I only use that when absolutely necessary.

Oh boy... That sounds like a fun project. I will do it end of next year if I'm still trading. Because otherwise... I'm going to be down the rabbit hole for months!

djames said:
Typically quant data is suited to a columnar layout not SQL rows.
Also, these days you can get extreme storage along the columns. Some good options for storing quant data are https://github.com/man-group/arctic or use pyarrow parquet files. I use pyarrow parquet files and roll my own flat file database - very very fast!

Loading tick data from parquet files into pandas dataframes wasn't faster than loading from SQL. Take from that what you will but I did the tests.

Edit: Arctic can query millions of rows per second per client, achieves ~10x compression on network bandwidth, ~10x compression on disk, and scales to hundreds of millions of rows per second per MongoDB instance.

Yawn, I do this with SQLite the best database in history

d08 · Nov 14, 2019

nooby_mcnoob said:
I haven't thought about performance for a long time. It's sad, but true. And I'm the type of guy who sees value in something like this https://boost-experimental.github.io/di/benchmarks.html

Python is eminently productive, I rarely run up against any issues that would slow me down that are NOT related to the algorithm, i.e., that are caused by the language. The main thing would be parallel processing but multiprocessing handles that just fine if/when I need it. I wrote my own df_parallel_apply(grouper,function,*args,**kwargs) to apply a function to a grouped dataframe in parallel.

Excuse me if this is ignorant but why not just use Modin?

nooby_mcnoob · Nov 14, 2019

d08 said:
Excuse me if this is ignorant but why not just use Modin?

Just not that important yet. Probably will be at some point.

d08 · Nov 14, 2019

HobbyTrading said:
I see that you use the word "cloud" slightly different than I do. You seem to have a computer instance in the cloud with computing power and storage. Such that you can run software from it (e.g. automated trading system). I only use "cloud storage", Dropbox in my case, to replicate files over multiple computers and to have access to settings files and log files while I'm not at home. I use a computer at home to run my trading system.
The difference: I don't have to pay $150/month, as you seem to do, but had to buy an extra computer. The data I have at Dropbox is less than 2 GB, so I use a free account.

Cloud computing (rather, VPS) can be much cheaper than that. I pay about $10 a month for a decent single core machine. I don't have a GUI and it runs Ubuntu with 2GB of RAM, perfectly fine. I definitely don't want to run a trading algo on a home computer ever again.

nooby_mcnoob · Nov 14, 2019

d08 said:
Cloud computing (rather, VPS) can be much cheaper than that. I pay about $10 a month for a decent single core machine. I don't have a GUI and it runs Ubuntu with 2GB of RAM, perfectly fine. I definitely don't want to run a trading algo on a home computer ever again.

How often do you modify the code? I'm still in the mode where I'm making daily tweaks

d08 · Nov 14, 2019

nooby_mcnoob said:
How often do you modify the code? I'm still in the mode where I'm making daily tweaks

Not very often, maybe every 2 weeks or so and less. I have a script to upload (scp) the compiled updated to the VPS. The compiling takes about a minute while the upload is about 10 seconds. So there's a few steps but nothing too bad.
Consider that I'm in a country with a historically bad internet connectivity (+blackouts), I did run things locally for years but that was extremely stressful. I trust my VPS connectivity much more and they're not far from IB, so any intercontinental disruptions aren't a thing either.

HobbyTrading · Nov 14, 2019

d08 said:
Consider that I'm in a country with a historically bad internet connectivity (+blackouts), I did run things locally for years but that was extremely stressful. I trust my VPS connectivity much more and they're not far from IB, so any intercontinental disruptions aren't a thing either.

I agree with you that the reliability of a "cloud computer" is most likely higher than having your hardware at home and relying on your ISP and local electricity supplier. However, in my case has it not yet reached the point where I'm willing to pay money for that extra reliability. My impression from what I'm reading at ET is that your and @nooby_mcnoob's trading systems are much more advanced than mine.

nooby_mcnoob · Nov 14, 2019

Great points all. Fundamentally, the "issue", if there is one, is that I'm just not really making use of the remote machine except to run data collection, and even that, I'm not really using it anymore.

For now, I will probably shut the remote machine down and return it to the cloud graveyard but I think I would eventually want to resurrect it for the reasons that d08 stated.

Will let the decision fester for a bit before I pull the trigger.

globalarbtrader · Nov 15, 2019

nooby_mcnoob said:
I like the simplicity of your setup. I currently have about 100 GB of data w/ Dropbox. I'm not too worried about $150 on the remote machine, but whether it is solving a problem for me.

The question still remains whether the desktop setup is stable enough. In the past, I've had problems with machines dying when they've been on 24/7 which is really the main reason I chose to use someone else's infrastructure as they can avoid world ending hardware failures better than I can. However, the local machines that died weren't really high quality parts. I have very high quality parts in the machines I build today.

Yes, I think we get rid of the remote machine assuming no serious problems by end of this year.

Thanks for the chat buddy

Two issues here, the data storage and the cloud.

I've discussed the cloud stuff before, but briefly if machines breaking is your issue then you can stay local by buying another computer. I am also using headless NUC type units, and I have 3 of them that are high enough spec to run my system which cost about $500 each new, though I bought two of them secondhand (there are a few slower machines knocking around the house that I keep meaning to use for various hobby projects and never get round to). I use one for development, the other two are live and backup, and I swap them round regularly. In the last 6 years I've had one machine failure, so I really wouldn't trust one machine entirely.

Similarly I've had issues with backups, so I've gone to town with a RAID NAS drive on which everything backups every night, plus a USB drive that backs that up, plus everything is on at least 2 machines anyway. For belt and braces I should probably set up some offline storage like dropbox, and that's on my to do list. That also stores all our household data, and I can write the cost of all this stuff off against tax which I couldn't do if I was purely a trader.

Of course this doesn't help you if your internet fails, or your power fails... and that I guess is the appeal. $1000 of local hardware does buy you quite a lot of cloud computing time. I think local hardware is still cheaper but time I've done the maths it's got closer and closer depending on how many years you amortise your hardware over.

I'm considering containerising my new system when it eventually goes on line (that's basically when pysystemtrade is production ready), which will seriously reduce the up front hassle of moving everything to the cloud, so this is still something I might well do in the future. I would still want local copies of everything for persistence reaasons, and so I could spin it up locally if I wanted to, so I'd need at least one machine to do this on. It is also also pretty cool having a stack of computers though...

As for data storage, sqllite has done well for me and I think it's acceptable for low frequency trading, but I have had occasional issues with files becoming corrupted (writing when a process fails? it doesn't have the concept of a record lock, just uses the OS file lock). I'm planning to move everything to mongoDb / Arctic which is how the production side of pysystemtrade is set up. Using it for the last few years it's just a nicer solution once you get the nosql idea in your head, and a lot quicker. I'm a bit reluctant to rely on third party libraries (even if it's AHL!) but I could easily write a native mongoDb pandas read/write client if I had to in about 5 minutes.

The 'black box' nature of how stuff is stored in mongoDb slightly worries me however - you can do dump a backup file but if you can't recover from that file for whatever reason, you're stuffed. So I'm also planning to write backup files in .csv format so I can always manually recover from a corruption issue.

GAT

nooby_mcnoob · Nov 15, 2019

globalarbtrader said:
Two issues here, the data storage and the cloud.

I've discussed the cloud stuff before, but briefly if machines breaking is your issue then you can stay local by buying another computer. I am also using headless NUC type units, and I have 3 of them that are high enough spec to run my system which cost about $500 each new, though I bought two of them secondhand (there are a few slower machines knocking around the house that I keep meaning to use for various hobby projects and never get round to). I use one for development, the other two are live and backup, and I swap them round regularly. In the last 6 years I've had one machine failure, so I really wouldn't trust one machine entirely.

Similarly I've had issues with backups, so I've gone to town with a RAID NAS drive on which everything backups every night, plus a USB drive that backs that up, plus everything is on at least 2 machines anyway. For belt and braces I should probably set up some offline storage like dropbox, and that's on my to do list. That also stores all our household data, and I can write the cost of all this stuff off against tax which I couldn't do if I was purely a trader.

Of course this doesn't help you if your internet fails, or your power fails... and that I guess is the appeal. $1000 of local hardware does buy you quite a lot of cloud computing time. I think local hardware is still cheaper but time I've done the maths it's got closer and closer depending on how many years you amortise your hardware over.

You're running your own data center! I agree that the costs are converging.

globalarbtrader said:
I'm considering containerising my new system when it eventually goes on line

Highly recommended. Whether I do remote or local, I use terraform + docker to deploy the code and the best part about this is the reproducibility. My terraform config file is the workhorse.

globalarbtrader said:
I think it's acceptable for low frequency trading, but I have had occasional issues with files becoming corrupted (writing when a process fails? it doesn't have the concept of a record lock, just uses the OS file lock).

Not sure what problems you've had here but make sure that you use PRAGMA journal_mode=WAL. Additionally, if you are committing to the database on a "high frequency" basis - I batch ticks every 5 seconds which seems to work OK for my purposes - then you should have the high frequency table on a separate database that you attach as it could block other tables. I haven't had any problems with corruption yet, though it has not been long enough for that problem to manifest. I regularly kill processes without any nice exit though so I would have expected to see some issue by now.

I looked into Arctic/Arrow/etc but in my testing, it didn't seem to be any better than a mildly tuned SQLite database. Obviously, we couldn't have so much global investment into such technology if I was right, so my testing must have been wrong. One of the things I want to do is start dumping realtime data into one of these fancy pants things and see if it does better without me tuning it since the tuning is what make SQLite fast. The only issue I have with using SQLite is SQLite -> Pandas is super duper slow since apparently no one really cares about it but even when I exported the tick data to parquet files, it was slow af to load in Pandas.