Live data feed options and processing

DiceAreCast · Mar 29, 2021

Even whether using Linux or windows is irrelevant in this case. And cpu cache is not a make or break either. What dholliday suggested is that it's the data structure you store data in that makes a big difference and I can only agree with that. In the end you want to use the the data you stream. How you store and access that data is key.

931 said:
Same design philosophy.
Using proprietary platforms would have created many limitations with awful workarounds.
Understood it early enough and never coded anything other than bridges leading out from proprietary apps.

Then your ideas were optimal from beginning or you know how to work around problems.

I build using Qt5/C++ and maintain as cross platform app for linux/windows/mac
Lately using mostly linux due to better compiler and profiling support.

All performance critical parts utilize parrallel+simd.
Learning/implementing new optimization tricks from time to time.

My ideas are quite inefficient ML related, to find non random features and separate chaos.
Memory access patterns are not cache friendly and optimizations wont help much unless core ideas and access patterns change.

If loading less data than memory then all backtesting data in memory.
If more data then utilizing M.2-SSD based cache system that autooptimizes from past access patterns.

https://www.elitetrader.com/et/threads/tick-data-storage.346878/
I discuss some ideas related to that in that thread, alot is still open and undone.
Current plan is to use motherboard with 2-3 M.2 drives and create software based raid 0, in hopes of getting closer to ram speed with sequential read.
Maybe you can recommend alternative ideas.

Lately i thought to also test if can get decent performance when just setting up large linux-swap drive using fastest M.2 drives.
But then no control how it gets buffered ,as OS decides.
To my knowledge linux also can utilize lz4 compression for swap drive but with 5000mb/s speed CPU is bottleneck and much simpler algo would be needed.

Some modern cpus have 60mb+ caches now, probably can run 50mb very fast as well.

Do you mean to say it creates better memory access patterns and performance if data is in separate containers instead of single container with structs containing all?

How long have you been working on your platform? I spent ~6 years and still alot could be optimized better and improved. Lately getting time off due to long test periods.

931 · Apr 5, 2021

DiceAreCast said:
It's not relevant in this context whether you implement in C++ or C# or Java. They all perform as @dholliday suggested. I have similar profiling results. I stream around 500+ symbols into my engine on a tick basis.

By efficient algos not meaning only data stream processing but evaluation also.

DiceAreCast said:
Even whether using Linux or windows is irrelevant in this case. And cpu cache is not a make or break either.

My software can use lots of memory depending on configuration and has non linear memory access patterns.
Compared windows "pagingfile" or linux "swap" and there is enormous difference in performance, especialy if lots of jumps to random memory locations with small blocks.

Cpu cache misses are where performane is lost.
For example in my case when interchanging few for loops that control memory access patterns ,there is 25%+ speed difference due to less cache misses.
It means hours saved when creating or evaluating models.

Also gained by using all cores on same instrument instead of dif instruments for each. Quite sure its cache misses related.

Imagine 64 people going back and forth in narrow tunnel. That is cpu cores getting info from ram.

DiceAreCast said:
What dholliday suggested is that it's the data structure you store data in that makes a big difference and I can only agree with that. In the end you want to use the the data you stream. How you store and access that data is key.

If you mean dif in speed then that difference is due to less cache misses IMO.

dholliday · Apr 5, 2021

931 said:
Packet serialization overhead was minimal compared to data feed providers formats that need json parsers etc. , that part was not considered during the test.

I like the CSV (comma-separated value) format that IQFeed provides. When downloading historical data I write a line of data to a file exactly as I received it. Since each line starts with a timestamp I can easily check the data when I have questions as to why my analysis software did something unexpected.
Though JSON may be appropriate for sending data to a web app it is definitely not the best format to receive and write to an easily readable file quickly (IMO).
I don't know how tokenizing a CSV string compares to a JSON parser in speed, and though JSON results in many more bits over the wire, it's still minimal so if JSON is what you get you can save it as is or convert it to CSV before writing to the file.

931 said:
Biggest bottleneck atm with 1000+ streams is that all incoming data gets evaluated on same pc and at same intervals.
Causing 100% spikes in cpu usage and delays for orders.
Not impossible to fix if collecting with all data timings shifted and process at different times. But at some point will still need to separate more.

You may want to handle each tick and bid/ask change as they arrive.

931 said:
Also im wondering if any data provider sends out ~1sec interval bid and ask bar data as live feed instead of tick data and covers 1000+ stocks?

DTNs real-time tick feed includes bid/ask changes.
Their historical tick data only includes the bid/ask with a trade. Not between trades. It would be nice if they included the complete tape as it had happened. I believe NxCore does this. Maybe check with polygon.io?

931 · Apr 6, 2021

Most of the problems got solved awhile back.
Cant make everything perfect ,as time is limited resource.

Still cant process each tick due to limited resources, but now is have more computers and no cpu spike+ wait problem as all instruments get constantly processed 1by1 without stop, unless no incoming data.

Also due to retail spread levels and time it takes to overcome spread , i see no point in tick processing.
I have no special market privileges ,just retail spreads...
Takes awhile to overcome those.

If data gets processed once per 10-30 sec im fine.
Id rather add more instruments to seek for oppertunity than evaluating fewer at higher frequency.

So far i have used polygon.io for stocks and its decent, altough the forex price data is not great due to dif from my broker spreads, so i made MT4,5 script to collect forex data directly.