Im wondering what it takes to gather and process 1000-3000 stocks tick data.
In terms of computing hardware-software-networking and datafeed.
SP500 list for example is mandatory due to its low spread stocks.
Basically maximal ammount of low spread stocks would be welcome until system bottlenecks.
What data rate could i expect to see if collecting tick data of all SP500 stocks and market is very active?
Also good if you can tell your data provider to know format overhead.
Today i did some simulations by using historical tick data as incoming stream to simulate and test how much platform can handle.
It was done over LAN network and i think software is optimized enough to handle 1000+ stocks using regular workstation pc , or the tick data was not consisting of all ticks...
Packet serialization overhead was minimal compared to data feed providers formats that need json parsers etc. , that part was not considered during the test.
But i have API connections and incoming data parsers running on separate computer from other parts of software.
It is to spread workload around and to support hardware firewall with advanced rules, without need to make new rules for each new API that gets tested.
I still suspect that at this stage i dont have the networking-computing-software infrastructure to process this ammount of real incoming tick data with low delays.
Biggest bottleneck atm with 1000+ streams is that all incoming data gets evaluated on same pc and at same intervals.
Causing 100% spikes in cpu usage and delays for orders.
Not impossible to fix if collecting with all data timings shifted and process at different times. But at some point will still need to separate more.
Also im wondering if any data provider sends out ~1sec interval bid and ask bar data as live feed instead of tick data and covers 1000+ stocks?
High quality historical bid-ask data is also very important.
It is good if it goes back at 10+ years, but it could be kept as seperate feed from live.
What provider could be useful in this scenario?
In terms of computing hardware-software-networking and datafeed.
SP500 list for example is mandatory due to its low spread stocks.
Basically maximal ammount of low spread stocks would be welcome until system bottlenecks.
What data rate could i expect to see if collecting tick data of all SP500 stocks and market is very active?
Also good if you can tell your data provider to know format overhead.
Today i did some simulations by using historical tick data as incoming stream to simulate and test how much platform can handle.
It was done over LAN network and i think software is optimized enough to handle 1000+ stocks using regular workstation pc , or the tick data was not consisting of all ticks...
Packet serialization overhead was minimal compared to data feed providers formats that need json parsers etc. , that part was not considered during the test.
But i have API connections and incoming data parsers running on separate computer from other parts of software.
It is to spread workload around and to support hardware firewall with advanced rules, without need to make new rules for each new API that gets tested.
I still suspect that at this stage i dont have the networking-computing-software infrastructure to process this ammount of real incoming tick data with low delays.
Biggest bottleneck atm with 1000+ streams is that all incoming data gets evaluated on same pc and at same intervals.
Causing 100% spikes in cpu usage and delays for orders.
Not impossible to fix if collecting with all data timings shifted and process at different times. But at some point will still need to separate more.
Also im wondering if any data provider sends out ~1sec interval bid and ask bar data as live feed instead of tick data and covers 1000+ stocks?
High quality historical bid-ask data is also very important.
It is good if it goes back at 10+ years, but it could be kept as seperate feed from live.
What provider could be useful in this scenario?
Last edited: