Even whether using Linux or windows is irrelevant in this case. And cpu cache is not a make or break either. What dholliday suggested is that it's the data structure you store data in that makes a big difference and I can only agree with that. In the end you want to use the the data you stream. How you store and access that data is key.
Same design philosophy.
Using proprietary platforms would have created many limitations with awful workarounds.
Understood it early enough and never coded anything other than bridges leading out from proprietary apps.
Then your ideas were optimal from beginning or you know how to work around problems.
I build using Qt5/C++ and maintain as cross platform app for linux/windows/mac
Lately using mostly linux due to better compiler and profiling support.
All performance critical parts utilize parrallel+simd.
Learning/implementing new optimization tricks from time to time.
My ideas are quite inefficient ML related, to find non random features and separate chaos.
Memory access patterns are not cache friendly and optimizations wont help much unless core ideas and access patterns change.
If loading less data than memory then all backtesting data in memory.
If more data then utilizing M.2-SSD based cache system that autooptimizes from past access patterns.
https://www.elitetrader.com/et/threads/tick-data-storage.346878/
I discuss some ideas related to that in that thread, alot is still open and undone.
Current plan is to use motherboard with 2-3 M.2 drives and create software based raid 0, in hopes of getting closer to ram speed with sequential read.
Maybe you can recommend alternative ideas.
Lately i thought to also test if can get decent performance when just setting up large linux-swap drive using fastest M.2 drives.
But then no control how it gets buffered ,as OS decides.
To my knowledge linux also can utilize lz4 compression for swap drive but with 5000mb/s speed CPU is bottleneck and much simpler algo would be needed.
Some modern cpus have 60mb+ caches now, probably can run 50mb very fast as well.
Do you mean to say it creates better memory access patterns and performance if data is in separate containers instead of single container with structs containing all?
How long have you been working on your platform? I spent ~6 years and still alot could be optimized better and improved. Lately getting time off due to long test periods.