....I currently let the data be dumped into a text file per exchange per hour. Need to think about storage soonish.
My idea is to go with Blobs. Storing market updates (ticks) in a table is wastefull like hell. Timestamp, order in Timestamp, all fields very long - ridicuous wastage.
My idea is:
* One table.
* Ine entry PER INSTRUMENT PER TIMESPAN.
The TImespan has to be determined. I would possibly not go further than an hour, may go as low as 15 minutes (which is good to keep the ES off times in one complete entry) - allows one to store hourly/ 15 minutes totals in the table and take longer durations from there.
Tick data can be stored extremely efficient if one uses a simple "store as ticks, not floats" and "store delta only" approach. I can get a trade down to about two bytesif all is right (same timestamp, same volume and price like last - that is not THAT rare), and 3 bytes if the trade is "fast after the last". 4 bytes if the volume is al ittle different. Most bid/ask run into 4 bytes. Extraction is extremely fast also with this approach.... only problem is if one Asks for "the last 3 minutes" (needs to read blob to discard data), but seriously, how often does that happen? Mostly charts are created X hours / days back, and there the approach is good.
With this approach (15 minutes) there is also a good amount of data in every blob. This reduces the reading overhead significantly - as well as disc storage (SQL Server stores blobs in 8kb pages - wastefull if the data is just some hundred bytes). During the nights etc. one may even safely go up to about an hour or more with all full data.
Oh, note - I dont store ticks, I store all (bid, ask, best bid and ask). The text files I currently use are about 25mb/hour for just 4 instruments
I considered the appraoch to store ticks in a table, but it is wastefull. I get data in microseconds (Millionth of a second) from Zen-Fire. I get a lot of data per microsecond often, so I need another field for the order in the timestamp. Then SQL Server has no efficient way to store delta prices, which means a lot of wasted space, wasted network bandwidth when retrieving - all eating up time like mad.