I am collecting historical data, market data and order book from Interactive Broker (IB), currently just focus on E mini S&P 500. Planning to collect all shares and commodities, also will probably get another data feed, potentially DTN.IQfeed.
My current implementation stores all data in PostgreSQL. I am planning to load them to a Java class. For example to hold all 1 minutes bar of historical data of all ES contracts in the recent 2 years. They are about 3.5 million rows, which have date_time, open, high, low, close and volume.
My current thinking, to have a class, e.g. EsHist which has
- array of a class called Row (Row will have public variables to store date_time, open, etc.., public variables will allow faster access to them)
- hashtable of a class called Row, which will refer to the same object as the array above as the value of the hashtable and the key is date_time.
- a rolling window of 4 million rows, I will define 5 million array element to to store the 4 million rolling windows, when the market is active, I will let the array grow pass 4 million, however when the market close, I will drop the oldest rows from the array, shift the array to fill the dropped elements, so the beginning of the rolling window is shifted to array[0]
The advantages of using this structure
- I can iterate sequentially bar by bar using the array index
- I can find location of a date_time quickly using hashtable which then refer to the specific row, then I can get index of the row which allow me to iterate sequentially.
- The historical data will still in PostgreSQL, however the most recent data, which the 4 million rows rolling windows are always in memory, reshifted when the market is closed. I will use the in memory data for the feed of the algorithmic trading module.
I am running this Java code on a box with i7, 6 cores with 64GB ram/linux Ubuntu.
Any thoughts or suggestion on how to improve this data structure?
Thank you.
My current implementation stores all data in PostgreSQL. I am planning to load them to a Java class. For example to hold all 1 minutes bar of historical data of all ES contracts in the recent 2 years. They are about 3.5 million rows, which have date_time, open, high, low, close and volume.
My current thinking, to have a class, e.g. EsHist which has
- array of a class called Row (Row will have public variables to store date_time, open, etc.., public variables will allow faster access to them)
- hashtable of a class called Row, which will refer to the same object as the array above as the value of the hashtable and the key is date_time.
- a rolling window of 4 million rows, I will define 5 million array element to to store the 4 million rolling windows, when the market is active, I will let the array grow pass 4 million, however when the market close, I will drop the oldest rows from the array, shift the array to fill the dropped elements, so the beginning of the rolling window is shifted to array[0]
The advantages of using this structure
- I can iterate sequentially bar by bar using the array index
- I can find location of a date_time quickly using hashtable which then refer to the specific row, then I can get index of the row which allow me to iterate sequentially.
- The historical data will still in PostgreSQL, however the most recent data, which the 4 million rows rolling windows are always in memory, reshifted when the market is closed. I will use the in memory data for the feed of the algorithmic trading module.
I am running this Java code on a box with i7, 6 cores with 64GB ram/linux Ubuntu.
Any thoughts or suggestion on how to improve this data structure?
Thank you.
Last edited: