Quote from gtor514:
As a starting point, you could store the time into a single long variable, which could be the Unix times (seconds past the 1970 epoch).
That's the most efficient solution!
Alternatively, you can use OLE date-time format where the date-time is a floating point number and increase by 1 means 24 hours (or exactly 1 day) later. The best part with OLE format is the dates and time in a CSV file can be read, understood and saved by Excel.
Regarding the best format, as others suggested a vector of OHLC structures is better than 4 vectors.
I don't assume you will be adding new dates - instead you'll just read the input file from the start. So, I would avoid list containers like a plague.
A few ideas:
1) If you want to use strings for dates and times anyway, use the correct format: year, month, day, hour, minute with leading zeros. Something like yyyy-MM-dd hh:mm or yy/MM/dd hh:mm. Then strings will have the same order as corresponding dates do
2) Consider adding to OHLC structure a boolean value with the meaning "price exists". Then a bar will be present for every minute and there will be no need for searching the right time. Instead you will know that a bar 60 minutes later is exactly 60 positions later in the vector/llist/array.
Note that if you implement typical idicators like moving avearge in a
naive way, this storage method comes with a performance penalty. Say, for a 30-period moving average you need 30
valid prices. So, you will have to iterate back through any gaps in price series till you find 3 bars where the price is not missing.
3) To handle bank holidays and week-ends better, consider storing a list of days when the security is traded with intraday data arranged as arrays. This way each day's data can start at market open and not midnight... and it still will be very easy to find the right time every day.
4) If you want to find the right date in an ordered array of days the
binary search method is much faster than iterating through all days.
If you want speed at the cost of some flexibility, consider implementing backtesting as matrix operations. This way you cna use one of the popular linear algebra implementations such as
ATLAS,
MKL or
GotoBLAS. For a case study of how linear algebra subroutines can be use dto massively speed up backtesting, see Amibroker and its Amibroker Formula Language (AFL).