HDF5 Layout for Multiple Stocks

Quote from sma202:

This thread is going off-topic. The question was how to best structure hdf5 for stock time-series. Whether you use a custom built file or commercial db is irrelevant, that depends on your own mix of strategy and timeframe.

Frankly, i'd like to hear how other people are using hdf5.

The website you referenced talks about amazon and map reduce
 
I've been wanting to take some time to study possible uses of map reduce, but have never gotten around to it. Sounds like a powerful setup.
 
Quote from vikana:

I've been wanting to take some time to study possible uses of map reduce, but have never gotten around to it. Sounds like a powerful setup.

Map-Reduce at first I thought was a really difficult topic, but then as I dug into it what I liked is that it was a clean way to interact with data to set it up in different ways.

You have your data in format X. But when data mining you want your format Y, which have if's and the likes in it. Normally you would write code to create that new format. With map reduce it is done for you by the server.

The end result is that you have multiple questions answered for you, and data in a format that allows you to easily process it. I use map reduce to scan equities into a score on a scale of 0 to 6.

In most cases map-reduce is something that is included in noSQL, but there is no reason why you can't write it yourself. Especially when you have access to the source code like in HDF5.
 
Quote from sma202:

This thread is going off-topic. The question was how to best structure hdf5 for stock time-series. Whether you use a custom built file or commercial db is irrelevant, that depends on your own mix of strategy and timeframe.

Frankly, i'd like to hear how other people are using hdf5.

Totally agree...Why do people feel the need to argue ever stupid point to death on here at the total expense of the thread itself.
If you prefer SQL great, go make a new thread.

I would love to hear more about what is in the topic as to me the biggest problem with HDF5 is the lack of educational material to the non specialist.
 
first, great sub forum ModulusFE..It has been years since I've even wanted to check this forum on a regular basis. It would be nice if people interested in this sub forum self regulate as far as trying not to get too far off topic. If you like other things that is great but there is just not enough info on HDF5 for trading to clog up a good thread with debate, even if good debate.

I've wanted to learn HDF5 for years but always run into a wall and get uninspired.

Maybe we can try "One directory per day with each column broken out into a separate file" on a simple structure using yahoo data.
 
Quote from darthtrader3.9:

I've wanted to learn HDF5 for years but always run into a wall and get uninspired.

Maybe we can try "One directory per day with each column broken out into a separate file" on a simple structure using yahoo data.

This is a good idea as a test-case. If I manage to get some free time after the close, I'll give this a shot before doing something more complex and maybe write-back to the thread.
 
i use hdf5 for futures. one file per symbol per day capturing all messages (N levels deep). for single stocks you would need a lot of space. As far as layout its the book.

You can look into
1) ssd
2) a file system like ceph
3) turn off all the OS stuff that updates when you write to disk like last access time or the like
 
Back
Top