Quote from erdewit:
I've played with HDF5 through it's Python
bindings (called PyTables) and found it to be
unsuitable for a tick database.
First of all, it is slower then just working with
regular files.
Second, the HDF5 database is one huge file that
is easily corruptible. For example, if you accidently
let two processes have write access to
the database then it will become corrupted.
Repairing didn't always work for me so then all
data would be lost.
Third, HDF5 is a hierarchical database where
objects are retrieved via a path-like key. This is
the same as with a regular filesystem and offers
no advantage whatsoever over a just a plain
regular filesystem.
The only usecase for HDF5 is when working with
datasets that are too large to fit into memory.
Tick data does not fall into this catagory.
What I am using is just simple files. One tick file
per instrument per day. It's easy to see what's
going on, it's easy to compress files and make
incremental backups. Reading speed is 1 M ticks/s
for text files and 10-20 M ticks/s for binary files.