I'm more interested in hearing stories about how you did it one way, then realized what a f* up it was before deciding on reorganizing another way.
Try to read and write to 50,000 symbols as individual files on NTFS. The process can take several minutes.
Then try a memory mapped file, which should take less than a second.
This was discussed here before:
http://www.elitetrader.com/vb/showthread.php?threadid=81345&perpage=6&pagenumber=3
HDF5 or not, if you are constantly reading and writing many symbols, the best option is one large memory mapped file (can be several terabytes). Otherwise your file system becomes a bottleneck. You will only need to maintain the linear database once every so often by "growing" it.
A single memory mapped file (or "linear database") is also much faster than a relational database such as MSSQL, MySQL, etc.
More on that:
"Having an RDBMS doesn't mean instant decision-support nirvana. As enabling as RDBMSs have been for users, they were never intended to provide powerful functions for data synthesis, analysis, and consolidation (functions collectively known as multidimensional data analysis)." - Ted Codd, inventor of the relational database model, 1993.
A look at traditional data storage
SQL databases consist of a set of row/column-based "tables", indexed by a "data dictionary". A table is a âcontainerâ that stores data. In reality, a table looks a lot like a spreadsheet as it is composed of rows (records) and each row is composed of columns (fields). A collection of related tables are known as a database.
Using the very flexible SQL (structured query language), you can retrieve data from any table, or groups of related tables, and have that data presented to you as a âviewâ.
This basic functionality, and the flexibility to store and relate almost anything, is what makes the RDMS model so powerful and so widely used for nearly every serious business application.
Unfortunately, this âone size fits allâ approach to data storage and retrieval is why the RDBMS model fails for financial applications.
The RDBMS model produces substantial overhead due to its inherent multiple row and table record structures. When you heap indices, clusters, and procedures on top, you create even more overhead which slows down performance considerably.
Since all RDBMS records are equally âimportantâ to the database, they are not optimized for speed.
Also, since an RDBMS has no inherent data compression methods, they are usually combined with exception reporting and averaging techniques, which may result in data loss and inaccurately reproduced data.
RDBMS are too slow
The speed of writing to an RDBMS is quite slow (from the prospective of the computer). Major RDBMS vendors often claim benchmarks that include very high transactions per second (TPS). What they donât say is that the TPS speed refers to actions performed on the data after it is already in the database, and not to the speed at which it is written to the database or the data retrieval speed. What goes on inside of the database is of little interest to the end user. The data acquisition speed, and the actual time that it takes to put a set of results onto the screen, is where money is made and lost.
An additional SQL drawback, from the prospective of any financial data application, is that statistics are not automatically calculated by the RDBMS because SQL mathematics is limited to sums, minimums, maximums, and averages.
Note
C# supports memory mapped files in .net 4.0. Let me know if you need any pointers (no pun intended).