Up until now I've only worked with historical data for small set of underlying assets, mostly indices. Right now I am storing them in the file-system based hierarchy - a directory for each symbol where there is a sub-directory for option chain and a sub-directory for volatility surfaces. An option chain for each day lives in it's own file and the same for volatility surfaces. This allows me granular access across assets and dates and the necessary degree of flexibility to store volatility surfaces in detail. Later I aggregate volatility surfaces into historical implied volatility files - most filters/models I have deal with actual files for option chains and volatility surfaces as well as the aggregated data.
However, now I am planning to expand to many more underlying assets (from about a hundred to a few thousands) and starting to wonder if there will be any sort of performance penalty for storing so many files and directories on the disk. In essense, there will be 2k directories and each directory would have 2.5k daily files - is there going to be an issue with that? if yes, what are my alternatives (hardware and software)?
However, now I am planning to expand to many more underlying assets (from about a hundred to a few thousands) and starting to wonder if there will be any sort of performance penalty for storing so many files and directories on the disk. In essense, there will be 2k directories and each directory would have 2.5k daily files - is there going to be an issue with that? if yes, what are my alternatives (hardware and software)?
