I'm looking for some advice on what the best practices are for storing options and futures data. How do you folks store these instruments? I've been using data that I (quite literally with a USB flash drive) fetch from school, but that's not optimal, so I've decided to build a database. Here's my idea for the design:
1. Use the IQFeed symbol lookup from their C++ API to get all options for a specific security/contracts on a specific future.
....e.g. search for "index/equity options" on "SPY"
2. From the list that's returned, I check to see if each a table for each result on the list exists in my database. If not, create the table.
3. Loop over the list and store the data in the right table.
Concerns:
1. Relying on IQFeed's symbol lookup to catch every option or future. There's obviously some algorithm that determines when new futures or options are created and what their strikes/expirations/deliveries are. Where can I find these algorithms so that I can code them up and know what is created when instead of relying on the IQFeed symbol lookup?
2. Is it best to create a table for each option/future? Throw them all together in one table? Throw all of it in one big table? I imagine what I've designed will probably make the queries fastest of those three options, but I'm here to learn so tell me if I'm wrong.
3. The size of my database. This thing is obviously going to become HUGE. I figure I can't really know how huge until I try or someone tells me. I've been reading about different compression options from my database of choice and believe I'm using the right tools here, but maybe I will still need to trim back the amount of data I'm looking at. To be clear, I'm looking to database TRADES (not quotes) on S&P 500 equities and options, a few futures, and a few ETF/P/Ns. I really have no idea how big this is gonna get, so I'm just gonna try it and see. I'm prepared to invest in 4 or 5 TB hard drives every once in a while, but I don't have the means or the expertise to run a legitimate data server (I don't think I do at least, don't even know what the cost would be...if anyone has any estimates of what it will cost to scale this approach I'm all ears).
4. Backups. Simply put, I want to make sure all this data I'm working on capturing is safe. I'm prepared to pay a good amount for backup space, but again I worry it could become unmanageable as I scale.
5. Choice of databases. Currently kdb+ 32bit edition, I love it. I've heard anecdotally that you can address more RAM if you use multiple kdb+ processes? If anyone knows anything about that I'd love to hear. I've used other (SQL and noSQL) databases in the past but this is the fastest I've played with. I know the general consensus on EliteTrader is to store binary files, but I don't really know how to go about querying a bunch of flat files in the most efficient way, so either I learn to do that or stick with kdb+ 32 bit. Again, I'm open to advice there.
Thanks for your ideas and feedback.
1. Use the IQFeed symbol lookup from their C++ API to get all options for a specific security/contracts on a specific future.
....e.g. search for "index/equity options" on "SPY"
2. From the list that's returned, I check to see if each a table for each result on the list exists in my database. If not, create the table.
3. Loop over the list and store the data in the right table.
Concerns:
1. Relying on IQFeed's symbol lookup to catch every option or future. There's obviously some algorithm that determines when new futures or options are created and what their strikes/expirations/deliveries are. Where can I find these algorithms so that I can code them up and know what is created when instead of relying on the IQFeed symbol lookup?
2. Is it best to create a table for each option/future? Throw them all together in one table? Throw all of it in one big table? I imagine what I've designed will probably make the queries fastest of those three options, but I'm here to learn so tell me if I'm wrong.
3. The size of my database. This thing is obviously going to become HUGE. I figure I can't really know how huge until I try or someone tells me. I've been reading about different compression options from my database of choice and believe I'm using the right tools here, but maybe I will still need to trim back the amount of data I'm looking at. To be clear, I'm looking to database TRADES (not quotes) on S&P 500 equities and options, a few futures, and a few ETF/P/Ns. I really have no idea how big this is gonna get, so I'm just gonna try it and see. I'm prepared to invest in 4 or 5 TB hard drives every once in a while, but I don't have the means or the expertise to run a legitimate data server (I don't think I do at least, don't even know what the cost would be...if anyone has any estimates of what it will cost to scale this approach I'm all ears).
4. Backups. Simply put, I want to make sure all this data I'm working on capturing is safe. I'm prepared to pay a good amount for backup space, but again I worry it could become unmanageable as I scale.
5. Choice of databases. Currently kdb+ 32bit edition, I love it. I've heard anecdotally that you can address more RAM if you use multiple kdb+ processes? If anyone knows anything about that I'd love to hear. I've used other (SQL and noSQL) databases in the past but this is the fastest I've played with. I know the general consensus on EliteTrader is to store binary files, but I don't really know how to go about querying a bunch of flat files in the most efficient way, so either I learn to do that or stick with kdb+ 32 bit. Again, I'm open to advice there.
Thanks for your ideas and feedback.