For what it's worth, I have both a US stocks database and an EOD Option database.
I have an i5-2500 with 32 GB RAM and a 2 TB data storage.
I run FreeBSD and use Postgres for the database.
I get the data from Yahoo Finance.
There are 3 issues with this thread - (1) designing the database schema, (2) getting the data into the database, and (3) getting the data you want out of the database.
Item #1 is the most important because it directly affects #2 and #3. Number 3 is the 2nd most important item. What I've learned is that if you want optimal query performance, you need to identify and create indices on your frequently accessed data. Due to my liberal creation of indices (single and multi-column), I can get the entire price history for a stock symbol in milliseconds using Perl's DBI.
I perform all the calculations I desire in my queries ahead of time, and store them in the database. This way, the data I want is instantly available. I don't have to repeat the same calculations over and over for each study script I create.
I set up my databases in a relational manner because I don't want to have 1 gigantic table with 50-100 columns.
I also use Postgres because it works, I like it, I understand it, I know how to optimize the 3 items I listed above, and I know how to use Perl to get the data in and out in (mostly) optimal fashion.
My schema is as follows:
stocks_sectors: pk, sector, yahoo_url
stocks_industries: pk, industry, sector (fkey), yahoo_url
stocks_etfs: pk, symbol, company, type (stock or etf), exchange, yahoo_url, country, sector(fkey), industry(fkey), active, trades_options, etf_fund_family, etf_category
stocks_prices: pk, symbol (fkey), date, open, high, low, close, volume, factor_higher (for splits - multiply), factor_lower(for splits - divide)
stocks_facts: pk, symbol(fkey), shares_outstanding, yahoo_url, first_day_traded, days_traded, all_time_high, all_time_high_date, all_time_low, all_time_low_date
stocks_splitsdividends: pk, symbol(fkey) date, type (split or dividend), split, dividend, source (yahoo or stocksplithistory.com)
stocks_earningsdates: pk, symbol(fkey) date, source
stocks_volumes: pk, symbol(fkey), date, one_month, two_months, three_months, six_months, nine_months, one_year, two_year - this table stores the avg volume for the period for each date.
stocks_stddev: pk, symbol(fkey), date, std5, std10, std15, std20, std30, std40, std50, std60, std80, std90, std100, std120, std126, std150, std180 std200, std252, std365, std504, std730, std755 -- this table stores the std deviation of closing prices over 5 days, 10 days, etc.
stocks_hv: pk, symbol(fkey), date, hv5-hv755 -- this table stores the historical volatility of closing prices over 5 days, 10 days, etc.
stocks_pricespikes: pk, symbol(fkey), date, price_spike (based upon std20 in prices_stddev table)
Regarding the options database, due to the volume of data, I've simplified and created tables by expiration year. I haven't really split it up like it should.
But for the record, here are the row counts for the options database (from historicaloptionsdata.com):
2002 - 15.9 million
2003 - 25.9 million
2004 - 31 million
2005 - 36.9 million
2006 - 43.4 million
2007 - 49.3 million
2008 - 65.7 million
2009 - 77.3 million
2010 - 86.3 million
2011 - 101 million
2012 - 118.7 million
2013 - 130.8 million
2014 (until 8/31) - 137.4 million
Once again, liberal use of single and multi-column indices is the only way to pull the data out in milliseconds instead of minutes....
Hopefully this is a useful idea from someone who's doing instead of talking...
The key is finding a solution you like, you easily understand, you can extend with a little bit of coding, and most importantly, gives you the answers you seek as quickly and conveniently as possible with minimal maintenance and other issues.