Quote from Vorpal:
I would agree with a security table and a market data table for proper design.
"security" table
market_id autonumber int [primary key]
symbol varchar(5)
name varchar(50)
(IE: 1, 'AA', 'Alcoa Inc')
(possibly add fields for CUSIP or "delisted", because stocks can share the same symbol over time or you may only want to query active stocks).
"market_data" table
market_id int [primary key]
timestamp datetime [primary key]
openprice decimal(7,2)
highprice decimal(7,2)
lowprice decimal(7,2)
closeprice decimal(7,2)
volume int
(IE: 1, '11/14/2007 03:01:00 PM', 38, 39, 37, 37.5, 1000)
I chose decimal(7,2) for the price data so you could allow 5 digits to the left of the decimal place. You can't store large values like BRK/A in there, but you keep your field size to 5 bytes instead of 8 bytes. This can add up...
Performance and disk space will be your biggest issue with that much data. Somebody check my math, but based on the above design, you are looking at generating 36 bytes per record. That doesn't sound like much, but 36bytes x 900k rows per day = 31mb per day (1024 bytes per KB and 1024 KB per MB). And that doesn't count the size of the index on the primary key (which you need if you want to be able to query the data with any speed). It won't take long for this database to get huge if you plan on archiving the data indefinitely. The free versions of MS SQL only permit a max database size of a few GB, so you may run through that pretty fast unless you want to go with a commercial version that doesn't have database size restriction. Obviously if you settle for hourly, etc. bars, you will use far less space.
You >could< set up one database per symbol to get around size/performance, but this would be manual labor-intensive to manage. Commercial versions of SQL do this clustering for you in the background. There are some other databases out there that emphasize performance that you can consider... or free ones like mysql. But MS SQL is nice, and I use it for my stuff (daily bars only, so my DB is not that big). But MS SQL is a database is primarily built for transaction processing and not raw speed.