Quote from CloroxCowboy:
A flat file would be stored on a disk also, wouldn't it?
Yes but you can store as one column in a separate file. That way if you want a sliding window on C(Closing price) you load the file with just C(Closing Price) not all columns stored in this file.
It compresses better because you have similar values and thus load faster. It requires less memory and it can sit in memory for subsequent access.
This is the basis of column oriented databases.
If you have 1,000 columns as a result of extensive research your SQL queries would be painfully slow. To access just 1 column you read all 1,000 columns in each row. You can limit the range of rows read to make things faster but you cannot limit the number of columns you read with each row. You read all of them. It is how it is stored on a disk.
COTS so called online transactional databases (OLDB) were created when for example you have have customers and you store all data for each customer in one row. The row ID is your customer number and in each row you have Address, Phone, etc.
The access pattern is row by row one at a time. This organization is OK if you collect customer data and want to retrieve customer data by ID number. With the ID number you get all data for each customer very fast.
But if you want to do analysis for example how many customers buy something, this organization is not very efficient. This is why there are Online Analytical Databases (OLAP). But If you try them too you can see they are not exactly appropriate for analyzing trading data e.g. time series. They more appropriate for data warehouses, inventories analysis, etc. Also they are very expensive too.
We evaluated, tested and benchmarked most COTS solutions available so far and created an internal report of performance benchmarks and cost. Also analysed most of database patents.
The bottom line is they are not appropriate for time series analysis, large data sets, they are very expensive, do not utilize fully the processor resources, there are a limits on the number of CPU used, some of them charge for additional CPU.
No one ever tries to use the SIMD processing capabilities available already for 10 years and which is one way to do parallel processing via vector operations. The truth is SIMD programming is very hard.
Also in MS SQL you have a limitations to the number of columns you can create in a table. It is 500 columns max. The other solution is to create 1 Table with 1 Column, but SQL is not good at this too - the maitenance would be a nightmare.
The best way to get the maximum performance is do it as column oriented database optimized for vectorized SIMD and NVIDIA CUDA GPU parallel computing. That is what we are doing.