best way to hack up an rdbms to approximate something like kdb for poor/dumb people

Quote from garchbrooks:

I've got one minute tick data for around 3000 stocks. The total set size for me is 200 million rows.


You don't have tick data. You have 1 minute data of 3000 stocks. Tick data is different than 1 minute data.

Take each of the 3000 symbols and create one database per symbol or table per symbol. Call each database/table something like CSCO1Min, DELL1Min, etc.

Then if you want to create 15 minute databases you can create CSCO15Min, DELL15Min, ,etc.

Divide and conquer the problem space into smaller components istead of dealing with the whole crapload of data.
 
Quote from FutsTrader111:

You don't have tick data. You have 1 minute data of 3000 stocks. Tick data is different than 1 minute data.

Take each of the 3000 symbols and create one database per symbol or table per symbol. Call each database/table something like CSCO1Min, DELL1Min, etc.

Then if you want to create 15 minute databases you can create CSCO15Min, DELL15Min, ,etc.

Divide and conquer the problem space into smaller components istead of dealing with the whole crapload of data.

Will do. Thanks Futs and Darth.

Your help has been invaluable.
 
Quote from garchbrooks:
Code:
Using prices from stock A and prices from Stock B:

For a given date range:
   For a given set of times on each date:
      stocktuple = assemble tuple (price A from stock A, price from stock B)
      tuplelist.append( stocktuple )
Or course. setup
A_m table (mindata of A)
B_m table (mindata of B)
getTuple("A", "B", from, to) <- stored proc

getTuple can run the query and return the resultset. It will scan two tables only as opposed to many hundreds of millions of rows you have.

on the minute data table, I have NO keys. they are MyISAM tables, and I ensure they are ordered. This is subject to experimentation and change
 
Sometimes it isn't about dazzling everyone with hardware bullshit and going hardcore like some people do.

Its about really thinking long and hard about the problem at hand and THEN spending money if you need it on the hardware to speed up your solution.

You may just find that building a quote database and running queries on it shows you very little in your vested time or is good enough. No use making it worse by spending thousands of dollars on hardware just because you were so excited about the project.

Just use Ubuntu 64 and MySql Cluster 7. Ubuntu 64 bit to address large physical memory and Cluster so that you can keep the tables in-core memory and fully use the 64 bit address space reducing IO.

Make sense?
 
Quote from FutsTrader111:

Sometimes it isn't about dazzling everyone with hardware bullshit and going hardcore like some people do.

Its about really thinking long and hard about the problem at hand and THEN spending money if you need it on the hardware to speed up your solution.

You may just find that building a quote database and running queries on it shows you very little in your vested time or is good enough. No use making it worse by spending thousands of dollars on hardware just because you were so excited about the project.

Just use Ubuntu 64 and MySql Cluster 7. Ubuntu 64 bit to address large physical memory and Cluster so that you can keep the tables in-core memory and fully use the 64 bit address space reducing IO.

Make sense?

Absolutely. (Plus, I'm not comfortable spending money on something I don't fully understand. I figure an unnecessary hardware purchase is just as bad as losing money on the market.)
 
A correction. MyISAM tables (with minute data) does have a key on (date, time). Also, I just now tested with InnoDB tables. For typical queries (from datetime to datetime), MyISAM is faster, so I will stick with that for now.
 
Quote from garchbrooks:

Absolutely. (Plus, I'm not comfortable spending money on something I don't fully understand. I figure an unnecessary hardware purchase is just as bad as losing money on the market.)

A suggestion before you get too carried away in this...
You have to know what you are looking for, and most of the time, it won't be there.
 
Quote from DarthSidious:

garch, where are you getting minute data on stocks / what is a good place to get it from? Thanks

Sent PM; but the general answer is that I put a DLL in a retail broker platform with good historical data and funneled the data out.
 
Garchbrooks,

I have been reading your thread with some interest. I used to know everything about databases. I’m a retired Data Base Administrator (DB2 and Oracle). But now I trade full time to forget restoring DBs at 2:00 A.M. every other day. I’m not a SQL Server guru. I’m a mainframe DBA. But I will help if you want it.

I see that you are trying to do the ‘impossible’ of using a PC (they still don’t match the bigger hardware) and run a monster app against big databases. Like you found out you are going to be tied up with I/O trying to access all the data in raw form. Unless you have an absolute need for raw data then do what they do in real apps and condense it into something that is usable.

The first question I always get is why can’t a PC do what the mainframe does. So I will answer it in advance. On a mainframe you would spread your data across multiple disk packs. Next the data would be partitioned across the packs. When you ran your query each pack would grab part of the data. Imagine 20 packs working parallel on the I/O. That is a quick over view of what you can’t do on a P.C.

I’m very familiar with processing this type of data.
 
Back
Top