Tick Database, Now Want to Run SQL

Ok let me back up SEVERAL STEPS. I mis-used the word data mining.

My project is not the truest sense of the phrase but rather simply running some queries on tick data to find particular events like large spikes, slow drifting, and any other point of curiosity.

Once I have a solid footing with the SQL, I will look to find stock consolidation, channels, etc.

The next phase will be to overlay a news feed to determine what articles moved my one stock vs not.

Another use of my database will be tweaking filter-criteria which is based on price move, volume size, and timeframe. Currently I have a logically constructed setup with live exchange data to find special situations in the market and make plays. But i'm looking to use my database OFFLINE to see how effective my filters are...

So again, to re-emphasize, not data mining but rather me running many queries for various goals (stated above).
 
Big problem already.

> Once I have a solid footing with the SQL, I will look to find
> stock consolidation, channels, etc

Forget doing stuff like that in SQL. SQL is set based - the moment you have to run your stuff along in a loop, it is by far NOT the optimal solution.

I would go with a CPE / custom programming model here. Move your tick data into a simple binary coded file (like 8 bytes per event) then you can run it along very efficiently. Use a CPE model - with a PROPER programming langauge. C# or something easy and fast to write.

Becuase seriously, any type of time series analysis in SQL is an exercise in masochism. SQL simply is not done for that - it is set based.
 
Take a look at implementing sqlite inmemory with a disk based historic consolidated archive. You can very quickly load historic record sets for near real time analysis.

thecommon.net has some examples worth taking a look. ie. metatrader/northwind db loaded in memory feeding excel etc.
 
Quote from bscully27:

Hello,

I just created a simple equity tick-database in MS Access and now would like to run some queries and mine the information! (note: running sql through 3rd party software, cant stand MS access gui). However, i'm running into a couple basic issues:
1) When looking for a percent move over say 10 seconds, what is the best approach for this SQL?
2) I couldnt find on the web any examples related to point 1, any good ref sites?
3) Should I not pursue SQL queries but rather use software like amibroker or tradelink??

I prefer writing my own SQL for what it's worth..

Thanks!

Ben

Hi Scully,

I'm not into trading (yet) but I am a SQL DBA looking to get into trading. If you want to send me your schema, anticipated data streams and what kind of queries you need to run against the data, I will help you (for free, as this is something I want to set up for myself).

I would just ask in return you help me with a few very basic questions about trading Stateside (I'm a Brit).

Cheers,


Jaybee
 
Quote from NetTecture:

Big problem already.

> Once I have a solid footing with the SQL, I will look to find
> stock consolidation, channels, etc

Forget doing stuff like that in SQL. SQL is set based - the moment you have to run your stuff along in a loop, it is by far NOT the optimal solution.

I would go with a CPE / custom programming model here. Move your tick data into a simple binary coded file (like 8 bytes per event) then you can run it along very efficiently. Use a CPE model - with a PROPER programming langauge. C# or something easy and fast to write.

Becuase seriously, any type of time series analysis in SQL is an exercise in masochism. SQL simply is not done for that - it is set based.

I might add, that to get started, the OP could use ... ahem ... VBA.

Select * from myTicks

Then loop through the recordset, picking out "features".

Got to start somewhere...
 
I think you are blowing things out of proportion here. Help this dude if you have anything to contribute, I am not sure your negativism adds any value. Sorry, but I cannot stand those know-it-all who only say "cannot be done", "you are too naive",... but do not even add a single source of evidence why it cannot be done. Yes it CAN be done, and yes you need to have a willingness to get your hands dirty but it can be done. I programmed my own binary data store and stream pure tick fx data at a rate of almost 5 million quotes a second to my strategy container and OMS, PMS, Risk systems, all coded up myself. So, please do not tell others it cannot be done under 300,000 USD (was that you or someone else?)

Ben, true MS Access does not get you anywhere. Here are couple keywords you want to google and familiarize yourself with before asking more questions:

HDF5, KX+/KDB, BerkeleyDB, Column based Databases, MonetDB, Redis, HBase, InfiniDB, RavenDB, Cassandra, binary file data store, Esper/Nesper.

Some of the above are very pricey solutions with steep learning curve, others are open source and simple to learn to use over couple hours or weekend.

Quote from NetTecture:

Famous last words. Normally followed by a blank stare and the look of a tilted brain when "not an issue" turns into "what do you mean, it takes months to execute that statement?".



No, sorry. Data mining is a high end eandavour. There is a FREE non open sourrce version, though - Oracle ExaData is frequently used for that, comes free with the hardware. Starts around 300.000 USD for the smallest installation.

Seriously, your problem is not only the software side, it is also the hardware. One reason Access is a sad joke is that it can not utilize a 12+ core setup, and that is what you will need.




Given that no CEP / TSDS architecture I know of can do data MINING (which is not "test a script" but data mining as per definition - the computer finding certain elements, which requires special non open software and is FUNDAMENTALLY DIFFERENT FROM EXECUTING A SQL STATEMENT) this is a question you can only answer.

A lot depends what you actually want, and your definition of items seems to be very different from anyone who works for example in data mining definition of something.
 
what a bollocks, several hedge funds who use .Net (F# or C#) very successfully use Linq queries to "data mine", "aggregate", "sort", ... tick based data. In fact sort algorithms in Linq/Plinq in .Net 4.0 are so incredibly efficient you will not be able to speed things up by a whole lot with a pure C++ merge/sort algo.

Same with SQL. Have you ever run SQLite with SQL like syntax? Can you tell me how much faster it is to run a custom query on a hdf5 db over such sql query? Not much according to my own profiling exercise. Point is, everything is relative. Just debunking SQL does not show much practical experience. Same as saying KDB is the best and there is nothing faster out there (yes there is, I bet I can aggregate tick data into time based bins faster using my own self-written accessors to binary data store than KDB could ever do).

Quote from NetTecture:

Big problem already.

> Once I have a solid footing with the SQL, I will look to find
> stock consolidation, channels, etc

Forget doing stuff like that in SQL. SQL is set based - the moment you have to run your stuff along in a loop, it is by far NOT the optimal solution.

I would go with a CPE / custom programming model here. Move your tick data into a simple binary coded file (like 8 bytes per event) then you can run it along very efficiently. Use a CPE model - with a PROPER programming langauge. C# or something easy and fast to write.

Becuase seriously, any type of time series analysis in SQL is an exercise in masochism. SQL simply is not done for that - it is set based.
 
KDB is not the best per se, its the best if you dont care about money nor men-hours. the 32 bit version is NOT free, nothing is free in life. It time-outs after 2hours and deactivates after a certain period of time.

Quote from sle:

Why not use KDB? 32 bit version is free. That's what man grownup firms use :)
 
Quote from sle:

Why not use KDB? 32 bit version is free. That's what man grownup firms use :)

Yep .. KDB / q & VBA = man-grownup.

* I think the free version might boot you out every 4 hours.

I fiddled about talking to KDB from Java at one point, for algo analytics. Not the slowest part of the bank's infrastructure.
 
Back
Top