Time series DB?

@sle kdb+ is an excelent choice for a time-series db implementation.

However, the real advantage is in the q (and k) language framework itself. To truly get the best out of it you will need to master the language and design your system in such a way that most of the heavy (pre/post)processing of data is done within a set of dedicated q servers. Only in this way you should be able to fully utilise the memory and speed optimisation capabilities of the kdb+ framework.

Any other front-end clients should just use the data results for display only, for example.

Can python hook onto kdb+ and bypass the q language? Are there any good resources out there for tutorials on kdb?
 
Sure, a standard q server supports HTTP requests via a tcp port, but you could use some other programmatic interfaces too. There are also a few libraries that provide interfaces with R, ODBC, etc.

However, you would still need to know q quite well, and not just at a simple query level, to be able to create and maintain all the kdb data repositories.

But, after all this, if you were just trying to query/load say a whole year of raw tick data of a frequently traded instrument like the SPY or QQQ or any other option symbol into say a java or a python client to do all the data processing at that end, it would be all a waste of effort if you asked me, providing of course you did not run out of memory first.

J
 
Can python hook onto kdb+ and bypass the q language? Are there any good resources out there for tutorials on kdb?

Why would you want to? Do you really think you can do in pandas what is done on the server - like shift millions of rows of bid/ask by delta_t and join on itself? To do serious work you want to take advantage of how data is partitioned on the server and avoid transferring any intermediate data structure to the client.

Common example: is it faster to first select by date then by symbol or the other way around?
 
Why would you want to? Do you really think you can do in pandas what is done on the server - like shift millions of rows of bid/ask by delta_t and join on itself? To do serious work you want to take advantage of how data is partitioned on the server and avoid transferring any intermediate data structure to the client.

Common example: is it faster to first select by date then by symbol or the other way around?

Depends on structure of database, if I were to design a fin db, I will optimize both by building indexes etc
 
So far I have downloaded the 32bit version of kdb+ and was going to start playing with it in my spare time. Nothing I do at the moment really requires it, so the whole migration is going to be a slow.
 
https://github.com/alpacahq/marketstore
MarketStore is a good open source timeseries database that work well with Python Pandas/numpy ecoysystem.

Any pointers?
- mostly for futures and stock tick data, but I might move my options there too
- Linux is the desired platform, but I can live with Windows if that's required
- Easy integration with Python and C++
- Free would be nice (especially if it's something that has commercial support level)
- Is there a reason to go with No-SQL vs SQL etc?
- I have never used anything besides kdb, but I hear there are better options now.
 
Back
Top