New trading platform collaboration C# .NET, anyone?

Zzzz1 · Jan 22, 2017

many hedge funds and banks use kdb. The 32 bit version is free of charge but only runs on 1 core as far as I remember. I find the system way too over-engineered and overkill for my simpler needs and the "q" learning curve is steep . Hence the next best thing imho is a binary data store that can search for entries directly on open file streams without having to read the entire file into memory.

cjdsellers said:
That's sounds great fan27, I'm not familiar with the Go language but I see it popping up. InfluxDB is written in Go it seems. Flat file CSV is how I've previously been storing my data... it works... but I'm not yet convinced it's the all round optimal solution. I'm sure the average hedge fund out there isn't directly working with CSV, maybe just when it's been download from a vendor if they aren't capturing their own.

Here are a few links I've come across tonight

DB-Engines Ranking of Time Series DBMS
http://db-engines.com/en/ranking/time+series+dbms

Management of Time Series Data (Thesis)
http://www.canberra.edu.au/research...7-7446-fcf2-6115-b94fbd7599c6/1/full_text.pdf

Developing Time-Oriented Database Applications in SQL (.PDF book)
http://sql-info.de/sql-notes/developing-time-oriented-database-applications-in-sql.html

InvBox · Jan 22, 2017

cjdsellers said:
I'm sure the average hedge fund out there isn't directly working with CSV

DB has a great advantages is search mode. Uses clustered indexes you will get the great select performance instead of whole CSV file scan. Also MySQL has in memory cache mode that cause zero I/O operations. Sure SSD cannot process that even close.

InvBox · Jan 22, 2017

InvBox said:
DB has a great advantages is search mode. Uses clustered indexes you will get the great select performance instead of whole CSV file scan. Also MySQL has in memory cache mode that cause zero I/O operations. Sure SSD cannot process that even close.

And do not forget - most of DB are scalable over the network (like a cloud services). It mean you can easy parallel your huge queries on ten and hundred physical servers.

Zzzz1 · Jan 22, 2017

even a full in-memory solution in MySQL is way too slow for large time series. Try it out for yourself and select tick data that fall in between a time window and are between 2 different price levels from out of millions of data points in each tick time series and you see how slow it really is. You are right about csv files that they need to be fully loaded into memory but that is not the case with binary files. You can open a stream and as long as each data point is prepended by an index (time stamp or what have you) even a simple binary search algorithm is gonna beat any time series data base currently in existence. Test it for yourself and you shall see.

Re your scaling suggestion I do not see how this applies to any standard time series database unless you are Bloomberg, Reuters or another massive data provider. You get incredible read performance with an SSD raid where you can perform concurrent reads. SQL is bad, bad, bad for time series data bases. I have no clue why people still insist that it makes sense to apply a technology that was never designed to tackle a specific problem. I have tested the performance differences between my own binary data store and any SQL and non-SQL solution that I could get my fingers on and believe I speak with a high level of confidence and conviction.

InvBox said:
DB has a great advantages is search mode. Uses clustered indexes you will get the great select performance instead of whole CSV file scan. Also MySQL has in memory cache mode that cause zero I/O operations. Sure SSD cannot process that even close.

cjdsellers · Jan 22, 2017

InfluxDB
https://www.influxdata.com/

InfluxData.NET (API wrapper in C#)
https://github.com/pootzko/InfluxData.Net

InfluxDB Management Studio (manager UI in C#)
https://github.com/CymaticLabs/InfluxDBStudio

Thoughts, guys?

fan27 · Jan 22, 2017

cjdsellers said:
That's sounds great fan27, I'm not familiar with the Go language but I see it popping up. InfluxDB is written in Go it seems. Flat file CSV is how I've previously been storing my data... it works... but I'm not yet convinced it's the all round optimal solution. I'm sure the average hedge fund out there isn't directly working with CSV, maybe just when it's been download from a vendor if they aren't capturing their own.

Here are a few links I've come across tonight

DB-Engines Ranking of Time Series DBMS
http://db-engines.com/en/ranking/time+series+dbms

Management of Time Series Data (Thesis)
http://www.canberra.edu.au/researchrepository/file/82315cf7-7446-fcf2-6115-b94fbd7599c6/1/full_text.pdf

Developing Time-Oriented Database Applications in SQL (.PDF book)
http://sql-info.de/sql-notes/developing-time-oriented-database-applications-in-sql.html

Time Series DBMS fastest growing in popularity
http://db-engines.com/en/blog_post//62

Yeah...CSV is certainly not the most optimal...but it is "easy" to deal with and will enable me to move on with the development of other components of my trading system until I need further optimization.

cjdsellers · Jan 22, 2017

fan27 said:
Yeah...CSV is certainly not the most optimal...but it is "easy" to deal with and will enable me to move on with the development of other components of my trading system until I need further optimization.

That's a good point, if CSV is adequate for now then time could be better spent on other areas of ones system to get up and running. Then revisit later for more optimal solutions as you say.

InvBox · Feb 1, 2017

Zzzz1 said:
even a full in-memory solution in MySQL is way too slow for large time series.

I am not trying to convince you. Flat files and databases have their own strengths and weaknesses. I've described my view - where is DB will have advantages.

Zzzz1 · Feb 2, 2017

Nobody denied that conventional sql databases have their strengths, just not in managing and storing and retrieving time series data. I am not trying to convince you either, just stating established facts. There is a reason for the existence of columnar databases.

InvBox said:
I am not trying to convince you. Flat files and databases have their own strengths and weaknesses. I've described my view - where is DB will have advantages.

cjdsellers · Feb 2, 2017

LMAX have won awards for their programming, they pretty much invented the disruptor pattern. Interestingly they use MySQL to store account and other administrative information but not the historical market data available from their servers. (Not sure what they actually use but they tend to roll their own where necessary from what I've read).