Why use a database?

The technology has changed considerably.

Your stats may be accurate for single individual drives but using a program like fancycache can yield a 1000x performance boost.
http://www.romexsoftware.com/en-us/fancy-cache/index.html

Infiniband / Striping / hybrid arrays of SSD + disks / clustering etc are all consumer cost reachable technologies today.

1 GB per second sustained read/writes is doable today and with clustering over infiniband 5gb/sec is attainable while remaining affordable.

Even some of the journaled posts on index settings and speed may not accurately reflect the hardware impact.

Try these simple changes for a 1000x boost

For DB Use covering indexes\ 64k page size (cuts read i/o in half and better cache hit rates)

For Disk use Fancycache setup with write only cache (1gb 512k blocks write deferred 2 sec) this buffers writes to achieve near SATA 3 spec speeds. Deferring the writes for 2 secs increases cache read hits and reduces hammering of i/o write operations... less fragmented sequentially writes of random data... which in turn optimizes read i/o.

CAUTION: Write deferred caching can cause serious data corruption on power failure/system lock ups.









Quote from inflector:

I've got an eclectic background. Started programming in high school over 20 years ago writing futures trading systems. Had a bit of fame in my early twenties as a trader and then left for 15 years to start a few software companies.

One of them sold an embedded database which was the number one product on the Macintosh. I worked on the internals, disk access, etc. as well as the query optimization.

There is a huge difference in read time between a database and a binary file unless the database has been specifically optimized for large binary data storage (known as BLOBs in the business).

The reason is simple, even in a database with an efficient caching mechanism large data sets generally involve multiple reads from the disk because the data is split up into chuncks. Every separate read will take a while because on average it will require 1/2 of a rotation of the disk before the data comes under the read heads so the read can start.

Unlike almost every other aspect of computing, disk speeds have not followed Moore's Law. Disks are maybe 30 to 100 times faster than they were 20 years ago while computers are 10,000 times faster.

Even a 10,000 RPM disk takes 6 milliseconds to rotate. So you only get 167 rotations per second. That's a lot of time when computers are doing billions of instructions per second.

For tick data analysis the speed of reading the data is the determining factor for the speed of testing unless you have very inefficient code or are doing esoteric analysis.

So I suggest storing information about your data in a database but storing the physical data on the disk in raw binary files.



You can get acceptable performance from a database if you know what you are doing, however, you will always pay a performance penalty.

- Curtis
 
Quote from PocketChange:

Infiniband / Striping / hybrid arrays of SSD + disks / clustering etc are all consumer cost reachable technologies today.

CAUTION: Write deferred caching can cause serious data corruption on power failure/system lock ups.

Creating a single point of failure/error...
At a complex, 3rd party, NAS device...
Is very poor design for a trading operation.

You are either spending $5,000,000/year and completing with banks...
Or you are not playing the latency/data warehousing game...
There is no real middle ground.
 
Of course a simple binary file is very fast but there are many issues that need to be handled very carefully. You end up recreating a lot of your own infrastructure to handle these real-cases that you won't even realize unless you do it. It takes time and resources to get this capability and for small timers arguably better spent generating profits rather than trying to catch up the rapidly increasing technology curve.

Quote from onelot:

There's been some discussion on databases lately and I'd like to pose the question of why even use a database? From the research I've done it looks like a lot of traders seem to prefer binary data files for storing price/quote information. Apparently binary data is much faster to read and write to.

For instance, if my purpose was to have an efficient means of storing gigs of tickdata and this data was the backbone to my bactesting engine, whereby strategies were run off of different timeperiods etc., why would I need a database? Why not call and hold the pertinent data in memory? From my research, dbs excel at query based searches... unless your backtesting analysis needs this type of functionality, again why would use a database?

Background: I'm currently in development of a backtesting/trading engine and have been looking at the different ways large amounts of data can be stored, I have no experience with either method of storage. Here's a great discussion by experienced mechanical traders on the merits of binary data and the pitfalls of dbs. It's where I based most of my questions from and it's definitely worth a read:

http://www.turtletradingsoftware.com/forum/viewtopic.php?t=980

http://www.turtletradingsoftware.com/forum/viewtopic.php?t=791

Would love to hear your guys' thoughts on this. It would be great to hear examples... I've been reading a lot of the theory and it's a bit difficult to base decisions upon. Concrete examples of how one has used different methods and their comparisions, always seems to triumph over theory.

Good trading.
onelot
 
Quote from SeventhCereal:

...rather than trying to catch up the rapidly increasing technology curve.

The basic Art of Trading has not changed much...
People who try to substitute off-the-shelf technology...
For hard earned trading EXPERTISE...
Are wasting their time.

Actual professional EXPERTISE in any field = 10,000 hours...
And cannot be circumvented with bells and whistles.
 
Back
Top