I am going to be a bit of a Luddite here and ask the question - Are these streaming DBs really that useful ? And also ask if somebody can provide an example of how they are useful.
As I find if difficult to talk about sw technology in the abstract, I will briefly discuss a little application I have written in Java as an illustration.
The application is a real market scanner (or alarm raiser if you like). You can set criteria like
In other words relational expressions of arbitary complexity involving any of the supported time series "functions". The expressions can include a mix of time series with different periods (eg 1min or 5 min bars etc) or even time series with constant volume or constant tick bars. Multiple simultaneous screens are supported.
On each tick every screen is evaluated for the instrument the tick occurred on. The largest universe of stocks I have tried so far is components of the NDX. Performance is very, very good. Half a dozen concurrent screens show almost no CPU utilization on an old Athlon 2800 XP (Socket A !). The components of the SPX should be no problem at all.
Implementation - Historical data is stored in MySQL tables and loaded into arrays at startup. (Arrays of doubles not Doubles). These arrays are encapsulated in TimeSeries objects. Arrays are grown as required. I have tried to avoid locking and it seems that I have been successful. TimeSeries objects have an event notification mechanism. There are basically two time series events - LAST_BAR_CHANGED and BAR_APPENDED. Event listeners listen for these eg an SMA class listens for these events on its input and recalculates the last bar or appends a new bar to itself and notifies its listeners. So a indefinate number of subclasses of TimeSeries can be chained together via event notification.
Other time series listeners include charting and the real time scanner.
It seems to me that there are lots of performance advantages in using in-memory arrays, not the least of which is that portion of the time series for which the last bar is being calculated should be in processor L2 cache for the duration of the calculation - and loaded into cache with only one memory access. I find it hard to believe that anything else (eg most types of Collections or a streaming database) is going to achieve this.
The problem of course is limited memory. Solution - buy more memory and allocate a big swap partition and let the OS virtual memory system do it's job. If we are talking analysis (eg backtesting) this should be ok too because of the sequential nature of access to the arrays should result in fairly orderly paging. Address space no issue with 64bit CPUs.
So here is the question - what would a streaming data base do for me ? And how would it match the performance ?
As I find if difficult to talk about sw technology in the abstract, I will briefly discuss a little application I have written in Java as an illustration.
The application is a real market scanner (or alarm raiser if you like). You can set criteria like
Code:
(SMA10 > SMA50) AND (RSI7 < 30) AND (VOLUME > 1.5 * SMA5 (VOLUME) OR ......
In other words relational expressions of arbitary complexity involving any of the supported time series "functions". The expressions can include a mix of time series with different periods (eg 1min or 5 min bars etc) or even time series with constant volume or constant tick bars. Multiple simultaneous screens are supported.
On each tick every screen is evaluated for the instrument the tick occurred on. The largest universe of stocks I have tried so far is components of the NDX. Performance is very, very good. Half a dozen concurrent screens show almost no CPU utilization on an old Athlon 2800 XP (Socket A !). The components of the SPX should be no problem at all.
Implementation - Historical data is stored in MySQL tables and loaded into arrays at startup. (Arrays of doubles not Doubles). These arrays are encapsulated in TimeSeries objects. Arrays are grown as required. I have tried to avoid locking and it seems that I have been successful. TimeSeries objects have an event notification mechanism. There are basically two time series events - LAST_BAR_CHANGED and BAR_APPENDED. Event listeners listen for these eg an SMA class listens for these events on its input and recalculates the last bar or appends a new bar to itself and notifies its listeners. So a indefinate number of subclasses of TimeSeries can be chained together via event notification.
Other time series listeners include charting and the real time scanner.
It seems to me that there are lots of performance advantages in using in-memory arrays, not the least of which is that portion of the time series for which the last bar is being calculated should be in processor L2 cache for the duration of the calculation - and loaded into cache with only one memory access. I find it hard to believe that anything else (eg most types of Collections or a streaming database) is going to achieve this.
The problem of course is limited memory. Solution - buy more memory and allocate a big swap partition and let the OS virtual memory system do it's job. If we are talking analysis (eg backtesting) this should be ok too because of the sequential nature of access to the arrays should result in fairly orderly paging. Address space no issue with 64bit CPUs.
So here is the question - what would a streaming data base do for me ? And how would it match the performance ?