Sparohok>Sure, every application is different, etc. But I think there is a right way to handle large amounts of time series data, and that is to keep it in a true transactional database and then cache as necessary for performance.
I think this depends upon your definition of "time series data." If you are referring to one OHLC bar per day per instrument, then I could see the merits of your argument. If you mean to imply Intraday time-series data, perhaps down to the level of resolution of each tick in the Time And Sales data, then I'd disagree in most cases.
------------------------------------------
onelot>My testing involves running a stat analysis program against single/multiple products 100's of thousands of times. The closest thing I could compare it to, would be a brute force optimization engine that was looking at up to 10+ variables in a run on a LOT of data. It takes days sometimes. It's definitely not realtime. The results are just summaries and stats, small files. The main bottleneck is h/w, my code, and file I/O. H/w and code I can optimize, but I currently have all data in ascii's and just trying to figure out the best route to go.
This sounds like the kind of workload upon which institutionals can spend considerable resources. Assuming you're trying to do this on a private trader's budget, you probably won't get it as fast as them, but you may get close enough to tolerate.
For some context, consider that strict interpretation of modern portfolio theory would require that beta for each stock in the S&P500 be calculated by correlating against all 499 of the other stocks. This would require calculating 250,000 time-series correlations. The common short-cut is to calculate beta as the correlation between each stock's time-series and just the index itself. Very few institutionals will invest what is required for a strict calculation of beta.
If the institutionals take such short-cuts, then might you adapt this notion to your own needs?
__________________________________________________________
linuxtrader>The question here is one of proper application design. .... The toughest part of any application is figuring out the requirements and the most cost effective solution.
With what little details are available, it sounds like each input parameter is being is being varied across a wide range in order to find the values for each parameter which achieve the optimal end result for the whole system.
If so, then perhaps this problem might be a candidate for Linear Programming methods, as an alternative to brute force numerical methods.
On a related tact, if a way can be found to express the problem algebraically, then perhaps it might be possible to calculate the PDE's (Partial Differential Equations) with respect to each input parameter to calculate the rate of change for the net result depending upon changes in each parameter. Knowing this might allow a binary search within the range of values for each parameter so as to navigate the solution space in a more precise and direct manner.
__________________________________________________________
Otherwise, for optimization on a budget perhaps consider some of the following:
1) Hawk whatever you must to get money for a motherboard allowing more RAM, and stuff it with several GB's.
1a) The most economical way to do this is probably with AMD Opteron or FX-51/53
1b) More memory is more important than a fast CPU or fast disks.
2) If you can't afford fancy disk controllers, then at least buy multiple cheap disks and cheap controllers, with input files on different disks than your output files.
3) If you can't spread files out on a one per disk basis, then make partitions as small as possible while still being able to keep all files within one partition on any given disk in order to minimize seeks across the whole disk.
4) ASCII text encoded numerical data is not only bulkier on disk and in memory, it also takes time to convert to numerical equivalents. Therefore, have a preprocessor which converts your persistent store ASCII versions into binary for use at run-time.
5) If you will not be doing purely sequential access through each file, but rather will be seeking to random positions within the file then you may need an indexing scheme, whether you write it or you buy it. ( If someone searches sequentially to find a row matching desired contents, then that's CS blasphemy, aka PEBCAK )
6) consider scrounging up multiple old retired computers with at least P4 1GHz cpu's such that each subset of the total job can run on a dedicated machine and not have to compete for disk access.
7) Use an O/S with less overhead burden upon the h/w i.e. Unix or Linux
8) If you must use Windoze, then turn off many of those superfluous services you don't need.
9) if in a prop trading shop, consider negotiating to be allowed to run parallel distributed jobs across all workstations in the office, kind of like the SETI project, but only out of hours.