Quote from jtrader33:
As lwlee suggested, I ended up caching the data in a persistent server app that reads the csv files and then stores the filtered 5min data into an ArrayList<OptionQuote> per day. The backtest app requests a day's worth of data from the server and receives it via objectinputstream* over the socket. It works great - a test that took over two hours previously is now finished in a few minutes (save for the initial data loading into the server). However, to quatron's point about scaleability, the issue I am up against now is managing the memory consumption. One month of csv files (with all 1min data) is 1.0GB and yet my server loaded with just the 5min data is 1.2GB (I only have 24GB of RAM and need to test 36 months of data). I've tried to be careful with my serialized OptionQuote class: OptQuote(long quoteDateTime, double undBid, double undAsk, long expiryDateTime, double strike, char right, double optBid, double optAsk) ...but perhaps I will need to keep each quote as a single String on the server side and then convert the ArrayList<String> to ArrayList<OptionQuote> on the client side. I'd rather not have to do that though since:
1) it will require splitting the string on the server side anyway (to determine if its a 5min interval)
2) an additonal iteration through the entire day of quotes on the client side to convert to ArrayList<OptionQuote> (this is necessary for methods that determine earliest expiry, closest strike, etc.)
3) I'm not certain that it will actually be more efficient from a memory standpoint
Any suggestions on the memory aspect would be helpful. Regardless, thanks for all the input - memory issue aside, I'm pretty pleased that I was able to get something that works well together in a single afternoon + evening.
*In case anyone finds this post in a search and attempts something similar:
There was a huge difference in performance between this...
out = new ObjectOutputStream(connection.getOutputStream());
in = new ObjectInputStream(connection.getInputStream());
and this...
out = new ObjectOutputStream(new BufferedOutputStream(connection.getOutputStream()));
in = new ObjectInputStream(new BufferedInputStream(connection.getInputStream()));
Without the inclusion of BufferedOutputStream/BufferedInputStream, it took ~10 seconds to push my ArrayList objects through the socket (which was no faster than reading from disk). After adding them, the transmission is virtually instantaneous.