Java - Storing data in memory for post-runtime access

quatron · Jul 29, 2013

Quote from CalVolibrator:

I am not sure what kind of data throughput is acceptable to you but if you have no problem with parsing to and back from string based data then we are already in different camps.

I took a look at Redis benchmarks at http://redis.io/topics/benchmarks and the results are in line with "parsing to and back from string".

$ redis-benchmark -n 1000000 -t set,get -P 16 -q
GET: 508388.41 requests per second

I'm getting similar throughput when loading and parsing from a flat text file (csv). Not sure how Redis will improve things compared to loading from a csv. Is your Redis working faster?

lwlee · Jul 29, 2013

* You are probably right that most financials operate on legacy technology as long as it does not directly solve imminent front office problems. That is the reason why DB and other IBanks run tens of thousands of applications, not because they are needed but because nobody dared to consolidate and cut legacy back compatibility at the expense of higher IT budgets. Its something that is very unfortunate because it is a huge cost driver due to expensive bugs and errors that originate from such chaotic stack but its most likely not gonna change because IT managers are paid to get problems today solved, not the ones of tomorrow.

Yes, it's true plus politics plays a role. Architecture committees decide the technical direction which generally lasts for a few years. Practically speaking, jumping on the newest bandwagon and then trying to support it is not a easy task.

* May I please correct you: Redis is not a niche product: Its used by the back-end of some of the highest traffic websites worldwide and also by quite a number of large cap corporations. Most everybody who is involved in C++, Java, .Net, has heard of Redis. Lol, and why no experts are sought for Redis on Job Boards is because you dont need any expert!!! Its easy to configure and just works contrary to the total SQL mess. I looked at Entity Framework and its laughable how complicated problems are solved in the ORM world just to pull some data out of traditional RDBMs.

Please. "don't need any expert"? The fact is, no one is hiring programmers to even start using the product. Which is not to say it couldn't grow. It's just that young product like RedisDB need time to develop and see if it will grow to be accepted by the masses. NoSQL has potential, just that it's not mainstream yet. You laugh at ORM but consider its precursor, EJB 2.0. ORM is a dream versus that standard. Btw, wikipedia has youporn.com listed as one of the "many" companies that use it. You know you are stretching it when you need a porn site for endorsement.

* You do not seem to understand the task at hand, hence my questioning of your expertise: NOBODY needs to serialize data when loading historical data into a profiling/testing platform. The data is deserialized from binary data structures and loaded into the platform. This process is an order of magnitude faster than any REST, any RDBMS could ever be. No Oracle, no SQL Server, no other database can by definition compete with a flat binary data structure because of all the overhead such databases force you into.

I thought there was constantly new data that needed to be serialized. Anyway if possible, I prefer separation of data and application. How does a client pull data from Redis? Socket protocol? Curious since you're pushing Redis so hard, would like to know it's client performance compared to HTTP.

* Now you suddenly advocate Redis as the solution for the OP? Confused....

It's just your arrogance. Never did I NOT advocate Redis. Hell I never heard of it until today. Again as I said, it's a nichey product from a relatively new category (NoSQL). There are a myriad of NoSQL contenders. Redis happens to fit what the OP is doing but again what's the client communication protocol.

* Lol, dude, REST is the preferred way to serve content on web pages over SOAP these days but this DOES NOT MEAN IN THE SLIGHTEST that it is incredibly slow to solve the problem OP has at hand which is loading historical data from a data source. Please get real!!!

Dude, we got 3 solutions.
1. data and app together, serialize/deserialize.
2. RedisDB
3. simple HTTP/RESTful Server

While I think we can agree 1 is the fastest (though not elegant) until we know what the Redis client communication protocol is, your arrogant support of Redis might be misplaced.

lwlee · Jul 29, 2013

Dude, I did the work for you.

Redis is a TCP server using the client-server model and what is called a Request/Response protocol.

Redis Request/Response protocols and RTT

An HTTP session is a sequence of network request-response transactions. An HTTP client initiates a request by establishing a Transmission Control Protocol (TCP) connection to a particular port on a server (typically port 80; see List of TCP and UDP port numbers).

Don't be hating client/server

. See I didn't question your expertise.

CalVolibrator · Jul 29, 2013

I need more information from you to judge whether my performance is similar to what you are seeing but you can easily determine that your Redis instance is not as fast as it can be (possibly because of how you setup your config file, persistence can slow things down among a host of other reasons) when you consider that

* Redis : you serialize to binary and write to memory/ deserialize and read from memory through a very thin API layer.

vs

* csv : you parse data to string then you write line by line to a csv file/ you read line by line and then parse from string to whatever type. Of course you can read all at once if your memory permits but internally still the stream is processed line by line.

But more importantly, let me ask you a simple question. How do you read data, stored in csv format back into memory, given you dont read the whole file? You basically have to traverse line by line until you reach the point you want to actually read. You do not have to do so at all with a binary file structure. You can intelligently determine the start and end point WITHIN the binary file structure and directly jump to it and read the portion into memory. So, you see, its impossible, unless one is a very poor programmer, to reach as slow a speed for binary file (and memoryDB, such as Redis) operations as CSV file operations, read-wise and write-wise.

Quote from quatron:

I took a look at Redis benchmarks at http://redis.io/topics/benchmarks and the results are in line with "parsing to and back from string".

$ redis-benchmark -n 1000000 -t set,get -P 16 -q
GET: 508388.41 requests per second

I'm getting similar throughput when loading and parsing from a flat text file (csv). Not sure how Redis will improve things compared to loading from a csv. Is your Redis working faster?

CalVolibrator · Jul 29, 2013

* not sure what you mean with data and app together. I wrote my own API that manages a full fledged flat file binary store. Nothing I ever tested, including KDB in memory, is faster than that. My data is on disk (SSD) and my access layer sits within a segregated library that can be referenced in order to access the functionality. That is the fasted way there is.

* Redis is used by the StackExchange network, possibly one of the websites with the most traffic worldwide, and how about Twitter? Ridis in both cases is one of the core components used for lookups. That you conveniently omit those and name a porn website shows your ignorance. (But hey I am sure that porn website also has tons of traffic, maybe by you as well? You sound very emphatic about that site).

* Do your own homework, its just a weblink away from your fingertips to find out what technology they use.

I do not advocate Redis, I merely say it could be used and I know from own usage that it is faster. And please drop out your comical suggestion of REST, really REST has no place in this discussion. Saying REST should be used to load time series data into a testing platform is just bizarre. There is a reason people serialize/deserialize to/from binary format for speed and not serving the data over a REST API.

I bow out of our exchange because I have spent enough time on this stuff, I wrote my own profiling and testing platform, live trading framework, feed handlers, fix gateways, optimizers, and yes, data handling to know what I am talking about. I tested about 10 different technologies and APIs before settling on a combination of binary flat file store and Redis. REST was not even among those 10. I am glad it works for you but it would be awefully slow for my needs. The OP needs to decide himself what works best for him.

Quote from lwlee:

Yes, it's true plus politics plays a role. Architecture committees decide the technical direction which generally lasts for a few years. Practically speaking, jumping on the newest bandwagon and then trying to support it is not a easy task.

Please. "don't need any expert"? The fact is, no one is hiring programmers to even start using the product. Which is not to say it couldn't grow. It's just that young product like RedisDB need time to develop and see if it will grow to be accepted by the masses. NoSQL has potential, just that it's not mainstream yet. You laugh at ORM but consider its precursor, EJB 2.0. ORM is a dream versus that standard. Btw, wikipedia has youporn.com listed as one of the "many" companies that use it. You know you are stretching it when you need a porn site for endorsement.

I thought there was constantly new data that needed to be serialized. Anyway if possible, I prefer separation of data and application. How does a client pull data from Redis? Socket protocol? Curious since you're pushing Redis so hard, would like to know it's client performance compared to HTTP.

It's just your arrogance. Never did I NOT advocate Redis. Hell I never heard of it until today. Again as I said, it's a nichey product from a relatively new category (NoSQL). There are a myriad of NoSQL contenders. Redis happens to fit what the OP is doing but again what's the client communication protocol.

Dude, we got 3 solutions.
1. data and app together, serialize/deserialize.
2. RedisDB
3. simple HTTP/RESTful Server

While I think we can agree 1 is the fastest (though not elegant) until we know what the Redis client communication protocol is, your arrogant support of Redis might be misplaced.

CalVolibrator · Jul 29, 2013

I criticized the use of REST API not that client/server relationships are involved. My opinion took everything into account, (a) lower throughput performance, (b) need for DTOs, (c) possible need for ORM,...REST is just not the right tool for the trade.

If I gave you this wrong impression then my wrong.

As you already checked things out with Redis you should be able to easily pull some performance benchmarks others ran.

Quote from lwlee:

Dude, I did the work for you.

Redis is a TCP server using the client-server model and what is called a Request/Response protocol.

Redis Request/Response protocols and RTT

An HTTP session is a sequence of network request-response transactions. An HTTP client initiates a request by establishing a Transmission Control Protocol (TCP) connection to a particular port on a server (typically port 80; see List of TCP and UDP port numbers).

Don't be hating client/server . See I didn't question your expertise.

newguy05 · Jul 29, 2013

Quote from jtrader33:

That thread title is probably poorly worded but this is the issue:

- I've bought 1 min option data and stored it in csv files

- The actual data required for any given backtest is a small subset of the whole dataset (e.g. 5 min quotes derived from the 1 min) and should easily fit into RAM

- Rather than read through the entirety of the csv files on each backtest, I'd like to store only the data I need into ArrayLists and then have the backtesting code freely access it

- The catch is that I want to be able to make substantial changes to the backtesting code in between runs (not just simple parameter changes)

I know Amibroker does this - how can it be done in Java? Appreciate any suggestions.

get a real database

CalVolibrator · Jul 29, 2013

and with that we would be back to square one ;-)

Recommendation in order of preference:

(a) Binary flat file storage and management
(b) In memory database (in case all fits into memory), SUCH AS Redis

I am out of here. Good luck to the OP.

Quote from newguy05:

get a real database

quatron · Jul 29, 2013

Quote from CalVolibrator:

REST was not even among those 10. I am glad it works for you but it would be awefully slow for my needs. The OP needs to decide himself what works best for him.

I believe OP does not use REST to fetch tick by tick but rather download a flat file for a day from the server that keeps those files in memory. This way it will certainly outperform Redis. And I guess he already made his decision.

CalVolibrator · Jul 29, 2013

Ok then I must have missed something. Sure, if he uses binary flat then that is what I also recommend from the first post. But Redis would easily outperform reading such batches of data would they be stored in .csv format.

Quote from quatron:

I believe OP does not use REST to fetch tick by tick but rather download a flat file for a day from the server that keeps those files in memory. This way it will certainly outperform Redis. And I guess he already made his decision.