Why use a database?

prophet · Oct 14, 2004

Quote from billgates:

I preselect 20-30K ticks, that fits nicely into 256K L2 cache.
All integer arithmetics (no floats), hand optimized.

Excellent design.

Sparohok · Oct 14, 2004

Prophet, I'm not saying tick data is useless, I'm just saying it's harder to analyze than bar data. The algorithms and data structures are significantly more complex than using bar data. That statement holds for every single algorithm you have described.

Also, as far as I can tell, your algorithms are essentially converting tick data to fixed intervals prior to numerical analysis. If, in fact, the easiest way to deal with tick data is to first bin it into fixed time intervals then once again that pretty much proves my point.

Martin

Sparohok · Oct 14, 2004

Quote from prophet:

Do you even realize what you are saying? Computational efficiency does matter when you are talking about 1 to 4 orders of magnitude improvements. Who wouldn't like to have the equivalent of 10 to 10,000 times as much computing power?

If I have a nightly data analysis program that takes 10 minutes to run, I could care less about a 4 order of magnitude improvement. It works, it makes the deadline, and optimizing the code may not be an effective way to spend my time. Instead, I could be designing a new strategy and reusing my clean, well-designed, non-optimal code.

Obviously there are other cases where performance matters very much. But the original poster in the thread was just starting to design their system. At that point it is madness to optimize against performance problems that may never arise.

It is much easier to optimize clean code than to clean optimized code.

Martin

prophet · Oct 14, 2004

Quote from Sparohok:
Prophet, I'm not saying tick data is useless, I'm just saying it's harder to analyze than bar data. The algorithms and data structures are significantly more complex than using bar data. That statement holds for every single algorithm you have described.

Yes, there may be a somewhat steeper learning curve. More advanced analysis always has a steeper learning curve. Significantly more complex? Certainly not. A little more complex, yes. However, none of the extra complexity matters once the basic data manipulation infrastructure is in place, and you are spending most of your time testing hypotheses, optimizing, trading the systems, etc.

Also, as far as I can tell, your algorithms are essentially converting tick data to fixed intervals prior to numerical analysis. If, in fact, the easiest way to deal with tick data is to first bin it into fixed time intervals then once again that pretty much proves my point.

My algorithms only convert tick data into fixed intervals for purposes of generating meaningful covariances and regularizing performance statistics into per-hour or per-day intervals. The bulk of my analysis is based on per-hybrid-tick and per-N-hybrid-tick calculations and is never converted to fixed-time until after the trading is done and I have a performance statistic and need to calcualte a covariance or Sharpe ratio.

You asked about calculating covariances over tick data. Just because I describe a simple method to convert tick data to fixed-time does not prove anything about the algorithms I use. It does not prove that fixed time intervals are superior. It proves the opposite, that tick data is more versatile and trivially easy to convert to fixed-time intervals when necessary.

Quote from Sparohok:
If I have a nightly data analysis program that takes 10 minutes to run, I could care less about a 4 order of magnitude improvement. It works, it makes the deadline, and optimizing the code may not be an effective way to spend my time. Instead, I could be designing a new strategy and reusing my clean, well-designed, non-optimal code.

And that 10 minutes will never add up because youâll only be running 10 minutes of computation per day right? Get real. Any serious optimization over a statistically significant amount of data will take a lot of time.

Obviously there are other cases where performance matters very much. But the original poster in the thread was just starting to design their system. At that point it is madness to optimize against performance problems that may never arise.

You misrepresent the situation. No one is suggesting all code should be optimized against future problems. Proper optimization only targets the code that matters most.

Regarding those just starting out... it is extremely foolish to design or code any program without a basic understanding of computational efficiency (algorithm complexity, caching intermediates, memory locality, etc.). If those concepts are too complicated, one should at least avoid deeply nested loops, i.e. reduce the polynomial order of algorithm running time. It is also trivially easy to determine what parts of code need optimization using a profiler. Repeated use of a profiler will train any programmer to write very efficient code without any additional effort. My top level code constitutes 90% of written lines, is all interpreted Matlab, yet constitutes less than 10% of running time. The 90% of running time is done either by a handful of highly vectorized Matlab lines and/or by a few MEX functions coded in straight C. None of these C functions are more than 50 lines. Many MEX functions are around 25 lines of C, with only 10 lines doing the actual calculations. This isn't very difficult given the performance improvements.

It is much easier to optimize clean code than to clean optimized code.

Optimized code does not imply dirty code. Code can easily be both clean and optimized. Besides, we are usually talking about the critical 10% of code anyway... not a big deal to optimize. Leave the rest unoptimized.

Sparohok · Oct 14, 2004

As far as analyzing tick data versus bar data, I think we've said all that needs to be said. Let the coder be the judge.

Quote from prophet:

And that 10 minutes will never add up because youâll only be running 10 minutes of computation per day right? Get real. Any serious optimization over a statistically significant amount of data will take a lot of time.

Jeez... kids these days. If 10 minutes on a modern processor isn't enough to do real optimization, then clearly everyone analyzing the markets a decade ago was losing money. Too bad they didn't have Pentium 4s.

If you think the markets have changed since then, well, not for me. I'm making good money with a swing strategy that runs in 10 minutes a night on an Athlon 64. In Python by the way... 100% interpreted language. Every application is different.

Quote from prophet:

No one ever advocated dirty code. Optimized code does not imply dirty code. Less maintainable, less conciese or less understandable does not mean dirty.

OK... if less maintainable, less concise, and less understandable code is still clean, what does it take to make code dirty?!? Seriously.

Martin

prophet · Oct 14, 2004

Quote from Sparohok:
Jeez... kids these days. If 10 minutes on a modern processor isn't enough to do real optimization, then clearly everyone analyzing the markets a decade ago was losing money. Too bad they didn't have Pentium 4s.

Why do you make such silly arguments? Plenty of traders are making money without any computational optimization, both now and 10 years ago. Most of us system traders use our brains to analyze and trade, in combination with available computational tools, now and then. There were plenty of supercomputers 10 years ago being used efficiently for analysis, perhaps more efficiently than any of us program our computers today. Bloat is accepted today. It wasn't then. Markets were also very different 10 years ago. There was less automation/program trading to compete against. More dumb money too.

I'm making good money with a swing strategy that runs in 10 minutes a night on an Athlon 64. In Python by the way... 100% interpreted language. Every application is different.

I don't know whether to congratulate you or feel sorry for you here. It's wonderful that you have found a system that requires little computational time. On the other hand I feel sorry that you are content enough with the "good money" and "10 minutes a night" to not see the potential to ramp up the testing and perhaps turn "good money" into "great money". I hope your system continues to profit into the future, because when if and when it stops working, you may blame yourself for resting on your laurels, not allocating more than 10 minutes a day for optimization, simulation, or at least exploring new markets. Are you proud of your efficiency or your efficiency plus the obvious laziness it affords you?

OK... if less maintainable, less concise, and less understandable code is still clean, what does it take to make code dirty?

What code do I consider dirty? Buggy code, poorly written, undocumented and hard to read, poor designs... code that doesnât achieve a very useful balance of correctness, performance and maintainability or readability. The primary purpose of code is to do a job correctly, sometimes as fast as possible. Itâs readability and maintainability is also important, but often secondary to correctness and performance, especially in the case of optimization where the optimized code constitutes a small fraction of the total project, and can be easily rewritten.

marist89 · Oct 14, 2004

Quote from kc11415:

marist89>...I can get 850K records out of my database of 500M records in less then a second. Of course, I run Oracle.

That sounds rather good.

May I ask:

1) Is your timing of this query fresh after the database is started? Or, is it after some of these 850K rows have had time to be accessed by other queries such that many still are still cached in the disk buffer? You're not running this query multiple times and then reporting the later/faster result?

Give me a little credit.

2) You're not by chance pinning anything in memory in the KEEP POOL?

nope

3) What's the datatype of the column(s) in the WHERE clause?

NUMBER(18)

4) Are you using a hash or bitmap index for the column(s) referenced in the WHERE clause?

regular old b-tree

5) Do you have a composite index exactly matching the conditions referenced in your WHERE clause, or do you have individual indexes on each separate column referenced in the WHERE clause?

composite.

6) Are you using table partitioning? If so, do(es) the condition(s) in your WHERE clause relate to the column(s) used to specify partition boundaries?

I could, but for this excercise, no.

7) Is it safe to assume that you are not using an ORDER BY clause? If so, how do you ensure rows come out in the same order on every query? Just because they are inserted in a particular order is no guarantee they will come out in the same order.

who said anything about order? If I had to order this beast, I'd have a sort_area_size of about 256M and it would come out in less than 2 seconds.

8) Do you avoid UPDATES, or else INSERTS after DELETES on this table, which can lead to automatic free space management in the form of coalescing or row migration? (which can change the order of rows)

Sure that would affect performance. But if I needed to get 850K rows out real quick I wouldn't design it so that it carried a lot of DML's, now would I?

marist89 · Oct 14, 2004

Quote from billgates:

Duh good it is... What's your hardware, OS, Oracle version?

Not Windoz!

Sparohok · Oct 14, 2004

Quote from prophet:

Why do you make such silly arguments.

To refute your silly assertion, I guess. You said that one cannot do serious market analysis in 10 minutes a day of computer time. Fortunes have been made and Nobel prizes have been won using a lot less computation than 10 minutes on an Athlon 64. Not to say that I'm making a fortune or winning a Nobel prize, but it's not like I'm going to get there by optimizing my code some more.

Quote from prophet:

There were plenty of supercomputers 10 years ago being used efficiently for analysis, perhaps more efficiently than any of us program our computers today. Bloat is accepted today. It wasn't then.

Trust me, 10 years ago, people were complaining about bloat and reminiscing about the old days when people wrote tight code, just like you are today. It was also at least 20 years ago, when computers were a thousand times less powerful, that Donald Knuth wrote that "premature optimization is the root of all evil." The more things change the more they stay the same.

Quote from prophet:

On the other hand I feel sorry that you are content enough with the "good money" and "10 minutes a night" to not see the potential to ramp up the testing and perhaps turn "good money" into "great money".

Boy, you are making a lot of assumptions here. I'll just say that your assumptions are not correct and leave it at that.

Most programmers goes through a phase where they think about programming as this samurai art of making their code leaner and meaner and faster than anyone else's. Then, if they spend enough time writing, debugging, maintaining, and reusing code, they usually figure out that the true lasting value of code is not how fast it runs or how clever it is, but rather how expressive it is, how well it bridges the gap between human understanding and machine execution.

You're welcome to call that laziness, I really don't mind.

Martin

prophet · Oct 15, 2004

Quote from Sparohok:
To refute your silly assertion, I guess. You said that one cannot do serious market analysis in 10 minutes a day of computer time.

Sure the analysis might be valid, maybe even serious. Plenty can do serious analysis with a calculator, or all in their head. However, in a comparative sense ten minutes per day of quantitative analysis is not serious relative to what is possible with longer computational times. Analysis on 100 markets is more relevant than analysis over 1 market right?

Fortunes have been made and Nobel prizes have been won using a lot less computation than 10 minutes on an Athlon 64.

So? Fortunes and Nobels have been won using zero computation too.

Not to say that I'm making a fortune or winning a Nobel prize, but it's not like I'm going to get there by optimizing my code some more.

Do you believe scores of software companies, quants, hedge funds and financial institutions optimize their code for the fun of it? Would they rather wait years for an analysis to finish that might only take a day with optimized code? Or why donât they just used recycled Pentium II PCs to cut costs?

Trust me, 10 years ago, people were complaining about bloat and reminiscing about the old days when people wrote tight code, just like you are today. It was also at least 20 years ago, when computers were a thousand times less powerful,

You initially asked how people made money in the markets quantitatively 10 or 20 years ago given that I say 10 minutes/day today is not a "serious" amount of computation. I then mentioned custom supercomputer codes for quant analysis, 10 or 20 years ago, with no bloat, most of which is still advanced by todayâs standards. The point is that it has been done, and that a decent amount of computation is often helpful or essential for success. Yes there are exceptions. Plenty have done ok with just calculators, slide rules or hand drawn charts. However, anyone who attempted serious (eg multi market or tick based) analysis with primitive tools like PCs 10 or 20 years ago probably had a slim chance of success unless they had a cluster, or brought novel skills or data to the table.

that Donald Knuth wrote that "premature optimization is the root of all evil." The more things change the more they stay the same.

What is the relevance of this? This statement seems to address the quality or form of optimization, not the quantity of computations involved.

Boy, you are making a lot of assumptions here. I'll just say that your assumptions are not correct and leave it at that.

Most programmers goes through a phase where they think about programming as this samurai art of making their code leaner and meaner and faster than anyone else's. Then, if they spend enough time writing, debugging, maintaining, and reusing code, they usually figure out that the true lasting value of code is not how fast it runs or how clever it is, but rather how expressive it is, how well it bridges the gap between human understanding and machine execution.

Lean mean and fast canât be expressive too? Here you go again making unsubstantiated claims, ignoring all of my arguments, and portraying the issues as black or white. Why go through the trouble if the code isnât expressive and flexible? The whole point of using interpreted Matlab for 90% of my code is to achieve maximum expressiveness and maximum efficiency thanks to the optimized 10%. You never answered the point about only needing to optimize a fraction of code while the other 90% is highly expressive. It's a not a complete overhaul like you would have us believe. More like replacing a fraction of functions with optimized equivalents and avoiding deeply nested loops. How hard is that?

You're welcome to call that laziness, I really don't mind.

And you are welcome to educate me on that point. I admit I could be wrong. I really donât mind either way. It just seemed to me that anyone who says they do quant analysis, but is content with 10 minutes of computation per day and little optimization must not be terribly motivated to maintain or improve their profitability.