Why use a database?

Gringinho · Oct 15, 2004

Quote from linuxtrader:

Similar things with mySQL databases on Linux... AND my SQL is free while oracle is not......
...

Isn't mySQL fairly simplistic in comparison to the also free PostgreSQL ?
I remember mySQL being more used for small tasks and web-oriented deployment. Also, the licensing of mySQL requires that you stay away from commercial deployment, while PostgreSQL uses Berkley license ?
Also, mySQL only supports row-locking, while PostgreSQL have more advanced locking as well as optional row-locking. PostgreSQL has PL/SQL stored procedures.

But I guess both free alternatives are not perhaps the thing one would look for if the database grows very big, with lots of intraday data and for many symbols. Then backup, replication etc. might be things that weigh in as the data themselves becomes more of an investment of time, resources and money.

linuxtrader · Oct 15, 2004

Quote from Gringinho:

Isn't mySQL fairly simplistic in comparison to the also free PostgreSQL ?
.....

They are both useful ... The newer versions of mySQL offer familiar features found elsewhere .....

Gringinho · Oct 15, 2004

Quote from linuxtrader:

They are both useful ... The newer versions of mySQL offer familiar features found elsewhere .....

I'm not that familiar with recent versions - as I've only used years and years back, but I still get the impression that mySQL is faster on simple databases and simple operations than PostgreSQL, which has some more advanced features.
I guess the licenses also determines usage a lot. Two great and free products anyway.

prophet · Oct 15, 2004

Quote from Sparohok:

Hey Prophet,
I think we're in violent agreement here. I'm not against optimization, I just think that someone just starting out with their implementation shouldn't worry about optimization right off the bat.

The point I keep trying to make is that optimization is actually easier, much easier than you portray. Yes, beginning programmers shouldnât have to worry about it. However, it is wrong to say they should neglect it completely. They should know just enough about proper algorithm design and use a profiler as a learning tool such that their code is decently optimized from the start. Iâm not suggesting heavy profiling to get every last drop of performance or generating less readable code. I mean a basic algorithm understanding and exploratory profiling to see what runs fast and what doesnât. Not too much trouble if you bother to learn about it. What problem would anyone have with this philosophy?

To get back to the original question of the thread, that's why I think it's better to start with a database which does an enormous amount of work for you, providing atomicity, a data model, a client-server model, etc., but as you point out not the best performance. It's a lot easier to take a good database and move the crucial bits into flat files.

Maybe weâll never agree on stuffâ¦ sadly. I feel that flat files have tremendous advantages starting out. Ascii files can be and edited, understood and debugged better than databases. I store all my market data in CSV files and cache it in binary files as itâs used. Why? I sometimes need to correct for market data gaps, patching my primary serverâs data with the backup server data. Just edit the CSV file and delete the binary cache file. I use flat binary files because they are fast, and I can manage them easier, compressing and archving data I don't need anymore. If I had to use a database, I would be adding, subtracting, purging, and archving many GB of data per day through the database. Lets hope the database is smart enough to handle the internal layout so it doesn't take hours to process.

For example in my system all the data is in a database but it is cached in flat files. If the database gets too huge I might need to migrate all the data into flat files permanently and just keep metadata in the database.

I agree on that last point.

It's just kinda amusing for me to see the heavy focus on fancy hardware and algorithms here when you can accomplish great things with simple algorithms, obsolete hardware, and a (God forbid) CRT monitor instead of flat panel! I think that pushing the limits of performance will often obscure the underlying issues that affect profitability.

You know I hate saying this. Unfortunatley, here you go again portraying things as black-or-white, one-or-the other. The truth of the matter is that one can achieve a very nice combination of both performance and expression, with surprisingly little effort. Both performance and expression can speed the path to profitability. How does pushing performance limits necessarily interfere with profitability? I donât see how one can negate the other, except in the case of poor, unplanned or uneducated designs... incompetence.

One example is statistical validity, I think the more data you have and the more you analyze it, the easier it is to over fit your data and end up with serious implementation shortfall as a result.

You have it backwards. More market data allows greater statistical significance. Less data leads to over-fit results. Please prove your opposite point of view.

I've seen this again and again, and in such cases throwing more hardware or data at the problem will make it worse.

Sure, plenty of quants have thrown hardware at a failed system, only to still end up with a failed system. It was their design at fault, not the use of hardware. I threw hardware at my systems, which resulted in smoother and more substantial returns, by virtue of greater diversification to more markets and more systems per market. Youâll find examples either way. The question is does extra computation necessarily hurt? If yes, then why do you use any computation to begin with?

It is ridiculous to claim there is a certain optimial amount of computation, beyond which is detrimental to profitability. That is what you are suggesting. It doesn't make sense logically, unless you are assuming an inherent amount of incompetence. In that case the problem is core incompetence of the designer, not the amount of computation. They will fail with any amount of computation.

linuxtrader · Oct 15, 2004

Quote from prophet:

... The truth of the matter is that one can achieve a very nice combination of both performance and expression, with surprisingly little effort. How does pushing performance limits necessarily interfere with profitability? I donât see how one can negate the other, except in the case of poor, unplanned or uneducated designs.

...

All of the debate can be boiled down to saying that if the system meets your present and future needs then you are done: no need to optimize further, change the design or do anything else.

A good system design balances cost, and performance and also scales appropriately to meet changes in capacity and demand which are within the design limits of the system.

You can use any combination of flat files, database systems and algorithms that accomplishes your design goals. Unless another system is identical to yours then comparing designs is largely irrelevant.

nononsense · Oct 15, 2004

Quote from linuxtrader:

He is not ... and neither is the other debater.

The whole argument is well known.

You can produce highly optimized code that will run fairly complicated optimizations in relatively small amounts of time: the trick is to match the technique to the problem and understanding how to optimize to your hardware- like the other poster said about matching the cache and pipelines with the data/instructions.
[...]

nice to learn about your tricks.

prophet · Oct 15, 2004

Quote from linuxtrader:
All of the debate can be boiled down to saying that if the system meets your present and future needs then you are done: no need to optimize further, change the design or do anything else.

Future needs? Markets change in unpredictable ways, especially long-term. You canât predict that.

Like I said to Sparohok earlier, what happens when the market changes, rendering your systems unprofitable? You will fault yourself for not trying to improve profitability in the past while there were more market opportunities to profit from, and you could have ramped up your analysis efforts.

You and Sparohok are both suggesting something very dangerous... namely contentment with the status quo. Many people and fortunes have been destroyed by that.

A good system design balances cost, and performance and also scales appropriately to meet changes in capacity and demand which are within the design limits of the system.

Design limits? Cost? "Scales appropriately"? Weâre not talking airplanes here! In the pursuit of trading systems there are often NO design limits when it comes to achievable processing speed, scalabilty, amount of data processed, number of systems traded, profitability, etc. Your motivation is your only limit. You mentioned cost. Computing hardware, books and self-education are cheap. There's no excuse for not using them. As a Linux user you already know how to cut costs.

A better educated programmer can do the work of 10 or 100 less educated programmers, and multiply the effective processing power of a computer by factors of a thousand or more.

kc11415 · Oct 15, 2004

kc11415>>1) Is your timing of this query fresh after the database is started?

marist89>Give me a little credit.

Please accept my apologies. I was just curious ;-)
_______________________________________________

linuxtrader>As far as answering a laundry list of inquiries about database configuration and query execution my advice to the other poster that asked is to read their oracle documentaion: all of those issue are discussed if they need confirmation of how to handle implementation and optimization - which you graciously answered .....

LinuxTrader, FYI: That tiny "laundry list" would not be known to someone who hadn't already RTFM'd ;-)

However, in hindsight the question about hash vs. bitmap index shouldn't have been asked since good index performance accompanied by a high degree of selectivity implies b-tree hash rather than a bitmap index.

Grizli · Oct 15, 2004

We are trying to find the best decision for future bargains using backtesting. Many people told the good decision for the past period cannot be the same in the future. The question is the following: what information do you hope to get using historical database, for example?

linuxtrader · Oct 15, 2004

Quote from prophet:

Design limits? Cost? "Scales appropriately"? Weâre not talking airplanes here! In the pursuit of trading systems there are often NO design limits when it comes to achievable processing speed, scalabilty, amount of data processed, number of systems traded, profitability, etc. Your motivation is your only limit.

.....

A better educated programmer can do the work of 10 or 100 less educated programmers, and multiply the effective processing power of a computer by factors of a thousand or more.

On the first point I can tell you that I would never approve a project where the design engineer did not know the limitations of their system design: If they cant predict how their system will respond to a spike in demand/load or a change in the problem regime then I simply send them back to their desk to rework their idea before I approve a dollar of funding towards implementation time.

On the second point our experience differs: at a certain level in most businesses people are not wholly incompetant. I've never met anyone that produces a design that can not be improved upon in subsequent iterations. However if you start with a good system design that matches the problem regime and you are careful in your implementation then you can arrive at something that requires very little change over a broad spectrum of applications. The idea that most programmers are incompetent is not true today: very few techniques or practices are secret today ... part of the reason why software people are commoditized and seeeing their incomes decrease or stagnate.

Why use a database?

Gringinho

linuxtrader

Guest

Gringinho

prophet

linuxtrader

Guest

nononsense

prophet

kc11415

Grizli

linuxtrader

Guest