Hey Prophet,
I think we're in violent agreement here.
I'm not against optimization, I just think that someone just starting out with their implementation shouldn't worry about optimization right off the bat. As you say, your 90/10 rule is expressing much the same idea, so I don't think we're that far apart.
To get back to the original question of the thread, that's why I think it's better to start with a database which does an enormous amount of work for you, providing atomicity, a data model, a client-server model, etc., but as you point out not the best performance. It's a lot easier to take a good database and move the crucial bits into flat files. For example in my system all the data is in a database but it is cached in flat files. If the database gets too huge I might need to migrate all the data into flat files permanently and just keep metadata in the database. But no matter where that ends up, the database will continue to provide all kinds of advantages, for example I do data capture on a Windows box and analysis on Linux. My life is a lot easier because Postgres takes care of the networking, locking, transactions, notifications, and the like, which would be a lot of work if I had to develop my own client server model.
It's just kinda amusing for me to see the heavy focus on fancy hardware and algorithms here when you can accomplish great things with simple algorithms, obsolete hardware, and a (God forbid) CRT monitor instead of flat panel!
I think that pushing the limits of performance will often obscure the underlying issues that affect profitability. One example is statistical validity, I think the more data you have and the more you analyze it, the easier it is to over fit your data and end up with serious implementation shortfall as a result. I've seen this again and again, and in such cases throwing more hardware or data at the problem will make it worse.
Martin
I think we're in violent agreement here.
I'm not against optimization, I just think that someone just starting out with their implementation shouldn't worry about optimization right off the bat. As you say, your 90/10 rule is expressing much the same idea, so I don't think we're that far apart.To get back to the original question of the thread, that's why I think it's better to start with a database which does an enormous amount of work for you, providing atomicity, a data model, a client-server model, etc., but as you point out not the best performance. It's a lot easier to take a good database and move the crucial bits into flat files. For example in my system all the data is in a database but it is cached in flat files. If the database gets too huge I might need to migrate all the data into flat files permanently and just keep metadata in the database. But no matter where that ends up, the database will continue to provide all kinds of advantages, for example I do data capture on a Windows box and analysis on Linux. My life is a lot easier because Postgres takes care of the networking, locking, transactions, notifications, and the like, which would be a lot of work if I had to develop my own client server model.
It's just kinda amusing for me to see the heavy focus on fancy hardware and algorithms here when you can accomplish great things with simple algorithms, obsolete hardware, and a (God forbid) CRT monitor instead of flat panel!
I think that pushing the limits of performance will often obscure the underlying issues that affect profitability. One example is statistical validity, I think the more data you have and the more you analyze it, the easier it is to over fit your data and end up with serious implementation shortfall as a result. I've seen this again and again, and in such cases throwing more hardware or data at the problem will make it worse.Martin
