Quote from Sparohok:
Hey Prophet,
I think we're in violent agreement here.
I'm not against optimization, I just think that someone just starting out with their implementation shouldn't worry about optimization right off the bat.
The point I keep trying to make is that optimization is actually easier, much easier than you portray. Yes, beginning programmers shouldnât have to worry about it. However, it is wrong to say they should neglect it completely. They should know just enough about proper algorithm design and use a profiler as a learning tool such that their code is decently optimized from the start. Iâm not suggesting heavy profiling to get every last drop of performance or generating less readable code. I mean a basic algorithm understanding and exploratory profiling to see what runs fast and what doesnât. Not too much trouble if you bother to learn about it. What problem would anyone have with this philosophy?
To get back to the original question of the thread, that's why I think it's better to start with a database which does an enormous amount of work for you, providing atomicity, a data model, a client-server model, etc., but as you point out not the best performance. It's a lot easier to take a good database and move the crucial bits into flat files.
Maybe weâll never agree on stuff⦠sadly. I feel that flat files have tremendous advantages starting out. Ascii files can be and edited, understood and debugged better than databases. I store all my market data in CSV files and cache it in binary files as itâs used. Why? I sometimes need to correct for market data gaps, patching my primary serverâs data with the backup server data. Just edit the CSV file and delete the binary cache file. I use flat binary files because they are fast, and I can manage them easier, compressing and archving data I don't need anymore. If I had to use a database, I would be adding, subtracting, purging, and archving many GB of data per day through the database. Lets hope the database is smart enough to handle the internal layout so it doesn't take hours to process.
For example in my system all the data is in a database but it is cached in flat files. If the database gets too huge I might need to migrate all the data into flat files permanently and just keep metadata in the database.
I agree on that last point.
It's just kinda amusing for me to see the heavy focus on fancy hardware and algorithms here when you can accomplish great things with simple algorithms, obsolete hardware, and a (God forbid) CRT monitor instead of flat panel!
I think that pushing the limits of performance will often obscure the underlying issues that affect profitability.
You know I hate saying this. Unfortunatley, here you go again portraying things as black-or-white, one-or-the other. The truth of the matter is that one can achieve a very nice combination of both performance and expression, with surprisingly little effort. Both performance and expression can speed the path to profitability. How does pushing performance limits necessarily interfere with profitability? I donât see how one can negate the other, except in the case of poor, unplanned or uneducated designs... incompetence.
One example is statistical validity, I think the more data you have and the more you analyze it, the easier it is to over fit your data and end up with serious implementation shortfall as a result.
You have it backwards. More market data allows greater statistical significance. Less data leads to over-fit results. Please prove your opposite point of view.
I've seen this again and again, and in such cases throwing more hardware or data at the problem will make it worse.
Sure, plenty of quants have thrown hardware at a failed system, only to still end up with a failed system. It was their design at fault, not the use of hardware. I threw hardware at my systems, which resulted in smoother and more substantial returns, by virtue of greater diversification to more markets and more systems per market. Youâll find examples either way. The question is does extra computation necessarily hurt? If yes, then why do you use any computation to begin with?
It is ridiculous to claim there is a certain optimial amount of computation, beyond which is detrimental to profitability. That is what you are suggesting. It doesn't make sense logically, unless you are assuming an inherent amount of incompetence. In that case the problem is core incompetence of the designer, not the amount of computation. They will fail with any amount of computation.