So you don't know. You're trying to present yourself like you have real world server experience when you don't. You deflect to other issues. I see this kind of answer all the time. Language makes a huge difference. I already noted that with python in a previous post. You seemed to agree.
Let's go over the rest...
Why are you indicating that disk speeds are so blindingly fast? They may be able to burst data in their cache through the south bridge, to the north bridge, and into a DMA address somewhere in RAM, but this is rarely sustainable at burst speeds. The better SSD's can probably do 300-500megs/sec quick burst. Sustained will be a lot lower. This is hardly keeping up with the processor. For the sake of argument, let's stick with SSD's as rotational platter based disks will be much slower... but are still in use for most data storage back ends, so shouldn't be totally ignored.
A semi-current processor will be 3ghz or faster. Let's stick with the 3ghz number for easy math. That's 3 billion clock ticks per second. The actual instructions per cycle (IPC) will be much higher depending on the program and optomizations. I don't have current numbers, but L2 cache should be over 30g/sec and RAM speed should be over 5g/sec. I'm willing to admit that memtest numbers can be a little off, but the point is that internal CPU speeds are hugely faster than any disk. The CPU waiting for a file to load can add up to billions of clock ticks over the total length of the file. On a 3ghz processor, a 20ms delay means the CPU has to wait for 60 million clock ticks.
Some might argue to just run multiple instances of the program and during the kernel waits the other instances can fill in the gaps. That only works to a point. If the data set is large, a lot of instances will be fighting for the disk bandwidth. This is part of the reason I recommended converting CSV to a smaller binary format. This is also why in some instances that compressed files may load faster than larger uncompressed ones. Those files get into system RAM faster and time offset the excessive number of CPU cycles needed to uncompress them. But as I hinted above, sometimes this works, many times not.
With that being said, anyone who thinks getting their data over a network connection (either DB or a shared file system) needs their head examined. This hasn't been mentioned in this thread, yet, but I know someone will be thinking about it. Gigabit NIC's are fast, but they will never compete with a built in disk controller. If the remote system the data is being fetched from is equally fast, it will still have to go through all of what I've stated so far PLUS the networking overhead. If the server isn't being heavily used, this isn't much of a problem. On the flip side, it is.
Going back to my kernel statements, the file system kernel driver will be responsible for multiple reads (and maybe a few status update writes) from various locations on the disk to find the disk data and get it into system RAM. Remember these will go back and forth through the motherboard bridges. If the disk data isn't too fragmented, this is relatively fast in disk terms (not CPU terms). The kernel file system driver will execute in some kind of loop that will add up to thousands of lines of code being executed (not including the necessary wait times for the disk to catch up). These add up.
Back to databases. The better databases (usually the more expensive ones) will bypass the file system totally and write to a raw partition directly. This eliminates the file system overhead and potentially some file fragmentation issues. The rest of the databases will use the file system like any other file. They will issue a file seek, get a find, read through multiple tables scattered about, start assembling the data, read more and repeat the assembly, and then return the data request to the program usually through some kind of socket. Depending on the database server programming and data design, there might also be some temporary files and queue files scattered about. This has a HUGE overhead, especially if the data being fetched is hundreds of thousands of little OHLC bars. If there are multiple requests being made to the database server (like from dozens of the same program being run or other paying clients), there will be a lot of thrashing about (but minimized with SSD's).
From your point of view, you've got a database on a workstation that's minimally used. You're not running multiple instances of the same program. It will look deceptively fast. Don't go telling others that they can scale this up while ignoring real issues.
One thing you are quite right about is code efficiency when processing large amounts of data. I asked about your programming language and got ignored and deflected. Going back to my python example, being 30-50x heavier than C/C++ makes a huge difference in language selection. Where people get deceived to thinking that other languages (interpreted in general) are fast is that they tend to return results faster than the user is expecting them. There's nothing wrong with that on individual workstations that are single use. I do that on mine all the time. What I keep indicating is that mentality will kill a heavly used server for reasons already mentioned. If your code runs fast on your system, great! But don't mislead other when it comes to server scaling.
You've also mentioned that loading data from CSV files is fast enough. For a single workstation, yes. I do this with my scratch pad program from a RAM disk. For light server use, you can get away with it. For heavy server use, NO! Why? Break it down. 1) CSV files are larger than their binary counterparts. See above about disk bandwidth. 2) Every CSV field has to be parsed. You may pass your program one line of code to do this, but in processor instructions, there are probably a dozen or so. 3) Every CSV value, once parsed, has to be converted back into an interger or float. That will take another dozen or so processor instructions. Compare that to a binary load. The data block has everything in fixed positions. There is no data conversion, just variable loading. These operations will take a few processor instructions per variable. That is a huge difference. Parsing hundreds of thousands of bars translates to millions of variables. On small data sets, this doesn't add up to much. On large data sets, it does. This is simple math. If you're using python, multiply the CSV overhead by 30 (or maybe more).
Everything I've mentioned so far falls under the catetory of "Performance And Tuning". That topic is also much larger than what I've mentioned.
I keep telling fan27 to keep these point in mind when programming because if his site really takes off, he will need his server(s) to perform maximally. Inefficient programs in inefficient languages will mean more servers/cloud time will have to be purchased and costs will be unnecessarily high.
On a personal note, it also bugs me when I'm browsing the web and I hit some poorly scripted site that I need information from, but it's cratering under it's own weight. As a business, it's bad to piss off your customers with these, especially if they're paying for access.
So... fan27, keep what I've said in mind. If your site takes off, you'll be running into these issues in the future. Plan ahead and you'll do great. Ignore them and you'll run into some nasty problems just like others have.
Zzzz1: I don't really mean to bash you, but you're the one who stood out the most. There's a lot more to admin than what's on the surface. To be fair, I had more programming experience than admin experience when I started to take over handling servers many years ago. I had some rough times to get through, but the programming experience was useful in watching server execution and figuring out where the processes were slowing down and hanging. Once I had this experience, I could then tell clients why they were having performance problems, and what was needed to be done to fix it. They like that compared to the useless answer of "just buy more expensive hardware".
I am not sure why you appear so agitated. This is not even an issue of language choice much less of kernels. Any modern language today can read binary data from disk and de-serialize the data at a rate of several million data points per second. When you run even a single strategy that peruses algorithms that are computationally intensive then your rate of processed data points can easily drop down to less than one million per second. Hence my saying that the throughput of data imports is usually not the bottleneck. I believe the real bottleneck lies in un-optimized and inefficient algorithms in the code that consumes the data.
Your points re server, desktop, workstations, choice of hardware today are not making much of a difference of what I said above. Most commodity hardware suffices to load data faster than it can be consumed. Regardless of language.
P.S.: I agree with all your points that you made when you iterate an empty consumer over your imported data structures. For bragging rights all of the above you said is most certainly valid. But that is not reflective of real world use cases where you load and import data for the purpose of steaming to a consumer that utilizes the data as source to its algorithms.
Re your raw data import vs db import I do not see your point. A db either accesses files on disk or accesses memory. This determines the isolated process of data loading. But you are not done after loading a bunch of binary data, you need to de-serialize them also, you need to feed the target data structures into algorithms which are much more computationally intensive than the time it takes for most any columnar database, in memory or file based, or raw data loads.