quite impressive. Aside the memory issues for large files pandas seems to be quite capable in importing text files.
It takes me 1.7s running your code. You seem to be running on a pretty good SSD drive. Btw, I tested within Anaconda/Spyder on top of Python 3.4. This comes close to the 1.3s it takes C# to import the data (the rest is taken up with parsing the data into strong types)
It takes me 1.7s running your code. You seem to be running on a pretty good SSD drive. Btw, I tested within Anaconda/Spyder on top of Python 3.4. This comes close to the 1.3s it takes C# to import the data (the rest is taken up with parsing the data into strong types)
I revised the test to use the sample data the OP posted. It's now actually faster.
The code is in the attached text file. It generates the test data, saves it to a file (all outside pandas), reads it into a pandas dataframe then strips out the row below the headers. It's a bit faster still if you just supply the headers and tell pandas to ignore the first 3 rows.
So revised timings:
1 million rows ~ 1.15s (edit: originally said 1.27s but that was including saving csv out)
10 million rows ~ 10.75s (edit: originally said 14.5s but that was including saving csv out)
Surprising result to me. Will try the windows/old SSD at some point soon-ish.
Q1