Quote from propseeker:
but, i think your complaint that c++ would take forever to write a parser and that python is oh so much faster, is, well... a bit hyperbolic. c++ can be as simple or as complex as you make it. using lex/yacc is an apples to oranges comparison to the few lines of python code. apples to apples would be using fstream and an istringstream object, same number of lines of code, and most likely better efficiency. python is great, but the only time i think it's ever applicable in a competitive trading firm is in one-off code or, when seeing a lot of reuse, where speed is not and will never be an issue.
The amount of time it takes to change the code to adapt for a new formatting is tedious if I have to rewrite the parser using istringstream. Log files can and do change in organizations. I'd much rather change the regex and the group ordering than go adjusting my reads out of an istringstream.
Also, my beef with istringstream and primitive parsers based on things like strtok() like that is that you don't get the added benefit of having reg-ex groups. For example, you can have nested groups and then reference portions of the group. But it's not just regex groups -- dealing with repeating patterns, special character classes, etc. A 2 character change vs either coding a loop structure or changing a large static expression makes more sense.
Third, if speed were really a concern, I'd just write a C++ binding to deal with the slow part -- presumably, interpreting the data once it has been parsed.
Fourth, so you open an fstream. Say it's a CSV file -- C++, by default, doesn't come with classes to deal with quote-chars and such. You then iterate over the file with getline, and then you have to implement your own split() function. Is this really a productive use of your time? I don't think so. Boost helps in some regard, but now you've got a new makefile change on your hands.
If you wanted to use a library, you'd have to find it, link it in, then run the program. You get a debug build, run it, and if it crashes, you need to examine the core file to find the line where it flopped.
In python, you're going to simply import csv, iterate over the contents of the file, and the csv rows are returned you pre-split into an array. If it f's up, you get your line number and just go. There's no loading gdb and looking at the callstack, and wasting more time. And don't act like you don't make mistakes!
Fifth, suppose you wanted to add a visualization on top. Are you going to talk to a COM interface with R? Are you going to code a GUI using a C++ GUI toolkit? Now you have to link a new library, new makefile changes. That would take forever. It makes more sense to use RPy and wxWidgets' python bindings and get the job done in half the time -- on Ubuntu, it's apt-get install whatever, then import magic. Simple as that.
Sixth, NumPy and SciPy. Enough said. With C++, you have to go hunt for or standardize internally on the computational package you want to use, then adjust your makefile to link in whatever routines you want. More makefile work.
Seventh, If I pay a guy $70 and hour for his quant abilities and he then spends his time writing split() or re.search()/match() over again in C++, then I'd just get angry. I would much rather invest in him learning python (or even perl) to not go through this. If he was extremely proficient with the boost regex classes, I -might- give him leeway on using C++.