writing a fast parser in python

panzi picture panzi · Apr 27, 2010 · Viewed 17.2k times · Source

I've written a hands-on recursive pure python parser for a some file format (ARFF) we use in one lecture. Now running my exercise submission is awfully slow. Turns out by far the most time is spent in my parser. It's consuming a lot of CPU time, the HD is not the bottleneck.

I wonder what performant ways are there to write a parser in python? I'd rather not rewrite it in C. I tried to use jython, but that decreased performance a lot! The files I parse are partially huge (> 150 MB) with very long lines.

My current parser only needs a look-ahead of one character. I'd post the source here but I don't know if that's such a good idea. After all the submission deadline has not ended yet. But then, the focus in this exercise is not the parser. You can choose whatever language you want to use and there already is a parser for Java.

Note: I've a x86_64 system so psyco (and it seems also PyPy) is no option.

Update: I now uploaded my parser/writer to bitbucket.

Answer

wvd picture wvd · Apr 27, 2010

You could use ANTLR or pyparsing, they might speed up your parsing process.

And if you want to keep your current code, you might want to look at Cython/PyPy, which increases your perfomance (sometimes upto 4x).