Resources for lexing, tokenising and parsing in python

Hamish Downer picture Hamish Downer · Aug 31, 2008 · Viewed 33k times · Source

Can people point me to resources on lexing, parsing and tokenising with Python?

I'm doing a little hacking on an open source project (hotwire) and wanted to do a few changes to the code that lexes, parses and tokenises the commands entered into it. As it is real working code it is fairly complex and a bit hard to work out.

I haven't worked on code to lex/parse/tokenise before, so I was thinking one approach would be to work through a tutorial or two on this aspect. I would hope to learn enough to navigate around the code I actually want to alter. Is there anything suitable out there? (Ideally it could be done in an afternoon without having to buy and read the dragon book first ...)

Edit: (7 Oct 2008) None of the below answers quite give what I want. With them I could generate parsers from scratch, but I want to learn how to write my own basic parser from scratch, not using lex and yacc or similar tools. Having done that I can then understand the existing code better.

So could someone point me to a tutorial where I can build a basic parser from scratch, using just python?

Answer

Eli Bendersky picture Eli Bendersky · Sep 20, 2008

I'm a happy user of PLY. It is a pure-Python implementation of Lex & Yacc, with lots of small niceties that make it quite Pythonic and easy to use. Since Lex & Yacc are the most popular lexing & parsing tools and are used for the most projects, PLY has the advantage of standing on giants' shoulders. A lot of knowledge exists online on Lex & Yacc, and you can freely apply it to PLY.

PLY also has a good documentation page with some simple examples to get you started.

For a listing of lots of Python parsing tools, see this.