From what I can make out, the two main HTML parsing libraries in Python are lxml and BeautifulSoup. I've chosen BeautifulSoup for a project I'm working on, but I chose it for no particular reason other than finding the syntax a bit easier to learn and understand. But I see a lot of people seem to favour lxml and I've heard that lxml is faster.
So I'm wondering what are the advantages of one over the other? When would I want to use lxml and when would I be better off using BeautifulSoup? Are there any other libraries worth considering?
Pyquery
provides the jQuery selector interface to Python (using lxml under the hood).
http://pypi.python.org/pypi/pyquery
It's really awesome, I don't use anything else anymore.