Parsing HTML in python - lxml or BeautifulSoup? Which of these is better for what kinds of purposes?

Monika Sulik picture Monika Sulik · Dec 17, 2009 · Viewed 35.6k times · Source

From what I can make out, the two main HTML parsing libraries in Python are lxml and BeautifulSoup. I've chosen BeautifulSoup for a project I'm working on, but I chose it for no particular reason other than finding the syntax a bit easier to learn and understand. But I see a lot of people seem to favour lxml and I've heard that lxml is faster.

So I'm wondering what are the advantages of one over the other? When would I want to use lxml and when would I be better off using BeautifulSoup? Are there any other libraries worth considering?

Answer

mikeal picture mikeal · Dec 17, 2009

Pyquery provides the jQuery selector interface to Python (using lxml under the hood).

http://pypi.python.org/pypi/pyquery

It's really awesome, I don't use anything else anymore.