BeautifulSoup and lxml.html - what to prefer?

python beautifulsoup lxml

user225312 · Feb 11, 2011 · Viewed 38.6k times · Source

I am working on a project that will involve parsing HTML.

After searching around, I found two probable options: BeautifulSoup and lxml.html

Is there any reason to prefer one over the other? I have used lxml for XML some time back and I feel I will be more comfortable with it, however BeautifulSoup seems to be much common.

I know I should use the one that works for me, but I was looking for personal experiences with both.

Answer

The simple answer, imo, is that if you trust your source to be well-formed, go with the lxml solution. Otherwise, BeautifulSoup all the way.

Edit:

This answer is three years old now; it's worth noting, as Jonathan Vanasco does in the comments, that BeautifulSoup4 now supports using lxml as the internal parser, so you can use the advanced features and interface of BeautifulSoup without most of the performance hit, if you wish (although I still reach straight for lxml myself -- perhaps it's just force of habit :)).

BeautifulSoup and lxml.html - what to prefer?

Answer

Related questions