What's my best bet for parsing HTML if I can't use BeautifulSoup or lxml? I've got some code that uses SGMLlib but it's a bit low-level and it's now deprecated.
I would prefer if it could stomache a bit of malformed HTML although I'm pretty sure most of the input will be pretty clean.
Python has a native HTML parser, however the Tidy wrapper Nick suggested would probably be a solid choice as well. Tidy is a very common library, (written in C is it?)