Top "Html-parsing" questions

HTML parsing is the process of consuming a serialization of an HTML document and producing a representation that you can work with programmatically — e.g., in order to extract data from it.

ExpatError: junk after document element

I really don't know, what the Problem is? I get the following error: File "C:\Python27\lib\xml\dom\expatbuilder.…

python html-parsing minidom
How to Minify HTML code?

My idea is to somehow minify HTML code in server-side, so client receive less bytes. What do I mean with "…

html html-parsing minify htmlpurifier min
How to parse malformed HTML in python, using standard libraries

There are so many html and xml libraries built into python, that it's hard to believe there's no support for …

python html dom parsing html-parsing
How does a parser (for example, HTML) work?

For argument's sake lets assume a HTML parser. I've read that it tokenizes everything first, and then parses it. What …

html browser parsing html-parsing tokenize
get contents of <a> tags using python

Assuming I have html read into my program like this: <p><a href="http://vancouver.en.craigslist.…

python html-parsing sgml
DOMDocument remove script tags from HTML source

I used @Alex's approach here to remove script tags from a HTML document using the built in DOMDocument. The problem …

php html-parsing xss domdocument script-tag
Writing an HTML Parser

I am currently attempting (or planning to attempt) to write a simple (as possible) program to parse an html document …

html parsing html-parsing
Parsing the html meta tag with jsoup library

Just started exploring the Jsoup library as i will use it for one of my projects. I tried googling but …

java html html-parsing jsoup
Jsoup set accept-header request doesn't work

I'm trying to parse data from tempobet.com in english format. The thing is when I use google rest client …

java html-parsing jsoup request-headers
How can I use iText to convert HTML with images and hyperlinks to PDF?

I'm trying to convert HTML to PDF using iTextSharp in an ASP.NET web application that uses both MVC, and …

pdf itext html-parsing html-agility-pack xmlworker