Top "Html-parsing" questions

HTML parsing is the process of consuming a serialization of an HTML document and producing a representation that you can work with programmatically — e.g., in order to extract data from it.

Get all elements by class name using DOMDocument

This question seems to have been answered numerous times but i still cant seem to put the pieces together. I …

php html-parsing domdocument
Can you provide examples of parsing HTML?

How do you parse HTML with a variety of languages and parsing libraries? When answering: Individual comments will be linked …

html language-agnostic html-parsing
jquery-like HTML parsing in Python?

Is there any Python library that allows me to parse an HTML document similar to what jQuery does? i.e. …

python jquery css-selectors html-parsing
Is there a built-in HTML validator in any major browser?

In Firefox, there's a Extension called “Html Validator”. It adds a little indicator icon at the bottom right corner of …

html html-parsing
How to parse an HTML string in Google Apps Script without using XmlService?

I want to create a scraper using Google Spreadsheets with Google Apps Script. I know it is possible and I …

javascript parsing google-apps-script google-sheets html-parsing
xpath find node that does not contain child

I'm trying to create some xpath that will find all a tags that do not contain img tags, so that …

xpath html-parsing xml-parsing
Difference between "findAll" and "find_all" in BeautifulSoup

I would like to parse an HTML file with Python, and the module I am using is BeautifulSoup. It is …

python xml-parsing html-parsing beautifulsoup
PHP DOM: parsing a HTML list into an array?

I have the below HTML string, and I would like to turn it into an array. $string = ' <a …

php dom php-5.3 html-parsing
beautiful soup getting tag.id

I'm attempting to get a list of div ids from a page. When I print out the attributes, I get …

python html beautifulsoup html-parsing
How to extract a JSON object that was defined in a HTML page javascript block using Python?

I am downloading HTML pages that have data defined in them in the following way: ... <script type= "text/javascript"&…

python html-parsing beautifulsoup headless-browser