Top "Html-content-extraction" questions

Techniques for predicting/detecting certain article text and extracting it from a particular document.

What is the best way to parse html in C#?

I'm looking for a library/method to parse an html file with more html specific features than generic xml parsing …

c# .net html parsing html-content-extraction
Extracting text from HTML file using Python

I'd like to extract the text from an HTML file using Python. I want essentially the same output I would …

python html text html-content-extraction
How to extract img src, title and alt from html using php?

I would like to create a page where all images which reside on my website are listed with title and …

php html regex html-parsing html-content-extraction
Extract part of a regex match

I want a regular expression to extract the title from a HTML page. Currently I have this: title = re.search(…

python html regex html-content-extraction
Options for HTML scraping?

I'm thinking of trying Beautiful Soup, a Python package for HTML scraping. Are there any other HTML scraping packages I …

html web-scraping html-parsing html-content-extraction
BeautifulSoup Grab Visible Webpage Text

Basically, I want to use BeautifulSoup to grab strictly the visible text on a webpage. For instance, this webpage is …

python text beautifulsoup html-content-extraction
Using BeautifulSoup to find a HTML tag that contains certain text

I'm trying to get the elements in an HTML doc that contain the following pattern of text: #\S{11} <h2&…

python regex beautifulsoup html-content-extraction
How do you parse an HTML in vb.net

I would like to know if there is a simple way to parse HTML in vb.net. I know that …

.net html vb.net parsing html-content-extraction
parsing HTML on the iPhone

Can anyone recommend a C or Objective-C library for HTML parsing? It needs to handle messy HTML code that won't …

iphone html parsing html-content-extraction
regular expression to extract text from HTML

I would like to extract from a general HTML page, all the text (displayed or not). I would like to …

html regex html-content-extraction text-extraction