How to read the Website content in c#?

Azeem Akram picture Azeem Akram · May 14, 2012 · Viewed 11.7k times · Source

I want to read the website text without html tags and headers. i just need the text displayed in the web browser.

i don't need like this

<html>
<body>
bla bla </td><td>
bla bla 
<body>
<html>

i just need the text "bla bla bla bla".

I have used the webclient and httpwebrequest methods to get the HTML content and to split the received data but it is not possible because if i change the website the tags may change.

So is there any way to get only the displayed text in the website anagrammatically?

Answer

Tigran picture Tigran · May 14, 2012

You need to use special HTML parser. The only way to get the content of the such non regular language.

See: What is the best way to parse html in C#?