I would like to extract from a general HTML page, all the text (displayed or not).
I would like to remove
Is there a regular expression (one or more) that will achieve that?
Remove javascript and CSS:
<(script|style).*?</\1>
Remove tags
<.*?>