How to extract texts between <p> tags

Question 1

How to extract texts between <p> tags

java html parsing jsoup

rena-c · May 23, 2013 · Viewed 29.1k times · Source

Answer

Answer

This can do the job

Elements e=doc.select("p");

Here is a list of all selectors you can use.

Suppose you have this html:

String html="<p>some <strong>bold</strong> text</p>";

To get some bold text as result you should use:

Document doc = Jsoup.parse(html);
Element p= doc.select("p").first();
String text = doc.body().text(); //some bold text

or

String text = p.text(); //some bold text

Suppose now you have the following complex html

String html="<div id=someid><p>some text</p><span>some other text</span><p> another p tag</p></div>"

To get the values from the two p tags you have to do something like this

Document doc = Jsoup.parse(html);
Element content = doc.getElementById("someid");
Elements p= content.getElementsByTag("p");

String pConcatenated="";
for (Element x: p) {
  pConcatenated+= x.text();
}

System.out.println(pConcatenated);//sometext another p tag

You can find more info here also

Hope this helped

Question 2

I want to extract texts from HTML page(s) which placed in p and li tags, so I can start to tokenize the page to construct inverted index(es) for each page in order to answer search queries.

How I can get p tags using jsoup

Elements e = doc.select("");

What could be the string to be written in that parameter?

How to extract texts between <p> tags

Answer

Related questions