One block on the page is filled with content by JavaScript and after loading page with Jsoup there is none of that inforamtion. Is there a way to get also JavaScript generated content when parsing page with Jsoup
?
Can't paste page code here, since it is too long: http://pastebin.com/qw4Rfqgw
Here's element which content I need: <div id='tags_list'></div>
I need to get this information in Java. Preferably using Jsoup. Element is field with help of JavaScript:
<div id="tags_list">
<a href="/tagsc0t20099.html" style="font-size:14;">разведчик</a>
<a href="/tagsc0t1879.html" style="font-size:14;">Sr</a>
<a href="/tagsc0t3140.html" style="font-size:14;">стратегический</a>
</div>
Java code:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class Test
{
public static void main( String[] args )
{
try
{
Document Doc = Jsoup.connect( "http://www.bestreferat.ru/referat-32558.html" ).get();
Elements Tags = Doc.select( "#tags_list a" );
for ( Element Tag : Tags )
{
System.out.println( Tag.text() );
}
}
catch ( IOException e )
{
e.printStackTrace();
}
}
}
JSoup is an HTML parser, not some kind of embedded browser engine. This means that it's completely unaware of any content that is added to the DOM by Javascript after the initial page load.
To get access to that type of content you will need an embedded browser component, there are a number of discussions on SO regarding that kind of component, eg Is there a way to embed a browser in Java?