parsing/extracting a HTML Table, Website in Java

Christian Steuer picture Christian Steuer · Jul 11, 2015 · Viewed 15.5k times · Source

I want to parse the contents of this HTML table :

Blockquote

Here is the full website with source code:

http://www.kantschule-falkensee.de/uploads/dmiadgspahw/klassen/A_Klasse_11.htm

I want to parse the data for each cell, all 5 cells under "Montag"(Monday) as an example. I tried several ways of parsing this Website using JSOUP but i havent got any succes with it. My main Goal is to show the contents in an listview in an Android app. For now i tried to print the contents in a java console. Both Languages are accepted :). Any Help is appreciated.

Answer

Sachin Nambiar Nalavattanon picture Sachin Nambiar Nalavattanon · Jul 11, 2015

Here are the steps you would need to follow:

1) You could use any of the below java libraries for HTML scraping:

2) Use Xpath helper

Eg 1: Enter "//tr[1]//td[1]" in the query and it will give all table elements at position (1,1)

Eg 2: "/html/body[@class='tt']/center/table[1]/tbody/tr[4]/td[3]/table/tbody/tr/td" Will give you all 15 values under Montag.

Eg 3: "/html/body[@class='tt']/center/table[1]/tbody/tr/td/table/tbody/tr/td" Will give you all 380 entries of the table

OR

Example using Jsoup

import org.jsoup.Jsoup;
import java.io.IOException;

public class Main {
    public static void main(String[] args) throws IOException {
        org.jsoup.nodes.Document doc = Jsoup.connect("http://www.kantschule-falkensee.de/uploads/dmiadgspahw/klassen/A_Klasse_11.htm").get();
        org.jsoup.select.Elements rows = doc.select("tr");
        for(org.jsoup.nodes.Element row :rows)
        {
            org.jsoup.select.Elements columns = row.select("td");
            for (org.jsoup.nodes.Element column:columns)
            {
                System.out.print(column.text());
            }
            System.out.println();
        }

    }
}