Reading and saving the full HTML contents of a URL to a text file

user2580745 picture user2580745 · Mar 22, 2014 · Viewed 9k times · Source

Requirement:

To read HTML from any website say "http://www.twitter.com ".

Print the retrived HTML

Save it to a text file on local machine .

Code:

import java.net.*;

import java.io.*;

public class oddless {
    public static void main(String[] args) throws Exception {

        URL oracle = new URL("http://www.fetagracollege.org");
        BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));

        OutputStream os = new FileOutputStream("/Users/Rohan/new_sourcee.txt");


        String inputLine;
        while ((inputLine = in.readLine()) != null)
            System.out.println(inputLine);
        in.close();
    }
}

Code above retrieves the data, prints it on console and saves it to a text file but mostly it retrieves only half code (because of line space in html code). It does not save the code further.

Questions:

How can I save the full html code?

Are there any other alternatives?

Answer

Leos Literak picture Leos Literak · Mar 22, 2014

I have used different approach but I received same output like you. Is not there problem on server side of this URL?

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpGet = new HttpGet("http://www.fetagracollege.org");
CloseableHttpResponse response1 = httpclient.execute(httpGet);
try {
    System.out.println(response1.getStatusLine());
    HttpEntity entity1 = response1.getEntity();
    String content = EntityUtils.toString(entity1);
    System.out.println(content);
} finally {
    response1.close();
}

It finishes with:

    </table>
    <p><br>

UPDATE: This Faculty of Engineering and Technology does not have well formed home page. This content is complete, your code works well. But commentators have right, you shall use try/catch/finally block.