How to retrieve HTML content from WebView (as a string)

JohnK picture JohnK · Mar 10, 2011 · Viewed 97.2k times · Source

How do I retrieve all HTML content currently displayed in a WebView?

I found WebView.loadData() but I couldn't find the opposite equivalent (e.g. WebView.getData())

Please note that I am interested in retrieving that data for web pages that I have no control over (i.e. I cannot inject a Javascript function into those pages, so that that it would call a Javascript interface in WebView).

Answer

shridutt kothari picture shridutt kothari · Feb 6, 2013

You can achieve this through:

final Context myApp = this;

/* An instance of this class will be registered as a JavaScript interface */
class MyJavaScriptInterface
{
    @SuppressWarnings("unused")
    public void processHTML(String html)
    {
        // process the html as needed by the app
    }
}

final WebView browser = (WebView)findViewById(R.id.browser);
/* JavaScript must be enabled if you want it to work, obviously */
browser.getSettings().setJavaScriptEnabled(true);

/* Register a new JavaScript interface called HTMLOUT */
browser.addJavascriptInterface(new MyJavaScriptInterface(), "HTMLOUT");

/* WebViewClient must be set BEFORE calling loadUrl! */
browser.setWebViewClient(new WebViewClient() {
    @Override
    public void onPageFinished(WebView view, String url)
    {
        /* This call inject JavaScript into the page which just finished loading. */
        browser.loadUrl("javascript:window.HTMLOUT.processHTML('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>');");
    }
});

/* load a web page */
browser.loadUrl("http://lexandera.com/files/jsexamples/gethtml.html");

You will get the whole Html contnet in processHTML method. and it wont make another request for webpage. so it is also more efficient way for doing this.

Thanks.