How to get page meta (title, description, images) like facebook attach url using Regex in java

Bhanu Gupta picture Bhanu Gupta · Jul 25, 2012 · Viewed 20.2k times · Source

How to get page meta (title, description, images) like facebook attach url using Regex in .java

Answer

user2080225 picture user2080225 · Sep 18, 2013

Here's a snippet that reads a web page and builds a little chunk of HTML that will display the Open Graph image, and Title to the right wrapping around the image. It falls back to using just html title if OG tags are missing, so it can work to represent all web pages.

public static String parsePageHeaderInfo(String urlStr) throws Exception {

    StringBuilder sb = new StringBuilder();
    Connection con = Jsoup.connect(urlStr);

    /* this browseragant thing is important to trick servers into sending us the LARGEST versions of the images */
    con.userAgent(Constants.BROWSER_USER_AGENT);
    Document doc = con.get();

    String text = null;
    Elements metaOgTitle = doc.select("meta[property=og:title]");
    if (metaOgTitle!=null) {
        text = metaOgTitle.attr("content");
    }
    else {
        text = doc.title();
    }

    String imageUrl = null;
    Elements metaOgImage = doc.select("meta[property=og:image]");
    if (metaOgImage!=null) {
        imageUrl = metaOgImage.attr("content");
    }

    if (imageUrl!=null) {
        sb.append("<img src='");
        sb.append(imageUrl);
        sb.append("' align='left' hspace='12' vspace='12' width='150px'>");
    }

    if (text!=null) {
        sb.append(text);
    }

    return sb.toString();       
}