how to extract links and titles from a .html page?

Toni Michel Caubet picture Toni Michel Caubet · Dec 12, 2010 · Viewed 56.9k times · Source

for my website, i'd like to add a new functionality.

I would like user to be able to upload his bookmarks backup file (from any browser if possible) so I can upload it to their profile and they don't have to insert all of them manually...

the only part i'm missing to do this it's the part of extracting title and URL from the uploaded file.. can anyone give a clue where to start or where to read?

used search option and (How to extract data from a raw HTML file?) this is the most related question for mine and it doesn't talk about it..

I really don't mind if its using jquery or php

Thank you very much.

Answer

Toni Michel Caubet picture Toni Michel Caubet · Dec 12, 2010

Thank you everyone, I GOT IT!

The final code:

$html = file_get_contents('bookmarks.html');
//Create a new DOM document
$dom = new DOMDocument;

//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
@$dom->loadHTML($html);

//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementsByTagName('a');

//Iterate over the extracted links and display their URLs
foreach ($links as $link){
    //Extract and show the "href" attribute.
    echo $link->nodeValue;
    echo $link->getAttribute('href'), '<br>';
}

This shows you the anchor text assigned and the href for all links in a .html file.

Again, thanks a lot.