Simple HTML Dom - Fatal error when using load_file

Martin Fejes picture Martin Fejes · Jul 14, 2012 · Viewed 7.1k times · Source

I'm trying to parse an HTML file that has terrible (believe me, it is) HTML structure and because of this and my lack of knowledge, I couldn't write my own parser. Later I tried using Simple HTML Dom parser, because a lot of people (on SO as well) recommend it.

I required the simple_html_dom.php, then created the object. They seem to work, the require() function returns "1" and var_dump()-ing the object returns an object.

After this I tried to load the URL as it was done in the manual, but I got a fatal error, no matter what URL I tried. The error was the following:

Fatal error: Call to undefined function mb_detect_encoding() in 
             /home/fema/web/subdomain/devel/www_root/parser/
             simplehtmldom_1_5/simple_html_dom.php on line 988

I checked what's on line 988 and it is the following:

// Have php try to detect the encoding from the text given to us.
        $charset = mb_detect_encoding($this->root->plaintext . "ascii", 
                   $encoding_list = array( "UTF-8", "CP1252" ) );

I understand that this is about character encoding, but that's all. I haven't found anything about this neither with google or on SO.

My whole code is (placeholder URL):

<?php

require('simplehtmldom_1_5/simple_html_dom.php');

// Create a DOM object
$dom = new simple_html_dom();

$dom->load_file('http://www.google.com/');

?>

Could anyone please tell me what to do? Or some kind of advice when something like this happens.

Thanks in advance.

Answer

GordonM picture GordonM · Jul 14, 2012

Your build of PHP is missing the multibyte string extension. It's actually quite unusual for this to be the case, unless you're using a really old build of PHP or one compiled with unusual compile options, as whilst the multibyte extension isn't enabled by default, it is usually considered to be one of the essential extensions that more or less every PHP build has these days.

If you're running an old version of PHP I'd strongly recommend upgrading, if you have a fairly recent build, check with phpinfo () that you have multibyte installed. If you don't, then you might need to reinstall or rebuild PHP from source.

If it's installed, --enable-mbstring should be in the list of compile options. See the PHP manual on the multibyte extension, especially the chapter on installation, for more details.