str_get_html is not loading a valid html string

Dani picture Dani · Jan 5, 2013 · Viewed 23.4k times · Source

I receive an html string using curl:

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html_string = curl_exec($ch);

When I echo it I see a perfectly good html as I require for my parsing needs. But, When trying to send this string to HTML DOM PARSER method str_get_html($html_string), It would not upload it (returns false from the method invocation).

I tried saving it to file and opening with file_get_html on the file, but the same thing occurs.

What can be the cause of this? As I said, the html looks perfectly fine when I echo it.

Thanks a lot.

The code itself:

$html = file_get_html("http://www.bgu.co.il/tremp.aspx");
$v = $html->find('input[id=__VIEWSTATE]');
$viewState = $v[0]->attr['value'];
$e = $html->find('input=[id=__EVENTVALIDATION]');
$event = $e[0]->attr['value'];

$html->clear(); 
unset($html);

$body = " A_STRING_THAT_CONTAINS_SOME_DATA " 

$ch = curl_init("http://www.bgu.co.il/tremp.aspx");
curl_setopt($ch, CURLOPT_POSTFIELDS, $body);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$html_string = curl_exec($ch);

$file_handle = fopen("file.txt", "w");
fwrite($file_handle, $html_string);
fclose($file_handle);

curl_close($ch);

$html = str_get_html($html_string);

Answer

twxia picture twxia · Feb 9, 2014

You curl link seems have many element(large file).

And I am parsing a string(file) as large as your link and encounter this problem.

After I saw the source code, I found the problem. It works for me !


I found that simple_html_dom.php have limit the size you read.

// get html dom from string
  function str_get_html($str, $lowercase=true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_B     R_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)
  {
           $dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $stripRN, $defaultBRText, $defaultSpanText);
           if (empty($str) || strlen($str) > MAX_FILE_SIZE)
           {
                   $dom->clear();
                   return false;
           }
           $dom->load($str, $lowercase, $stripRN);
           return $dom;
  }

you must to change the default size below (It's on the top of the simple_html_dom.php)
maybe change to 100000000 ? it's up to you.

define('MAX_FILE_SIZE', 6000000);