I am building an RSS parser using the SimpleXML Class and I was wondering if using the DOMDocument class would improve the speed of the parser. I am parsing an rss document that is at least 1000 lines and I use almost all of the data from those 1000 lines. I am looking for the method that will take the least time to complete.
SimpleXML
and DOMDocument
both use the same parser (libxml2
), so the parsing difference between them is negligible.
This is easy to verify:
function time_load_dd($xml, $reps) {
// discard first run to prime caches
for ($i=0; $i < 5; ++$i) {
$dom = new DOMDocument();
$dom->loadXML($xml);
}
$start = microtime(true);
for ($i=0; $i < $reps; ++$i) {
$dom = new DOMDocument();
$dom->loadXML($xml);
}
$stop = microtime(true) - $start;
return $stop;
}
function time_load_sxe($xml, $reps) {
for ($i=0; $i < 5; ++$i) {
$sxe = simplexml_load_string($xml);
}
$start = microtime(true);
for ($i=0; $i < $reps; ++$i) {
$sxe = simplexml_load_string($xml);
}
$stop = microtime(true) - $start;
return $stop;
}
function main() {
// This is a 1800-line atom feed of some complexity.
$url = 'http://feeds.feedburner.com/reason/AllArticles';
$xml = file_get_contents($url);
$reps = 10000;
$methods = array('time_load_dd','time_load_sxe');
echo "Time to complete $reps reps:\n";
foreach ($methods as $method) {
echo $method,": ",$method($xml,$reps), "\n";
}
}
main();
On my machine I get basically no difference:
Time to complete 10000 reps:
time_load_dd: 17.725028991699
time_load_sxe: 17.416455984116
The real issue here is what algorithms you are using and what you are doing with the data. 1000 lines is not a big XML document. Your slowdown will not be in memory usage or parsing speed but in your application logic.