Why is array_key_exists 1000x slower than isset on referenced arrays?

Kendall Hopkins picture Kendall Hopkins · Jun 14, 2011 · Viewed 11k times · Source

I have found that array_key_exists is over 1000x slower than isset at check if a key is set in an array reference. Does anyone that has an understanding of how PHP is implemented explain why this is true?

EDIT: I've added another case that seems to point to it being overhead required in calling functions with a reference.

Benchmark Example

function isset_( $key, array $array )
{
    return isset( $array[$key] );
}

$my_array = array();
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    array_key_exists( $i, $my_array );
    $my_array[$i] = 0;
}
$stop = microtime( TRUE );
print "array_key_exists( \$my_array ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    isset( $my_array[$i] );
    $my_array[$i] = 0;
}
$stop = microtime( TRUE );
print "isset( \$my_array ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    isset_( $i, $my_array );
    $my_array[$i] = 0;
}
$stop = microtime( TRUE );
print "isset_( \$my_array ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    array_key_exists( $i, $my_array_ref );
    $my_array_ref[$i] = 0;
}
$stop = microtime( TRUE );
print "array_key_exists( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    isset( $my_array_ref[$i] );
    $my_array_ref[$i] = 0;
}
$stop = microtime( TRUE );
print "isset( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    isset_( $i, $my_array_ref );
    $my_array_ref[$i] = 0;
}
$stop = microtime( TRUE );
print "isset_( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

Output

array_key_exists( $my_array ) 0.0056459903717
isset( $my_array ) 0.00234198570251
isset_( $my_array ) 0.00539588928223
array_key_exists( $my_array_ref ) 3.64232587814 // <~ what on earth?
isset( $my_array_ref ) 0.00222992897034
isset_( $my_array_ref ) 4.12856411934 // <~ what on earth?

I'm on PHP 5.3.6.

Codepad example.

Answer

Shane H picture Shane H · Jun 18, 2011

At work I've got a VM instance of PHP that includes a PECL extension called VLD. This lets you execute PHP code from the commandline and rather than execute it, it returns the generated opcode instead.

It's brilliant at answering questions like this.

http://pecl.php.net/package/vld

Just in case you go this route (and if you're generally curious about how PHP works internally, i think you should) you should definitely install it on a virtual machine (that is, i wouldn't install it on a machine i'm trying to develop on or deploy to). And this is the command you'll use to make it sing:

php -d vld.execute=0 -d vld.active=1 -f foo.php

Looking at the opcodes will tell you a more complete story, however, I have a guess.... Most of PHP's built-ins make a copy of an Array/Object and act upon that copy (and not a copy-on-write either, an immediate copy). The most widely known example of this is foreach(). When you pass an array into foreach(), PHP is actually making a copy of that array and iterating on the copy. Whis is why you'll see a significant performance benefit by passing an array as a reference into foreach like this:

foreach($someReallyBigArray as $k => &$v)

But this behavior -- that passing in an explicit reference like that -- is unique to foreach(). So I would be very surprised if it made an array_key_exists() check any faster.

Ok, back to what I was getting at..

Most the built-ins take a copy of an array and act upon that copy. I am going to venture a completely unqualified guess that isset() is highly optimized and that one of those optimizations is perhaps to not do an immediate copy of an Array when its passed-in.

I'll try to answer any other questions you may have but you could probably read a lot of you google for "zval_struct" (which is the data structure in the PHP internals which stores each variable. It's a C struct (think.. an associative array) that has keys like "value", "type", "refcount".