what is faster: in_array or isset?

Fabrizio picture Fabrizio · Nov 20, 2012 · Viewed 45k times · Source

This question is merely for me as I always like to write optimized code that can run also on cheap slow servers (or servers with A LOT of traffic)

I looked around and I was not able to find an answer. I was wondering what is faster between those two examples keeping in mind that the array's keys in my case are not important (pseudo-code naturally):

<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
    if(!in_array($new_val, $a){
        $a[] = $new_val;
        //do other stuff
    }
}
?>

<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
    if(!isset($a[$new_val]){
        $a[$new_val] = true;
        //do other stuff
    }
}
?>

As the point of the question is not the array collision, I would like to add that if you are afraid of colliding inserts for $a[$new_value], you can use $a[md5($new_value)]. it can still cause collisions, but would take away from a possible DoS attack when reading from an user provided file (http://nikic.github.com/2011/12/28/Supercolliding-a-PHP-array.html)

Answer

David Harkness picture David Harkness · Nov 20, 2012

The answers so far are spot-on. Using isset in this case is faster because

  • It uses an O(1) hash search on the key whereas in_array must check every value until it finds a match.
  • Being an opcode, it has less overhead than calling the in_array built-in function.

These can be demonstrated by using an array with values (10,000 in the test below), forcing in_array to do more searching.

isset:    0.009623
in_array: 1.738441

This builds on Jason's benchmark by filling in some random values and occasionally finding a value that exists in the array. All random, so beware that times will fluctuate.

$a = array();
for ($i = 0; $i < 10000; ++$i) {
    $v = rand(1, 1000000);
    $a[$v] = $v;
}
echo "Size: ", count($a), PHP_EOL;

$start = microtime( true );

for ($i = 0; $i < 10000; ++$i) {
    isset($a[rand(1, 1000000)]);
}

$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;

$start = microtime( true );

for ($i = 0; $i < 10000; ++$i) {
    in_array(rand(1, 1000000), $a);
}

$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;