This question is merely for me as I always like to write optimized code that can run also on cheap slow servers (or servers with A LOT of traffic)
I looked around and I was not able to find an answer. I was wondering what is faster between those two examples keeping in mind that the array's keys in my case are not important (pseudo-code naturally):
<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
if(!in_array($new_val, $a){
$a[] = $new_val;
//do other stuff
}
}
?>
<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
if(!isset($a[$new_val]){
$a[$new_val] = true;
//do other stuff
}
}
?>
As the point of the question is not the array collision, I would like to add that if you are afraid of colliding inserts for $a[$new_value]
, you can use $a[md5($new_value)]
. it can still cause collisions, but would take away from a possible DoS attack when reading from an user provided file (http://nikic.github.com/2011/12/28/Supercolliding-a-PHP-array.html)
The answers so far are spot-on. Using isset
in this case is faster because
in_array
must check every value until it finds a match.in_array
built-in function.These can be demonstrated by using an array with values (10,000 in the test below), forcing in_array
to do more searching.
isset: 0.009623
in_array: 1.738441
This builds on Jason's benchmark by filling in some random values and occasionally finding a value that exists in the array. All random, so beware that times will fluctuate.
$a = array();
for ($i = 0; $i < 10000; ++$i) {
$v = rand(1, 1000000);
$a[$v] = $v;
}
echo "Size: ", count($a), PHP_EOL;
$start = microtime( true );
for ($i = 0; $i < 10000; ++$i) {
isset($a[rand(1, 1000000)]);
}
$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;
$start = microtime( true );
for ($i = 0; $i < 10000; ++$i) {
in_array(rand(1, 1000000), $a);
}
$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;