What is the difference between a generator and an array?

David Rodrigues picture David Rodrigues · Jun 20, 2013 · Viewed 9k times · Source

Today the PHP team released the PHP 5.5.0 version, which includes support for generators. Reading the documentation, I noticed that it does exactly what it could do with an array.

PHP team generator example:

// Only PHP 5.5
function gen_one_to_three() {
    for ($i = 1; $i <= 3; $i++) {
        // Note that $i is preserved between yields.
        yield $i;
    }
}

$generator = gen_one_to_three();
foreach ($generator as $value) {
    echo "$value\n";
}

Result:

1
2
3

But I can do the same thing using arrays. And I can still keep compatible with earlier versions of PHP.

Take a look:

// Compatible with 4.4.9!
function gen_one_to_three() {
    $results = array();
    for ($i = 1; $i <= 3; $i++) {
        $results[] = $i;
    }

    return $results;
}

$generator = gen_one_to_three();
foreach ($generator as $value) {
    echo "$value\n";
}

So the question is: what is the purpose of the existence of this new feature? I got to play all examples of documentation without using the new feature, replacing it with array.

Can anyone give a good explanation and perhaps an example that is not necessarily impossible with older versions, but using generators can help in development?

Answer

ircmaxell picture ircmaxell · Jun 20, 2013

The difference is in terms of efficiency. For example, many languages other than PHP include two range functions, range() and xrange(). This is a really good example of generators and why to use them. Let's build our own:

function range($start, $end) {
    $array = array();
    for ($i = $start; $i <= $end; $i++) {
        $array[] = $i;
    }
    return $array;
}

Now that's really straight forward. However, for large ranges, it takes a HUGE amount of memory. If we tried to run it with $start = 0 and $end = 100000000, we'd likely run out of memory!

But if we used a generator:

function xrange($start, $end) {
    for ($i = $start; $i <= $end; $i++) {
        yield $i;
    }
}

Now we use constant memory, yet still have an "array" (like structure) that we can iterate over (and use with other iterators) in the same space.

It doesn't replace an array, but it does provide an efficient way of avoiding to need the memory...

But it also provides savings in terms of the generation of items. Since each result is generated as-needed, you could delay execution (fetching or computing) each element until you needed it. So for example, if you needed to fetch an item from a database and do some complex processing around each row, you could delay that with a generator until you actually need that row:

function fetchFromDb($result) {
    while ($row = $result->fetchArray()) {
        $record = doSomeComplexProcessing($row);
        yield $record;
    }
}

So if you only needed the first 3 results, you'd only process the first three records.

For more info, I wrote a blog post on this exact subject.