Does (string) 'hard-copy' a string?

mzimmer picture mzimmer · Feb 28, 2013 · Viewed 10.2k times · Source

PHP uses a copy-on-modification system.

Does $a = (string) $a; ($a is a already string) modify and copy anything?


Especially, this is my problem:

Parameter 1 is mixed / I want to allow to pass non-strings and convert them to strings.
But sometimes these strings are very large. So I want to omit copying of a param, that is already a string.

Can I use version Foo or do I have to use version Bar?

class Foo {
    private $_foo;
    public function __construct($foo) {
        $this->_foo = (string) $foo;
    }
}

class Bar {
    private $_bar;
    public function __construct($bar) {
        if (is_string($bar)) {
            $this->_bar = $bar;
        } else {
            $this->_bar = (string) $bar;
        }
    }
}

Answer

ircmaxell picture ircmaxell · Feb 28, 2013

The answer is that yes, it does copy the string. Sort-of... Not really. Well, it depends on your definition of "copy"...

>= 5.4

To see what's happening, let's look at the source. The executor handles a variable cast in 5.5 here.

    zend_make_printable_zval(expr, &var_copy, &use_copy);
    if (use_copy) {
        ZVAL_COPY_VALUE(result, &var_copy);
        // if optimized out
    } else {
        ZVAL_COPY_VALUE(result, expr);
        // if optimized out
        zendi_zval_copy_ctor(*result);
    }

As you can see, the call uses zend_make_printable_zval() which just short-circuits if the zval is already a string.

So the code that's executed to do the copy is (the else branch):

ZVAL_COPY_VALUE(result, expr);

Now, let's look at the definition of ZVAL_COPY_VALUE:

#define ZVAL_COPY_VALUE(z, v)                   \
    do {                                        \
        (z)->value = (v)->value;                \
        Z_TYPE_P(z) = Z_TYPE_P(v);              \
    } while (0)

Note what that's doing. The string itself is NOT copied (which is stored in the ->value block of the zval). It's just referenced (the pointer remains the same, so the string value is the same, no copy). But it's creating a new variable (the zval part that wraps the value).

Now, we get into the zendi_zval_copy_ctor call. Which internally does some interesting things on its own. Note:

case IS_STRING:
    CHECK_ZVAL_STRING_REL(zvalue);
    if (!IS_INTERNED(zvalue->value.str.val)) {
        zvalue->value.str.val = (char *) estrndup_rel(zvalue->value.str.val, zvalue->value.str.len);
    }
    break;

Basically, that means that if it's an interned string, it won't be copied. but if it's not, it will be copied... So what's an interned string, and what does that mean?

<= 5.3

In 5.3, interned strings didn't exist. So the string is always copied. That's really the only difference...

Benchmark Time:

Well, in a case like this:

$a = "foo";
$b = (string) $a;

No copy of the string will happen in 5.4, but in 5.3 a copy will occur.

But in a case like this:

$a = str_repeat("a", 10);
$b = (string) $a;

A copy will occur for all versions. That's because in PHP, not all strings are interned...

Let's try it out in a benchmark: http://3v4l.org/HEelW

$a = "foobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisoutfoobarbizbazbuztestingthisout";
$b = str_repeat("a", 300);

echo "Static Var\n";
testCopy($a);
echo "Dynamic Var\n";
testCopy($b);

function testCopy($var) {
    echo memory_get_usage() . "\n";
    $var = (string) $var;
    echo memory_get_usage() . "\n";
}

Results:

  • 5.4 - 5.5 alpha 1 (not including other alphas, as the differences are minor enough to not make a fundamental difference)

    Static Var
    220152
    220200
    Dynamic Var
    220152
    220520
    

    So the static var increased by 48 bytes, and the dynamic var increased by 368 bytes.

  • 5.3.11 to 5.3.22:

    Static Var
    624472
    625408
    Dynamic Var
    624472
    624840
    

    The static var increased by 936 bytes while dynamic var increased by 368 bytes.

So notice that in 5.3, both the static and the dynamic variables were copied. So the string is always duplicated.

But in 5.4 with static strings, only the zval structure was copied. Meaning that the string itself, which was interned, remains the same and is not copied...

One Other Thing

Another thing to note is that all of the above is moot. You're passing the variable as a parameter to the function. Then you're casting inside the function. So copy-on-write will be triggered by your line. So running that will always (well, in 99.9% of cases) trigger a variable copy. So at best (interned strings) you're talking about a zval duplication and associated overhead. At worst, you're talking about a string duplication...