While this question doesn't have any real use in practice, I am curious as to how Python does string interning. I have noticed the following.
>>> "string" is "string"
True
This is as I expected.
You can also do this.
>>> "strin"+"g" is "string"
True
And that's pretty clever!
But you can't do this.
>>> s1 = "strin"
>>> s2 = "string"
>>> s1+"g" is s2
False
Why wouldn't Python evaluate s1+"g"
, and realize it is the same as s2
and point it to the same address? What is actually going on in that last block to have it return False
?
This is implementation-specific, but your interpreter is probably interning compile-time constants but not the results of run-time expressions.
In what follows CPython 3.9.0+ is used.
In the second example, the expression "strin"+"g"
is evaluated at compile time, and is replaced with "string"
. This makes the first two examples behave the same.
If we examine the bytecodes, we'll see that they are exactly the same:
# s1 = "string"
1 0 LOAD_CONST 0 ('string')
2 STORE_NAME 0 (s1)
# s2 = "strin" + "g"
2 4 LOAD_CONST 0 ('string')
6 STORE_NAME 1 (s2)
This bytecode was obtained with (which prints a few more lines after the above):
import dis
source = 's1 = "string"\ns2 = "strin" + "g"'
code = compile(source, '', 'exec')
print(dis.dis(code))
The third example involves a run-time concatenation, the result of which is not automatically interned:
# s3a = "strin"
3 8 LOAD_CONST 1 ('strin')
10 STORE_NAME 2 (s3a)
# s3 = s3a + "g"
4 12 LOAD_NAME 2 (s3a)
14 LOAD_CONST 2 ('g')
16 BINARY_ADD
18 STORE_NAME 3 (s3)
20 LOAD_CONST 3 (None)
22 RETURN_VALUE
This bytecode was obtained with (which prints a few more lines before the above, and those lines are exactly as in the first block of bytecodes given above):
import dis
source = (
's1 = "string"\n'
's2 = "strin" + "g"\n'
's3a = "strin"\n'
's3 = s3a + "g"')
code = compile(source, '', 'exec')
print(dis.dis(code))
If you were to manually sys.intern()
the result of the third expression, you'd get the same object as before:
>>> import sys
>>> s3a = "strin"
>>> s3 = s3a + "g"
>>> s3 is "string"
False
>>> sys.intern(s3) is "string"
True
Also, Python 3.9 prints a warning for the last two statements above:
SyntaxWarning: "is" with a literal. Did you mean "=="?