How Python handles integers, an accidental discovery via the is keyword
These outputs led me to believe that Python uses the is
keyword inconsistently:
>>> a = 1
>>> b = 1
>>> a is b
True
>>> x = 4000
>>> y = 4000
>>> x is y
False
My guess was:
- When comparing lists,
l1 is l2
means you're checking if they're the same object. - When comparing integers,
a is b
means you're comparing their values.
The truth is quite surprising. It turns out that Python makes an optimization for commonly used integers, which are defined as -5
to 256
. For all of those values, int
objects (or, more correctly, PyIntObject
s) are created in memory when Python is initialized.
If a user creates a variable with a value within the [-5, 256]
range, they receive a reference to the preallocated "small integer" object. That's why a
and b
are references to the same "small integer" 1
object.
Python creates new int
objects only for values outside of that range. This explains why x
and y
are not the same object.
Strictly speaking, these are not created on demand, when x
or y
is declared. Rather, during initialization, Python sets aside a block of un-initialized PyIntObject
s in a structure called a PyIntBlock
. When a large int
is declared, an available free PyIntObject
from this preallocated block is returned and initialized with the supplied value.
References:
- http://www.laurentluce.com/posts/python-integer-objects-implementation/
- https://davejingtian.org/2014/12/11/python-internals-integer-object-pool-pyintobject/