How Python handles integers, an accidental discovery via the is keyword
These outputs led me to believe that Python uses the is keyword inconsistently:
>>> a = 1
>>> b = 1
>>> a is b
True
>>> x = 4000
>>> y = 4000
>>> x is y
False
My guess was:
- When comparing lists,
l1 is l2means you're checking if they're the same object. - When comparing integers,
a is bmeans you're comparing their values.
The truth is quite surprising. It turns out that Python makes an optimization for commonly used integers, which are defined as -5 to 256. For all of those values, int objects (or, more correctly, PyIntObjects) are created in memory when Python is initialized.
If a user creates a variable with a value within the [-5, 256] range, they receive a reference to the preallocated "small integer" object. That's why a and b are references to the same "small integer" 1 object.
Python creates new int objects only for values outside of that range. This explains why x and y are not the same object.
Strictly speaking, these are not created on demand, when x or y is declared. Rather, during initialization, Python sets aside a block of un-initialized PyIntObjects in a structure called a PyIntBlock. When a large int is declared, an available free PyIntObject from this preallocated block is returned and initialized with the supplied value.
References:
- http://www.laurentluce.com/posts/python-integer-objects-implementation/
- https://davejingtian.org/2014/12/11/python-internals-integer-object-pool-pyintobject/