How Python handles integers, an accidental discovery via the is keyword

04 Jun, 2024

These outputs led me to believe that Python uses the is keyword inconsistently:

>>> a = 1
>>> b = 1
>>> a is b
True
>>> x = 4000
>>> y = 4000
>>> x is y
False

My guess was:

When comparing lists, l1 is l2 means you're checking if they're the same object.
When comparing integers, a is b means you're comparing their values.

The truth is quite surprising. It turns out that Python makes an optimization for commonly used integers, which are defined as -5 to 256. For all of those values, int objects (or, more correctly, PyIntObjects) are created in memory when Python is initialized.

If a user creates a variable with a value within the [-5, 256] range, they receive a reference to the preallocated "small integer" object. That's why a and b are references to the same "small integer" 1 object.

Python creates new int objects only for values outside of that range. This explains why x and y are not the same object.

Strictly speaking, these are not created on demand, when x or y is declared. Rather, during initialization, Python sets aside a block of un-initialized PyIntObjects in a structure called a PyIntBlock. When a large int is declared, an available free PyIntObject from this preallocated block is returned and initialized with the supplied value.

References:

http://www.laurentluce.com/posts/python-integer-objects-implementation/
https://davejingtian.org/2014/12/11/python-internals-integer-object-pool-pyintobject/

#python