Comparing primitives with identity checking best practice

Comparing primitives with identity checking

Primitive data types such as strings and integers should be compared using == and != rather than is and is not.

The == operator checks equality, while is checks identity. Equality is for checking certain characteristics of the object have parity, while identity is for checking the objects are literally the same thing.

The difference is Guido van Rossum lookalikes are not Guido van Rossum.

Under the hood, the object's identity is determined using the id() function. id() returns an integer that is unique to the object for the lifetime of the object. For CPython this is the memory address of the object. Two objects cannot share the same memory location at the same time, after all.

The object's equality is determined using the object's __eq__() function. This function's purpose is to return do certain characteristics of the objects have parity.

Note though that Python performs interring for performance optimization. Interring ensures only one copy of each distinct value is stored in memory. For performance reasons Python inters integers between -5 to 256. This saves on memory usage for commonly used values, but it also allows bugs to creep into your code if you rely on identity checking for primitives:

foo = 256
bar = 256
foo is bar  # outputs True

baz = 257
qux = 257
baz is qux  # outputs False

foo = 256
bar = 256
foo is bar  # outputs True

baz = 257
qux = 257
baz is qux  # outputs False

It's worth underlining that this behavior is a result of two Python features interacting: Python's identity checking feature with Python's memory usage optimization feature. foo is not really bar, but Python's optimization feature makes Python's identity checking feature think it is.

Notice the behavior changes after 256. If you use identity checking on primitives and your unit tests only uses values below 256 then a hard-to-debug bug could occur in production. This shows you cannot rely on using identity checking primitives.

Similarly, Python also automatically inters strings. But the behavior changes depending on the Python version. Before python 3.7 all strings of 20 characters or less was interred. Newer Python versions inter strings of 4096 characters or less.

If our GitHub code review bot spots this issue in your pull request it gives this advice: