x = 42
type(x) = <class 'int'>
type(type(x)) = <class 'type'>
Data Model
Objects are Python’s abstraction for data. All data in a Python program is reresented by objects or relations between objects. Even code is represented by objects.
Object values and Types
Every object has an identity, type and value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. The is operator compares the identity of two objects, the id() function returns an integer representing its identity.
For CPython, id(x) is the memory address where x is stored.
An object’s type determines the operations that the object supports (e.g. does it have a length?) and also defines the possible value for the objects of that type. The type() function returns an object’s type (which is an object itself).
The value of some objects can change. Objects whose value can change are said to be mutable; objects whose value is unchangeable once they are created are called immutable. (The value of an immutable container object that contains a reference to mutable object can change when the latter’s value is changed; however the container is still considered immutable, because the collection of objects it contains cannot be changed. So, immutability is not strictly the same thing as having an immuteable value, it is more subtle.) An object’s mutability is determined by its type; for instance, numbers, strings, and tuples are immutable, while dictionaries and lists are considered mutable.
Immutable types are always compared by identity. Mutable types are always compared by value.
Objects are never explicitly destroyed; however when they become unreachable they may be garbage collected. CPython currently uses a reference counting scheme with delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references.
Note that, debug or tracing facilities may keep objects alive that would be normally collectable. try...except may keep objects alive.
Some objects contain references to external resources such as open files and windows. It is understood that these resources are freed when the object is garbage collected, but since GC is not guaranteed to happen, such objects also provide an explicit way to release the external resource, usually some kind of close() method. The try...finally and with statement provide convenient ways to do this.
Some objects contain references to other objects; these are called containers. Examples of containers are tuples, lists and dictionaries. When we talk about the value of the container, we imply the value of the contained objects.
For immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed. For example, after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation. This is because int is an immutable type, so the reference to 1 can be reused.
Built-in Types in Python
There is a core set of built-in types natively supported by Python runtime. The standard library defines additional types.
None
This type has a single value. There is a single object with this value. It is used to signify the absence of a value in many situations. Its truth value is False.
NotImplemented
This type has a single value. Again, there is a single object with this value.
Ellipsis
This type also has a single value and there is single object with this value. The object is accessed through the literal .... Its truth value is True.
Numbers
Python numbers are of course strongly related to mathematical numbers, but subject to the limitations of numerical representation in computers.
Numeric objects are immutable; once created their value never changes. Hence, they can be used as keys in a dictionary.
Integers (int)
These represent numbers in an unlimited range, subject to available virtual memory only. For the purpose of shift and mask operations, a binary representation is assumed, and negative numbers are represented in a variant of \(2\)’s complement, which gives the illusion of an infinite string of sign bits extending to the left.
CPython has a global limit for converting between int and str upto \(4,300\) digits by default. This limitation aims to prevent DoS attacks that exploit the quadratic complexity of integer to string conversions. This limit only applies to decimal or other non-power of two number bases. Hexadecimal, octal and binary conversions are unlimited.
The int type in CPython is an arbitrary length number stored in the binary form(commonly known as bignum). There exists no algorithm that convert a string to a binary integer or a binary integer to a string in linear time, unless the base is a power of \(2\). Converting a large value such as int('1' * 500_000) can take over a second on a fast CPU.
The default limit is \(4300\) digits as provided in sys.int_info.default_max_str_digits.
Booleans (bool)
These represent the truth values False and True. The two objects representing the values False and True are the only Boolean objects.
Real numbers (float)
These represent IEEE-754 double-precision 64-bit floating point numbers.
sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)
float_info.max = 1.7976931348623157e+308, float_info.min = 2.2250738585072014e-308
Note that float represents infinity as inf, which is considered larger than any numerical value.
Complex numbers (complex)
These represent com plex numbers as a pair of double-precision floating-point numbers. The real and imaginary parts of a complex number z can be retrieved through the read-only attributes z.real and z.imag.
Sequences
These represent finite ordered sets indexed by non-negative numbers. The built-in function len() returns the number of items of a sequence. When the length of a sequence is \(n\), the index set contains the numbers \(0,1,\ldots,n-1\). Item \(i\) of a sequence \(a\) is selected by \(a[i]\). Some sequences including built-in sequences interpret negative subscripts by adding the sequence length. For example, a[-2] means a[n-2], the second to last item of a sequence a of length n.
Sequences also support slicing. a[i:j:k] selects all items with index x where x = i + nk, \(0 \leq x < j\), \(n \geq 0\).
Immutable sequences
An object of an immutable sequence type cannot change once it is created. (If an object contains references to other objects, these other objects may be mutable and may be changed; however the collection of objects directly referenced by an immutable object cannot change.)
Strings
A string is a sequence of values that represent unicode code points. Python doesn’t have a char type.
Tuples
The items of a tuple are arbitrary python objects. Tuple of two or more items are formed by comma separated list of expression. An empty tuple can be formed by a empty pair of parentheses.
Bytes
A bytes object is an immutable array. The items are \(8\)-bit bytes, represented in the range \(0 \leq x < 256\). Bytes literals like b'abc' and the bytes() constructor can be used to create bytes objects.
Mutable sequences
Mutable sequences can be changed after they are created. The subscript and slicing notations can be used as a target of assignment and del (delete) statements.
There are currently two intrinsic mutable sequence types:
- Lists : The items of a list are arbitrary Python objects. Lists are formed by a comma separated list of expressions inside square brackets.
- Byte Arrays : A
bytearrayobject is mutable array. It is created by the built-inbytearray()constructor. Aside from being mutable, byte arrays provide the same interface and functionality as immutablebytesobjects.
Since lists and bytearrays are mutable, they are unhashable.
Set types
These represent unorderd, finite sets of unique elements. As such they cannot be indexed by a subscript. However, they can be iterated over, and the built-in function len() returns the number of items in a set. Common uses of sets are fast membership testing, removing duplicates from a sequence, computing mathematicl operations such as intersection, union, difference and symmetric difference.
Note that, if two numbers compare equal (e.g. 1 and 1.0) only one of them can be contained in a set.
There are currently two intrinsic set types:
Sets
They represent a mutable set. They are created by the set() constructor and can be modified by methods such as add.
Frozen Sets
These represent an immutable set. They are created by the built-in frozenset() constructor. As a frozenset is immutable and hashable, it can be used again as an element of another set, or as a dictionary key.
Mappings
These represent finite sets of objects indexed by arbitrary index sets. The subscript notation a[k] selects the item indexed by k from the mapping a.
There is currently a single intrinsic mapping type: dict.
Dictionaries
They represent key-value pairs \(\texttt{key}\mapsto \texttt{value}\) or a set of mappings. key can be any immutable type. Thus, numeric types (int, bool, float, complex), immutable sequence types (str, tuple, bytes) immutable set type (frozenset) can be used as dictionary keys. The only things not acceptable as keys are values containing lists or dictionaries or other mutable types that are compared by value rather than object identity. The reason for this is that an efficient implementation of dictionaries requires a key’s hash vaue to remain constant. Numeric types used for keys obey normal rules for numeric comparision: if two numbers compare equal (e.g. 1 and 1.0) then they can be used interchangeably.
Dictionaries preserve insertion order, meaning that keys will be produced in the same order they were added sequentially over the ddictionary. Replacing an existing key does not change the order. Removing a key and re-inserting it will add to the end instead of keeping its old place.
Dictionaries are mutable; they can be created by the {} notation.
An immutable container is hashable if all of its elements are hashable. For example, a list is mutable and therefore not hashable. A tuple a=(1,) is hashable. The tuple b=(2,[1,2,3]) is not hashable, because it contains an unhashable type.
Callable types
These are the types to which the function call operation can be applied.
User-defined functions
A user-defined function object is created by a function definition.
Instance methods
An instance method object combines a class, a class instance and any callable object(normally a user-defined function).
Generator functions
A function or method which uses the yield statement is called a generator function. Such a function, when called always returns an iterator object, which can then be used to execute the body of the function: calling the iterator’s iterator.__next__() method will cause the function to execute until it provides a value using the yield statement. When the function executes the return statement or falls off the end, a StopIteration exception is raise and the iterator will have reached the end of the set of values to be returned.
Coroutine functions
A function or method which is defined using async_def is called a coroutine function. A coroutine is any function that can be paused in the middle. Such a function, when called, returns a coroutine object. It may contain await expressions, as well as async_with and async_for statements.
Asynchronous generator functions
A function or method which is defined using the async_def and which uses the yield statement is called a asynchronous generator function. Such a function, when called, returns an asynchronous iterator object which can be used in an async_for statement to execute the body of the function.
Calling the asynchronous iterator’s aiterator.__anext__() method will return an awaitable which when awaited will execute until it provides a value using the yield expression. When the function executes an empty return statement or falls off the end, a StopAsyncIteration exception is raised and the asynchronous iterator will have reached the end of the values to be yielded.
Built-in functions
A built-in function object is a wrapper around a C function. Examples of built-in unctions are len(), zip() etc.
Built-in methods
This is really a different disguise of a built-in function, this time containing an object passed to the C function as an implicit extra argument. An example of a built-in method is .append() on a list object.
Classes
Classes are callable. These object normally act as factories for new instances of themselves, but variations are possible for class types that override __new__().
Modules
The most basic organization unit of code in Python is a module. Python code in one module gains access to the code in another module by the process of importing it.
Special method names
A class can implement certain operations that are invoked by special syntax (such as arithmetic operations, subscripting or slicing). This is Python’s approach to operator overloading : allowing classes to define their own behavior w.r.t language operators. For example, if a class defines a method __getitem__(self, pos) and x is an instance of the class, x[i] is roughly equivalent to x.__getitem__(pos).
Setting a special method to None means that the operation is not available.
Basic customization
__new__(cls,...)
This method is called to create a new instance of the class cls. __new__() is mainly intended to allow subclasses of immutable types(like int, str, tuple) to customize instance creation. If __new__() does not return an instance of cls, then the new instance’s __init__() method won’t be invoked.
If __new__(cls,...) is invoked during object construction and it returns an instance of cls, then the new instance’s .__init__() method will be invoked like __init__(self,...) where self is the new instance and the remaining arguments are the same as were passed to the object constructor.
__init__(self,...)
This method is called after the instance has been created by __new__(), but before it is returned to the caller. The arguments of __init__() are those passed to the class constructor and are typically used to initialize member-data of the class.
If a base class has an __init__() method, the derived class’s __init__() method must explicitly call it to ensure proper initialization of the base class member data using super().__init__([args...]).
object.__del__(self)
This method is called when the instance is about to be destroyed. This is also the so called finalizer or destructor. If a base class has a __del__() method, the derived class’ __del__() method, if any, must explicitly call the base class’ __del__().
del x doesn’t directly call x.__del__() - the former decrements the reference count for x by one, and the latter is only called when x’s reference count reaches zero.
object.__repr__(self)
This method is called by the repr() built-in function to compute the official string representation of the object.
object.__str__(self)
This method is called by the str(object), the default __format__() implementation and the built-in function print()
object.__lt__(self, other) and friends
These are the so-called rich-comparison methods. The correspondence between operator symbols and method names is as follows: x<y calls x.__lt__(y), x<=y calls x.__le__(y), x==y calls x.__eq__(y), x!=y calls x.__ne__(y), x>y calls x.__gt__(y), and x>=y calls x.__ge__(y).
object.__hash__(self)->int
The __hash__() method should return an integer. Objects that compare equal in value, must have the same hash value.
User-defined classes have __eq__() and __hash__() methods by default (inherited from the object class); with them, all objects compare unequal (except themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).
Special attributes
object.__dict__
A dictionary
Modules and Packages
The import statement
import <module_name>
When Python executes the statement
it searches for foo.py in a list of directories:
- Module Cache
sys.modules - Current directory
- List of directories contained in the
PYTHONPATHenvironment variable.
Note that import <module_name> does not make the module contents directly accessible to the caller. A module creates a separate namespace.
The statement import <module_name> only places the <module_name> in the caller’s symbol table. The objects that are defined in the module remain in the module’s private symbol table.
From the caller, objects in the module are only accessible when prefixed with the <module_name> using the dot notation.
When a module is first imported, Python searches for the module and if found, it creates a module object, initializing it. If the named module cannot be found, a ModuleNotFoundError is raised. Python employs various strategies to search for the module when the import machinery is invoked.
from <package_name> import <module_name>
An alternate form of the import statement allows individual objects from the module to be imported directly into the caller’s symbol table:
# mod.py
s = "Impossible is the word in the dictionary of fools."
def foo(arg):
print(f"arg = {arg}")
class Foo:
def __init__(self):
pass
# main.py
from mod import s, foo
print(f"s : {s}")
# s : "Impossible is the word in the dictionary of fools."
print(f"{foo('qux')}")
# arg = qux
from mod import Foo
x = Foo()
x
# <mod.Foo object at 0x02E3AD50>Because this form of import places the object names directly into the caller’s symbol table, any objects that already exist with the same name will be overwritten.
import <module_name> as <alt_name>
You can also import an entire module under an alternate name:
For example, if we do import numpy as np then any symbol such as sum() will be added to the caller’s symbol table as np.sum().
Module contents can be imported from within a function definition. In that case, the import does not occur until the function is executed.
Executing a module as a script
Any .py file that contains a module is also a python script and you can execute using python <my_module>.py.
We can distinguish between when the file is loaded as a module and when it is run as a standalone script. When a .py file is imported as a module, Python sets the special dunder variable __name__ to the name of the module. However, if the file is run as a standalone script, __name__ is set to the string __main__.
Modules are often designed with the capability to run as a standalone script for the purposes of testing the functionaility that is contained within the module. This is called unit testing.
Namespaces
A python namespace is a mapping from names to objects. A namespace is like a container that holds the currently defined symbolic names and the objects each name references. You can think of a namespace as a dictionary, in which the keys are the object names and the values are the objects themselves. Each key-value pair maps a name to its corresponding object.
Namespaces let you use the same name in different contexts without collisions. It’s like giving everything its own little room in a well-organized house.
dir() function
The built-in function dir() returns a list of defined names in a namespace.
['In',
'List',
'Out',
'_',
'_3',
'__',
'___',
'__builtin__',
'__builtins__',
'__name__',
'_dh',
'_i',
'_i1',
'_i2',
'_i3',
'_i4',
'_i5',
'_i6',
'_i7',
'_i8',
'_ih',
'_ii',
'_iii',
'_oh',
'empty_tuple_example',
'exit',
'get_ipython',
'ojs_define',
'open',
'quit',
'sum_arr',
'sys',
'x']
There are \(4\) different types of namespaces:
- Built-in
- Global
- Local
- Enclosing or non-local
These namespaces have differing lifetimes. As Python executes a program, it creates namespaces as necessary and removes them when it no longer needs them. Typically, many namespaces will exist at any given time.
The global, local and enclosing namespaces are implemented as dictionaries. In contrast, the built-in namespace isn’t a dictionary, but a module called builtins.
builtin namespace
The built-in namespace contains the names of all Python’s built-in objects. This namespace is available when the python interpreter is running. So, you can access the names in this namespace at any time in your code without explicitly importing them.
You can list the objects in the built-in namespace with the dir() function using __builtins__ as an argument:
dir(__builtins__)
Things like built-in exceptions, built-in functions, built-in data types live in this namespace.
The Global namespace
The global namespace contains names defined at the module level. Python creates a main global namespace when the main program’s body starts. This namespace remains in existence until the interpreter terminates.
Additionally, each module has its own global namespace. The interpreter creates a global namespace for any module your program loads with the import statement.
The local namespace
The python interpreter creates a new and dedicated namespace whenever you call a function. The namespace is local to the function and exists only until the function returns:
The enclosing namespace
You can also define one function inside the other. In this case, you’d have what’s known as an inner function. In this case, you’d have what is known as an inner function.
global_variable = "global"
def outer_func():
# Non local scope
nonlocal_var = "nonlocal"
def inner_func():
# Local scope
local_variable = "local"
print(f"Hi from the '{local_variable}' scope!")
print(f"Hi from the '{nonlocal_var}' scope!")
print(f"Hi from the '{global_variable}' scope!")
inner_func()
outer_func() Hi from the 'local' scope!
Hi from the 'nonlocal' scope!
Hi from the 'global' scope!
In this example, we first created a global variable at the module level. Then, we defined a function called outer_func(). Inside this function, we have a nonlocal_var, which is local to outer_func(), but non-local to inner_func(). In inner_func(), we create another variable called local_variable, which is local to the function itself.
Each of these namespaces remains in existence until the respective function returns. Python might not immediately reclaim the memory allocated for those namespaces, but all references to the objects become invalid.
Scope and LEGB rule
The existence of multiple, distinct namespaces allows you to have several different instances of a particular name simultaneously while a Python program runs. As long as each instance is in a different namespace, they’re all maintained separately and won’t interfere with one another.
That raises a question. Suppose you refer to the name x in your code, and x exists in several namespaces. How does Python know which one you mean each time?
Python looks for x in the following order:
\[ \text{Local} \longrightarrow \text{Enclosed} \longrightarrow \text{Global} \longrightarrow \text{Built-in} \]
The below code snip agree’s with our intuitive understanding of global and local variables.
x = 42
x = 17
x = 42
If we want to modify the value of the global variable x from within f, we can use the global keyword.
x = 42
x = 17
x = 17
Similarly, when we have nested functions, we can use the nonlocal keyword.
Packages
To help organize modules and provide a naming hierarchy, Python has a concept of packages. You can think of packages as the directories on a file sytem and modules as files within directories, but don’t take this analogy too literally, since packages and modules need not originate from the file system. For the purposes of this blog note, we’ll use this convenient analogy of directories and files. Like file system directories, packages are organized hierarchically, and packages may themselves contain subpackages as well as regular modules.
Its important to keep in mind that all packages are modules, but not all modules are packages. Or put another way, packages are just a special kind of module. Specifically, any module that contains a __path__ attribute is considered a package.
All modules have a name. Subpackage names are separated from their parent package name by a dot, akin to Python’s standard attribute access syntax.
import pkg only places the name pkg in the caller’s symbol table and doesn’t import any modules.
Package initialization
If a file named __init__.py is present in the package directory, it is invoked when the package or a module in the package is imported. __init__.py can be used to effect automatic importing of modules from a package.
Earlier, we saw that the import pkg statement only places the name pkg in the caller’s symbol table and doesn’t import any modules. But, if __init__.py in the pkg directory contains the following:
Then when you execute import pkg, modules mod1 and mod2 are automatically imported.
Importing * from a package
When import * is used for a module, all objects form the module are imported into the local symbol table, except those whose names begin with an underscore.
The analogous statement for a package is:
What does that do? Not much! Python follows the following convention. If the __init__.py file in the package directory contains a list name __all__, it taken to be a list of modules that should be imported when the statement from <package_name> import * is encountered.
Subpackages
Packages can contain nested packages to arbitrary depth. Consider
/pkg
__init__.py
/sub_pkg_1
__init__.py
mod_in_sub_pkg_1.py
/sub_pkg_2
__init__.py
mod_in_sub_pkg_2.py
We can use both absolute imports and relative imports.
- Absolute imports use the fully qualified name.
..evaluates to the parent package...<some_sub_pkg>evaluates to some sub-package of the parent package.
Regular packages
Python defines two types of packages : regular packages and namespace packages. Regular packages are traditional packages as they existed in Python 3.2 and earlier. A regular package is typically implemented as a directory containing an __init__.py file. When a regular package is imported, this __init__.py is executed and the objects it defines are bound to the names in the package’s namespace.
Consider the following file-system layout that defines a top level parent package with three subpackages:
Importing parent.one will implicitly execute parent/__init__.py and /parent/one/__init__.py. Subsequent imports of parent.two and parent.three will execute parent/two/__init__.py and parent/three/__init__.py respectively.
Namespace packages
A namespace package is a composite of various portions where each portion contributes a subpackage to the parent package. Portions may reside in different locations of the file system.
Searching
To begin searching, Python needs the fully qualified name of the module (or package, but for the purposes of this blog note, the difference is immaterial) being imported.
The fully qualified name will be used in various phases of the import search, and it may be dotted path to the submodule e.g. foo.bar.baz. In this case, Python first tries to import foo, then foo.bar and then finally foo.bar.baz. If any of the intermediate imports fail, a ModuleNotFoundError is raised.
The module cache
The first place checked during import search is sys.modules. This mapping serves as a cache of all modules that have been previously imported, including the intermediate paths. So, if foo.bar.baz was previously imported, sys.modules will contain entries for foo, foo.bar and foo.bar.baz. Each key will have as its value the corresponding module object.
If module A imports module B, and module B then tries to import module A, Python won’t re-import A. Since A’s name is aleady present in sys.modules (even though module object is not fully loaded yet), the import system simply returns the existing (but partially initialized) module instead of loading it again.
Challenge puzzle
What is printed when you run python main.py? Note that the comment delineate separate files.
# alpha.py
print("alpha", end="")
import beta
print("X", end="")
# beta.py
print("beta", end="")
import alpha
print("Y",end="")
# main.py
print("main", end="")
import alpha
print("Z", end="")For more such puzzles, visit getcracked.io.
Python Type Checking
Type systems
All programming languages include some kind of type system that formalizes the categories of objects it can work with and how those categories are treated.
Dynamic typing
Python is a dynamically typed language. This means that the Python interpreter does type checking only at runtime, and that the type of a variable is allowed to change over its lifetime.
Static type
Many higher-level languages such as C, C++, Java are statically typed. The type and size of the variable are known at compile-time.
Python will always remain a dynamically typed language. However, PEP-484 introduced type hints, which makes it possible to also do static type checking of Python code.
Unlike how types work in most other statically typed languages, type hints by themselves don’t cause Python to enforce types. As the name suggests, type hints just suggest types. There are other tools like mypy that perform static type checking using type hints.
Duck Typing
Another term that is often used when talking about Python is duck typing. This moniker comes from the phrase: “If it walks like a duck, and quacks like a duck, it must be a duck!”.
Duck typing is a concept related to dynamic typing, where the type or class of the object is less important than the methods it defines. Using duck typing, you do not check the types at all. Instead you check, for the presence of a given method or attribute.
For example, if I define:
class TheHobbit:
def __len__(self):
return 1024
the_hobbit = TheHobbit()
print(f"Length = len(the_hobbit)")Length = len(the_hobbit)
I would say that all strings, tuples, lists, dicts and instances of TheHobbit have a length.
typing module
Python’s typing module provides tools for adding type hints to your code. Type hints make your programs more readable, safer to refactor and helps static type checkers like mypy to catch errors before runtime.
from typing import TypedDict
class Person(TypedDict):
name: str
age: int
def print_people(people: list[Person]) -> None:
for person in people:
print(f"{person['name']} is {person['age']} years old")
print_people(
[{"name": "Johannes Bernoulli", "age": 50},
{"name": "Carl Friedrich Gauss", "age":60}]
)Johannes Bernoulli is 50 years old
Carl Friedrich Gauss is 60 years old
Features
- Supports type hinting for variables, functions and classes
- Allows for custom type creation using
TypeVarandNewType - Supports defining reusable type aliases for better code readability
- Enables complex type definitions
- Supports defining structural subtyping interfaces with protocols
Frequently used classes and functions
| Object | Type | Description |
|---|---|---|
typing.NamedTuple |
class | Creates tuple-like classes with named fields |
typing.TypeDict |
class | Creates dictionary-like classes with fixed set of keys |
typing.List |
class | Represents a list |
typing.Callable |
class | Represents a callable with a specified signature |
typing.Any |
Represents a type that can be of any value | |
typing.TypeVar() |
Function | Creates a generic type variable |
typing.NewType() |
Function | Creates a distinct type based on an existing one |
typing.Protocol |
Defines a structural subtyping interface that classes can implement |
Examples
Define a class using NamedTuple and a function that takes it as an argument.
from typing import NamedTuple
import datetime as dt
class Employee(NamedTuple):
employee_id : int
name: str
date_of_joining : dt.date
def greet_employee(employee: Employee) -> str:
return f"Hi {employee.name}, employee_id : {employee.employee_id}, welcome to the firm!"
greet_employee(Employee(1, "Alice", dt.date(2025,1,1)))'Hi Alice, employee_id : 1, welcome to the firm!'
We can define a type alias to simplify complex type annotations:
import datetime as dt
from typing import Dict, Union
UserData = Dict[str, Union[str,int]]
def format_user(user: UserData) -> str:
return f"{user["name"]} has last login date {user["last_login"]}"
format_user({"name": "John Carmack", "last_login": dt.date(2025,6,30)}) 'John Carmack has last login date 2025-06-30'
Create a protocol and implement it in a class:
Function Annotations
For functions you can annotate arguments and return value. This is done as follows:
There are special mypy expressions : reveal_type() and reveal_locals() that you can add to your code and mypy will report which types it has inferred.
Variable annotations
We can also annotate variable definitions just like function signatures.
Implementing a simple card game
import random
from typing import List, Tuple
Card = Tuple[str, str]
Deck = List[Card]
Hand = List[Card]
Player = str
suits = "♠ ♡ ♢ ♣".split()
ranks = list(range(2,10)) + "J Q K A".split()
def create_deck(shuffle: bool = False) -> Deck:
"""Create a new deck of 52 cards"""
deck = [(s,r) for s in suits for r in ranks]
if shuffle:
random.shuffle(deck)
return deck
def deal_hands(deck : Deck) -> Tuple[Hand, Hand, Hand, Hand]:
"""Deal the cards in the deck into four hands"""
return (deck[0::4], deck[1::4], deck[2::4], deck[3::4])
def play():
"""Play a 4-player card game"""
deck: Deck = create_deck(shuffle=True)
players: List[Player] = "P1 P2 P3 P4".split()
hands: Dict[Player, Hand] = {player: hand for player, hand in zip(players, deal_hands(deck))}
for player, cards in hands.items():
card_str = [f"{s}{r}" for (s,r) in cards]
print(f"{player}: {card_str}")
play()P1: ['♠5', '♣5', '♡9', '♢K', '♡7', '♠A', '♢A', '♠7', '♢Q', '♠2', '♣8', '♡2']
P2: ['♢2', '♡Q', '♡3', '♠Q', '♠9', '♡K', '♠6', '♢7', '♣3', '♢5', '♡8', '♢9']
P3: ['♢8', '♠K', '♣7', '♡J', '♡6', '♠8', '♣4', '♢6', '♣J', '♣9', '♠4', '♢J']
P4: ['♠3', '♡A', '♣A', '♢3', '♣6', '♠J', '♡4', '♡5', '♢4', '♣2', '♣Q', '♣K']
Type theory
Subtypes
One important concept is that of subtypes. If S is a subtype of T written as \(S <: T, S \subseteq T\) it means that S can be safely used in any context where T is expected. The following two conditions must hold:
- \(\forall s \in S\), \(s \in T\).
- If
S <: T,Callable[T, Any] <: Callable[S, Any]
For example bool <: int, because bool takes values \(0\), \(1\) which also belong to set of values an int takes on. Every function from type int is also in the set of functions from type bool.
Covariance
Consider the following code snip:
# covariance.py
from typing import Tuple
class Animal:
...
class Dog(Animal):
...
an_animal: Animal = Animal()
lassie: Dog = Dog()
snoopy: Dog = Dog()
animals: Tuple[Animal,...] = (an_animal, lassie)
dogs: Tuple[Dog, ...] = (lassie, snoopy)
dogs = animals>>> mypy ./covariance.py
covariance.py:16: error: Incompatible types in assignment
(expression has type "tuple[Animal, ...]",
variable has type "tuple[Dog, ...]") [assignment]
Found 1 error in 1 file (checked 1 source file)If \(S <: T\), then Tuple[S,...] <: Tuple[T,...]. This is because Tuple is covariant in all its arguments. Covariant types preserve orderding <: of types.
In Python, most immutable containers are covariant. Union is covariant in all its arguments. Callables are covariant in the return type.
Contravariance
Languages with first-class functions have function types like a function expecting a Dog and returning an Animal (written Callable[Dog, Animal](or Dog -> Animal). Those languages also need to specify when one function type is a subtype of another. It is safe to substitute a function f for a function g if f accepts a more general type of argument and returns a more specific type then g. For example, Animal -> Dog, Dog -> Dog and Animal -> Animal can be used wherever a Dog -> Animal is expected.
Invariance
An invariant type is neither covariant nore contravariant. Mutable containers like List are invariant.
Gradual typing and consistent types
Python supports gradual typing, where you can gradyally add type hints to your python code. Gradual typing iss essentially made possible by the Any type.
Somehow Any sits both at the top and at the bottom of the type hierarchy of subtypes. Any type behaves as if it is a subtype of Any, and Any behaves as if it is a subtype of any other type. Looking at the definition of subtypes, this is not really possible. Instead, we talk about consistent types.
The type T is consistent with the type U if T is a subtype of U or either T or U is Any.
The type checker only complains about inconsistent types. The takeway is therefore that you will never see type errors arising from the Any type.
Type Variables
A type variable is a special variable that can take on any type, depending on the situation. The syntax for TypeVar is straightforward:
Here, T is the name of the type variable. You can use any name, but single uppercase letters like T. Let’s create a simple generic function:
from typing import TypeVar
T = TypeVar('T')
def echo(value: T) -> T:
return value
print(echo(42))
print(echo("Hello"))Generic Classes
TypeVar defines a type variable, a placeholder, aka a variable that stores a type. Generic lets you use that type variable in a class.
from typing import TypeVar, Generic
T = TypeVar('T')
class Box(Generic[T]):
def __init__(self, content:T):
self.content = content
def get_content(self) -> T:
return self.content
box = Box(123)
print(box.get_content()) # Outputs: 123Now, you can create typ-e-sage boxes:
At runtime, Generic does almost nothing special - its mainly for type checkers (mypy, pyright etc.). Python doesn’t enforce types at runtime.
You can have multiple type variables.
Constraints and Bounds
TypeVars can be constrained:
Or bounded(must be subclass of):
Duck Types and Protocols
Recall the following example from the introduction:
len() can return the length of any object that has implemented the .__len__() method. How can we add type hints to len() and in particular the obj argument?
The answer hides behind the academic sounding term structural subtyping. One way to categorize type systems is by whether they are nominal or structural:
- In a nominal system, comparisons between types are based on names and declarations. The Python type system is mostly nominal, where an
intcan be used in place of afloatbecause of their subtype relationship. - In a structural system, comparisons between types are based on structure. You could define a structual type
Sizedthat includes all instances that define.__len__(), irrespective of their nominal type.
There is ongoing work to bring a full-fledged structural type system to Python through PEP-544 which aims at adding a concept called protocols. Most of PEP 544 is already implemented in mypy though.
A protocol specifies one or more methods that must be implemented. For example, all classes defining .__len__() fulfill the typing.Sized protocol. We can therefore annotate len() as follows:
Other examples of protocols defined in the typing module include Container, Iterable Awaitable and ContextManager.
You can also define your own protocols. This is done by inheriting from Protocol and defining the function signatures (with empty function bodies) that the protocol expects. The following example shows how len() and Sized could have been implemented:
How do you add type hints to a large python codebase?
TODO
Challenge Puzzle
Given the snippet below, (i) Is this valid Python code and (ii) will mypy raise an error?
In Python, the syntax for variable annotations allows you to declare the expected type of a variable even before assigning a value. This feature was introduced in PEP 526 to improve the readability and static type checking without affecting run-time behavior. Declaring a variable with an annotation, but no initial value simply informs tools like mypy of the intended type, without actually creating the variable until its assigned later in the code. Moreover, its not a syntax error.
For more such puzzles, visit getcracked.io
Control flow statements
Python supports if, for and while statements.
In a for or while loop, the break statement may be paired with an else clause. If the loop finishes without executing the break, the else clause executes.
In either kind of loop, the else clause is not executed if the loop was terminated by a break. Other wiways of ending the loop early, such as return or raised exception will also skip the execution of the else clause.
One way to think of the else clause is to imagine it paired with the if inside the loop.
The else clause has more in common with the else clause of a try statement than it does with that of if statements: a try statements else clause runs when no exception occurs and loop’s else clasue runs when no break occurs.
match statements
A match statement takes an expression and compares its value to successive patterns given as one or more case blocks. This is similar to pattern matching in languages like Rust or Haskell. Only the first pattern that matches gets executed and it can also extract components(sequence elements or object attributes) from value into variables.
There are several other key features of the assignment statement:
- Like unpacking assignments, tuple and list patterns have exactly the same meaning and actually match arbitrary sequences.
- Sequence pattens support extended unpacking:
[x,y, *rest]and(x,y,*rest)work similar to unpacking assignments. The name after*may also be_, so(x,y, *_)matches a sequence of atleast two items without binding the remaining items.
Examples
Reverse a list
Find out whether a list is a palindrome
from typing import List, TypeVar
T = TypeVar('T')
def is_palindrome(lst: List[T]) -> bool:
match lst:
case []: return True
case [x]: return True
case [head, *tail]:
if head == tail[-1]:
return is_palindrome(tail[:-1])
else:
return False
print(is_palindrome([1,2,2,1]))
print(is_palindrome([1,2,3]))
print(is_palindrome([1,2,3,2,1]))
print(is_palindrome([1,1]))
print(is_palindrome([1]))True
False
True
True
True
Flatten a nested list structure
# Transform a list, possibly holding lists as elements into a flat
# list replacing each list with its elements (recursively)
# >>> flatten([a, [b, [c, d], e]]]
# [a, b, c, d, e]# Traverse the list sequentially. We keep advancing the pointer to the
# current element. We always have a dichotomy.
# Either (1) the element is scalar-value (2) the element itself is a
# list. If the element is a scalar-value, then append it to the result.
# If the element itself is a nested list, flatten(element) will
# yield a 1-dimensional flat list.
from typing import TypeVar, List, Any
T = TypeVar('T')
def flatten(lst: List[Any], result=List[T]):
# print(f"Input lst = {lst}")
if type(lst) is list:
if len(lst) == 0:
return result
head = lst[0]
tail = lst[1:]
flatten(head, result) # recursive call to flatten the head,
# the 1D-output of this flattening will be
# stored in `result`
flatten(tail, result) # call flatten on the tail of the list
return result
else:
result.append(lst)
# print(f"result = {result}")
print(flatten([1, [2, [3, 4], 5], 6, [[[7]]]], []))[1, 2, 3, 4, 5, 6, 7]
Functions in Python
The def keyword introduces a function definition. It must be followed by the function name and parenthesized list of parameters. The statements that form the body of the function start at the next line and must be indented.
The execution of a function introduces a new symbol table used for local variables of the function. More precisely, all variable assignments in a function store the value in the local symbol table. Variable references follow the LEGB rule: they first look in the local symbol table, then the enclosing symbol table, then the global symbol table and finally in the table of builtin names.
The actual arguments to a function call are passed by object reference. That is, object references are passed by value.
from typing import List
def multiply_by_two(x: int) -> None:
print(f"multiply_by_two({x})")
x = 2 * x
def append_item(lst: List[int]) -> None:
print(f"append_item({lst})")
lst.append(5)
assert lst[-1] == 5
def reassign_list(lst: List[int]) -> None:
print(f"reassign_list({lst})")
lst = list(range(1,6))
assert lst == [1,2,3,4,5]
x = 10
print(f"x = {x}")
multiply_by_two(x)
print(f"x = {x}")
lst = list(range(0,11,2))
print(f"lst = {lst}")
append_item(lst)
print(f"lst = {lst}")
reassign_list(lst)
print(f"lst = {lst}")x = 10
multiply_by_two(10)
x = 10
lst = [0, 2, 4, 6, 8, 10]
append_item([0, 2, 4, 6, 8, 10])
lst = [0, 2, 4, 6, 8, 10, 5]
reassign_list([0, 2, 4, 6, 8, 10, 5])
lst = [0, 2, 4, 6, 8, 10, 5]
A function definition associates the function name with the function object in the current symbol table. The interpreter recognizes the object pointed to by that name as a user-defined function.
Default argument values
A default value can be specified for one or more arguments.
The default values are evaluated at the point of the function definition in the defining scope, so that:
The default value is evaluated only once. This makes a difference when the default argument is a mutable object such as a list, a dictionary or instances of most classes. For example, the following function accumulates the arguments passed to it on subsequent function calls.
The state of mutable default arguments is shared between function calls in Python.
Keyword arguments
Functions can also be called using keyword arguments of the form kwarg=value. Keyword arguments must follow positional arguments. All the keyword arguments passed must match one of the arguments accepted by the function, and their order is not important. No argument may receive a value more than once.
When a final parameter of the form **kwargs is present, it receives a dictionary containing all keyword arguments except for those corresponding to a formal parameter. *args receives a tuple of all the positional arguments beyond the formal parameter list.
**kwargs must be the final parameter in the parameter list.
Special parameters
By default, arguments may be passed to Python either by position or explicitly by keyword. For readability and performance, it makes sense to restrict the way arguments can be passed so that a developer need only look at the function definition
Exception handling
The try statement
The try statement specifies exception handlers and cleanup code for a group of statements.
The except clause specifies one or more exception handlers. When no exception occurs in the try clause, no execution handler is executed. When an excveption occurs in the try suit, a search for an exception handler is started. The search inspects the except clauses in turn until one is found that matches the exception. An exception-less except clause, if present, must be the last.
For an except clause with an expression, the expression must evaluate to an exception type or a tuple of exception types.
If no except clause matches the exception, the search for an exception handler continues in the surrounding code and on the call-stack.
If the evaluation of an expression in the header of an except clause raises an exception, the original search for a handler is canceled and asearch starts for the new exception.
except* clause
The except* clause specifies one or more handlers for groups of exeptions. A try statement can have either except or except* clauses, but not both. The exception type for matching is mandatory in the case of except*, so except*: is a syntax error.
When an exception group is raised in the try block, each except* clause splits it into subgroups of matching and non-matching exceptions. If the matching subgroup is not empty, it becomes the handled exception and assigned to the target of except* clause. Then, the body of the except* clause executes. If the non-matching subgroup is non-empty, it is processed by the next except* in the same manner. This continues untill all exceptions in the group have been matched.
After all except* clauses execute, the group of unhandled exceptions is merged with any exceptions that were raised or re-raised with except* clauses. This merged exception group propogates on.
else clause
The optional else clause is executed if the control flow leaves the try suite, no exception was raised, no return, continue or break statement was executed.
finally clause
If finally is present, it specifies a cleanup handler. It is always executed even when an exception is thrown.
OOPS
OOPs is a programming paradigm that provides a means of structuring programs so that properties and behaviors are bundled into individual objects. There are \(4\) tenets of OOPs.
- Encapsulation. Bundling data(attributes) and behaviors(methods) into a single unit. By defining methods to control access to attributes and its modificcation, encapsurlation helps maintain data integrity and promotes modular, secular code.
- Inheritance. A child class inherits data-members and behaviors from the parent class promoting code reuse.
- Abstraction. Hiding low-level implementation details. Only expose the essentials to the outside world.
- Polymorphism. It allows you to treat objects of different types as instances of the base type.
Classes vs instances
A class allows us to create a user-defined data type. Classes are like the blue-print for an object. An object is an instance of a class. All objects of a class have the same blueprint.
When you call a class constructor, first .__new__() method is called to create a new instance of the class. This returns an empty object. Then, another special method, .__init__() is called that takes the result object and initializes it using constructor-args.
Subclassing immutable built-in types
Let’s explore another use-case of .__new__() method that consists of subclassing an immutable built-in type. As an example, say you need to write a Vector class as a subclass of Python’s tuple type. While this class doesn’t have any additional attributes, it provides methods to perform component-wise vector addition, subtraction and scalar multiplication.
>>> class Vector(tuple):
... def __init__(self, value):
... super().__init__(value)
...
>>> Vector((1,2))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Vector((1,2))
~~~~~~^^^^^^^
File "<stdin>", line 3, in __init__
super().__init__(value)
~~~~~~~~~~~~~~~~^^^^^^^
TypeError: object.__init__() takes exactly one argument (the instance to initialize)
>>> The problem is that tuple.__init__(value) calls tuple.__new__(value) and it doesn’t take arguments the same way as object.__new__().
To workaround this issue, we can initialize the object at creation time with .__new__() instead of overriding .__init__(). Here’s how you can do this in practice:
import math
class Vector(tuple):
def __new__(cls, value):
instance = super().__new__(cls, value)
return instance
def __add__(self, other):
result = ()
for e1, e2 in zip(self, other):
elem = e1 + e2
result = result + (elem,)
return Vector(result)
def __sub__(self, other):
result = ()
for e1, e2 in zip(self, other):
elem = e1 - e2
result = result + (elem,)
return Vector(result)
def dot(self, other) -> float:
result = 0.0
for e1, e2 in zip(self, other):
result += e1 * e2
return result
def length(self):
return math.sqrt(sum(x**2 for x in self))
v1 = Vector((1,2))
v2 = Vector((3,4))
v3 = v1 + v2
print(f"v1 + v2 = {v3}")
v4 = v2 - v1
print(f"v2 - v1 = {v4}")
inner_product = v1.dot(v2)
print(f"Inner product = {inner_product}")
v1_norm = v1.length()
v2_norm = v2.length()
print(f"v1_norm = {v1_norm}")
print(f"v2_norm = {v2_norm}")v1 + v2 = (4, 6)
v2 - v1 = (2, 2)
Inner product = 11.0
v1_norm = 2.23606797749979
v2_norm = 5.0
In this example, .__new__() runs the steps that you learned in the previous section. First, the method creates a new instance of the current class, cls by calling super().__new__(). This time, the call rolls back to tuple.__new__() which creates a new instance and initializes it using value as an argument. Then, we customize the new instance by adding additional attributes that we’d like. Finally, a new instance gets returned.
PEP-8 says: - Always use self for the first argument to instance methods. - Always use cls for the first argument to class methods
Writing a factory class
Writing a factory class is often a requirement that can raise the need for a custom implementation of .__new__(). However, you should be careful, because in this case, Python skips the initialization step entirely. So, we have the responsibility of taking the newly created object into a valid state, before using it in our code.
from random import choice
class Pet:
def __new__(cls):
other = choice([Dog, Cat, Python])
instance = super().__new__(other)
print(f"I am a {type(instance).__name__}!")
return instance
class Dog:
def communicate(self):
print("woof! woof!")
class Cat:
def communicate(self):
print("meow! meow!")
class Python:
def communicate(Self):
print("hiss! hiss!") In this example, Pet provides a .__new__() method that creates a new instance by randomly selecting a class from a list of existing classes.
Also, remember .__init__() of Pet is never called. That’s because Pet.__new__() always returns objects of a different class rather than of Pet itself.
Allowing only a single instance of your class
Sometimes you need to implement a class that allows the creation of a single instance only. This type of class is commonly known as a singleton class. In this situation, the __new__() method comes in handy because it can help you restrict the number of instance that a given class can have.
Most experienced Python developers would argue that you don’t need to implement the singleton design pattern in Python unless you already have a working class and need to add the pattern’s functionality on top of it.
Here’s an example of coding a Singleton class with a .__new__() method that allows the creation of only one instance at a time. To do this, .__new__() checks the existence of previous instanced cached on a class attribute.
@classmethod vs @staticmethod
Inheritance and composition, super keyword
Providing multiple constructors in a Python class
Operator and function overloading
Python Decorators
Python metclasses
dataclass module
Implementing map/dict
Python closures
What are Mixin classes in Python?
Python bindings: Calling C or C++ with Python
collections module
all and any in Python
Consider the following code snip
all(iterable) returns True if all elements in the iterable are truthy. For an empty iterable, there are no elements that violate the condition. In logic, this is called vacuously true: since there’s nothing to falsify the statement, it’s considered True. So, bool(all([])) is True
any(iterable) returns True if atleast one element in the iterable is truthy. For an empty iterable, there are no elements at all, so the condition atleast one is true cannot be satsified. So, any([]) is False.
Flattening a dict of dicts
quote_types = {
'Bids' : {
1 : [10, 45],
2 : [25, 47.5],
3 : [30, 49.5]
},
'Offers' : {
1 : [30, 50.5],
2 : [25, 52.5],
3 : [10, 55]
}
}
dict_of_height_3 ={
'a' : {
'b' :{
'c' : 1,
'd' : 2,
},
'e' : {
'f' : 3,
'g' : 4,
}
},
'h' : {
'i' : {
'j' : 5,
'k' : 6,
},
'l' : {
'm' : 7,
'n' : 8,
},
}
}
def flatten_dict(d : dict, parent_key = '', sep = '_'):
result = {}
for k,v in d.items():
if (type(v) is dict):
# Recursively flatten the child element
child_flat_dict = flatten_dict(v, parent_key=str(k))
# We now have a dict-of-dicts of height 2
for child_k, child_v in child_flat_dict.items():
key = parent_key + sep + child_k if parent_key > '' else child_k
result[key] = child_v
else:
key = parent_key + sep + str(k)
result[key] = v
return result
print("flattening quotes\n")
flatten_dict(quote_types)flattening quotes
{'Bids_1': [10, 45],
'Bids_2': [25, 47.5],
'Bids_3': [30, 49.5],
'Offers_1': [30, 50.5],
'Offers_2': [25, 52.5],
'Offers_3': [10, 55]}
list() in Python
lists are mutable sequences typically used to store collections of homogenous items.
list.append(x:Any)->None adds a single-item to the end of the list, in-place. list.extend(Iterable)->None extends the list in-place by appending all items from the iterable, and returns None.
list.insert(i,x)->None inserts an element x at the given index i. list.remove(x) removes the first item from the list who value is equal to x. list.pop([i]) removes the item at the given position in the list and returns it. If no index is specified, list.pop() removes and returns the last element in the list.
Reverse a list
# recursive solution
def reverse(l : List, acc : List = []) -> List:
if(len(l) == 0):
return acc
if(len(l) == 1):
l.extend(acc)
return l
new_acc = [l[0]]
new_acc.extend(acc)
return reverse(l[1:], new_acc)
def reverse_iter(l : List) -> List:
result = []
for element in l:
result.insert(0, element)
return result
items = [2, 17, 42, 15, 3]
reverse(items)[3, 15, 42, 17, 2]
Determine if the list is a palindrome
Flatten a nested list
Eliminate consecutive duplicates of list elements
Always use key in my_dict directly instead of key in my_dict.keys(), if you want to check the existence of a key in a dict. That will use the dictionary’s \(O(1)\) hashing rather than \(O(n)\). my_dict.keys() returns a list of keys.
from typing import List
# Remove duplicates from a nested-list while preserving the
# the structure
def array_unique(l : List, unique_elements : dict={}) -> (List,dict):
result = []
for element in l:
if type(element) is list:
# get the list of unique children and append it to result
child_list, unique_elements = array_unique(element, unique_elements=unique_elements)
result.append(child_list)
else:
if element in unique_elements:
continue
else:
result.append(element)
unique_elements[element] = True
return result, unique_elements
my_array = [1, [1, 2, [1, 2, 3], 4, 5], [5, 6], 7]
result, _ = array_unique(my_array)
result[1, [2, [3], 4, 5], [6], 7]
List comprehensions
Nested List comprehensions
tuples in Python
lists are mutable wherease tuples are immutable types. The contents of a tuple cannot be modified at run-time. They usually store a heterogenous collection of items.
sets in Python
Python also includes a data-type for sets. A set is an unordered collection with no duplicate elements. Basic uses include membership testing and eliminating duplicate entries. Set objects also support mathematical operations such as union, intersection, difference and symmetric difference.
Curly braces or set() is used to create sets.
Python 3.8 walrus := operator
:= assigns a value to a variable and simultaneous returns the value. For example:
The list has non-zero length = 5
Another motivating use-case is when looping over fixed-length blocks in a protocol parser.