Data Model

Objects are Python’s abstraction for data. All data in a Python program is reresented by objects or relations between objects. Even code is represented by objects.

Object values and Types

Every object has an identity, type and value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. The is operator compares the identity of two objects, the id() function returns an integer representing its identity.

For CPython, id(x) is the memory address where x is stored.

An object’s type determines the operations that the object supports (e.g. does it have a length?) and also defines the possible value for the objects of that type. The type() function returns an object’s type (which is an object itself).

x = 42
print(f"x = {x}")
print(f"type(x) = {type(x)}")
print(f"type(type(x)) = {type(type(x))}")

x = 42
type(x) = <class 'int'>
type(type(x)) = <class 'type'>

The value of some objects can change. Objects whose value can change are said to be mutable; objects whose value is unchangeable once they are created are called immutable. (The value of an immutable container object that contains a reference to mutable object can change when the latter’s value is changed; however the container is still considered immutable, because the collection of objects it contains cannot be changed. So, immutability is not strictly the same thing as having an immuteable value, it is more subtle.) An object’s mutability is determined by its type; for instance, numbers, strings, and tuples are immutable, while dictionaries and lists are considered mutable.

Immutable types are always compared by identity. Mutable types are always compared by value.

Objects are never explicitly destroyed; however when they become unreachable they may be garbage collected. CPython currently uses a reference counting scheme with delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references.

Note that, debug or tracing facilities may keep objects alive that would be normally collectable. try...except may keep objects alive.

Some objects contain references to external resources such as open files and windows. It is understood that these resources are freed when the object is garbage collected, but since GC is not guaranteed to happen, such objects also provide an explicit way to release the external resource, usually some kind of close() method. The try...finally and with statement provide convenient ways to do this.

Some objects contain references to other objects; these are called containers. Examples of containers are tuples, lists and dictionaries. When we talk about the value of the container, we imply the value of the contained objects.

For immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed. For example, after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation. This is because int is an immutable type, so the reference to 1 can be reused.

Built-in Types in Python

There is a core set of built-in types natively supported by Python runtime. The standard library defines additional types.

`None`

This type has a single value. There is a single object with this value. It is used to signify the absence of a value in many situations. Its truth value is False.

`NotImplemented`

This type has a single value. Again, there is a single object with this value.

Ellipsis

This type also has a single value and there is single object with this value. The object is accessed through the literal .... Its truth value is True.

Numbers

Python numbers are of course strongly related to mathematical numbers, but subject to the limitations of numerical representation in computers.

Numeric objects are immutable; once created their value never changes. Hence, they can be used as keys in a dictionary.

Integers (`int`)

These represent numbers in an unlimited range, subject to available virtual memory only. For the purpose of shift and mask operations, a binary representation is assumed, and negative numbers are represented in a variant of \(2\)’s complement, which gives the illusion of an infinite string of sign bits extending to the left.

CPython has a global limit for converting between int and str upto \(4,300\) digits by default. This limitation aims to prevent DoS attacks that exploit the quadratic complexity of integer to string conversions. This limit only applies to decimal or other non-power of two number bases. Hexadecimal, octal and binary conversions are unlimited.

The int type in CPython is an arbitrary length number stored in the binary form(commonly known as bignum). There exists no algorithm that convert a string to a binary integer or a binary integer to a string in linear time, unless the base is a power of \(2\). Converting a large value such as int('1' * 500_000) can take over a second on a fast CPU.

The default limit is \(4300\) digits as provided in sys.int_info.default_max_str_digits.

import sys
assert sys.int_info.default_max_str_digits == 4300

Booleans (`bool`)

These represent the truth values False and True. The two objects representing the values False and True are the only Boolean objects.

Real numbers (`float`)

These represent IEEE-754 double-precision 64-bit floating point numbers.

import sys
sys.float_info

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

print(f"float_info.max = {sys.float_info.max}, float_info.min = {sys.float_info.min}")

float_info.max = 1.7976931348623157e+308, float_info.min = 2.2250738585072014e-308

Note that float represents infinity as inf, which is considered larger than any numerical value.

print(float('inf') > sys.float_info.max)

True

Complex numbers (`complex`)

These represent com plex numbers as a pair of double-precision floating-point numbers. The real and imaginary parts of a complex number z can be retrieved through the read-only attributes z.real and z.imag.

Sequences

These represent finite ordered sets indexed by non-negative numbers. The built-in function len() returns the number of items of a sequence. When the length of a sequence is \(n\), the index set contains the numbers \(0,1,\ldots,n-1\). Item \(i\) of a sequence \(a\) is selected by \(a[i]\). Some sequences including built-in sequences interpret negative subscripts by adding the sequence length. For example, a[-2] means a[n-2], the second to last item of a sequence a of length n.

Sequences also support slicing. a[i:j:k] selects all items with index x where x = i + nk, \(0 \leq x < j\), \(n \geq 0\).

Immutable sequences

An object of an immutable sequence type cannot change once it is created. (If an object contains references to other objects, these other objects may be mutable and may be changed; however the collection of objects directly referenced by an immutable object cannot change.)

Strings

A string is a sequence of values that represent unicode code points. Python doesn’t have a char type.

Tuples

The items of a tuple are arbitrary python objects. Tuple of two or more items are formed by comma separated list of expression. An empty tuple can be formed by a empty pair of parentheses.

import sys

def empty_tuple_example():
    t = ()
    print(f"Empty tuple size = {sys.getsizeof(t)}")
    
empty_tuple_example()

Empty tuple size = 40

Bytes

A bytes object is an immutable array. The items are \(8\)-bit bytes, represented in the range \(0 \leq x < 256\). Bytes literals like b'abc' and the bytes() constructor can be used to create bytes objects.

Mutable sequences

Mutable sequences can be changed after they are created. The subscript and slicing notations can be used as a target of assignment and del (delete) statements.

There are currently two intrinsic mutable sequence types:

Lists : The items of a list are arbitrary Python objects. Lists are formed by a comma separated list of expressions inside square brackets.
Byte Arrays : A bytearray object is mutable array. It is created by the built-in bytearray() constructor. Aside from being mutable, byte arrays provide the same interface and functionality as immutable bytes objects.

Since lists and bytearrays are mutable, they are unhashable.

Set types

These represent unorderd, finite sets of unique elements. As such they cannot be indexed by a subscript. However, they can be iterated over, and the built-in function len() returns the number of items in a set. Common uses of sets are fast membership testing, removing duplicates from a sequence, computing mathematicl operations such as intersection, union, difference and symmetric difference.

Note that, if two numbers compare equal (e.g. 1 and 1.0) only one of them can be contained in a set.

There are currently two intrinsic set types:

Sets

They represent a mutable set. They are created by the set() constructor and can be modified by methods such as add.

Frozen Sets

These represent an immutable set. They are created by the built-in frozenset() constructor. As a frozenset is immutable and hashable, it can be used again as an element of another set, or as a dictionary key.

Mappings

These represent finite sets of objects indexed by arbitrary index sets. The subscript notation a[k] selects the item indexed by k from the mapping a.

There is currently a single intrinsic mapping type: dict.

Dictionaries

They represent key-value pairs \(\texttt{key}\mapsto \texttt{value}\) or a set of mappings. key can be any immutable type. Thus, numeric types (int, bool, float, complex), immutable sequence types (str, tuple, bytes) immutable set type (frozenset) can be used as dictionary keys. The only things not acceptable as keys are values containing lists or dictionaries or other mutable types that are compared by value rather than object identity. The reason for this is that an efficient implementation of dictionaries requires a key’s hash vaue to remain constant. Numeric types used for keys obey normal rules for numeric comparision: if two numbers compare equal (e.g. 1 and 1.0) then they can be used interchangeably.

Dictionaries preserve insertion order, meaning that keys will be produced in the same order they were added sequentially over the ddictionary. Replacing an existing key does not change the order. Removing a key and re-inserting it will add to the end instead of keeping its old place.

Dictionaries are mutable; they can be created by the {} notation.

Note

An immutable container is hashable if all of its elements are hashable. For example, a list is mutable and therefore not hashable. A tuple a=(1,) is hashable. The tuple b=(2,[1,2,3]) is not hashable, because it contains an unhashable type.

Callable types

These are the types to which the function call operation can be applied.

User-defined functions

A user-defined function object is created by a function definition.

Instance methods

An instance method object combines a class, a class instance and any callable object(normally a user-defined function).

Generator functions

A function or method which uses the yield statement is called a generator function. Such a function, when called always returns an iterator object, which can then be used to execute the body of the function: calling the iterator’s iterator.__next__() method will cause the function to execute until it provides a value using the yield statement. When the function executes the return statement or falls off the end, a StopIteration exception is raise and the iterator will have reached the end of the set of values to be returned.

Coroutine functions

A function or method which is defined using async_def is called a coroutine function. A coroutine is any function that can be paused in the middle. Such a function, when called, returns a coroutine object. It may contain await expressions, as well as async_with and async_for statements.

Asynchronous generator functions

A function or method which is defined using the async_def and which uses the yield statement is called a asynchronous generator function. Such a function, when called, returns an asynchronous iterator object which can be used in an async_for statement to execute the body of the function.

Calling the asynchronous iterator’s aiterator.__anext__() method will return an awaitable which when awaited will execute until it provides a value using the yield expression. When the function executes an empty return statement or falls off the end, a StopAsyncIteration exception is raised and the asynchronous iterator will have reached the end of the values to be yielded.

Built-in functions

A built-in function object is a wrapper around a C function. Examples of built-in unctions are len(), zip() etc.

Built-in methods

This is really a different disguise of a built-in function, this time containing an object passed to the C function as an implicit extra argument. An example of a built-in method is .append() on a list object.

Classes

Classes are callable. These object normally act as factories for new instances of themselves, but variations are possible for class types that override __new__().

Modules

The most basic organization unit of code in Python is a module. Python code in one module gains access to the code in another module by the process of importing it.

Special method names

A class can implement certain operations that are invoked by special syntax (such as arithmetic operations, subscripting or slicing). This is Python’s approach to operator overloading : allowing classes to define their own behavior w.r.t language operators. For example, if a class defines a method __getitem__(self, pos) and x is an instance of the class, x[i] is roughly equivalent to x.__getitem__(pos).

Setting a special method to None means that the operation is not available.

Basic customization

`new(cls,...)`

This method is called to create a new instance of the class cls. __new__() is mainly intended to allow subclasses of immutable types(like int, str, tuple) to customize instance creation. If __new__() does not return an instance of cls, then the new instance’s __init__() method won’t be invoked.

If __new__(cls,...) is invoked during object construction and it returns an instance of cls, then the new instance’s .__init__() method will be invoked like __init__(self,...) where self is the new instance and the remaining arguments are the same as were passed to the object constructor.

`init(self,...)`

This method is called after the instance has been created by __new__(), but before it is returned to the caller. The arguments of __init__() are those passed to the class constructor and are typically used to initialize member-data of the class.

If a base class has an __init__() method, the derived class’s __init__() method must explicitly call it to ensure proper initialization of the base class member data using super().__init__([args...]).

`object.del(self)`

This method is called when the instance is about to be destroyed. This is also the so called finalizer or destructor. If a base class has a __del__() method, the derived class’ __del__() method, if any, must explicitly call the base class’ __del__().

Note

del x doesn’t directly call x.__del__() - the former decrements the reference count for x by one, and the latter is only called when x’s reference count reaches zero.

`object.repr(self)`

This method is called by the repr() built-in function to compute the official string representation of the object.

`object.str(self)`

This method is called by the str(object), the default __format__() implementation and the built-in function print()

`object.lt(self, other)` and friends

These are the so-called rich-comparison methods. The correspondence between operator symbols and method names is as follows: x<y calls x.__lt__(y), x<=y calls x.__le__(y), x==y calls x.__eq__(y), x!=y calls x.__ne__(y), x>y calls x.__gt__(y), and x>=y calls x.__ge__(y).

`object.hash(self)->int`

The __hash__() method should return an integer. Objects that compare equal in value, must have the same hash value.

User-defined classes have __eq__() and __hash__() methods by default (inherited from the object class); with them, all objects compare unequal (except themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

Special attributes

`object.dict`

A dictionary

Modules and Packages

The `import` statement

`import <module_name>`

When Python executes the statement

import foo

it searches for foo.py in a list of directories:

Module Cache sys.modules
Current directory
List of directories contained in the PYTHONPATH environment variable.

Note that import <module_name> does not make the module contents directly accessible to the caller. A module creates a separate namespace.

The statement import <module_name> only places the <module_name> in the caller’s symbol table. The objects that are defined in the module remain in the module’s private symbol table.

From the caller, objects in the module are only accessible when prefixed with the <module_name> using the dot notation.

When a module is first imported, Python searches for the module and if found, it creates a module object, initializing it. If the named module cannot be found, a ModuleNotFoundError is raised. Python employs various strategies to search for the module when the import machinery is invoked.

`from <package_name> import <module_name>`

An alternate form of the import statement allows individual objects from the module to be imported directly into the caller’s symbol table:

from <package_name> import <module_name>

# mod.py
s = "Impossible is the word in the dictionary of fools."

def foo(arg):
    print(f"arg = {arg}")

class Foo:
    def __init__(self):
        pass
        
# main.py
from mod import s, foo
print(f"s : {s}")   
# s : "Impossible is the word in the dictionary of fools."

print(f"{foo('qux')}")
# arg = qux

from mod import Foo
x = Foo()
x
# <mod.Foo object at 0x02E3AD50>

Because this form of import places the object names directly into the caller’s symbol table, any objects that already exist with the same name will be overwritten.

# mod.py
a = [100, 200, 300]

# main.py
a = ['foo','bar','baz']
from mod import a
print(f"a = {a}")
# a = [100, 200, 300]

`import <module_name> as <alt_name>`

You can also import an entire module under an alternate name:

import <module_name> as <alt_name>

For example, if we do import numpy as np then any symbol such as sum() will be added to the caller’s symbol table as np.sum().

Module contents can be imported from within a function definition. In that case, the import does not occur until the function is executed.

Executing a module as a script

Any .py file that contains a module is also a python script and you can execute using python <my_module>.py.

We can distinguish between when the file is loaded as a module and when it is run as a standalone script. When a .py file is imported as a module, Python sets the special dunder variable __name__ to the name of the module. However, if the file is run as a standalone script, __name__ is set to the string __main__.

Modules are often designed with the capability to run as a standalone script for the purposes of testing the functionaility that is contained within the module. This is called unit testing.

from typing import List

def sum_arr(arr: List[float]) -> float:
    result = sum(x for x in arr)
    return result
    
if __name__ == "__main__":
    assert sum_arr([1,2,3,4,5]) == 15

Namespaces

A python namespace is a mapping from names to objects. A namespace is like a container that holds the currently defined symbolic names and the objects each name references. You can think of a namespace as a dictionary, in which the keys are the object names and the values are the objects themselves. Each key-value pair maps a name to its corresponding object.

Namespaces let you use the same name in different contexts without collisions. It’s like giving everything its own little room in a well-organized house.

`dir()` function

The built-in function dir() returns a list of defined names in a namespace.

dir()

['In',
 'List',
 'Out',
 '_',
 '_3',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__name__',
 '_dh',
 '_i',
 '_i1',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_i7',
 '_i8',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'empty_tuple_example',
 'exit',
 'get_ipython',
 'ojs_define',
 'open',
 'quit',
 'sum_arr',
 'sys',
 'x']

There are \(4\) different types of namespaces:

Built-in
Global
Local
Enclosing or non-local

These namespaces have differing lifetimes. As Python executes a program, it creates namespaces as necessary and removes them when it no longer needs them. Typically, many namespaces will exist at any given time.

The global, local and enclosing namespaces are implemented as dictionaries. In contrast, the built-in namespace isn’t a dictionary, but a module called builtins.

`builtin` namespace

The built-in namespace contains the names of all Python’s built-in objects. This namespace is available when the python interpreter is running. So, you can access the names in this namespace at any time in your code without explicitly importing them.

You can list the objects in the built-in namespace with the dir() function using __builtins__ as an argument:

dir(__builtins__)

Things like built-in exceptions, built-in functions, built-in data types live in this namespace.

The Global namespace

The global namespace contains names defined at the module level. Python creates a main global namespace when the main program’s body starts. This namespace remains in existence until the interpreter terminates.

Additionally, each module has its own global namespace. The interpreter creates a global namespace for any module your program loads with the import statement.

The local namespace

The python interpreter creates a new and dedicated namespace whenever you call a function. The namespace is local to the function and exists only until the function returns:

>>> def double_number(number: int) -> int:
...     result = 2 * number
...     print(dir())
...     return result
... 
>>> double_number(4)
['number', 'result']
8
>>> result
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    result
NameError: name 'result' is not defined

The enclosing namespace

You can also define one function inside the other. In this case, you’d have what’s known as an inner function. In this case, you’d have what is known as an inner function.

global_variable = "global"

def outer_func():
    # Non local scope
    nonlocal_var = "nonlocal"
    def inner_func():
        # Local scope
        local_variable = "local"
        print(f"Hi from the '{local_variable}' scope!")
        print(f"Hi from the '{nonlocal_var}' scope!")    
        print(f"Hi from the '{global_variable}' scope!")            
        
    inner_func()
    
outer_func()

Hi from the 'local' scope!
Hi from the 'nonlocal' scope!
Hi from the 'global' scope!

In this example, we first created a global variable at the module level. Then, we defined a function called outer_func(). Inside this function, we have a nonlocal_var, which is local to outer_func(), but non-local to inner_func(). In inner_func(), we create another variable called local_variable, which is local to the function itself.

Each of these namespaces remains in existence until the respective function returns. Python might not immediately reclaim the memory allocated for those namespaces, but all references to the objects become invalid.

Scope and LEGB rule

The existence of multiple, distinct namespaces allows you to have several different instances of a particular name simultaneously while a Python program runs. As long as each instance is in a different namespace, they’re all maintained separately and won’t interfere with one another.

That raises a question. Suppose you refer to the name x in your code, and x exists in several namespaces. How does Python know which one you mean each time?

Python looks for x in the following order:

\[ \text{Local} \longrightarrow \text{Enclosed} \longrightarrow \text{Global} \longrightarrow \text{Built-in} \]

The below code snip agree’s with our intuitive understanding of global and local variables.

x = 42
print(f"x = {x}")

def f():
    x = 17
    print(f"x = {x}")
    
f()
print(f"x = {x}")

x = 42
x = 17
x = 42

If we want to modify the value of the global variable x from within f, we can use the global keyword.

x = 42
print(f"x = {x}")

def f():
    global x
    x = 17
    print(f"x = {x}")
    
f()
print(f"x = {x}")

x = 42
x = 17
x = 17

Similarly, when we have nested functions, we can use the nonlocal keyword.

Packages

To help organize modules and provide a naming hierarchy, Python has a concept of packages. You can think of packages as the directories on a file sytem and modules as files within directories, but don’t take this analogy too literally, since packages and modules need not originate from the file system. For the purposes of this blog note, we’ll use this convenient analogy of directories and files. Like file system directories, packages are organized hierarchically, and packages may themselves contain subpackages as well as regular modules.

Its important to keep in mind that all packages are modules, but not all modules are packages. Or put another way, packages are just a special kind of module. Specifically, any module that contains a __path__ attribute is considered a package.

All modules have a name. Subpackage names are separated from their parent package name by a dot, akin to Python’s standard attribute access syntax.

import pkg only places the name pkg in the caller’s symbol table and doesn’t import any modules.

Package initialization

If a file named __init__.py is present in the package directory, it is invoked when the package or a module in the package is imported. __init__.py can be used to effect automatic importing of modules from a package.

Earlier, we saw that the import pkg statement only places the name pkg in the caller’s symbol table and doesn’t import any modules. But, if __init__.py in the pkg directory contains the following:

# __init__.py
print(f"Invoking __init__.py for {__name__}")
import pkg.mod1, pkg.mod2

Then when you execute import pkg, modules mod1 and mod2 are automatically imported.

Importing * from a package

When import * is used for a module, all objects form the module are imported into the local symbol table, except those whose names begin with an underscore.

The analogous statement for a package is:

from <package_name> import *

What does that do? Not much! Python follows the following convention. If the __init__.py file in the package directory contains a list name __all__, it taken to be a list of modules that should be imported when the statement from <package_name> import * is encountered.

Subpackages

Packages can contain nested packages to arbitrary depth. Consider

/pkg
    __init__.py
    /sub_pkg_1
        __init__.py
        mod_in_sub_pkg_1.py
    /sub_pkg_2
        __init__.py
        mod_in_sub_pkg_2.py

We can use both absolute imports and relative imports.

Absolute imports use the fully qualified name.
.. evaluates to the parent package
...<some_sub_pkg> evaluates to some sub-package of the parent package.

Regular packages

Python defines two types of packages : regular packages and namespace packages. Regular packages are traditional packages as they existed in Python 3.2 and earlier. A regular package is typically implemented as a directory containing an __init__.py file. When a regular package is imported, this __init__.py is executed and the objects it defines are bound to the names in the package’s namespace.

Consider the following file-system layout that defines a top level parent package with three subpackages:

parent/
    __init__.py
    one/
        __init__.py
    two/
        __init__.py
    three/
        __init__.py

Importing parent.one will implicitly execute parent/__init__.py and /parent/one/__init__.py. Subsequent imports of parent.two and parent.three will execute parent/two/__init__.py and parent/three/__init__.py respectively.

Namespace packages

A namespace package is a composite of various portions where each portion contributes a subpackage to the parent package. Portions may reside in different locations of the file system.

Searching

To begin searching, Python needs the fully qualified name of the module (or package, but for the purposes of this blog note, the difference is immaterial) being imported.

The fully qualified name will be used in various phases of the import search, and it may be dotted path to the submodule e.g. foo.bar.baz. In this case, Python first tries to import foo, then foo.bar and then finally foo.bar.baz. If any of the intermediate imports fail, a ModuleNotFoundError is raised.

The module cache

The first place checked during import search is sys.modules. This mapping serves as a cache of all modules that have been previously imported, including the intermediate paths. So, if foo.bar.baz was previously imported, sys.modules will contain entries for foo, foo.bar and foo.bar.baz. Each key will have as its value the corresponding module object.

If module A imports module B, and module B then tries to import module A, Python won’t re-import A. Since A’s name is aleady present in sys.modules (even though module object is not fully loaded yet), the import system simply returns the existing (but partially initialized) module instead of loading it again.

Challenge puzzle

What is printed when you run python main.py? Note that the comment delineate separate files.

# alpha.py
print("alpha", end="")
import beta
print("X", end="")

# beta.py
print("beta", end="")
import alpha
print("Y",end="")

# main.py
print("main", end="")
import alpha
print("Z", end="")

For more such puzzles, visit getcracked.io.

Python Type Checking

Type systems

All programming languages include some kind of type system that formalizes the categories of objects it can work with and how those categories are treated.

Dynamic typing

Python is a dynamically typed language. This means that the Python interpreter does type checking only at runtime, and that the type of a variable is allowed to change over its lifetime.

Static type

Many higher-level languages such as C, C++, Java are statically typed. The type and size of the variable are known at compile-time.

Python will always remain a dynamically typed language. However, PEP-484 introduced type hints, which makes it possible to also do static type checking of Python code.

Unlike how types work in most other statically typed languages, type hints by themselves don’t cause Python to enforce types. As the name suggests, type hints just suggest types. There are other tools like mypy that perform static type checking using type hints.

Duck Typing

Another term that is often used when talking about Python is duck typing. This moniker comes from the phrase: “If it walks like a duck, and quacks like a duck, it must be a duck!”.

Duck typing is a concept related to dynamic typing, where the type or class of the object is less important than the methods it defines. Using duck typing, you do not check the types at all. Instead you check, for the presence of a given method or attribute.

For example, if I define:

class TheHobbit:
    def __len__(self):
        return 1024

the_hobbit = TheHobbit()
print(f"Length = len(the_hobbit)")

Length = len(the_hobbit)

I would say that all strings, tuples, lists, dicts and instances of TheHobbit have a length.

`typing` module

Python’s typing module provides tools for adding type hints to your code. Type hints make your programs more readable, safer to refactor and helps static type checkers like mypy to catch errors before runtime.

from typing import TypedDict

class Person(TypedDict):
    name: str
    age: int

def print_people(people: list[Person]) -> None:
    for person in people:
        print(f"{person['name']} is {person['age']} years old")
        
print_people(
    [{"name": "Johannes Bernoulli", "age": 50},
     {"name": "Carl Friedrich Gauss", "age":60}]
)

Johannes Bernoulli is 50 years old
Carl Friedrich Gauss is 60 years old

Features

Supports type hinting for variables, functions and classes
Allows for custom type creation using TypeVar and NewType
Supports defining reusable type aliases for better code readability
Enables complex type definitions
Supports defining structural subtyping interfaces with protocols

Frequently used classes and functions

Object	Type	Description
`typing.NamedTuple`	class	Creates tuple-like classes with named fields
`typing.TypeDict`	class	Creates dictionary-like classes with fixed set of keys
`typing.List`	class	Represents a list
`typing.Callable`	class	Represents a callable with a specified signature
`typing.Any`		Represents a type that can be of any value
`typing.TypeVar()`	Function	Creates a generic type variable
`typing.NewType()`	Function	Creates a distinct type based on an existing one
`typing.Protocol`		Defines a structural subtyping interface that classes can implement

Examples

Define a class using NamedTuple and a function that takes it as an argument.

from typing import NamedTuple
import datetime as dt

class Employee(NamedTuple):
    employee_id : int
    name: str
    date_of_joining : dt.date
    

def greet_employee(employee: Employee) -> str:
    return f"Hi {employee.name}, employee_id : {employee.employee_id}, welcome to the firm!"
    
greet_employee(Employee(1, "Alice", dt.date(2025,1,1)))

'Hi Alice, employee_id : 1, welcome to the firm!'

We can define a type alias to simplify complex type annotations:

import datetime as dt
from typing import Dict, Union
UserData = Dict[str, Union[str,int]]

def format_user(user: UserData) -> str:
   return f"{user["name"]} has last login date {user["last_login"]}"
   
format_user({"name": "John Carmack", "last_login": dt.date(2025,6,30)})

'John Carmack has last login date 2025-06-30'

Create a protocol and implement it in a class:

from typing import Protocol

class Greeter(Protocol):
    def greet(self) -> str:
        ...
        
class FriendlyGreeter:
    def greet(self)-> str:
        return "Hello, friend!"
        
def welcome(greeter: Greeter) -> None:
    print(greeter.greet())
    
welcome(FriendlyGreeter())

Hello, friend!

Function Annotations

For functions you can annotate arguments and return value. This is done as follows:

import math

def area(radius: float) -> float:
    return math.pi * radius**2

There are special mypy expressions : reveal_type() and reveal_locals() that you can add to your code and mypy will report which types it has inferred.

# reveal.py

import math
reveal_type(math.pi)

radius = 1
circumference = 2 * math.pi * radius
reveal_locals()

>>> mypy ./reveal.py                                                                               
reveal.py:4: note: Revealed type is "builtins.float"
reveal.py:8: note: Revealed local types are:
reveal.py:8: note:     circumference: builtins.float
reveal.py:8: note:     radius: builtins.int
Success: no issues found in 1 source file

Variable annotations

We can also annotate variable definitions just like function signatures.

Implementing a simple card game

import random
from typing import List, Tuple

Card = Tuple[str, str]
Deck = List[Card]
Hand = List[Card]
Player = str

suits = "♠ ♡ ♢ ♣".split()
ranks = list(range(2,10)) + "J Q K A".split()

def create_deck(shuffle: bool = False) -> Deck:
    """Create a new deck of 52 cards"""
    deck = [(s,r) for s in suits for r in ranks]
    if shuffle:
        random.shuffle(deck)
    return deck

def deal_hands(deck : Deck) -> Tuple[Hand, Hand, Hand, Hand]:
    """Deal the cards in the deck into four hands"""
    return (deck[0::4], deck[1::4], deck[2::4], deck[3::4])

def play():
    """Play a 4-player card game"""
    deck: Deck = create_deck(shuffle=True)
    players: List[Player] = "P1 P2 P3 P4".split()
    hands: Dict[Player, Hand] = {player: hand for player, hand in zip(players, deal_hands(deck))}

    for player, cards in hands.items():
        card_str = [f"{s}{r}" for (s,r) in cards]
        print(f"{player}: {card_str}")

play()

P1: ['♠5', '♣5', '♡9', '♢K', '♡7', '♠A', '♢A', '♠7', '♢Q', '♠2', '♣8', '♡2']
P2: ['♢2', '♡Q', '♡3', '♠Q', '♠9', '♡K', '♠6', '♢7', '♣3', '♢5', '♡8', '♢9']
P3: ['♢8', '♠K', '♣7', '♡J', '♡6', '♠8', '♣4', '♢6', '♣J', '♣9', '♠4', '♢J']
P4: ['♠3', '♡A', '♣A', '♢3', '♣6', '♠J', '♡4', '♡5', '♢4', '♣2', '♣Q', '♣K']

Type theory

Subtypes

One important concept is that of subtypes. If S is a subtype of T written as \(S <: T, S \subseteq T\) it means that S can be safely used in any context where T is expected. The following two conditions must hold:

\(\forall s \in S\), \(s \in T\).
If S <: T, Callable[T, Any] <: Callable[S, Any]

For example bool <: int, because bool takes values \(0\), \(1\) which also belong to set of values an int takes on. Every function from type int is also in the set of functions from type bool.

Covariance

Consider the following code snip:

# covariance.py
from typing import Tuple
 
class Animal:
     ...

class Dog(Animal):
     ...
 
an_animal: Animal = Animal()
lassie: Dog = Dog()
snoopy: Dog = Dog()
 
animals: Tuple[Animal,...] = (an_animal, lassie)
dogs: Tuple[Dog, ...] = (lassie, snoopy)
 
dogs = animals

>>> mypy ./covariance.py                                                                           
covariance.py:16: error: Incompatible types in assignment 
(expression has type "tuple[Animal, ...]", 
variable has type "tuple[Dog, ...]")  [assignment]
Found 1 error in 1 file (checked 1 source file)

If \(S <: T\), then Tuple[S,...] <: Tuple[T,...]. This is because Tuple is covariant in all its arguments. Covariant types preserve orderding <: of types.

In Python, most immutable containers are covariant. Union is covariant in all its arguments. Callables are covariant in the return type.

Contravariance

Languages with first-class functions have function types like a function expecting a Dog and returning an Animal (written Callable[Dog, Animal](or Dog -> Animal). Those languages also need to specify when one function type is a subtype of another. It is safe to substitute a function f for a function g if f accepts a more general type of argument and returns a more specific type then g. For example, Animal -> Dog, Dog -> Dog and Animal -> Animal can be used wherever a Dog -> Animal is expected.

Invariance

An invariant type is neither covariant nore contravariant. Mutable containers like List are invariant.

Gradual typing and consistent types

Python supports gradual typing, where you can gradyally add type hints to your python code. Gradual typing iss essentially made possible by the Any type.

Somehow Any sits both at the top and at the bottom of the type hierarchy of subtypes. Any type behaves as if it is a subtype of Any, and Any behaves as if it is a subtype of any other type. Looking at the definition of subtypes, this is not really possible. Instead, we talk about consistent types.

The type T is consistent with the type U if T is a subtype of U or either T or U is Any.

The type checker only complains about inconsistent types. The takeway is therefore that you will never see type errors arising from the Any type.

Type Variables

A type variable is a special variable that can take on any type, depending on the situation. The syntax for TypeVar is straightforward:

from typing import TypeVar
T = TypeVar('T')

Here, T is the name of the type variable. You can use any name, but single uppercase letters like T. Let’s create a simple generic function:

from typing import TypeVar

T = TypeVar('T')

def echo(value: T) -> T:
    return value
    
print(echo(42))
print(echo("Hello"))

Generic Classes

TypeVar defines a type variable, a placeholder, aka a variable that stores a type. Generic lets you use that type variable in a class.

from typing import TypeVar, Generic

T = TypeVar('T')

class Box(Generic[T]):
    def __init__(self, content:T):
        self.content = content
        
    def get_content(self) -> T:
        return self.content
        
    box = Box(123)
    print(box.get_content())    # Outputs: 123

Now, you can create typ-e-sage boxes:

int_box = Box[int](42)
str_box = Box[str]("hello")

At runtime, Generic does almost nothing special - its mainly for type checkers (mypy, pyright etc.). Python doesn’t enforce types at runtime.

You can have multiple type variables.

from typing import TypeVar, Generic

K = TypeVar('K')
V = TypeVar('V')

class Pair(Generic[K, V]):
    def __init__(self, key: K, value: V):
        self.key = key
        self.value = value
        
pair = Pair[str, int]("age", 30)

Constraints and Bounds

TypeVars can be constrained:

from typing import TypeVar, Generic

# only allow int or float
Numeric = TypeVar('Numeric', int, float)

class Calculator(Generic[Numeric]):
    def add(self, a: Numeric, b: Numeric) -> Numeric:
        return a + b

Or bounded(must be subclass of):

from typing import TypeVar

T = TypeVar('T', bound=int)

Duck Types and Protocols

Recall the following example from the introduction:

def len(obj):
    return obj.__len__()

len() can return the length of any object that has implemented the .__len__() method. How can we add type hints to len() and in particular the obj argument?

The answer hides behind the academic sounding term structural subtyping. One way to categorize type systems is by whether they are nominal or structural:

In a nominal system, comparisons between types are based on names and declarations. The Python type system is mostly nominal, where an int can be used in place of a float because of their subtype relationship.
In a structural system, comparisons between types are based on structure. You could define a structual type Sized that includes all instances that define .__len__(), irrespective of their nominal type.

There is ongoing work to bring a full-fledged structural type system to Python through PEP-544 which aims at adding a concept called protocols. Most of PEP 544 is already implemented in mypy though.

A protocol specifies one or more methods that must be implemented. For example, all classes defining .__len__() fulfill the typing.Sized protocol. We can therefore annotate len() as follows:

from typing import Sized

def len(obj: Sized) -> int:
    return obj.__len__()

Other examples of protocols defined in the typing module include Container, Iterable Awaitable and ContextManager.

You can also define your own protocols. This is done by inheriting from Protocol and defining the function signatures (with empty function bodies) that the protocol expects. The following example shows how len() and Sized could have been implemented:

from typing_extensions import Protocol

class Sized(Protocol):
    def __len__(self) -> int:
        ...

def len(obj: Sized) -> int:
    return obj.__len__()

How do you add type hints to a large python codebase?

TODO

Challenge Puzzle

Given the snippet below, (i) Is this valid Python code and (ii) will mypy raise an error?

import random

def foo() -> None:
    a: float | str
    
    a = random.random()
    a = input()

In Python, the syntax for variable annotations allows you to declare the expected type of a variable even before assigning a value. This feature was introduced in PEP 526 to improve the readability and static type checking without affecting run-time behavior. Declaring a variable with an annotation, but no initial value simply informs tools like mypy of the intended type, without actually creating the variable until its assigned later in the code. Moreover, its not a syntax error.

For more such puzzles, visit getcracked.io

Control flow statements

Python supports if, for and while statements.

In a for or while loop, the break statement may be paired with an else clause. If the loop finishes without executing the break, the else clause executes.

In either kind of loop, the else clause is not executed if the loop was terminated by a break. Other wiways of ending the loop early, such as return or raised exception will also skip the execution of the else clause.

One way to think of the else clause is to imagine it paired with the if inside the loop.

The else clause has more in common with the else clause of a try statement than it does with that of if statements: a try statements else clause runs when no exception occurs and loop’s else clasue runs when no break occurs.

`match` statements

A match statement takes an expression and compares its value to successive patterns given as one or more case blocks. This is similar to pattern matching in languages like Rust or Haskell. Only the first pattern that matches gets executed and it can also extract components(sequence elements or object attributes) from value into variables.

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def where_is(point):
        match point:
            case Point(x=0, y=0):
                print("Origin")
            case Point(x=0, y=y):
                print(f"Y={y}")
            case Point(x=x, y=0):
                print(f"X={x}")
            case Point(x=x, y=y):
                print(f"X={x}, Y={y}")
            case _:
                raise ValueError("Not a point!")

There are several other key features of the assignment statement:

Like unpacking assignments, tuple and list patterns have exactly the same meaning and actually match arbitrary sequences.
Sequence pattens support extended unpacking: [x,y, *rest] and (x,y,*rest) work similar to unpacking assignments. The name after * may also be _, so (x,y, *_) matches a sequence of atleast two items without binding the remaining items.

Examples

Reverse a list

from typing import List, TypeVar

T = TypeVar('T')

def reverse(lst: List[T]) -> List[T]: 
    match lst:
        case []: return lst
        case [x]: return [x]
        case [head, *tail]: return reverse(tail) + [head] 
        
print(reverse([1,2,3,4,5]))

[5, 4, 3, 2, 1]

Find out whether a list is a palindrome

from typing import List, TypeVar

T = TypeVar('T')

def is_palindrome(lst: List[T]) -> bool:
    match lst:
        case []: return True
        case [x]: return True
        case [head, *tail]: 
            if head == tail[-1]:
                return is_palindrome(tail[:-1])
            else:
                return False

print(is_palindrome([1,2,2,1]))
print(is_palindrome([1,2,3]))
print(is_palindrome([1,2,3,2,1]))
print(is_palindrome([1,1]))
print(is_palindrome([1]))

True
False
True
True
True

Flatten a nested list structure

# Transform a list, possibly holding lists as elements into a flat 
# list replacing each list with its elements (recursively)
# >>> flatten([a, [b, [c, d], e]]]
# [a, b, c, d, e]

# Traverse the list sequentially. We keep advancing the pointer to the
# current element. We always have a dichotomy. 
# Either (1) the element is scalar-value (2) the element itself is a 
# list. If the element is a scalar-value, then append it to the result.
# If the element itself is a nested list, flatten(element) will  
# yield a 1-dimensional flat list. 

from typing import TypeVar, List, Any

T = TypeVar('T')

def flatten(lst: List[Any], result=List[T]):
    # print(f"Input lst = {lst}")
    if type(lst) is list:
        if len(lst) == 0:
            return result
        
        head = lst[0]
        tail = lst[1:]
        
        flatten(head, result)   # recursive call to flatten the head,
                                # the 1D-output of this flattening will be
                                # stored in `result`
                                
        flatten(tail, result)   # call flatten on the tail of the list
        return result
    else:
        result.append(lst)
        
    # print(f"result = {result}")
        
print(flatten([1, [2, [3, 4], 5], 6, [[[7]]]], []))

[1, 2, 3, 4, 5, 6, 7]

Functions in Python

The def keyword introduces a function definition. It must be followed by the function name and parenthesized list of parameters. The statements that form the body of the function start at the next line and must be indented.

The execution of a function introduces a new symbol table used for local variables of the function. More precisely, all variable assignments in a function store the value in the local symbol table. Variable references follow the LEGB rule: they first look in the local symbol table, then the enclosing symbol table, then the global symbol table and finally in the table of builtin names.

The actual arguments to a function call are passed by object reference. That is, object references are passed by value.

from typing import List
def multiply_by_two(x: int) -> None:
    print(f"multiply_by_two({x})")
    x = 2 * x
    
def append_item(lst: List[int]) -> None:
    print(f"append_item({lst})")
    lst.append(5)
    assert lst[-1] == 5
    
def reassign_list(lst: List[int]) -> None:
    print(f"reassign_list({lst})")
    lst = list(range(1,6))
    assert lst == [1,2,3,4,5]
    
x = 10
print(f"x = {x}")
multiply_by_two(x)
print(f"x = {x}")

lst = list(range(0,11,2))
print(f"lst = {lst}")

append_item(lst)
print(f"lst = {lst}")

reassign_list(lst)
print(f"lst = {lst}")

x = 10
multiply_by_two(10)
x = 10
lst = [0, 2, 4, 6, 8, 10]
append_item([0, 2, 4, 6, 8, 10])
lst = [0, 2, 4, 6, 8, 10, 5]
reassign_list([0, 2, 4, 6, 8, 10, 5])
lst = [0, 2, 4, 6, 8, 10, 5]

A function definition associates the function name with the function object in the current symbol table. The interpreter recognizes the object pointed to by that name as a user-defined function.

Default argument values

A default value can be specified for one or more arguments.

def norm(x: float = 0.0, y: float = 0.0)
    return (x**2 + y**2)**0.50

The default values are evaluated at the point of the function definition in the defining scope, so that:

i = 5

def f(arg=i):
    print(arg)

i = 6
f()

The default value is evaluated only once. This makes a difference when the default argument is a mutable object such as a list, a dictionary or instances of most classes. For example, the following function accumulates the arguments passed to it on subsequent function calls.

def f(a, L=[]):
    L.append(a)
    return L

print(f(1))
print(f(2))
print(f(3))

[1]
[1, 2]
[1, 2, 3]

The state of mutable default arguments is shared between function calls in Python.

Keyword arguments

Functions can also be called using keyword arguments of the form kwarg=value. Keyword arguments must follow positional arguments. All the keyword arguments passed must match one of the arguments accepted by the function, and their order is not important. No argument may receive a value more than once.

When a final parameter of the form **kwargs is present, it receives a dictionary containing all keyword arguments except for those corresponding to a formal parameter. *args receives a tuple of all the positional arguments beyond the formal parameter list.

**kwargs must be the final parameter in the parameter list.

Special parameters

By default, arguments may be passed to Python either by position or explicitly by keyword. For readability and performance, it makes sense to restrict the way arguments can be passed so that a developer need only look at the function definition

Exception handling

The `try` statement

The try statement specifies exception handlers and cleanup code for a group of statements.

The except clause specifies one or more exception handlers. When no exception occurs in the try clause, no execution handler is executed. When an excveption occurs in the try suit, a search for an exception handler is started. The search inspects the except clauses in turn until one is found that matches the exception. An exception-less except clause, if present, must be the last.

For an except clause with an expression, the expression must evaluate to an exception type or a tuple of exception types.

If no except clause matches the exception, the search for an exception handler continues in the surrounding code and on the call-stack.

If the evaluation of an expression in the header of an except clause raises an exception, the original search for a handler is canceled and asearch starts for the new exception.

`except*` clause

The except* clause specifies one or more handlers for groups of exeptions. A try statement can have either except or except* clauses, but not both. The exception type for matching is mandatory in the case of except*, so except*: is a syntax error.

When an exception group is raised in the try block, each except* clause splits it into subgroups of matching and non-matching exceptions. If the matching subgroup is not empty, it becomes the handled exception and assigned to the target of except* clause. Then, the body of the except* clause executes. If the non-matching subgroup is non-empty, it is processed by the next except* in the same manner. This continues untill all exceptions in the group have been matched.

After all except* clauses execute, the group of unhandled exceptions is merged with any exceptions that were raised or re-raised with except* clauses. This merged exception group propogates on.

`else` clause

The optional else clause is executed if the control flow leaves the try suite, no exception was raised, no return, continue or break statement was executed.

`finally` clause

If finally is present, it specifies a cleanup handler. It is always executed even when an exception is thrown.

OOPS

OOPs is a programming paradigm that provides a means of structuring programs so that properties and behaviors are bundled into individual objects. There are \(4\) tenets of OOPs.

Encapsulation. Bundling data(attributes) and behaviors(methods) into a single unit. By defining methods to control access to attributes and its modificcation, encapsurlation helps maintain data integrity and promotes modular, secular code.
Inheritance. A child class inherits data-members and behaviors from the parent class promoting code reuse.
Abstraction. Hiding low-level implementation details. Only expose the essentials to the outside world.
Polymorphism. It allows you to treat objects of different types as instances of the base type.

Classes vs instances

A class allows us to create a user-defined data type. Classes are like the blue-print for an object. An object is an instance of a class. All objects of a class have the same blueprint.

When you call a class constructor, first .__new__() method is called to create a new instance of the class. This returns an empty object. Then, another special method, .__init__() is called that takes the result object and initializes it using constructor-args.

Subclassing immutable built-in types

Let’s explore another use-case of .__new__() method that consists of subclassing an immutable built-in type. As an example, say you need to write a Vector class as a subclass of Python’s tuple type. While this class doesn’t have any additional attributes, it provides methods to perform component-wise vector addition, subtraction and scalar multiplication.

>>> class Vector(tuple):
...     def __init__(self, value):
...         super().__init__(value)
... 
>>> Vector((1,2))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    Vector((1,2))
    ~~~~~~^^^^^^^
  File "<stdin>", line 3, in __init__
    super().__init__(value)
    ~~~~~~~~~~~~~~~~^^^^^^^
TypeError: object.__init__() takes exactly one argument (the instance to initialize)
>>>

The problem is that tuple.__init__(value) calls tuple.__new__(value) and it doesn’t take arguments the same way as object.__new__().

To workaround this issue, we can initialize the object at creation time with .__new__() instead of overriding .__init__(). Here’s how you can do this in practice:

import math 

class Vector(tuple):
    def __new__(cls, value):
        instance = super().__new__(cls, value)
        return instance
    
    def __add__(self, other):
        result = ()
        for e1, e2 in zip(self, other):
            elem = e1 + e2
            result = result + (elem,)
        return Vector(result)
    
    def __sub__(self, other):
        result = ()
        for e1, e2 in zip(self, other):
            elem = e1 - e2
            result = result + (elem,)
        return Vector(result)
    
    def dot(self, other) -> float:
        result = 0.0
        for e1, e2 in zip(self, other):
            result += e1 * e2
            
        return result
    
    def length(self):
        return math.sqrt(sum(x**2 for x in self))

v1 = Vector((1,2))        
v2 = Vector((3,4))

v3 = v1 + v2
print(f"v1 + v2 = {v3}")

v4 = v2 - v1
print(f"v2 - v1 = {v4}")

inner_product = v1.dot(v2)
print(f"Inner product = {inner_product}")

v1_norm = v1.length()
v2_norm = v2.length()
print(f"v1_norm = {v1_norm}")
print(f"v2_norm = {v2_norm}")

v1 + v2 = (4, 6)
v2 - v1 = (2, 2)
Inner product = 11.0
v1_norm = 2.23606797749979
v2_norm = 5.0

In this example, .__new__() runs the steps that you learned in the previous section. First, the method creates a new instance of the current class, cls by calling super().__new__(). This time, the call rolls back to tuple.__new__() which creates a new instance and initializes it using value as an argument. Then, we customize the new instance by adding additional attributes that we’d like. Finally, a new instance gets returned.

Note

PEP-8 says: - Always use self for the first argument to instance methods. - Always use cls for the first argument to class methods

Writing a factory class

Writing a factory class is often a requirement that can raise the need for a custom implementation of .__new__(). However, you should be careful, because in this case, Python skips the initialization step entirely. So, we have the responsibility of taking the newly created object into a valid state, before using it in our code.

from random import choice

class Pet:
    def __new__(cls):
        other = choice([Dog, Cat, Python])
        instance = super().__new__(other)
        print(f"I am a {type(instance).__name__}!")
        return instance

class Dog:
    def communicate(self):
        print("woof! woof!")
        
class Cat:
    def communicate(self):
        print("meow! meow!")
        
class Python:
    def communicate(Self):
        print("hiss! hiss!")

In this example, Pet provides a .__new__() method that creates a new instance by randomly selecting a class from a list of existing classes.

Also, remember .__init__() of Pet is never called. That’s because Pet.__new__() always returns objects of a different class rather than of Pet itself.

Allowing only a single instance of your class

Sometimes you need to implement a class that allows the creation of a single instance only. This type of class is commonly known as a singleton class. In this situation, the __new__() method comes in handy because it can help you restrict the number of instance that a given class can have.

Note

Most experienced Python developers would argue that you don’t need to implement the singleton design pattern in Python unless you already have a working class and need to add the pattern’s functionality on top of it.

Here’s an example of coding a Singleton class with a .__new__() method that allows the creation of only one instance at a time. To do this, .__new__() checks the existence of previous instanced cached on a class attribute.

class Singleton(object):
    _instance = None
    
    def __new__(cls, *args, **kwargs):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance
        
first = Singleton()
second = Singleton()
first is second

True

`@classmethod` vs `@staticmethod`

Inheritance and composition, `super` keyword

Providing multiple constructors in a Python class

Operator and function overloading

Python Decorators

Python metclasses

`dataclass` module

Implementing map/dict

Python closures

What are Mixin classes in Python?

Python bindings: Calling C or C++ with Python

`collections` module

`all` and `any` in Python

Consider the following code snip

print(bool(all([])), end="")
print(bool(any([])), end="")

TrueFalse

all(iterable) returns True if all elements in the iterable are truthy. For an empty iterable, there are no elements that violate the condition. In logic, this is called vacuously true: since there’s nothing to falsify the statement, it’s considered True. So, bool(all([])) is True

any(iterable) returns True if atleast one element in the iterable is truthy. For an empty iterable, there are no elements at all, so the condition atleast one is true cannot be satsified. So, any([]) is False.

Flattening a `dict` of `dict`s

quote_types = {
    'Bids' : {
        1 : [10, 45],
        2 : [25, 47.5],
        3 : [30, 49.5]
    },
    'Offers' : {
        1 : [30, 50.5],
        2 : [25, 52.5],
        3 : [10, 55]
    }
}

dict_of_height_3 ={
    'a' : {
        'b' :{
            'c' : 1,
            'd' : 2,
        },
        'e' : {
            'f' : 3,
            'g' : 4,
        }
    },
    'h' : {
        'i' : {
            'j' : 5,
            'k' : 6,
        },
        'l' : {
            'm' : 7,
            'n' : 8,
        },
    }
}

def flatten_dict(d : dict, parent_key = '', sep = '_'):
    result = {}
    for k,v in d.items():
        if (type(v) is dict):
            # Recursively flatten the child element
            child_flat_dict = flatten_dict(v, parent_key=str(k))

            # We now have a dict-of-dicts of height 2
            for child_k, child_v in child_flat_dict.items():
                key = parent_key + sep + child_k if parent_key > '' else child_k
                result[key] = child_v
        else:
            key = parent_key + sep + str(k)
            result[key] = v
            
    return result

print("flattening quotes\n")
flatten_dict(quote_types)

flattening quotes

{'Bids_1': [10, 45],
 'Bids_2': [25, 47.5],
 'Bids_3': [30, 49.5],
 'Offers_1': [30, 50.5],
 'Offers_2': [25, 52.5],
 'Offers_3': [10, 55]}

print("dict_of_height_3\n")
flatten_dict(dict_of_height_3)

dict_of_height_3

{'a_b_c': 1,
 'a_b_d': 2,
 'a_e_f': 3,
 'a_e_g': 4,
 'h_i_j': 5,
 'h_i_k': 6,
 'h_l_m': 7,
 'h_l_n': 8}

`list()` in Python

lists are mutable sequences typically used to store collections of homogenous items.

list.append(x:Any)->None adds a single-item to the end of the list, in-place. list.extend(Iterable)->None extends the list in-place by appending all items from the iterable, and returns None.

list.insert(i,x)->None inserts an element x at the given index i. list.remove(x) removes the first item from the list who value is equal to x. list.pop([i]) removes the item at the given position in the list and returns it. If no index is specified, list.pop() removes and returns the last element in the list.

Reverse a list

from typing import List
l = [1, 2, 3, 4, 5]

l.reverse()  # reverse in place
print(l)

[5, 4, 3, 2, 1]

# recursive solution
def reverse(l : List, acc : List = []) -> List:
    if(len(l) == 0):
        return acc
    
    if(len(l) == 1):
        l.extend(acc)
        return l
    
    new_acc = [l[0]]
    new_acc.extend(acc)
    return reverse(l[1:], new_acc)

def reverse_iter(l : List) -> List:
    result = []
    for element in l:
        result.insert(0, element)

    return result

items = [2, 17, 42, 15, 3]
reverse(items)

[3, 15, 42, 17, 2]

Determine if the list is a palindrome

from typing import List
def is_palindrome(l : List) -> bool:
    n = len(l)
    i = 0
    j = n - 1

    while(i <= j):
        if(l[i] != l[j]):
            return False
        
        i += 1
        j = n - i - 1

    return True

print(is_palindrome([1, 2, 3, 2, 1]))
print(is_palindrome([1, 2, 2, 1]))

True
True

Flatten a nested list

def flatten_list(l : List):
    result = []
    for element in l:
        if (type(element) is list):
            simple_list = flatten_list(element)
            result.extend(simple_list)
        else:
            result.append(element)
    return result

flatten_list(['a', ['b', ['c', 'd'], 'e']])

['a', 'b', 'c', 'd', 'e']

Eliminate consecutive duplicates of list elements

Always use key in my_dict directly instead of key in my_dict.keys(), if you want to check the existence of a key in a dict. That will use the dictionary’s \(O(1)\) hashing rather than \(O(n)\). my_dict.keys() returns a list of keys.

from typing import List

# Remove duplicates from a nested-list while preserving the
# the structure
def array_unique(l : List, unique_elements : dict={}) -> (List,dict):
    result = []
    for element in l:
        if type(element) is list:
            # get the list of unique children and append it to result
            child_list, unique_elements = array_unique(element, unique_elements=unique_elements)
            result.append(child_list)
        else:
            if element in unique_elements:
                continue
            else:
                result.append(element)
                unique_elements[element] = True

    return result, unique_elements

my_array = [1, [1, 2, [1, 2, 3], 4, 5], [5, 6], 7]
result, _ = array_unique(my_array)
result

[1, [2, [3], 4, 5], [6], 7]

List comprehensions

squares = [x**2 for x in range(5)]
print(squares)
combs = [(x,y,z) for x in range(2) for y in range(2) for z in range(2)]
print(combs)

[0, 1, 4, 9, 16]
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]

Nested List comprehensions

matrix = [
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
]

# Take the transpose of a matrix
[[row[i] for row in matrix]for i in range(4)]

[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

`tuple`s in Python

lists are mutable wherease tuples are immutable types. The contents of a tuple cannot be modified at run-time. They usually store a heterogenous collection of items.

`set`s in Python

Python also includes a data-type for sets. A set is an unordered collection with no duplicate elements. Basic uses include membership testing and eliminating duplicate entries. Set objects also support mathematical operations such as union, intersection, difference and symmetric difference.

Curly braces or set() is used to create sets.

a = set('abracadabra')
b = set('alcazam')

print(a)
print(a - b)
print(a | b)
print(a & b)

{'a', 'b', 'd', 'c', 'r'}
{'b', 'r', 'd'}
{'a', 'l', 'z', 'm', 'b', 'd', 'c', 'r'}
{'c', 'a'}

Python 3.8 walrus `:=` operator

:= assigns a value to a variable and simultaneous returns the value. For example:

my_list = [1, 2, 3, 4, 5]

if (n := len(my_list)):
    print(f"The list has non-zero length = {n}")

The list has non-zero length = 5

Another motivating use-case is when looping over fixed-length blocks in a protocol parser.

# Loop over fixed length blocks
while (block := f.read(256)) != '':
    process(block)

Data Model

Object values and Types

Built-in Types in Python

None

NotImplemented

Ellipsis

Numbers

Integers (int)

Booleans (bool)

Real numbers (float)

Complex numbers (complex)

Sequences

Immutable sequences

Strings

Tuples

Bytes

Mutable sequences

Set types

Sets

Frozen Sets

Mappings

Dictionaries

Callable types

User-defined functions

Instance methods

Generator functions

Coroutine functions

Asynchronous generator functions

Built-in functions

Built-in methods

Classes

Modules

Special method names

Basic customization

__new__(cls,...)

__init__(self,...)

object.__del__(self)

object.__repr__(self)

object.__str__(self)

object.__lt__(self, other) and friends

object.__hash__(self)->int

Special attributes

object.__dict__

Modules and Packages

The import statement

import <module_name>

from <package_name> import <module_name>

import <module_name> as <alt_name>

Executing a module as a script

Namespaces

dir() function

builtin namespace

The Global namespace

The local namespace

The enclosing namespace

Scope and LEGB rule

Packages

Package initialization

Importing * from a package

Subpackages

Regular packages

Namespace packages

Searching

The module cache

Challenge puzzle

Python Type Checking

Type systems

Dynamic typing

Static type

Duck Typing

typing module

Features

Frequently used classes and functions

Examples

Function Annotations

Variable annotations

Implementing a simple card game

Type theory

Subtypes

Covariance

`None`

`NotImplemented`

Integers (`int`)

Booleans (`bool`)

Real numbers (`float`)

Complex numbers (`complex`)

`new(cls,...)`

`init(self,...)`

`object.del(self)`

`object.repr(self)`

`object.str(self)`

`object.lt(self, other)` and friends

`object.hash(self)->int`

`object.dict`

The `import` statement

`import <module_name>`

`from <package_name> import <module_name>`

`import <module_name> as <alt_name>`

`dir()` function

`builtin` namespace

`typing` module

`match` statements

The `try` statement

`except*` clause

`else` clause

`finally` clause

`@classmethod` vs `@staticmethod`

Inheritance and composition, `super` keyword

`dataclass` module

`collections` module

`all` and `any` in Python

Flattening a `dict` of `dict`s

`list()` in Python

`tuple`s in Python

`set`s in Python

Python 3.8 walrus `:=` operator