Pythonic code

Context Managers

Context managers are a very useful feature in Python. Most of the time, we see contet managers around resource management. For example, in situations, when we open files, we want to make sure they are closed after processing (so we do not leak file descriptors). Or, if we open a connection to a service (or even a socket), we also want to be sure to close it accordingly. In all of these cases, you would normally have to remember to free all of the resources that were allocated and that is just thinking about the best case - there could be also be exceptions and error handling. Handling all possible combinations and execution paths of our program makes it harder. There is a elegant Pythonic way of handling this.

with open(filename) as fd:
    process_file(fd)

The with statement (PEP-43) enters the context manager. In this case, the open function implements the context manager protocol, which means that the file will be automatically closed when the block is finished, even if an exception occurred.

Context managers consist of two magic methods __enter__ and __exit__. On the first line of the context manager, the with statement will call the first method, __enter__, and whatever this method returns will be assigned to the variable after as. This is optional - we don’t really need to return anything specific on the __enter__ method, even if we do, there is still no strict reason to assign it to a variable if it is not needed.

After this line is executed, the code enters a new context, where any other Python code can be run. After the last statement on the block is finished, the context will be exited, meaning that Python will call the __exit__ method of the original context manager object we first invoked.

If there is an exception or error inside the context manager block, the __exit__ method will still be called, which makes it convenient for safely cleaning up the conditions. In fact, this method receives the exceptions that was triggered on the block in case we want to handle it in a custom fashion.

Context managers are a good way of seperating concerns and isolating parts of the code that should be kept independent, because if we mix them, then the logic will become harder to maintain.

As an example, consider a situation where we want to run a backup of our database with a script. The caveat is that the backup is offline, which means that we can only do it while the database is not running, and for this we have to stop it. After running the backup, we want to be sure that we start the process again, regardless of how the backup itself went.

Instead of creating a monolithic function to do this, we can tackle this issue with context managers:

def stop_database():
    run("systemctl stop postgresql.service")

def start_database():
    run("systemctl start postgresql.service")

class DBHandler:
    def __enter__(self):
        stop_database()
        return self
    
    def __exit__(self, exc_type, ex_value, ex_traceback):
        start_database()
    
def db_backup():
    run("pg_dump database")

def main():
    with DBHandler():
        db_backup()

Implementing context managers

In general, we can implement context managers like the one in the previous example. All we need is just a class that implements the __enter__ and __exit__ magic methods, and then that object will be able to support the context manager protocol. While this is the most common way for context managers to be implemented, it is not the only one.

The contextlib module in the Python standard library contains a lot of helper functions and objects to implement context managers or use ones already provided that can help us write more compact code.

Lets start by looking at the contextmanager decorator.

When the contextlib.contextmanager decorator is applied to a function, it converts the code on that function into a context manager. The function in question has to be a particular kind of function called a generator function, which will separate statements into what is going to be on __enter__ and __exit__ magic methods respectively.

The equivalent code in the previous example can be written as:

import contextlib

@contextlib.contextmanager
def db_handler():
    try:
        stop_database()
        yield
    finally:
        start_database()

with db_handler():
    db_backup()

Here, we define the generator function and apply the @contextlib.contextmanager decorator to it. The function contains a yield statement, which makes it a generator function. Details on generators are not important at this point. All we need to know, is when the decorator is applied, everything before the yield statement will be run as if it were part of the __enter__ method. Then the yielded value is going to be the result of the context manager evaluation (what __enter__ would return and what would be assigned to the variable if we chose to assign it like as x: - in this case nothing is yielded).

At the yield statement, the generator function is suspended, and the context manager is entered, where, again we run the backup code for our database. After this completes, the execution resumes, so we can consider every line that comes after the yield statement will be part of __exit__ logic.

Writing context managers like this has the advantage that it is easier to refactor existing functions, reuse code and in general a good idea when we need a context manager that doesn’t belong to a particular object.

Using context managers is considered idiomatic.

Comprehensions and assignment expressions

The use of comprehensions is recommended to create data-structures in a single instruction, instead of multiple operations. For example, if we wanted to create a list with calculations over some numbers in it, instead of:

numbers = []
for i in range(10):
    numbers.append(run_calculation(i))

we could do:

numbers = [run_calculation(i) for i in range(10)]

The introduction of assignment expressions in PEP-572 is also very useful.

# Compute partial sums in a list comprehension
total = 0
partial_sums = [total := total + v for v in values]
print("Total:", total)

The := operator is informally known as the walrus operator.

Keep in mind however, that a more compact code does not always mean better better code. If to write a one-liner, we have to create a convoluted expression, then its not worth it, and we would be better off using a naive approach. This is related to the Keep it simple, stupid(KISS) principle.

Another good reason for using assignment expressions in general is the performance considerations. If we have to use a function as part of our transformation logic, we don’t want to call that more than is necessary. Assigning the result of the function to a temporary identifier is a good optimization technique.

Properties, attributes and different types of methods for objects

All of the properties and functions of object are public in Python, which is different from other languages where properties can have the access specifier public, private and protected. That is, there is no point in preventing the caller from invoking any attributes an object has.

There is no strict enforcement, but there are some conventions. An attribute that starts with an underscore is meant to be private to that object, and we expect that no external agent calls it (but again nothing preventing this).

Underscores in Python

There are some conventions and implementation details that make use of underscores in Python, which is an interesting topic in itself that’s worthy of analysis.

Like I mentioned, by default, all attributes of an object in python are public. Consider the following example :

class Point:
    def __init__(self, x_value, y_value):
        self._x_value = x_value
        self._y_value = y_value

p1 = Point(1.0, 2.0)
p1.__dict__

{'_x_value': 1.0, '_y_value': 2.0}

Attributes that start with an underscore must be respected as private and not called externally. Using a single underscore prefix is the Pythonic way of clearly delimiting the interface of the object.

Note that, using too many internal methods and attributes could be a sign that a class has too many tasks and doesn’t comply with the single responsibility princple.

There is however, a common misconception that some attributes and methods can actually be made private. This is again a misconception. Let us imagine that the x_value and y_value attributes are defined with a leading double underscore instead.

class Point:
    def __init__(self, x_value : float, y_value : FloatingPointError):
        self.__x_value = x_value
        self._y_value = y_value

    def scale(self, scale_factor : float):
        self.__x_value *= scale_factor
        self._y_value *= scale_factor


p1 = Point(1.0, 2.0)
p1.scale(2.0)
p1.__x_value

Output:

--------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[7], line 13
     11 p1 = Point(1.0, 2.0)
     12 p1.scale(2.0)
---> 13 p1.__x_value

AttributeError: 'Point' object has no attribute '__x_value'

Some developers use this method to hide some attributes, thinking that x_value is now private and that no other object can modify it. Now, take a look at the exception that it raised when trying to access __x_value. It’s AttributeError saying that it doesn’t exist. It doesn’t say something like this is private,

What’s actually happening is that with the double underscores, Python creates a different name for the attribute (this is called as name mangling). What it does is create the attribute with the following name instead <class_name>__<attribute_name>. In this case the attribute named Point__x_value will be created.

p1.__dict__

{'_x_value': 1.0, '_y_value': 2.0}

The idea of a double underscore in Python is completely different. It was created as a means to override different methods of a class that is going to be extended several times. Double underscores are a non-Pythonic approach. If you need to defined attributes as private, use a single underscore and respect the Pythonic convention that it is a private attribute.

Properties

Typically, in object-oriented design we create objects to represent an abstraction over an entity of the problem domain. In this sense, objects can encapsulate the behavior or data. And more often than not, the accuracy of the data determines if an object can be created or not. That is to say, entities can exist for certain values of the data only, and incorrect values should not be allowed.

That is, why we create validation methods, typically to be used in setter operations. However, in Python, we can encapsulate the setter and getter methods more compactly using properties.

Consider the example of a geographical system that needs to deal with coordinates. There is only a certain range of values for which the latitude and the longitude make sense. Outside of those values, a coordinate cannot exist.

class Coordinate:
    def __init__(self, lat: float, long: float) -> None:
        self._latitude = self._longitude = None
        self.latitude = lat
        self.longitude = long

    @property
    def latitude(self) -> float:
        return self._latitude

    @latitude.setter
    def latitude(self, lat_value: float) -> None:
        if lat_value not in range(-90, 90 + 1):
            raise ValueError(f"{lat_value} is an invalid value for latitude")

        self._latitude = lat_value

    @property
    def longitude(self)->float:
        return self._longitude

    @longitude.setter
    def longitude(self, long_value:float) -> None:
        if long_value not in range(-180, 180+1):
            raise ValueError(f"{lat_value} is an invalid value for longitude")

        self._longitude = long_value

Using `dataclasses`

A dataclass is a class that exists primarily to store values which are accessible by attribute lookup and not complex logic. There is a common boilerplate when it comes to initialization of such objects, which is to declare in the __init__ method all attributes that the object will have, and then set that to internal variables.

class Point:
    def __init__(self, x:float, y:float) -> None:
        self._x = x
        self._y = y

Since Python 3.7 we can simplify this by using the dataclasses module, which was introduced in PEP-557. We’ll review this module briefly to understand how it helps us write compact code.

The module provides a @dataclass decorator objectr which when applied to a class, it’ll take all the class attributes with annotations and treat them as instance attributes, as if they were declared in the initialization method. When using this method, it will automatically generate the __init__ method on the class.

Additionally, this module provides a field object that will help us define particular traits for some of the attributes.

from dataclasses import dataclass, field

@dataclass
class VanillaOptionQuote:
    strike : float
    spot : float
    implied_vol : float 
    underlying : str 
    time_to_maturity : float = field(default=1.0)

The Field objects describe each field. These objects are created internally and are returned by the fields() module-level method. Users should never instantiate a Field object directly. Its documented attributes are:

name: Name of the field
type: The type of the field
default: If provided, this will be th default value for this field.
default_factory: If provided, it must be a zero-argument callable that will be called when a default value is needed for this field.
init: If true(default), this field is included as a parameter in the generated __init__ method.
__repr__ : If true(default), this field is included in the string returned by __repr__ method.

A good use-case for dataclass would be all those places when we need to use objects as data-containers or wrappers.

Immutable `dataclass`

It is not possible to create truly immutable Python objects. However, by passing frozen=True to the @dataclass decorator you can emulate immutability. In that case, Data Classes will add __setattr__ and __delattr__ methods to the class. These methods will raise a FrozenInstanceError when invoked.

Why not just use `attrs`?

attrs moves faster than could be accomodated if it were moved to the standard library.
attrs supports additional features not being proposed here : validators, converters, metadata etc.

Iterable objects

In python, we have objects that can be iterated by default. For example, lists, sets, tuples and dictionaries can not only hold data in the structure, but also be iterated over a for loop to get those values repeatedly.

However, the built-in iterable objects are not the only kind we can have in a for loop. We can also create our own iterable, with the logic we define for iteration.

In order to achieve this, we rely again on magic methods. Iteration works in Python by its own protocol(namely the iterator protocol). When you try to iterate an object in the form for e in my_collection:..., at a high-level, Python checks for:

If the object contains one of the iterator methods - __next__ or __iter__.
If the object is a sequence and has __len__ and __get_item__

Creating iterable objects

from datetime import timedelta
import datetime as dt

class DateRange:
    """An iterable that contains its own iterator object."""
    def __init__(self, start_date: dt.date, end_date: dt.date):
        self._start_date = start_date
        self._end_date = end_date
        self._present_date = start_date

    def __iter__(self):
        return self

    def __next__(self):
        if self._present_date >= self._end_date:
            raise StopIteration()

        today = self._present_date
        self._present_date += timedelta(days=1)
        return today

for day in DateRange(dt.date(2026,1,1), dt.date(2026, 1, 8)):
    print(f"{day}")

The basic mechanics

Here, the for loop starts a new iteration over our object. At this point Python will call the iter() function on it, which in turn will call the __iter__ magic method. On this method, it is decided to return self, indicating that the object is an iterable itself. Every step of the loop calls the next() function on this object, which delegates to __next__ method. When there is nothing else to produce, we have to signal this to Python by raising StopIteration exception.

r = DateRange(dt.date(2026,1,1), dt.date(2026, 1, 8))
print(next(r))
print(next(r))
print(next(r))
print(next(r))

This example works, but it has a small problem - once exhausted, the iterable will continue to be empty, hence raising StopIteration.

If there are two or more consecutive for loops, only the first one will work, while the second one will remain empty. Because, there is only one shared iteration state. We should separate out the iterator from the iterable.

Creating sequences

Maybe our object does not define the __iter__() method, but we still want to be able to iterate over it. If __iter__ is not defined on the object, the iter() function will look for the presence of __getitem__, and if this is not found, it will raise TypeError.

A sequence is an object that implements __len__ and __getitem__ magic methods and expects to be able to get the elements it contains, one at a time, in order, starting at \(0\) as the first index. This means that you should be careful in the logic so that you correctly implement __get_item__ to expect this type of index, or the iteration will not work.

The example from the previous section had the advantage that it uses less memory. This means that it is only holding one date at a time and knows how to produce the days one by one. However, it has the drawback that if we want to get to the nth element, we have no way to do so but iterate \(n\) times. until we reach it. This is a typical trade-off in CS between memory and CPU usage.

The implementation with an iterable will use memory, but take \(O(n)\) time, whereas implementing a sequence will use more(because we have to hold everything at once),but supports indexing in constant time.

class DateRangeSequence:
    def __init__(self, start_date : dt.date, end_date:dt.date):
        self._start_date = start_date
        self._end_date = end_date
        self._range = self._create_range()

    def _create_range(self):
        days = []
        current_date = self._start_date
        while current_date < self._end_date:
            days.append(current_date)
            current_date += timedelta(days=1)

        return days
    
    def __getitem__(self, day_idx):
        return self._range[day_idx]

    def __len__(self):
        return len(self._range)

s1 = DateRangeSequence(dt.date(2026,1,1), dt.date(2026,1,6))
for day in s1:
    print(day)

Container objects

Containers are objects that implement a __contains__ method(that usually returns a boolean value). This method is called in the presence of the python keyword in.

from dataclasses import dataclass

@dataclass
class Interval:
    left_end_point : float 
    right_end_point : float
    exclude_right_end : bool = True

    def __contains__(self, x_value : float):
        if self.exclude_right_end:
            return self.left_end_point <= x_value and x_value < self.right_end_point 
        else:
            return self.left_end_point <= x_value and x_value <= self.right_end_point 

interval = Interval(0.0,1.0)
print(f"Is 0.5 contained in the interval [0,1) : {0.5 in interval}")
print(f"Is 1.0 contained in the interval [0,1) : {1.0 in interval}")

Is 0.5 contained in the interval [0,1) : True
Is 1.0 contained in the interval [0,1) : False

Dynamic attributes

It is possible to control the way attributes are obtained from objects by means of the __getattr__ magic method. When we call something like <my_object>.<my_attribute>, Python will look at <my_attribute> in the dictionary of the object, calling __get_attribute__. If this is not found (that is, the object does not have the attribute we are looking for), then the extra method __getattr__ is called, passing in the name of the attribute as a parameter.

By receiving this value, we can control the way things should be returned to our objects. We can even create new attributes on the fly.

class DynamicAttributes:
    def __init__(self, attribute):
        self.attribute = attribute
    
    def __getattr__(self, attr):
        if attr.startswith("fallback_"):
            name = attr.replace("fallback_", "")
            return f"[fallback resolved] {name}"
    
        raise AttributeError(f"{self.__class__.__name__} has no attribute {attr}")

Callable objects

from collections import defaultdict

class CallCount:
    def __init__(self):
        self._counts = defaultdict(int)

    def __call__(self, argument):
        self._counts[argument] += 1
        return self._counts[argument]

call_count = CallCount()
print(call_count("Hello"))
print(call_count("World"))
print(call_count("Hello"))

1
1
2

Python `collections.abc` module

The best way to implement these methods is to declare our class to inherit from the corresponding class defined in the collections.abc module.

These interfaces provide the methods that need to be implemented, so it’ll make it easier for you to define the class correctly.

With practice and experience, you become more fluent with these features of Python, until it becomes second nature for you wrap the logic you’re writing behind abstractions with nice and small interfaces. Give it enough time, and you’ll naturally think of having small, clean interfaces in your programs.

Caveats

Mutable default arguments

Don’t use mutable objects as the default arguments of functions.