3 Ways to Build Data Classes in Python

3 Ways to Build Data Classes in Python

A Quick Guide to Data Class Builders in Python

Data classes are used in every OOP language, these are classes that contain only fields and CRUD methods for accessing them and they act as Data holders, but certainly, they can be more!

Martin Fowler, wrote in his book Refactoring: Improving the Design of Existing Code:

  • Data classes are like children. They are okay as a starting point, but to participate as a grownup object, they need to take some responsibility.

How to Build a Data Class?

Python offers multiple ways to build a data class, which is a collection of fields and methods, we will cover 3 of the data class builders:

  • Named Tuples through collections.namedtuple

  • Typed Named Tuples through typing.NamedTuple, which is a Named Tuple but with type hints for its fields.

  • The @dataclasses.dataclass decorator

The classic way to build a class in OOP languages such as Java is to explicitly define everything in it which translates to boilerplate code with __init__ method implementation in Python.

class Family:
    def __init__(self, mother, father, daughter):
        self.mother= mother
        self.father = father
        self.daughter= daughter


adams_family = Family('Morticia Addams', 'Gomez Addams', 'Wednesday Addams')

print(adams_family)
# <Family object at 0x107142f10>

Now let's implement it the Python way!

Using Named Tuple from the Collections Package

from collections import namedtuple
Family= namedtuple('Family', 'mother father daughter')

adams_family = Family('Morticia Addams', 'Gomez Addams', 'Wednesday Addams')
print(adams_family)

# Family(mother='Morticia Addams', father='Gomez Addams' daughter='Wednesday Addams')

Using Named Tuple from the Typing Package

We can either implement it like this:

import typing
Family= typing.NamedTuple('Family', [('mother', str), ('father', str), ('daughter', str)])

adams_family = Family('Morticia Addams', 'Gomez Addams', 'Wednesday Addams')
print(adams_family)

# Family(mother='Morticia Addams', father='Gomez Addams' daughter='Wednesday Addams')

Or implement a class inheriting from NameTuple such as:

from typing import NamedTuple

class Family(NamedTuple):
    mother: str
    father: str
    daughter: str

    def __str__(self):
        return f'The Father is {self.father}, the mother is called {self.mother} and the daughter is {self.daughter}'

adams_family = Family('Morticia Addams', 'Gomez Addams', 'Wednesday Addams')

print(adams_family)
# The Father is Gomez Addams, the mother is called Morticia Addams and the daughter is Wednesday Addams

Using the Data Class Decorator

from dataclasses import dataclass

@dataclass(frozen=True)
class Family:
    mother: str
    father: str
    daughter: str

    def __str__(self):
        return f'The Father is {self.father}, the mother is called {self.mother} and the daughter is {self.daughter}'

adams_family = Family('Morticia Addams', 'Gomez Addams', 'Wednesday Addams')

print(adams_family)
# The Father is Gomez Addams, the mother is called Morticia Addams and the daughter is Wednesday Addams

Notice how the @dataclass decorator does not depend on inheritance or a metaclass so it should not interfere with our business logic.

namedtuple VS NamedTuple vs dataclass

Selected features compared across the three data class builders. Image from the book Fluent Python

  • The classes built by typing.NamedTuple and @dataclass have an
    __annotations__ attribute holding the type hints for the fields accessible with inspect.get_annotations(MyClass) or typing.get_type_hints(MyClass).

  • Instances built with collections.namedtuple and typing.NamedTuple are immutable since the tuples are immutable, whereas @dataclass instances are mutable unless frozen is set to True.

Building a class with Named Tuples

The collections.namedtuple function is a factory that builds subclasses of tuple enhanced with field names, a class name, and an informative __repr__

from collections import namedtuple

# Create the class
City = namedtuple('City', 'name country population coordinates')

# Create the instance
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))

print(tokyo)
# City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722,139.691667))

print(tokyo.coordinates)
# (35.689722, 139.691667)

As a Tuple Subclass, the City class inherits useful methods like the __repr__, __eq__ and even the special methods used for comparison like __lt__.

As a namedtuple, we have access to extra attributes and methods such as _fields Class attribute, _make(iterable) Class method and the _asdict() Instance method.

Class Attributes or Class Methods are Shared by all Instances. Instance Attributes or Instance Methods are not, they are specific to that instance.

from collections import namedtuple

# Create the class
City = namedtuple('City', 'name country population coordinates')

print(City._fields)
# ('name', 'country', 'population', 'location')

# Create Cooridanate Class
Coordinate = namedtuple('Coordinate', 'lat lon')

# Create Tuple
delhi_data = ('Delhi NCR', 'IN', 21.935, Coordinate(28.613889, 77.208889))

# Create Instance from Tuple
delhi = City._make(delhi_data)

# print ready to json serialize dictionary 
print(delhi._asdict())
#{'name': 'Delhi NCR', 'country': 'IN', 'population': 21.935,'location': Coordinate(lat=28.613889, lon=77.208889)}

Building a Class with Typed Named Tuples

Typed Named Tuple is Named Tuple but with a type hint which allows it to support regular class statement syntax.

from typing import NamedTuple

class Coordinate(NamedTuple):
    lat: float
    lon: float

Python by default does not enforce any type hints and there is no impact on the runtime behavior of your apps, so these are mostly for documentation purposes.

The type hints are intended primarily to support third-party type checkers, like Mypy or any IDE’s type checker.

Building a class with DataClass Decorator

The dataclass module provides a @dataclass decorator and functions for automatically adding generated special methods such as __init__ .

We can pass it multiple arguments and each argument generates its equivalent method, here’s a selection of these arguments:

The list or arguments passed to @dataclass provided in the book Fluent PythonThe list or arguments passed to @dataclass provided in the book Fluent Python

Few points to keep in mind in your day-to-day code:

  • Class attributes are the equivalent of static attributes.

  • Instance attributes are specific to each instance.

  • If you provide a value for 1 field you have to provide the values for the rest of your class's fields as Python does not allow parameters without defaults after parameters with defaults.

  • @dataclass classes will reject any class attributes with mutable default value as the default value gets easily corrupted or mutated and Python views this as a common source of bugs, so this class will be rejected:

@dataclass
class Team:
    team_name: str
    members: list = []

# ValueError: mutable default <class 'list'> for field guests is not allowed

# The solution is to use the default_factory
@dataclass
class Team:
    team_name: str
    members: list = field(default_factory=list)
# each instance will have its own members list instead of all instances 
# sharing the same list from the class
# which is rarely what we want and is often a bug
  • The @dataclass does not generate __post_init__ method so if you need any validation or computing after the execution of the __init__ method you need to provide it yourself.

  • If you want to declare a typed class attribute which is not possible, for example using set[…](it turns into an instance attribute), you have to import a Pseudotype named typing.ClassVar, which leverages the generics [] notation to set the type of the variable and also declare it a class attribute.

from typing import ClassVar

@dataclass
class HackerClubMember(ClubMember):
    all_handles: ClassVar[set[str]] = set()
    handle: str = ''
    def __post_init__(self):
        cls = self.__class__
        if self.handle == '':
            self.handle = self.name.split()[0]
        if self.handle in cls.all_handles:
            msg = f'handle {self.handle!r} already exists.'
            raise ValueError(msg)
        cls.all_handles.add(self.handle)

# all_handles is a class attribute of type set-of-str, with an empty set as its default value
  • If you want to initialize a class attribute in the __init__ or just pass it as an argument method to the __post_init__ you have to import another Pseudotype named InitVar which uses the same syntax of typing.ClassVar.
@dataclass
class DbHandler:
    i: int
    j: int = None
    database: InitVar[DatabaseType] = None
    def __post_init__(self, database):
        if self.j is None and database is not None:
            self.j = database.lookup('j')

c = C(10, database=my_database)
  • Finally the @dataclass decorator doesn’t care about the types in the annotations, except in the last discussed two cases typing.ClassVar and InitVar

Conclusion

Data classes have always been a cornerstone in OOP and we must provide the right implementation to avoid any subtle bugs or performance issues down the line. I hope this has been informative, but just in case I am attaching different resources that could help your Python Journey.

Further Reading

Did you find this article valuable?

Support Algoryst's Corner by becoming a sponsor. Any amount is appreciated!