3 Ways to Build Data Classes in Python
A Quick Guide to Data Class Builders in Python
Data classes are used in every OOP language, these are classes that contain only fields and CRUD methods for accessing them and they act as Data holders, but certainly, they can be more!
Martin Fowler, wrote in his book Refactoring: Improving the Design of Existing Code:
- Data classes are like children. They are okay as a starting point, but to participate as a grownup object, they need to take some responsibility.
How to Build a Data Class?
Python offers multiple ways to build a data class, which is a collection of fields and methods, we will cover 3 of the data class builders:
Named Tuples through collections.namedtuple
Typed Named Tuples through typing.NamedTuple, which is a Named Tuple but with type hints for its fields.
The @dataclasses.dataclass decorator
The classic way to build a class in OOP languages such as Java is to explicitly define everything in it which translates to boilerplate code with __init__
method implementation in Python.
class Family:
def __init__(self, mother, father, daughter):
self.mother= mother
self.father = father
self.daughter= daughter
adams_family = Family('Morticia Addams', 'Gomez Addams', 'Wednesday Addams')
print(adams_family)
# <Family object at 0x107142f10>
Now let's implement it the Python way!
Using Named Tuple from the Collections Package
from collections import namedtuple
Family= namedtuple('Family', 'mother father daughter')
adams_family = Family('Morticia Addams', 'Gomez Addams', 'Wednesday Addams')
print(adams_family)
# Family(mother='Morticia Addams', father='Gomez Addams' daughter='Wednesday Addams')
Using Named Tuple from the Typing Package
We can either implement it like this:
import typing
Family= typing.NamedTuple('Family', [('mother', str), ('father', str), ('daughter', str)])
adams_family = Family('Morticia Addams', 'Gomez Addams', 'Wednesday Addams')
print(adams_family)
# Family(mother='Morticia Addams', father='Gomez Addams' daughter='Wednesday Addams')
Or implement a class inheriting from NameTuple such as:
from typing import NamedTuple
class Family(NamedTuple):
mother: str
father: str
daughter: str
def __str__(self):
return f'The Father is {self.father}, the mother is called {self.mother} and the daughter is {self.daughter}'
adams_family = Family('Morticia Addams', 'Gomez Addams', 'Wednesday Addams')
print(adams_family)
# The Father is Gomez Addams, the mother is called Morticia Addams and the daughter is Wednesday Addams
Using the Data Class Decorator
from dataclasses import dataclass
@dataclass(frozen=True)
class Family:
mother: str
father: str
daughter: str
def __str__(self):
return f'The Father is {self.father}, the mother is called {self.mother} and the daughter is {self.daughter}'
adams_family = Family('Morticia Addams', 'Gomez Addams', 'Wednesday Addams')
print(adams_family)
# The Father is Gomez Addams, the mother is called Morticia Addams and the daughter is Wednesday Addams
Notice how the @dataclass decorator does not depend on inheritance or a metaclass so it should not interfere with our business logic.
namedtuple VS NamedTuple vs dataclass
The classes built by
typing.NamedTuple
and@dataclass
have an
__annotations__
attribute holding the type hints for the fields accessible withinspect.get_annotations(MyClass)
ortyping.get_type_hints(MyClass)
.Instances built with
collections.namedtuple
andtyping.NamedTuple
are immutable since the tuples are immutable, whereas@dataclass
instances are mutable unlessfrozen
is set to True.
Building a class with Named Tuples
The collections.namedtuple
function is a factory that builds subclasses of tuple
enhanced with field names, a class name, and an informative __repr__
from collections import namedtuple
# Create the class
City = namedtuple('City', 'name country population coordinates')
# Create the instance
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
print(tokyo)
# City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722,139.691667))
print(tokyo.coordinates)
# (35.689722, 139.691667)
As a Tuple Subclass, the City class inherits useful methods like the __repr__
, __eq__
and even the special methods used for comparison like __lt__
.
As a namedtuple
, we have access to extra attributes and methods such as _fields
Class attribute, _make(iterable)
Class method and the _asdict()
Instance method.
Class Attributes or Class Methods are Shared by all Instances. Instance Attributes or Instance Methods are not, they are specific to that instance.
from collections import namedtuple
# Create the class
City = namedtuple('City', 'name country population coordinates')
print(City._fields)
# ('name', 'country', 'population', 'location')
# Create Cooridanate Class
Coordinate = namedtuple('Coordinate', 'lat lon')
# Create Tuple
delhi_data = ('Delhi NCR', 'IN', 21.935, Coordinate(28.613889, 77.208889))
# Create Instance from Tuple
delhi = City._make(delhi_data)
# print ready to json serialize dictionary
print(delhi._asdict())
#{'name': 'Delhi NCR', 'country': 'IN', 'population': 21.935,'location': Coordinate(lat=28.613889, lon=77.208889)}
Building a Class with Typed Named Tuples
Typed Named Tuple is Named Tuple but with a type hint which allows it to support regular class statement syntax.
from typing import NamedTuple
class Coordinate(NamedTuple):
lat: float
lon: float
Python by default does not enforce any type hints and there is no impact on the runtime behavior of your apps, so these are mostly for documentation purposes.
The type hints are intended primarily to support third-party type checkers, like Mypy or any IDE’s type checker.
Building a class with DataClass Decorator
The dataclass module provides a @dataclass
decorator and functions for automatically adding generated special methods such as __init__
.
We can pass it multiple arguments and each argument generates its equivalent method, here’s a selection of these arguments:
Few points to keep in mind in your day-to-day code:
Class attributes are the equivalent of static attributes.
Instance attributes are specific to each instance.
If you provide a value for 1 field you have to provide the values for the rest of your class's fields as Python does not allow parameters without defaults after parameters with defaults.
@dataclass
classes will reject any class attributes with mutable default value as the default value gets easily corrupted or mutated and Python views this as a common source of bugs, so this class will be rejected:
@dataclass
class Team:
team_name: str
members: list = []
# ValueError: mutable default <class 'list'> for field guests is not allowed
# The solution is to use the default_factory
@dataclass
class Team:
team_name: str
members: list = field(default_factory=list)
# each instance will have its own members list instead of all instances
# sharing the same list from the class
# which is rarely what we want and is often a bug
The
@dataclass
does not generate__post_init__
method so if you need any validation or computing after the execution of the__init__
method you need to provide it yourself.If you want to declare a typed class attribute which is not possible, for example using
set[…]
(it turns into an instance attribute), you have to import a Pseudotype namedtyping.ClassVar
, which leverages the generics [] notation to set the type of the variable and also declare it a class attribute.
from typing import ClassVar
@dataclass
class HackerClubMember(ClubMember):
all_handles: ClassVar[set[str]] = set()
handle: str = ''
def __post_init__(self):
cls = self.__class__
if self.handle == '':
self.handle = self.name.split()[0]
if self.handle in cls.all_handles:
msg = f'handle {self.handle!r} already exists.'
raise ValueError(msg)
cls.all_handles.add(self.handle)
# all_handles is a class attribute of type set-of-str, with an empty set as its default value
- If you want to initialize a class attribute in the
__init__
or just pass it as an argument method to the__post_init__
you have to import another Pseudotype namedInitVar
which uses the same syntax oftyping.ClassVar
.
@dataclass
class DbHandler:
i: int
j: int = None
database: InitVar[DatabaseType] = None
def __post_init__(self, database):
if self.j is None and database is not None:
self.j = database.lookup('j')
c = C(10, database=my_database)
- Finally the
@dataclass
decorator doesn’t care about the types in the annotations, except in the last discussed two casestyping.ClassVar
andInitVar
Conclusion
Data classes have always been a cornerstone in OOP and we must provide the right implementation to avoid any subtle bugs or performance issues down the line. I hope this has been informative, but just in case I am attaching different resources that could help your Python Journey.
Further Reading
The full article on my Blog discusses Data classes in more detail.
The official dataclasses documentation
Chapter 5 of the book Fluent Python