Implementing Data Classes in Python3

Mayank Saraswat
3 min readNov 8, 2020
Photo by Emile Perron on Unsplash

I use python frequently in my day-to-day work; more often than not, I found it unnecessarily irritating to write the entire boilerplate code only to implement a data-class to keep track of some attributes. So I did a bit of digging around and found some alternative ways to do the same.

1. Standard Class Definition

Say you want to implement a data-class with attributes attr1 and attr2, along with some dunder methods. (We would be trying to imitate the same class using other means in the list below.)

2. Using Tuple:

my_object = ('attr_value1', 'attr_value2')

This implementation is as quick as it can get but would shift a lot of responsibility to the programmer as one needs to remember the order of the attributes. And as the number of attributes increases, the code would get harder to maintain. Another issue with tuples would be immutability, i.e., the programmer won’t be able to make any update to the data-type once initialized.

3. Using Dictionary:

my_object = dict(attr1=’attr_value1', attr2=’attr_value2)

This implementation is more explicit as one doesn’t need to remember the order and can access the attribute using the attribute identifier, i.e., MyClass[‘attr1’]. This implementation puts the burden of maintaining the consistency of attribute names on the programmer again. This data-structure would be mutable, which would be helpful if you want the state of the data-type changing with the program.

4. Using NamedTuples:

from collections import namedtupleMyClass = namedtuple('MyClass', ['attr1', 'attr2'])
my_object = MyClass('attr_value1', 'attr_value2')

Named Tuples, as the name suggests, adds the facility to assign names to the tuple elements. Now one can access the value of the object, just as one would access the class property, i.e., my_object.attr1. Named-tuple is the extension of the tuple data-type, so by nature would have certain side-effects that you might not want for your class, e.g., immutability, iterability. Two named-tuples having the same attribute values would be equal to each other when compared, even if they represent different entities.

5. Using DataClasses:

With PEP557, dataclasses were introduced into python standard library. So the same class can now be defined without any boiler-plate code. One can customize this behavior even further. Refer: https://docs.python.org/3.7/library/dataclasses.html
Since dataclasses are comparatively recent addition to the standard library, you might find some packages that don’t have proper support for it.

6. Using attrs:

attrs has better support for different python versions (even for python2.7) and distributions. Intentionally dataclasses implementation is kept simple, so a lot of features were sacrificed in it which can be found in attrs, common examples would include validators, convertors and __slots__.
One downside(?) of the attrs is that it is not shipped with the standard library, so has to be installed separately. (The decision was taken so that the python releases don’t impede the development of the package.)

To conclude, no one method is better than the other, it depends entirely on the kind of behavior one wants from the implementation. I generally end up using dataclasses, given the ease and speed of the implementation.

--

--