With PEP 557 data classes are introduced into python standard library.
They make use of the @dataclass
decorator and they are supposed to be "mutable namedtuples with default" but I'm not really sure I understand what this actually means and how they are different from common classes.
What exactly are python data classes and when is it best to use them?
Data classes are just regular classes that are geared towards storing state, more than contain a lot of logic. Every time you create a class that mostly consists of attributes you made a data class.
What the dataclasses
module does is make it easier to create data classes. It takes care of a lot of boiler plate for you.
This is especially important when your data class must be hashable; this requires a __hash__
method as well as an __eq__
method. If you add a custom __repr__
method for ease of debugging, that can become quite verbose:
class InventoryItem:
'''Class for keeping track of an item in inventory.'''
name: str
unit_price: float
quantity_on_hand: int = 0
def __init__(
self,
name: str,
unit_price: float,
quantity_on_hand: int = 0
) -> None:
self.name = name
self.unit_price = unit_price
self.quantity_on_hand = quantity_on_hand
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
def __repr__(self) -> str:
return (
'InventoryItem('
f'name={self.name!r}, unit_price={self.unit_price!r}, '
f'quantity_on_hand={self.quantity_on_hand!r})'
def __hash__(self) -> int:
return hash((self.name, self.unit_price, self.quantity_on_hand))
def __eq__(self, other) -> bool:
if not isinstance(other, InventoryItem):
return NotImplemented
return (
(self.name, self.unit_price, self.quantity_on_hand) ==
(other.name, other.unit_price, other.quantity_on_hand))
With dataclasses
you can reduce it to:
from dataclasses import dataclass
@dataclass(unsafe_hash=True)
class InventoryItem:
'''Class for keeping track of an item in inventory.'''
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
The same class decorator can also generate comparison methods (__lt__
, __gt__
, etc.) and handle immutability.
namedtuple
classes are also data classes, but are immutable by default (as well as being sequences). dataclasses
are much more flexible in this regard, and can easily be structured such that they can fill the same role as a namedtuple
class.
The PEP was inspired by the attrs
project, which can do even more (including slots, validators, converters, metadata, etc.).
If you want to see some examples, I recently used dataclasses
for several of my Advent of Code solutions, see the solutions for day 7, day 8, day 11 and day 20.
If you want to use dataclasses
module in Python versions < 3.7, then you could install the backported module (requires 3.6) or use the attrs
project mentioned above.