Setting default/empty attributes for user classes in __init__

Andy picture Andy · Apr 22, 2019 · Viewed 14k times · Source

I have a decent level of programming, and get much value from the community here. However I have never had much academic teaching in programming nor worked next to really experienced programmers. Consequently I sometimes struggle with 'best practice'.

I can't find a better place for this question, and am posting this despite the likely flamers that hate these sort of questions. So sorry if this upsets you. I am just trying to learn, not piss you off.

Question:

When I am creating a new class, should I set all instance attributes in init, even if they are None and in fact later assigned values in class methods?

See example below for the attribute results of MyClass:

class MyClass:
    def __init__(self,df):
          self.df = df
          self.results = None

    def results(df_results):
         #Imagine some calculations here or something
         self.results = df_results

I have found in other projects, class attributes can get buried when they only appear in class methods and there is a lot going.

So to an experienced professional programmer what is standard practice for this? Would you define all instance attributes in init for readability?

And if anyone has any links for materials on where I can find such principles then please put them in an answer, it will be much appreciated. I know about PEP-8 and have already searched my question above several times, and cant find anyone touching on this.

Thanks

Andy

Answer

jferard picture jferard · Apr 23, 2019

I think you should avoid both solutions. Simply because you should avoid to create uninitialized or partially initialized objects, except in one case I will outline later.

Look at two slightly modified version of your class, with a setter and a getter:

class MyClass1:
    def __init__(self, df):
          self.df = df
          self.results = None

    def set_results(self, df_results):
         self.results = df_results

    def get_results(self):
         return self.results

And

class MyClass2:
    def __init__(self, df):
          self.df = df

    def set_results(self, df_results):
         self.results = df_results

    def get_results(self):
         return self.results

The only difference between MyClass1 and MyClass2 is that the first one initializes results in the constructor while the second does it in set_results. Here comes the user of your class (usually you, but not always). Everyone knows you can't trust the user (even if it's you):

MyClass1("df").get_results()
# returns None

Or

MyClass2("df").get_results()
# Traceback (most recent call last):
# ...
# AttributeError: 'MyClass2' object has no attribute 'results'

You might think that the first case is better because it does not fail, but I do not agree. I would like the program to fail fast in this case, rather than do a long debugging session to find what happened. Hence, the first part of first answer is: do not set the uninitialized fields to None, because you loose a fail-fast hint.

But that's not the whole answer. Whichever version you choose, you have an issue: the object was not used and it shouldn't have been, because it was not fully initialized. You can add a docstring to get_results: """Always useset_results**BEFORE** this method""". Unfortunately the user doesn't read docstrings either.

You have two main reasons for uninitialized fields in your object: 1. you don't know (for now) the value of the field; 2. you want to avoid an expansive operation (computation, file access, network, ...), aka "lazy initialization". Both situations are met in real world, and collide the need of using only fully initialized objects.

Happily, there is a well documented solution to this problem: Design Patterns, and more precisely Creational patterns. In your case, the Factory pattern or the Builder pattern might be the answer. E.g.:

class MyClassBuilder:
    def __init__(self, df):
          self._df = df # df is known immediately
          # give a default value to other fields if possible

    def results(self, df_results):
         self._results = df_results
         return self # for fluent style

    ... other field initializers

    def build(self):
        return MyClass(self._df, self._results, ...)

class MyClass:
    def __init__(self, df, results, ...):
          self.df = df
          self.results = results
          ...

    def get_results(self):
         return self.results

    ... other getters

(You can use a Factory too, but I find the Builder more flexible). Let's give a second chance to the user:

>>> b = MyClassBuilder("df").build()
Traceback (most recent call last):
...
AttributeError: 'MyClassBuilder' object has no attribute '_results'
>>> b = MyClassBuilder("df")
>>> b.results("r")
... other fields iniialization
>>> x = b.build()
>>> x
<__main__.MyClass object at ...>
>>> x.get_results()
'r'

The advantages are clear:

  1. It's easier to detect and fix a creation failure than a late use failure;
  2. You do not release in the wild a uninitialized (and thus potentially damaging) version of your object.

The presence of uninitialized fields in the Builder is not a contradiction: those fields are uninitialized by design, because the Builder's role is to initialize them. (Actually, those fields are some kind of forein fields to the Builder.) This is the case I was talking about in my introduction. They should, in my mind, be set to a default value (if it exists) or left uninitialized to raise an exception if you try to create an uncomplete object.

Second part of my answer: use a Creational pattern to ensure the object is correctly initialized.

Side note: I'm very suspicious when I see a class with getters and setters. My rule of thumb is: always try to separate them because when they meet, objects become unstable.