I have a decent level of programming, and get much value from the community here. However I have never had much academic teaching in programming nor worked next to really experienced programmers. Consequently I sometimes struggle with 'best practice'.
I can't find a better place for this question, and am posting this despite the likely flamers that hate these sort of questions. So sorry if this upsets you. I am just trying to learn, not piss you off.
Question:
When I am creating a new class, should I set all instance attributes in init, even if they are None and in fact later assigned values in class methods?
See example below for the attribute results of MyClass:
class MyClass:
def __init__(self,df):
self.df = df
self.results = None
def results(df_results):
#Imagine some calculations here or something
self.results = df_results
I have found in other projects, class attributes can get buried when they only appear in class methods and there is a lot going.
So to an experienced professional programmer what is standard practice for this? Would you define all instance attributes in init for readability?
And if anyone has any links for materials on where I can find such principles then please put them in an answer, it will be much appreciated. I know about PEP-8 and have already searched my question above several times, and cant find anyone touching on this.
Thanks
Andy
I think you should avoid both solutions. Simply because you should avoid to create uninitialized or partially initialized objects, except in one case I will outline later.
Look at two slightly modified version of your class, with a setter and a getter:
class MyClass1:
def __init__(self, df):
self.df = df
self.results = None
def set_results(self, df_results):
self.results = df_results
def get_results(self):
return self.results
And
class MyClass2:
def __init__(self, df):
self.df = df
def set_results(self, df_results):
self.results = df_results
def get_results(self):
return self.results
The only difference between MyClass1
and MyClass2
is that the first one initializes results
in the constructor while the second does it in set_results
. Here comes the user of your class (usually you, but not always). Everyone knows you can't trust the user (even if it's you):
MyClass1("df").get_results()
# returns None
Or
MyClass2("df").get_results()
# Traceback (most recent call last):
# ...
# AttributeError: 'MyClass2' object has no attribute 'results'
You might think that the first case is better because it does not fail, but I do not agree. I would like the program to fail fast in this case, rather than do a long debugging session to find what happened. Hence, the first part of first answer is: do not set the uninitialized fields to None
, because you loose a fail-fast hint.
But that's not the whole answer. Whichever version you choose, you have an issue: the object was not used and it shouldn't have been, because it was not fully initialized. You can add a docstring to get_results
: """Always use
set_results**BEFORE** this method"""
. Unfortunately the user doesn't read docstrings either.
You have two main reasons for uninitialized fields in your object: 1. you don't know (for now) the value of the field; 2. you want to avoid an expansive operation (computation, file access, network, ...), aka "lazy initialization". Both situations are met in real world, and collide the need of using only fully initialized objects.
Happily, there is a well documented solution to this problem: Design Patterns, and more precisely Creational patterns. In your case, the Factory pattern or the Builder pattern might be the answer. E.g.:
class MyClassBuilder:
def __init__(self, df):
self._df = df # df is known immediately
# give a default value to other fields if possible
def results(self, df_results):
self._results = df_results
return self # for fluent style
... other field initializers
def build(self):
return MyClass(self._df, self._results, ...)
class MyClass:
def __init__(self, df, results, ...):
self.df = df
self.results = results
...
def get_results(self):
return self.results
... other getters
(You can use a Factory too, but I find the Builder more flexible). Let's give a second chance to the user:
>>> b = MyClassBuilder("df").build()
Traceback (most recent call last):
...
AttributeError: 'MyClassBuilder' object has no attribute '_results'
>>> b = MyClassBuilder("df")
>>> b.results("r")
... other fields iniialization
>>> x = b.build()
>>> x
<__main__.MyClass object at ...>
>>> x.get_results()
'r'
The advantages are clear:
The presence of uninitialized fields in the Builder is not a contradiction: those fields are uninitialized by design, because the Builder's role is to initialize them. (Actually, those fields are some kind of forein fields to the Builder.) This is the case I was talking about in my introduction. They should, in my mind, be set to a default value (if it exists) or left uninitialized to raise an exception if you try to create an uncomplete object.
Second part of my answer: use a Creational pattern to ensure the object is correctly initialized.
Side note: I'm very suspicious when I see a class with getters and setters. My rule of thumb is: always try to separate them because when they meet, objects become unstable.