Why does Python's __import__ require fromlist?

ieure picture ieure · Apr 27, 2010 · Viewed 21.8k times · Source

In Python, if you want to programmatically import a module, you can do:

module = __import__('module_name')

If you want to import a submodule, you would think it would be a simple matter of:

module = __import__('module_name.submodule')

Of course, this doesn't work; you just get module_name again. You have to do:

module = __import__('module_name.submodule', fromlist=['blah'])

Why? The actual value of fromlist don't seem to matter at all, as long as it's non-empty. What is the point of requiring an argument, then ignoring its values?

Most stuff in Python seems to be done for good reason, but for the life of me, I can't come up with any reasonable explanation for this behavior to exist.

Answer

Thomas Wouters picture Thomas Wouters · Apr 28, 2010

In fact, the behaviour of __import__() is entirely because of the implementation of the import statement, which calls __import__(). There's basically five slightly different ways __import__() can be called by import (with two main categories):

import pkg
import pkg.mod
from pkg import mod, mod2
from pkg.mod import func, func2
from pkg.mod import submod

In the first and the second case, the import statement should assign the "left-most" module object to the "left-most" name: pkg. After import pkg.mod you can do pkg.mod.func() because the import statement introduced the local name pkg, which is a module object that has a mod attribute. So, the __import__() function has to return the "left-most" module object so it can be assigned to pkg. Those two import statements thus translate into:

pkg = __import__('pkg')
pkg = __import__('pkg.mod')

In the third, fourth and fifth case, the import statement has to do more work: it has to assign to (potentially) multiple names, which it has to get from the module object. The __import__() function can only return one object, and there's no real reason to make it retrieve each of those names from the module object (and it would make the implementation a lot more complicated.) So the simple approach would be something like (for the third case):

tmp = __import__('pkg')
mod = tmp.mod
mod2 = tmp.mod2

However, that won't work if pkg is a package and mod or mod2 are modules in that package that are not already imported, as they are in the third and fifth case. The __import__() function needs to know that mod and mod2 are names that the import statement will want to have accessible, so that it can see if they are modules and try to import them too. So the call is closer to:

tmp = __import__('pkg', fromlist=['mod', 'mod2'])
mod = tmp.mod
mod2 = tmp.mod2

which causes __import__() to try and load pkg.mod and pkg.mod2 as well as pkg (but if mod or mod2 don't exist, it's not an error in the __import__() call; producing an error is left to the import statement.) But that still isn't the right thing for the fourth and fifth example, because if the call were so:

tmp = __import__('pkg.mod', fromlist=['submod'])
submod = tmp.submod

then tmp would end up being pkg, as before, and not the pkg.mod module you want to get the submod attribute from. The implementation could have decided to make it so the import statement does extra work, splitting the package name on . like the __import__() function already does and traversing the names, but this would have meant duplicating some of the effort. So, instead, the implementation made __import__() return the right-most module instead of the left-most one if and only if fromlist is passed and not empty.

(The import pkg as p and from pkg import mod as m syntax doesn't change anything about this story except which local names get assigned to -- the __import__() function sees nothing different when as is used, it all remains in the import statement implementation.)