`pickle`: yet another `ImportError: No module named my_module`

user89 picture user89 · Nov 28, 2015 · Viewed 18.2k times · Source

I have a class MyClass defined in my_module. MyClass has a method pickle_myself which pickles the instance of the class in question:

def pickle_myself(self, pkl_file_path):
    with open(pkl_file_path, 'w+') as f:
        pkl.dump(self, f, protocol=2)

I have made sure that my_module is in PYTHONPATH. In the interpreter, executing __import__('my_module') works fine:

>>> __import__('my_module')
<module 'my_module' from 'A:\my_stuff\my_module.pyc'>

However, when eventually loading the file, I get:

File "A:\Anaconda\lib\pickle.py", line 1128, in find_class
  __import__(module)
ImportError: No module named my_module

Some things I have made sure of:


EDIT -- A toy example that reproduces the error:

The example itself is spread over a bunch of files.

First, we have the module ball (stored in a file called ball.py):

class Ball():
    def __init__(self, ball_radius):
        self.ball_radius = ball_radius

    def say_hello(self):
        print "Hi, I'm a ball with radius {}!".format(self.ball_radius)

Then, we have the module test_environment:

import os
import ball
#import dill as pkl
import pickle as pkl

class Environment():
    def __init__(self, store_dir, num_balls, default_ball_radius):
        self.store_dir = store_dir
        self.balls_in_environment = [ball.Ball(default_ball_radius) for x in range(num_balls)]

    def persist(self):
        pkl_file_path = os.path.join(self.store_dir, "test_stored_env.pkl")

        with open(pkl_file_path, 'w+') as f:
            pkl.dump(self, f, protocol=2)

Then, we have a module that has functions to make environments, persist them, and load them, called make_persist_load:

import os
import test_environment
#import pickle as pkl
import dill as pkl


def make_env_and_persist():
    cwd = os.getcwd()

    my_env = test_environment.Environment(cwd, 5, 5)

    my_env.persist()

def load_env(store_path):
    stored_env = None

    with open(store_path, 'rb') as pkl_f:
        stored_env = pkl.load(pkl_f)

    return stored_env

Then we have a script to put it all together, in test_serialization.py:

import os
import make_persist_load

MAKE_AND_PERSIST = True
LOAD = (not MAKE_AND_PERSIST)

cwd = os.getcwd()
store_path = os.path.join(cwd, "test_stored_env.pkl")

if MAKE_AND_PERSIST == True:
    make_persist_load.make_env_and_persist()

if LOAD == True:
    loaded_env = make_persist_load.load_env(store_path)

In order to make it easy to use this toy example, I have put it all up on in a Github repository that simply needs to be cloned into your directory of choice.. Please see the README containing instructions, which I also reproduce here:

Instructions:

1) Clone repository into a directory.

2) Add repository directory to PYTHONPATH.

3) Open up test_serialization.py, and set the variable MAKE_AND_PERSIST to True. Run the script in an interpreter.

4) Close the previous interpreter instance, and start up a new one. In test_serialization.py, change MAKE_AND_PERSIST to False, and this will programmatically set LOAD to True. Run the script in an interpreter, causing ImportError: No module named test_environment.

5) By default, the test is set to use dill, instead of pickle. In order to change this, go into test_environment.py and make_persist_load.py, to change imports as required.


EDIT: after switching to dill '0.2.5.dev0', dill.detect.trace(True) output

C2: test_environment.Environment
# C2
D2: <dict object at 0x000000000A9BDAE8>
C2: ball.Ball
# C2
D2: <dict object at 0x000000000AA25048>
# D2
D2: <dict object at 0x000000000AA25268>
# D2
D2: <dict object at 0x000000000A9BD598>
# D2
D2: <dict object at 0x000000000A9BD9D8>
# D2
D2: <dict object at 0x000000000A9B0BF8>
# D2
# D2

EDIT: the toy example works perfectly well when run on Mac/Ubuntu (i.e. Unix-like systems?). It only fails on Windows.

Answer

Mike McKerns picture Mike McKerns · Nov 28, 2015

I can tell from your question that you are probably doing something like this, with a class method that is attempting to pickle the instance of the class. It's ill-advised to do that, if you are doing that… it's much more sane to use pkl.dump external to the class instead (where pkl is pickle or dill etc). However, it can still work with this design, see below:

>>> class Thing(object):
...   def pickle_myself(self, pkl_file_path):
...     with open(pkl_file_path, 'w+') as f:
...       pkl.dump(self, f, protocol=2)
... 
>>> import dill as pkl
>>> 
>>> t = Thing()
>>> t.pickle_myself('foo.pkl')

Then restarting...

Python 2.7.10 (default, Sep  2 2015, 17:36:25) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> f = open('foo.pkl', 'r')
>>> t = dill.load(f)
>>> t
<__main__.Thing object at 0x1060ff410>

If you have a much more complicated class, which I'm sure you do, then you are likely to run into trouble, especially if that class uses another file that is sitting in the same directory.

>>> import dill
>>> from bar import Zap
>>> print dill.source.getsource(Zap)
class Zap(object):
    x = 1
    def __init__(self, y):
        self.y = y

>>> 
>>> class Thing2(Zap):   
...   def pickle_myself(self, pkl_file_path):
...     with open(pkl_file_path, 'w+') as f:
...       dill.dump(self, f, protocol=2)
... 
>>> t = Thing2(2)
>>> t.pickle_myself('foo2.pkl')

Then restarting…

Python 2.7.10 (default, Sep  2 2015, 17:36:25) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> f = open('foo2.pkl', 'r')
>>> t = dill.load(f)
>>> t
<__main__.Thing2 object at 0x10eca8090>
>>> t.y
2
>>> 

Well… shoot, that works too. You'll have to post your code, so we can see what pattern you are using that dill (and pickle) fails for. I know having one module import another that is not "installed" (i.e. in some local directory) and expecting the serialization to "just work" doesn't for all cases.

See dill issues: https://github.com/uqfoundation/dill/issues/128 https://github.com/uqfoundation/dill/issues/129 and this SO question: Why dill dumps external classes by reference, no matter what? for some examples of failure and potential workarounds.

EDIT with regard to updated question:

I don't see your issue. Running from the command line, importing from the interpreter (import test_serialization), and running the script in the interpreter (as below, and indicated in your steps 3-5) all work. That leads me to think you might be using an older version of dill?

>>> import os
>>> import make_persist_load
>>> 
>>> MAKE_AND_PERSIST = False #True
>>> LOAD = (not MAKE_AND_PERSIST)
>>> 
>>> cwd = os.getcwd()
>>> store_path = os.path.join(cwd, "test_stored_env.pkl")
>>> 
>>> if MAKE_AND_PERSIST == True:
...     make_persist_load.make_env_and_persist()
... 
>>> if LOAD == True:
...     loaded_env = make_persist_load.load_env(store_path)
... 
>>> 

EDIT based on discussion in comments:

Looks like it's probably an issue with Windows, as that seems to be the only OS the error appears.

EDIT after some work (see: https://github.com/uqfoundation/dill/issues/140):

Using this minimal example, I can reproduce the same error on Windows, while on MacOSX it still works…

# test.py
class Environment():
    def __init__(self):
        pass

and

# doit.py
import test
import dill

env = test.Environment()
path = "test.pkl"
with open(path, 'w+') as f:
    dill.dump(env, f)

with open(path, 'rb') as _f:
    _env = dill.load(_f)
    print _env

However, if you use open(path, 'r') as _f, it works on both Windows and MacOSX. So it looks like the __import__ on Windows is more sensitive to file type than on non-Windows systems. Still, throwing an ImportError is weird… but this one small change should make it work.