I am trying to serialize thousands of objects and some of these objects are lambda objects.
Since cPickle
doesn't work for lambdas, I tried using dill
. However, the drop in computational speed is more than 10 times when unpickleing (or undilling (?)). Looking through the source, it seems that dill
uses pickle
internally which might be the reason for the speed drop.
Is there another option for me that combine the best of both modules?
EDIT: The most significant speed drop is during unpickleing.
I'm the dill
author. Yes, dill
is slower typically, but that's the penalty you pay for more robust serialization. If you are serializing a lot of classes and functions, then you might want to try one of the dill
variants in dill.settings
If you use byref=True
then dill
will pickle several objects by reference (which is faster then the default). Other settings trade off picklibility for speed in selected objects.
In [1]: import dill
In [2]: f = lambda x:x
In [3]: %timeit dill.loads(dill.dumps(f))
1000 loops, best of 3: 286 us per loop
In [4]: dill.settings['byref'] = True
In [5]: %timeit dill.loads(dill.dumps(f))
1000 loops, best of 3: 237 us per loop
In [6]: dill.settings
Out[6]: {'byref': True, 'fmode': 0, 'protocol': 2, 'recurse': False}
In [7]: dill.settings['recurse'] = True
In [8]: %timeit dill.loads(dill.dumps(f))
1000 loops, best of 3: 408 us per loop
In [9]: class Foo(object):
...: x = 1
...: def bar(self, y):
...: return y + self.x
...:
In [10]: g = Foo()
In [11]: %timeit dill.loads(dill.dumps(g))
10000 loops, best of 3: 87.6 us per loop
In [12]: dill.settings['recurse'] = False
In [13]: %timeit dill.loads(dill.dumps(g))
10000 loops, best of 3: 87.4 us per loop
In [14]: dill.settings['byref'] = False
In [15]: %timeit dill.loads(dill.dumps(g))
1000 loops, best of 3: 499 us per loop
In [16]: