Store a dictionary in a file for later retrieval

Ryflex picture Ryflex · Jun 26, 2013 · Viewed 13k times · Source

I've had a search around but can't find anything regarding this...

I'm looking for a way to save a dictionary to file and then later be able to load it back into a variable at a later date by reading the file.

The contents of the file don't have to be "human readable" it can be as messy as it wants.

Thanks - Hyflex

EDIT

import cPickle as pickle

BDICT = {}

## Automatically generated START
name = "BOB"
name_title = name.title()
count = 5
BDICT[name_title] = count

name = "TOM"
name_title = name.title()
count = 5
BDICT[name_title] = count

name = "TIMMY JOE"
name_title = name.title()
count = 5
BDICT[name_title] = count
## Automatically generated END

if BDICT:
    with open('DICT_ITEMS.txt', 'wb') as dict_items_save:
        pickle.dump(BDICT, dict_items_save)

BDICT = {} ## Wiping the dictionary

## Usually in a loop
firstrunDICT = True

if firstrunDICT:
    with open('DICT_ITEMS.txt', 'rb') as dict_items_open:
        dict_items_read = dict_items_open.read()
        if dict_items_read:
            BDICT = pickle.load(dict_items_open)
            firstrunDICT = False
            print BDICT

Error:

Traceback (most recent call last):
  File "C:\test3.py", line 35, in <module>
    BDICT = pickle.load(dict_items_open)
EOFError

Answer

Justin Carroll picture Justin Carroll · Jun 26, 2013

A few people have recommended shelve - I haven't used it, and I'm not knocking it. I have used pickle/cPickle and I'll offer the following approach:

How to use Pickle/cPickle (the abridged version)...

There are many reasons why you would use Pickle (or its noticable faster variant, cPickle). Put tersely Pickle is a way to store objects outside of your process.

Pickle not only gives you the options to store objects outside your python process, but also does so in a serialized fashion. Meaning, First In, First Out behavior (FIFO).

import pickle

## I am making up a dictionary here to show you how this works...
## Because I want to store this outside of this single run, it could be that this
## dictionary is dynamic and user based - so persistance beyond this run has
## meaning for me.  
myMadeUpDictionary = {"one": "banana", "two": "banana", "three": "banana", "four": "no-more"}

with open("mySavedDict.txt", "wb") as myFile:
    pickle.dump(myMadeUpDictionary, myFile)

So what just happened?

  • Step1: imported a module named 'pickle'
  • Step2: created my dictionary object
  • Step3: used a context manager to handle the opening/closing of a new file...
  • Step4: dump() the contents of the dictionary (which is referenced as 'pickling' the object) and then write it to a file (mySavedDict.txt).

If you then go into the file that was just created (located now on your filesystem), you can see the contents. It's messy - ugly - and not very insightlful.

nammer@crunchyQA:~/workspace/SandBox/POSTS/Pickle & cPickle$ cat mySavedDict.txt 
(dp0
S'four'
p1
S'no-more'
p2
sS'three'
p3
S'banana'
p4
sS'two'
p5
g4
sS'one'
p6
g4
s.

So what's next?

To bring that BACK into our program we simply do the following:

import pickle

with open("mySavedDict.txt", "rb") as myFile:
    myNewPulledInDictionary = pickle.load(myFile)

print myNewPulledInDictionary

Which provides the following return:

{'four': 'no-more', 'one': 'banana', 'three': 'banana', 'two': 'banana'}

cPickle vs Pickle

You won't see many people use pickle these days - I can't think off the top of my head why you would want to use the first implementation of pickle, especially when there is cPickle which does the same thing (more or less) but a lot faster!

So you can be lazy and do:

import cPickle as pickle

Which is great if you have something already built that uses pickle... but I argue that this is a bad recommendation and I fully expect to get scolded for even recommending that! (you should really look at your old implementation that used the original pickle and see if you need to change anything to follow cPickle patterns; if you have legacy code or production code you are working with, this saves you time refactoring (finding/replacing all instances of pickle with cPickle).

Otherwise, just:

import cPickle

and everywhere you see a reference to the pickle library, just replace accordingly. They have the same load() and dump() method.

Warning Warning I don't want to write this post any longer than it is, but I seem to have this painful memory of not making a distinction between load() and loads(), and dump() and dumps(). Damn... that was stupid of me! The short answer is that load()/dump() does it to a file-like object, wheres loads()/dumps() will perform similar behavior but to a string-like object (read more about it in the API, here).

Again, I haven't used shelve, but if it works for you (or others) - then yay!

RESPONSE TO YOUR EDIT

You need to remove the dict_items_read = dict_items_open.read() from your context-manager at the end. The file is already open and read in. You don't read it in like you would a text file to pull out strings... it's storing pickled python objects. It's not meant for eyes! It's meant for load().

Your code modified... works just fine for me (copy/paste and run the code below and see if it works). Notice near the bottom I've removed your read() of the file object.

import cPickle as pickle

BDICT = {}

## Automatically generated START
name = "BOB"
name_title = name.title()
count = 5
BDICT[name_title] = count

name = "TOM"
name_title = name.title()
count = 5
BDICT[name_title] = count

name = "TIMMY JOE"
name_title = name.title()
count = 5
BDICT[name_title] = count
## Automatically generated END

if BDICT:
    with open('DICT_ITEMS.txt', 'wb') as dict_items_save:
        pickle.dump(BDICT, dict_items_save)

BDICT = {} ## Wiping the dictionary

## Usually in a loop
firstrunDICT = True

if firstrunDICT:
    with open('DICT_ITEMS.txt', 'rb') as dict_items_open:
        BDICT = pickle.load(dict_items_open)
        firstrunDICT = False
        print BDICT