How to make a dictionary from a text file with python

user2007220 picture user2007220 · Jan 24, 2013 · Viewed 31.5k times · Source

My file looks like this:

aaien 12 13 39
aan 10
aanbad 12 13 14 57 58 38
aanbaden 12 13 14 57 58 38
aanbeden 12 13 14 57 58 38
aanbid  12 13 14 57 58 39
aanbidden 12 13 14 57 58 39
aanbidt 12 13 14 57 58 39
aanblik 27 28
aanbreken 39
...

I want to make a dictionary with key = the word (like 'aaien') and the value should be a list of the numbers that are next to it. So it has to look this way: {'aaien': ['12, 13, 39'], 'aan': ['10']}

This code doesn't seem to work.

document = open('LIWC_words.txt', 'r')
liwcwords = document.read()
dictliwc = {}
for line in liwcwords:
    k, v = line.strip().split(' ')
    answer[k.strip()] = v.strip()

liwcwords.close()

python gives this error:

ValueError: need more than 1 value to unpack

Answer

Martijn Pieters picture Martijn Pieters · Jan 24, 2013

You are splitting your line into a list of words, but only giving it one key and value.

This will work:

with open('LIWC_words.txt', 'r') as document:
    answer = {}
    for line in document:
        line = line.split()
        if not line:  # empty line?
            continue
        answer[line[0]] = line[1:]

Note that you don't need to give .split() an argument; without arguments it'll both split on whitespace and strip the results for you. That saves you having to explicitly call .strip().

The alternative is to split only on the first whitespace:

with open('LIWC_words.txt', 'r') as document:
    answer = {}
    for line in document:
        if line.strip():  # non-empty line?
            key, value = line.split(None, 1)  # None means 'all whitespace', the default
            answer[key] = value.split()

The second argument to .split() limits the number of splits made, guaranteeing that there at most 2 elements are returned, making it possible to unpack the values in the assignment to key and value.

Either method results in:

{'aaien': ['12', '13', '39'],
 'aan': ['10'],
 'aanbad': ['12', '13', '14', '57', '58', '38'],
 'aanbaden': ['12', '13', '14', '57', '58', '38'],
 'aanbeden': ['12', '13', '14', '57', '58', '38'],
 'aanbid': ['12', '13', '14', '57', '58', '39'],
 'aanbidden': ['12', '13', '14', '57', '58', '39'],
 'aanbidt': ['12', '13', '14', '57', '58', '39'],
 'aanblik': ['27', '28'],
 'aanbreken': ['39']}

If you still see only one key and the rest of the file as the (split) value, your input file is using a non-standard line separator perhaps. Open the file with universal line ending support, by adding the U character to the mode:

with open('LIWC_words.txt', 'rU') as document: