TypeError: list indices must be integers, not str (boolean convertion actually)

RokiDGupta picture RokiDGupta · Aug 3, 2016 · Viewed 63k times · Source
import nltk
import random
from nltk.corpus import movie_reviews

documents=[(list(movie_reviews.words(fileid)),category)
           for category in movie_reviews.categories()
           for fileid in movie_reviews.fileids(category)]

random.shuffle(documents)
#print(documents[1])

all_words=[]

for w in movie_reviews.words():
    all_words.append(w.lower())

all_words=nltk.FreqDist(all_words)

word_features = list(all_words.keys())[:3000]

def find_features(document):
    words = set(document)
    features=[]
    for w in word_features:
        features[w]= (w in words)

    return features

print((find_features(movie_reviews.words('neg/cv000_29416.txt'))))

featuresets = [(find_features(rev), category) for (rev,category) in documents]

After run, I am getting the error

features[w]= (w in words)
TypeError: list indices must be integers, not str

Please help me to solve it...

Answer

Nickil Maveli picture Nickil Maveli · Aug 3, 2016

Only change that needs to be made is that features must be initialized to a dict ({}) rather than a list ([]) and then you could populate it's contents.

The TypeError was because word_features is a list of strings which you were trying to index using a list and lists can't have string indices.

features={}
for w in word_features:
    features[w] = (w in words)

Here, the elements present in word_features constitute the keys of dictionary, features holding boolean values, True based on whether the same element appears in words (which holds unique items due to calling of set()) and False for the vice-versa situation.