Generate bigrams with NLTK

Nikhil Raghavendra picture Nikhil Raghavendra · Jun 6, 2016 · Viewed 45.9k times · Source

I am trying to produce a bigram list of a given sentence for example, if I type,

    To be or not to be

I want the program to generate

     to be, be or, or not, not to, to be

I tried the following code but just gives me

<generator object bigrams at 0x0000000009231360>

This is my code:

    import nltk
    bigrm = nltk.bigrams(text)
    print(bigrm)

So how do I get what I want? I want a list of combinations of the words like above (to be, be or, or not, not to, to be).

Answer

Ilja Everil&#228; picture Ilja Everilä · Jun 6, 2016

nltk.bigrams() returns an iterator (a generator specifically) of bigrams. If you want a list, pass the iterator to list(). It also expects a sequence of items to generate bigrams from, so you have to split the text before passing it (if you had not done it):

bigrm = list(nltk.bigrams(text.split()))

To print them out separated with commas, you could (in python 3):

print(*map(' '.join, bigrm), sep=', ')

If on python 2, then for example:

print ', '.join(' '.join((a, b)) for a, b in bigrm)

Note that just for printing you do not need to generate a list, just use the iterator.