Find all locations / cities / places in a text

sardanes picture sardanes · May 10, 2015 · Viewed 20.3k times · Source

If I have a text containing for example an article of a newspaper in Catalan language, how could I find all cities from that text?

I have been looking at the package nltk for python and I have downloaded the corpus for catalan language (nltk.corpus.cess_cat).

What I have at this moment: I have installed all necessary from nltk.download(). An example of what I have at this moment:

te = nltk.word_tokenize('Tots els gats son de Sant Cugat del Valles.')

nltk.pos_tag(te)

The city is 'Sant Cugat del Valles'. What I get from the output is:

[('Tots', 'NNS'),
 ('els', 'NNS'),
 ('gats', 'NNS'),
 ('son', 'VBP'),
 ('de', 'IN'),
 ('Sant', 'NNP'),
 ('Cugat', 'NNP'),
 ('del', 'NN'),
 ('Valles', 'NNP')]

NNP seems to indicate nouns whose first letter is uppercase. Is there a way of getting places or cities and not all names? Thank you

Answer

Anindita Bhowmik picture Anindita Bhowmik · May 17, 2016

You can use the geotext python library for the same.

pip install geotext

is all it takes to install this library. The usage is as simple as:

from geotext import GeoText
places = GeoText("London is a great city")
places.cities

gives the result 'London'

The list of cities covered in this library is not extensive but it has a good list.