English dictionary as txt or xml file with support of synonyms

Simon picture Simon · Apr 19, 2010 · Viewed 29.4k times · Source

Can someone point me to where I can download English dictionary as a txt or xml file. I am building a simple app for myself and looking for something what I could start using immediately without learning complex API.

Support for synonyms would be great, that is it should be easier to retrieve all the synonyms for a particular word.

It would be absolutely fantastic if the dictionary would be listing British and American spelling of the words where they differ.

Even if it would be small dictionary (a few thousand words) that's OK, I only need it for a small project.

I even would be willing to buy one if the price is reasonable, and the dictionary is easy to use - simple XML would be great.

Any directions please.

Answer

dmcer picture dmcer · Apr 19, 2010

WordNet is what you want. It's big, containing over a hundred thousand entries, and it's freely available.

However, it's not stored as XML. To access the data, you'll want to use one of the existing WordNet APIs for your language of choice.

Using the APIs is generally pretty straightforward, so I don't think you have to worry much about "learning (a) complex API". For example, borrowing from the WordNet How to for the Python based Natural Language Toolkit (NLTK):

 >>> from nltk.corpus import wordnet
 >>> 
 >>> # Get All Synsets for 'dog'
 >>> # This is essentially all senses of the word in the db
 >>> wordnet.synsets('dog')
 [Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), 
  Synset('cad.n.01'), Synset('frank.n.02'),Synset('pawl.n.01'), 
  Synset('andiron.n.01'), Synset('chase.v.01')]

 >>> # Get the definition and usage for the first synset
 >>> wn.synset('dog.n.01').definition
 'a member of the genus Canis (probably descended from the common 
 wolf) that has been domesticated by man since prehistoric times; 
 occurs in many breeds'
 >>> wn.synset('dog.n.01').examples
 ['the dog barked all night']

 >>> # Get antonyms for 'good'
 >>> wordnet.synset('good.a.01').lemmas[0].antonyms()
 [Lemma('bad.a.01.bad')]

 >>> # Get synonyms for the first noun sense of 'dog'
 >>> wordnet.synset('dog.n.01').lemmas
 [Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), 
 Lemma('dog.n.01.Canis_familiaris')]

 >>> # Get synonyms for all senses of 'dog'
 >>> for synset in wordnet.synsets('dog'): print synset.lemmas
 [Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), 
 Lemma('dog.n.01.Canis_familiaris')]
 ...
 [Lemma('frank.n.02.frank'), Lemma('frank.n.02.frankfurter'), 
 ...

While there is an American English bias in WordNet, it supports British spellings and usage. For example, you can look up 'colour' and one of the synsets for 'lift' is 'elevator.n.01'.

Notes on XML

If having the data represented as XML is essential, you could easily use one of the APIs to access the WordNet database and convert it into XML, e.g. see Thinking XML: Querying WordNet as XML.