Top "Nltk" questions

The Natural Language Toolkit is a Python library for computational linguistics.

Classifying Documents into Categories

I've got about 300k documents stored in a Postgres database that are tagged with topic categories (there are about 150 categories …

python machine-learning nlp nltk naivebayes
Python re.split() vs nltk word_tokenize and sent_tokenize

I was going through this question. Am just wondering whether NLTK would be faster than regex in word/sentence tokenization.

python regex nlp nltk tokenize
NLTK Context Free Grammar Genaration

I'm working on a non-English parser with Unicode characters. For that, I decided to use NLTK. But it requires a …

python parsing nlp nltk context-free-grammar
NLTK package to estimate the (unigram) perplexity

I am trying to calculate the perplexity for the data I have. The code I am using is: import sys …

python-2.7 nlp nltk n-gram language-model
How to generate bi/tri-grams using spacy/nltk

The input text are always list of dish names where there are 1~3 adjectives and a noun Inputs thai iced tea …

python nlp nltk n-gram spacy
Docker NLTK Download

I am building a docker container using the following Dockerfile: FROM ubuntu:14.04 RUN apt-get update RUN apt-get install -y python …

python docker nltk
NLTK vs Stanford NLP

I have recently started to use NLTK toolkit for creating few solutions using Python. I hear a lot of community …

python nlp nltk stanford-nlp
nltk.word_tokenize() giving AttributeError: 'module' object has no attribute 'defaultdict'

I am new to nltk. I was trying some basics. import nltk nltk.word_tokenize("Tokenize me") gives me this …

nltk attributeerror defaultdict
Large scale machine learning - Python or Java?

I am currently embarking on a project that will involve crawling and processing huge amounts of data (hundreds of gigs), …

java python machine-learning nltk mahout
nltk doesn't add $NLTK_DATA to search path?

under linux,I have set env var $NLTK_DATA('/home/user/data/nltk'),and blew test works as expected &…

python environment-variables nltk search-path