Top "Text-processing" questions

Mechanizing the creation or manipulation of electronic text.

Is there any way to convert Wikitext to Markdown in python?

Is there a python library which takes wikitext (as used in mediawiki) input and converts it to markdown?

python mediawiki markdown text-processing
Efficient text preprocessing using PySpark (clean, tokenize, stopwords, stemming, filter)

Recently, I began to learn the spark on the book "Learning Spark". In theory, everything is clear, in practice, I …

python apache-spark pyspark apache-spark-sql text-processing
How to find out if a sentence is a question (interrogative)?

Is there an open source Java library/algorithm for finding if a particular piece of text is a question or …

java algorithm nlp data-mining text-processing
Effects of Stemming on the term frequency?

How are the term frequencies (TF), and inverse document frequency (IDF), affected by stop-word removal and stemming? Thanks!

data-mining text-processing tf-idf stop-words stemming
Text features input format for classification algorithms in scikit-learn

I'm starting to use the scikit-learn to do some NLP. I've already used some classifiers from NLTK and now I …

python scikit-learn classification text-processing feature-engineering
What is the difference between fit_transform and transform in sklearn countvectorizer?

I was recently practicing bag of words introduction : kaggle , I want to clear few things : using vectorizer.fit_transform( " * on …

python scikit-learn tokenize text-processing