Top "Text-processing" questions

Mechanizing the creation or manipulation of electronic text.

sed how to delete first 17 lines and last 8 lines in a file

I have a big file 150GB CSV file and I would like to remove the first 17 lines and the last 8 …

linux bash sed text-processing
How to get nth column with regexp delimiter

Basically I get line from ls -la command: -rw-r--r-- 13 ondrejodchazel staff 442 Dec 10 16:23 some_file and want to get size of …

bash unix text text-processing
Extract words surrounding a search word

I have this script that does a word search in text. The search goes pretty good and results work as …

python regex find text-processing
Split text on paragraphs where paragraph delimiters are non-standard

If I have text with standard paragraph formatting (a blank line followed by an indent) such as text 1 it's easy …

python text-processing
Negation handling in NLP

I'm currently working on a project, where I want to extract emotion from text. As I'm using conceptnet5 (a semantic …

python regex nlp nltk text-processing
How To Use Backreference in Grep

I have a regular expression with a backreference. How can use it in a bash script? Such as I want …

regex unix grep text-processing
Deleting the last line of a file with Java

I have a .txt file, which I want to process in Java. I want to delete its last line. I …

java file-io text-processing
Apache Tika and character limit when parsing documents

Could please anybody help me to sort it out? It can be done like this Tika tika = new Tika(); tika.…

java text-processing apache-tika
Replacing all GUIDs in a file with new GUIDs from the command line

I have a file containing a large number of occurrences of the string Guid="GUID HERE" (where GUID HERE is …

shell replace sed guid text-processing
TFIDF calculating confusion

I found the following code on the internet for calculating TFIDF: https://github.com/timtrueman/tf-idf/blob/master/tf-idf.py …

python data-mining text-processing information-retrieval tf-idf