Top "Text-extraction" questions

Text extraction is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents (text).

Extract text from pdf file using javascript

I want to extract text from pdf file using only Javascript in the client side without using the server. I've …

javascript pdf text-extraction pdf.js
How to extract text from MS office documents in C#

I was trying to extract a text(string) from MS Word (.doc, .docx), Excel and Powerpoint using C#. Where can …

c# ms-office text-extraction
PDF text extraction from given coordinates

I would like to extract text from a portion (using coordinates) of PDF using Ghostscript. Can anyone help me out?

pdf ghostscript text-extraction
How to extract common / significant phrases from a series of text entries

I have a series of text items- raw HTML from a MySQL database. I want to find the most common …

nlp text-extraction nltk text-analysis
regular expression to extract text from HTML

I would like to extract from a general HTML page, all the text (displayed or not). I would like to …

html regex html-content-extraction text-extraction
Text Extraction from HTML Java

I'm working on a program that downloads HTML pages and then selects some of the information and write it to …

java html screen-scraping html-content-extraction text-extraction
C# Extract text from PDF using PdfSharp

Is there a possibility to extract plain text from a PDF-File with PdfSharp? I don't want to use iTextSharp because …

c# text text-extraction pdfsharp
Extracting whole words

I have a large set of real-world text that I need to pull words out of to input into a …

python regex word alphabetical text-extraction
Extract columns of text from a pdf file using iText

I need to extract text from pdf files using iText. The problem is: some pdf files contain 2 columns and when …

java pdf itext text-extraction