Top "Text-extraction" questions

Text extraction is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents (text).

Rule based PDF text extraction for verious bills and invoices

I have to extract text from invoices and bills pdf files The files layouts can get complex, though its mostly …

pdf text-extraction
Jsoup - extracting text

I need to extract text from a node like this: <div> Some text <b>with tags&…

java iteration jsoup text-extraction
How to extract the contents of a table in pdf file?

I want to extract the contents of a table in pdf like like this : i wrote this java programme using …

java pdf itext text-extraction pdf-extraction
iText - Get Font size and family of a text segment

I'm currently trying to automatically extract important keywords from a PDF file. I am able to get the text information …

java pdf itext text-extraction pdf-extraction
List the words in a vocabulary according to occurrence in a text corpus, with Scikit-Learn CountVectorizer

I have fitted a CountVectorizer to some documents in scikit-learn. I would like to see all the terms and their …

python machine-learning scikit-learn text-extraction countvectorizer
How to extract regex matches using Vim

An example: case Foo: ... break; case Bar: ... break; case More: case Complex: ... break: ... I’d like to retrieve all the …

regex vim match text-extraction
What's a good method for extracting text from a PDF using C# or classic ASP (VBScript)?

Is there a good library for extracting text from a PDF? I'm willing to pay for it if I have …

pdf text-extraction pdf-scraping
Extracting text from Image

Two, the type of number I am trying to extract Another sample Another sample The image above is the output …

python opencv text tesseract text-extraction
Python pdftotext ShellError Using textract

When I run the below Python script on a directory that contains a PDF file, I keep getting this error: …

python pdf text-extraction
Extracting data from an email message (or several thousand emails) [Exchange based]

My marketing department, bless them, has decided to make a sweepstakes where people enter over a webpage. That is great …

exchange-server text-extraction