Top "Text-extraction" questions

Text extraction is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents (text).

How to extract a substring using regex

I have a string that has two single quotes in it, the ' character. In between the single quotes is …

java regex string text-extraction
Extract a single (unsigned) integer from a string

I want to extract the digits from a string that contains numbers and letters like: "In My Cart : 11 items" I …

php string integer text-extraction
Python module for converting PDF to text

Is there any python module to convert PDF files into text? I tried one piece of code found in Activestate …

python pdf text-extraction pdf-scraping
How to extract text from a PDF?

Can anyone recommend a library/API for extracting the text and images from a PDF? We need to be able …

pdf text ghostscript extraction text-extraction
How to extract string following a pattern with grep, regex or perl

I have a file that looks something like this: <table name="content_analyzer" primary-key="id"> <type="global" /&…

regex perl sed html-parsing text-extraction
How can I read pdf in python?

How can I read pdf in python? I know one way of converting it to text, but I want to …

python python-2.7 pdf text-extraction
Extracting text from a PDF file using PDFMiner in python?

I am looking for documentation or examples on how to extract text from a PDF file using PDFMiner with Python. …

python python-3.x python-2.7 text-extraction pdfminer
Getting URL parameter in java and extract a specific text from that URL

I have a URL and I need to get the value of v from this URL. Here is my URL: …

java url text-extraction
PDF Parsing Using Python - extracting formatted and plain texts

I'm looking for a PDF library which will allow me to extract the text from a PDF document. I've looked …

python pdf parsing text-extraction information-extraction
How to extract just plain text from .doc & .docx files?

Anyone know of anything they can recommend in order to extract just the plain text from a .doc or .docx? …

unix extract docx doc text-extraction