I have to convert a .pdf
file containing scanned images into .txt
files. The tesseract ocr
converts only images to .txt
, but I need to first extract the .tif
images and then convert it. Can anyone help me with this?
Use Imagemagick:
convert -density 600 input.pdf output.tif
Density is in DPI, from my experience 600 DPI works the best.