I am currently using PDF Box to parse a pdf and I am trying to figure out how to retrieve data about the text such as the font (bold, size, etc) and the location of the font.
Any suggestions?
After poking around the (hard to find) PDFBox docs, I found this little gem.
Apparently one of the examples shows exactly how to do everything you asked. Basically, you subclass PdfTextStripper
and override the processTextPosition
method. There, you query the TextPosition
for whatever information you need.
For future reference, you can find the javaDoc here: http://pdfbox.apache.org/apidocs/index.html
Edit 2018-04-02: original link is dead, but example can be found in the SVN repo here.