Is that even possible!?!
I have a bunch of legacy reports that I need to import into a database. However, they're all in pdf format. Are there any R
packages that can read pdf? Or should I leave that to a command line tool?
The reports were made in excel and then pdfed, so they have regular structure, but many blank "cells".
So... this gets me close even on a fairly complex table.
Download a sample pdf from bmi pdf
library(tm)
pdf <- readPDF(PdftotextOptions = "-layout")
dat <- pdf(elem = list(uri='bmi_tbl.pdf'), language='en', id='id1')
dat <- gsub(' +', ',', dat)
out <- read.csv(textConnection(dat), header=FALSE)