Tabula extract tables by area coordinates

Eric Choi picture Eric Choi · Aug 2, 2017 · Viewed 15.7k times · Source

We are given the option to extract tables from a PDF document by specifying its coordinates. For windows users, in order to get the coordinates, you have to upload the PDF file to Tabula web page and export the script which contains the coordinates then input the coordinates into your code. For Mac users, you just have to use the Preview app and the crop inspector. I'm just wondering if there are any third party programs or plug-ins which offer this to Windows user? I think this will be handy under the following situation:

  1. When you do not have internet access.
  2. I think the preview app will be more accurate because I have experienced inaccurate coordinates produced from the Tabula web page.

Will be grateful if anyone can point me to where I can find such thing. Much thanks.

Answer

Manuel Aristarán picture Manuel Aristarán · Aug 5, 2017

Tabula needs areas to be specified in PDF units, which are defined to be 1/72 of an inch. If using Acrobat Reader DC, you can use the Measure tool and multiply its readings by 72.

Tabula needs the area to be specified as the top, left, bottom and right distances. To obtain them, you can measure the distances from the top of the page to the beginning of the table and so on.

enter image description here