Invoice automatic data extraction OCR or PDF

Toby picture Toby · Jul 1, 2017 · Viewed 12.1k times · Source

I am looking for a solution to extract data from my invoices to send a summary to my accountant.

There are some companies out there which provide such services for around 20€ a month and invoices are usually very well recognised. But the services I tried don't extract all data I like, or are missing some functionality like an excel export to send the data to my accountant. And paying 20€ a month and having to manage another service for 5 invoices per month didn't appeal to me yet.

I was researching a little bit and found this stackoverflow question: Can anyone recommend OCR software to process invoices?

It's a bit outdated and hope to find some more up to date recommendations. I tried the Ephesoft community edition and it looked very promising at first. But the software has a learning and a review step. Inside the review step the data doesn't seem to be fed back to the learning step. Plus it feels more cumbersome then just doing it by hand. I assume it's made for big businesses.

I am looking for a simple data extraction software, which learns with each step I show it.

I also had a look at Apache Tika, but it doesn't seem ready to use with a simple web-interface.

  1. Do you have some recommendation for payed OCR services? Flexible to extract Total VAT amount/VAT %/ Total Amount/ Total Amount Currency/ VAT Currency/ Which account it was payed with/ Company name. With an export to excel?

  2. Do you have some recommendations for open source software?

  3. Do you have some general advice of how you handle your few (less than 50 a year) invoices?

Answer

Petr Baudis picture Petr Baudis · May 10, 2018

Except raw OCR and regexes on top of that (which may work fine for some very limited use-cases), there are several other options which offer API access. Those you can actually start using without any demo or sales process:

  • TagGun - specialized on receipts, can extract line-items too, free for 50 receipts monthly
  • Elis - specialized on invoices, supports a wide variety of templates automatically (a pre-trained machine learning model), free for under 300 invoices monthly

If you are willing to go through the sales process (and they actually seem to be real and live):

  • LucidTech and Itemize (not sure what their accuracy is and what are the fields they extract, as their API details are non-public)
  • FlexiCapture Engine - based on templates, if you are willing to define one for each specific invoice format

(disclaimer: I'm affiliated with Rossum, the vendor of Elis. Feel free to suggest edits adding other APIs!)