Reliable way to (programmatically) compare PDFs?

JohnIdol picture JohnIdol · Sep 30, 2010 · Viewed 15.5k times · Source

Possible Duplicate:
Tool to compare large numbers of PDF files?

I am in the classic scenario where the business gives you a bunch of new pdf forms for the new year with no revision notes whatsoever and you are supposed to figure out what's different from the previous year ones.

I am talking loads of forms here, so I am trying to find a way to compare PDFs to outline differences without having people to manually go through each and every one of them.

My idea was to extract all the text from the PDFs and dump it into a .txt then run differences on text files, but it sounds horrible.

My question says programmatically, but I'd be happy with any reliable tools for comparing PDFs, and mainly looking to get an idea from people experiences. Also willing to entertain any programmatic solutions (preferably in C# but pls shoot out any ideas).

Answer

Sorax picture Sorax · Sep 30, 2010

There is quite a few software products that claim to diff pdfs. I've never had need to use one but if this is going to be a recurring process I think it'd be wise for your company to invest in one of them. Just Google "pdf diff" for a bunch of potential applications.

Additionally, your situation is very similar to this question: Tool to compare large numbers of PDF files? I think its discussion may help.