What's a good method for extracting text from a PDF using C# or classic ASP (VBScript)?

Mark Biek picture Mark Biek · Sep 5, 2008 · Viewed 9.3k times · Source

Is there a good library for extracting text from a PDF? I'm willing to pay for it if I have to.

Something that works with C# or classic ASP (VBScript) would be ideal and I also need to be able to separate the pages from the PDF.

This question had some interesting stuff, especially pdftotext but I'd like to avoid calling to an external command-line app if I can.

Answer

Ferruccio picture Ferruccio · Sep 5, 2008

You can use the IFilter interface built into Windows to extract text and properties (author, title, etc.) from any supported file type. It's a COM interface so you would have use the .NET interop facilities.

You'd also have to download the free PDF IFilter driver from Adobe.