Reading PDF documents in .Net

JRoppert picture JRoppert · Sep 17, 2008 · Viewed 195.8k times · Source

Is there an open source library that will help me with reading/parsing PDF documents in .Net/C#?

Answer

Brock Nusser picture Brock Nusser · Nov 2, 2011

Since this question was last answered in 2008, iTextSharp has improved their api dramatically. If you download the latest version of their api from http://sourceforge.net/projects/itextsharp/, you can use the following snippet of code to extract all text from a pdf into a string.

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

namespace PdfParser
{
    public static class PdfTextExtractor
    {
        public static string pdfText(string path)
        {
            PdfReader reader = new PdfReader(path);
            string text = string.Empty;
            for(int page = 1; page <= reader.NumberOfPages; page++)
            {
                text += PdfTextExtractor.GetTextFromPage(reader,page);
            }
            reader.Close();
            return text;
        }   
    }
}