Itextsharp HTMLWorker.Parse error

Emon picture Emon · Sep 3, 2012 · Viewed 14.5k times · Source

I have a problem with HTMLWorker.Parse From iTextSharp in a Windows Form program. Everytime when I excecute the code and it starts with the HTMLWorker.Parse, it gives the objectDisposedException. The exception says that it cannot access a closed file. But I checked many times and cannot find the file that's closed. Here is the code:

class HtmlToPdfConverter
 {
             private iTextSharp.text.Document doc = new iTextSharp.text.Document();

     public HtmlToPdfConverter()
     {
        this.doc.SetPageSize(PageSize.A4);

     }

     public string Run(string html, string pdfName)
     {
        try
        {
            using (doc)
            {
                StyleSheet styles = new StyleSheet();
                using (PdfWriter writer = PdfWriter.GetInstance(this.doc, new     FileStream(@"Z:\programs\" + pdfName + ".pdf", FileMode.Create)))
                {
                    this.doc.Open();
                    this.doc.OpenDocument();
                    this.doc.NewPage();
                    if (this.doc.IsOpen() == true)
                    {
                        StringReader reader = new StringReader(html);
                        //XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, reader);
                        this.doc.Add(new Paragraph(" "));
                        HTMLWorker worker = new HTMLWorker(this.doc);
                        worker.Open();
                        worker.StartDocument();
                        worker.NewPage();
                        worker.Parse(reader);
                        worker.SetStyleSheet(styles);

                        List<IElement> ie = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(reader, null);

                        foreach (IElement element in ie)
                        {
                            this.doc.Add((IElement)element);
                        }

                        worker.EndDocument();
                        worker.Close();
                    }
                }
            }
            return string.Empty;
        }
        catch (Exception ex)
        {
            return ex.Message;
        }

    }
 }

This is the exception:

System.ObjectDisposedException was caught
  Message=Cannot access a closed file.
  Source=mscorlib
  ObjectName=""
  StackTrace:
       at System.IO.__Error.FileNotOpen()
       at System.IO.FileStream.Write(Byte[] array, Int32 offset, Int32 count)
       at iTextSharp.text.pdf.OutputStreamCounter.Write(Byte[] buffer, Int32 offset, Int32 count)
       at iTextSharp.text.pdf.PdfIndirectObject.WriteTo(Stream os)
       at iTextSharp.text.pdf.PdfWriter.PdfBody.Add(PdfObject objecta, Int32 refNumber, Boolean inObjStm)
       at iTextSharp.text.pdf.PdfWriter.PdfBody.Add(PdfObject objecta, Int32 refNumber)
       at iTextSharp.text.pdf.PdfWriter.PdfBody.Add(PdfObject objecta, PdfIndirectReference refa)
       at iTextSharp.text.pdf.PdfWriter.AddToBody(PdfObject objecta, PdfIndirectReference refa)
       at iTextSharp.text.pdf.Type1Font.WriteFont(PdfWriter writer, PdfIndirectReference piref, Object[] parms)
       at iTextSharp.text.pdf.FontDetails.WriteFont(PdfWriter writer)
       at iTextSharp.text.pdf.PdfWriter.AddSharedObjectsToBody()
       at iTextSharp.text.pdf.PdfWriter.Close()
       at iTextSharp.text.DocWriter.Dispose()
       at WebPageExtraction.HtmlToPdfConverter.Run(String html, String pdfName)
  InnerException: 

Answer

You are trying to call the close methods after it's already disposed.

You have a using block which is disposing the object automatically, so just remove those two lines:

doc.CloseDocument();
doc.Close();

If you don't trust the internal dispose code to properly close the document and want to do that yourself anyway, do it inside the using block:

using (doc)
{
    StyleSheet styles = new StyleSheet();
    using (PdfWriter writer = PdfWriter.GetInstance(this.doc, new     FileStream(@"Z:\programs\" + pdfName + ".pdf", FileMode.Create)))
    {
        //.....
    }
    doc.CloseDocument();
    doc.Close();
}

Edit: after trying your code for myself I noticed some more problems and found the real reason for the error you got:

  • You are closing and disposing the global object doc and never creating new instance.
  • You don't dispose of all objects, which might lead to memory leak or locked file.
  • The error you got was because by default, the PdfWriter is closing the Stream it's using and when disposed, the writer is trying to use this stream. So to solve this, you have to close the stream yourself and tell the writer to not do it.

Complete fixed code:

Document doc = new Document();
StyleSheet styles = new StyleSheet();
string filePath = @"Z:\programs\" + pdfName + ".pdf";
using (FileStream pdfStream = new FileStream(filePath, FileMode.Create))
{
    using (PdfWriter writer = PdfWriter.GetInstance(doc, pdfStream))
    {
        writer.CloseStream = false;
        doc.Open();
        doc.OpenDocument();
        doc.NewPage();
        if (doc.IsOpen() == true)
        {
            using (StringReader reader = new StringReader(html))
            {
                //XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, reader);
                doc.Add(new Paragraph(" "));
                using (HTMLWorker worker = new HTMLWorker(doc))
                {
                    worker.Open();
                    worker.StartDocument();
                    worker.NewPage();
                    worker.Parse(reader);
                    worker.SetStyleSheet(styles);
                    List<IElement> ie = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(reader, null);
                    foreach (IElement element in ie)
                    {
                        doc.Add((IElement)element);
                    }
                    worker.EndDocument();
                    worker.Close();
                }
            }
        }
        writer.Close();
    }
}

doc.CloseDocument();
doc.Close();
doc.Dispose();