Using itextsharp xmlworker to convert html to pdf and write text vertically

Daniel picture Daniel · Mar 23, 2016 · Viewed 28.3k times · Source

Is there possible to achieve writing text direction bottom-up in xmlworker? I would like to use it in table. My code is

     <table border=1>
     <tr>
     <td style="padding-right:18px">
          <p style="writing-mode:sideways-lr;text-align:center">First</p</td>
     <td style="padding-right:18px">
          <p style="writing-mode:sideways-lr;text-align:center">Second</p></td></tr>
     <tr><td><p style="text-align:center">1</p>  </td>
         <td><p style="text-align:center">2</p></td> 
     </tr>
        </table>

But it it doesn't work after conversion from html to pdf. Text FIRST and SECOND are not in direction bottom-to-up.

Answer

kuujinbo picture kuujinbo · Mar 25, 2016

This was a pretty interesting problem, so +1 to the question.

The first step was to lookup whether or not iTextSharp XML Worker supports the HTML td tag. The mappings can be found in the source in iTextSharp.tool.xml.html.Tags. There you find td is mapped to iTextSharp.tool.xml.html.table.TableData, which makes the job of implementing a custom tag processor a little easier. I.e. all we need to do inherit from the class and override End():

public class TableDataProcessor : TableData
{
    /*
     * a **very** simple implementation of the CSS writing-mode property:
     * https://developer.mozilla.org/en-US/docs/Web/CSS/writing-mode
     */
    bool HasWritingMode(IDictionary<string, string> attributeMap)
    {
        bool hasStyle = attributeMap.ContainsKey("style");
        return hasStyle
                && attributeMap["style"].Split(new char[] { ';' })
                .Where(x => x.StartsWith("writing-mode:"))
                .Count() > 0
            ? true : false;
    }

    public override IList<IElement> End(
        IWorkerContext ctx,
        Tag tag,
        IList<IElement> currentContent)
    {
        var cells = base.End(ctx, tag, currentContent);
        var attributeMap = tag.Attributes;
        if (HasWritingMode(attributeMap))
        {
            var pdfPCell = (PdfPCell) cells[0];
            // **always** 'sideways-lr'
            pdfPCell.Rotation = 90;
        }
        return cells;
    }
}

As noted in the inline comments, this is a very simple implementation for your specific needs. You'll need to add extra logic to support any other writing-mode CSS property value, and include any sanity checks.

UPDATE

Based on the comment left by @Daniel, it's not clear how to add custom CSS when converting the HTML to PDF. First the updated HTML:

string XHTML = @"
<h1>Table with Vertical Text</h1>
<table><tr>
<td style='writing-mode:sideways-lr;text-align:center;width:40px;'>First</td>
<td style='writing-mode:sideways-lr;text-align:center;width:40px;'>Second</td></tr>
<tr><td style='text-align:center'>1</td>
<td style='text-align:center'>2</td></tr></table>

<h1>Table <u>without</u> Vertical Text</h1>
<table width='50%'>
<tr><td class='light-yellow'>0</td></tr>
<tr><td>1</td></tr>
<tr><td class='light-yellow'>2</td></tr>
<tr><td>3</td></tr>
</table>";

Then a small snippet of custom CSS:

string CSS = @"
    body {font-size: 12px;}
    table {border-collapse:collapse; margin:8px;}
    .light-yellow {background-color:#ffff99;}
    td {border:1px solid #ccc;padding:4px;}
";

The slightly difficult part is the extra setup - you can't use the simple out of the box XMLWorkerHelper.GetInstance().ParseXHtml() commonly seen here at SO. Here's a simple helper method that should get you started:

public void ConvertHtmlToPdf(string xHtml, string css)
{
    using (var stream = new FileStream(OUTPUT_FILE, FileMode.Create))
    {
        using (var document = new Document())
        {
            var writer = PdfWriter.GetInstance(document, stream);
            document.Open();

            // instantiate custom tag processor and add to `HtmlPipelineContext`.
            var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
            tagProcessorFactory.AddProcessor(
                new TableDataProcessor(), 
                new string[] { HTML.Tag.TD }
            );
            var htmlPipelineContext = new HtmlPipelineContext(null);
            htmlPipelineContext.SetTagFactory(tagProcessorFactory);

            var pdfWriterPipeline = new PdfWriterPipeline(document, writer);
            var htmlPipeline = new HtmlPipeline(htmlPipelineContext, pdfWriterPipeline);

            // get an ICssResolver and add the custom CSS
            var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
            cssResolver.AddCss(css, "utf-8", true);
            var cssResolverPipeline = new CssResolverPipeline(
                cssResolver, htmlPipeline
            );

            var worker = new XMLWorker(cssResolverPipeline, true);
            var parser = new XMLParser(worker);
            using (var stringReader = new StringReader(xHtml))
            {
                parser.Parse(stringReader);
            }
        }
    }
}

Instead of rehashing an explanation of the example code above, see the documentation (iText removed documentation, linked to Wayback Machine) to get a better idea of why you need to setup the parser that way.

Also note:

  1. XML Worker does not support all CSS2/CSS3 properties, so you may need to experiment with what works or doesn't work with regards to how close you want the PDF to look to the HTML displayed in the browser.
  2. The HTML snippet removed the p tag, since the style can be applied directly to the td tag.
  3. The inline width property. If omitted the columns will be variable widths that match if the text had been rendered horizontally.

Tested with iTextSharp and XML Worker versions 5.5.9 Here's the updated result:

enter image description here