List fonts used by a Word Document (faster method)

Peter picture Peter · Mar 10, 2011 · Viewed 21.9k times · Source

I am working on a process for validating documents to make sure that they meet corporate standards. One of the steps is to make sure that the Word document does not use non-approved fonts.

I have the following stub of code, which works:

    Dim wordApplication As Word.ApplicationClass = New Word.ApplicationClass()
    Dim wordDocument As Word.Document = Nothing

    Dim fontList As New List(Of String)()

    Try
        wordDocument = wordApplication.Documents.Open(FileName:="document Path")
        'I've also tried using a for loop with an integer counter, no change in speed'
        For Each c As Word.Range In wordDocument.Characters
            If Not fontList.Contains(c.Font.Name) Then
                fontList.Add(c.Font.Name)
            End If
        Next

But this is incredibly slow! Incredibly slow = 2500 characters/minute (I timed it with StopWatch). Most of my files are around 6,000 words/30,000 characters (about 25 pages). But there are some documents that are in the 100's of pages...

Is there a faster way of doing this? I have to support Office 2003 format files, so the Open XML SDK isn't an option.

--UPDATE--

I tried running this as a Word macro (using the code found @ http://word.tips.net/Pages/T001522_Creating_a_Document_Font_List.html) and it runs much faster (under a minute). Unfortunately for my purposes I don't believe a Macro will work.

--UPDATE #2--

I took Chris's advice and converted the document to Open XML format on the fly. I then used the following code to find all RunFonts objects and read the font name:

    Using docP As WordprocessingDocument = WordprocessingDocument.Open(tmpPath, False)
        Dim runFonts = docP.MainDocumentPart.Document.Descendants(Of RunFonts)().Select(
                            Function(c) If(c.Ascii.HasValue, c.Ascii.InnerText, String.Empty)).Distinct().ToList()

        fontList.AddRange(runFonts)
    End Using

Answer

Chris Haas picture Chris Haas · Mar 10, 2011

You might have to support Office 2003 but that doesn't mean you have to parse it in that format. Take the Office 2003 documents, temporarily convert them to DOCX files, open that as a ZIP file, parse the /word/fontTable.xml file and then delete the DOCX.