Reading .Doc File using DocumentFormat.OpenXml dll

Shardaprasad Soni picture Shardaprasad Soni · Apr 2, 2012 · Viewed 15.5k times · Source

When I am trying to read .doc file using DocumentFormat.OpenXml dll its giving error as "File contains corrupted data."

This dll is reading .docx file properly.

Can DocumentFormat.OpenXml dll help in reading .doc file?

string path = @"D:\Data\Test.doc";
string searchKeyWord = @"java";

private bool SearchWordIsMatched(string path, string searchKeyWord)
{
    try
    {
       using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(path, true))
       {
           var text = wordDoc.MainDocumentPart.Document.InnerText;
           if (text.Contains(searchKeyWord))
               return true;
           else
               return false;
       }
     }
     catch (Exception ex)
     {
         throw ex;
     }
}

Answer

svick picture svick · Apr 2, 2012

The old .doc files have a completely different format from the new .docx files. So, no, you can't use the OpenXml library to read .doc files.

To do that, you would either need to manually convert the files first, or you would need to use Office interop, instead of the Open XML SDK you're using now.