How can I debug a corrupt docx file?

Martin Hansen Lennox picture Martin Hansen Lennox · Aug 12, 2013 · Viewed 15.6k times · Source

I have an issue where .doc and .pdf files are coming out OK but a .docx file is coming out corrupt.

In order to solve that I am trying to debug why the .docx is corrupt.

I learned that the docx format is much stricter with regard to extra characters than either .pdf or .doc. Therefore I have searched the various xml files WITHIN the docx file looking for invalid XML. But I can't find any. It all validates fine.

xml files I've been checking out

Could anyone suggest directions for me to investigate now?

UPDATE:

The full listing of files inside the folder is as follows:

/_rels
    .rels

/customXml
    /_rels
        .rels
    item1.xml
    itemProps1.xml

/docProps
    app.xml
    core.xml

/word
    /_rels
        document.xml.rels
    /media
        image1.jpeg
    /theme
        theme1.xml
    document.xml
    fontTable.xml
    numbering.xml
    settings.xml
    styles.xml
    stylesWithEffects.xml
    webSettings.xml

[Content_Types].xml

UPDATE 2:

I should also have mentioned that the reason for corruption is almost certainly a bad binary file POST on my behalf.

why are docx files corrupted by binary post, but .doc and .pdf are fine?

UPDATE 3:

I have tried the demo various docx repair tools. They all seem to repair the file ok but give no clue as to the cause of the error.

My next step is to examine the contents of the corrupted file with the repaired version.

If anybody knows of a docx repair tool that gives a decent error message I'd appreciate hearing about it. In fact I might post that as a separate question.

UPDATE 4 (2017)

I never solved this problem. I have tried all the tools suggested in the answers below but none of them worked for me.

I have since progressed a little further and found a block of 0000 missing when opening the .docx in Sublime Text. More details in the new question here: What could be causing this corruption in .docx files during httpwebrequest?

Answer

Jeremy K picture Jeremy K · Jan 24, 2014

I used the "Open XML SDK 2.5 Productivity Tool" (http://www.microsoft.com/en-us/download/details.aspx?id=30425) to find a problem with a broken hyperlink reference.

You have to download/install the SDK first, then the tool. The tool will open and analyze the document for problems.