How does the .doc format work?

stalepretzel picture stalepretzel · Sep 24, 2008 · Viewed 8.9k times · Source

I recently learned about the basic structure of the .docx file (it's a specially structured zip archive). However, docx is not formated like a doc.

How does a doc file work? What is the file format, structure, etc?

Answer

Jay picture Jay · Sep 24, 2008

It's not a direct answer to your question, but I highly recommend reading Joel Spolsky's article, Why are the Microsoft Office file formats so complicated? (And some workarounds). It will give you some insight into how complex the .doc format really is - and why. Joel also gives a very basic overview of what the .doc format consists of:

You see, Excel 97-2003 files are OLE compound documents, which are, essentially, file systems inside a single file. These are sufficiently complicated that you have to read another 9 page spec to figure that out. And these “specs” look more like C data structures than what we traditionally think of as a spec. It's a whole hierarchical file system.

(The quote refers to Excel files but it applies to Word docs as well). Informative article and helpful in understanding why .docx and ODF files are structured and designed so much more logically when being examined from an outside perspective.