Convert Word doc or docx files into text files?

CheeseConQueso picture CheeseConQueso · Jul 10, 2009 · Viewed 29.3k times · Source

I need a way to convert .doc or .docx extensions to .txt without installing anything. I also don't want to have to manually open Word to do this obviously. As long as it's running on auto.

I was thinking that either Perl or VBA could do the trick, but I can't find anything online for either.

Any suggestions?

Answer

jeje picture jeje · Jul 10, 2009

A simple Perl only solution for docx:

  1. Use Archive::Zip to get the word/document.xml file from your docx file. (A docx is just a zipped archive.)

  2. Use XML::LibXML to parse it.

  3. Then use XML::LibXSLT to transform it into text or html format. Seach the web to find a nice docx2txt.xsl file :)

Cheers !

J.