Using Python to extract images and text from a word document

Preston Donovan picture Preston Donovan · Jun 14, 2011 · Viewed 8.2k times · Source

I would like to run a script on a folder full of word documents that reads through the documents and pulls out images and their captions (text right below the images). From the research I've done, I think pywin32 might be a viable solution. I know how to use pywin32 to find strings and pull them out, but I need help with the images part. How can I read through a docx file and have an event occur when an image is found? Thank you for any help! I am using Python 2.7.

Answer

Kevin C. picture Kevin C. · Aug 3, 2011

Docx files can be unzipped for extracting the images.