python-docx get Header text

Dipas picture Dipas · Jan 15, 2018 · Viewed 7.1k times · Source

I want to read header text from a docx file in Python. I am using python-docx module.

Can someone help me to do this, if this feature has already been implemented.

I tried to do this this way, but didn't get a success.

from docx import Document

document = Document(path)
section = document.sections[0]
print(section.text)

Error:
    <class 'AttributeError'>'Section' object has no attribute 'text'

And:

from docx import Document

document = Document(path)
header = document.sections[0].header
print(header.text)

Error:
    <class 'AttributeError'>'Section' object has no attribute 'header'

Answer

Matt Schouten picture Matt Schouten · Jan 20, 2019

At the time you asked your question, this wasn't possible using the python-docx library. In the 0.8.8 release (January 7, 2019), header/footer support was added.

In a Word document, each section has a header. There's a lot of potential wrinkles to headers (e.g. they can be linked from section to section or different on even/odd pages), but in the simple case, with one section and a non-complicated header, you just need to go through the paragraphs in the section header.

from docx import Document
document = Document(path_and_filename)
section = document.sections[0]
header = section.header
for paragraph in header.paragraphs:
    print(paragraph.text) # or whatever you have in mind

I'm working with a document that has the header laid out with a table instead of simple text. In that case, you'd need to work with the rows in header.tables[0] instead of the paragraphs.