I want to read header text from a docx file in Python. I am using python-docx module.
Can someone help me to do this, if this feature has already been implemented.
I tried to do this this way, but didn't get a success.
from docx import Document
document = Document(path)
section = document.sections[0]
print(section.text)
Error:
<class 'AttributeError'>'Section' object has no attribute 'text'
And:
from docx import Document
document = Document(path)
header = document.sections[0].header
print(header.text)
Error:
<class 'AttributeError'>'Section' object has no attribute 'header'
At the time you asked your question, this wasn't possible using the python-docx library. In the 0.8.8 release (January 7, 2019), header/footer support was added.
In a Word document, each section has a header. There's a lot of potential wrinkles to headers (e.g. they can be linked from section to section or different on even/odd pages), but in the simple case, with one section and a non-complicated header, you just need to go through the paragraphs in the section header.
from docx import Document
document = Document(path_and_filename)
section = document.sections[0]
header = section.header
for paragraph in header.paragraphs:
print(paragraph.text) # or whatever you have in mind
I'm working with a document that has the header laid out with a table instead of simple text. In that case, you'd need to work with the rows
in header.tables[0]
instead of the paragraphs.