Python xml minidom. generate <text>Some text</text> element

Orjanp picture Orjanp · Feb 3, 2009 · Viewed 16.5k times · Source

I have the following code.

from xml.dom.minidom import Document

doc = Document()

root = doc.createElement('root')
doc.appendChild(root)
main = doc.createElement('Text')
root.appendChild(main)

text = doc.createTextNode('Some text here')
main.appendChild(text)

print doc.toprettyxml(indent='\t')

The result is:

<?xml version="1.0" ?>
<root>
    <Text>
        Some text here
    </Text>
</root>

This is all fine and dandy, but what if I want the output to look like this?

<?xml version="1.0" ?>
<root>
    <Text>Some text here</Text>
</root>

Can this easily be done?

Orjanp...

Answer

bobince picture bobince · Feb 3, 2009

Can this easily be done?

It depends what exact rule you want, but generally no, you get little control over pretty-printing. If you want a specific format you'll usually have to write your own walker.

The DOM Level 3 LS parameter format-pretty-print in pxdom comes pretty close to your example. Its rule is that if an element contains only a single TextNode, no extra whitespace will be added. However it (currently) uses two spaces for an indent rather than four.

>>> doc= pxdom.parseString('<a><b>c</b></a>')
>>> doc.domConfig.setParameter('format-pretty-print', True)
>>> print doc.pxdomContent
<?xml version="1.0" encoding="utf-16"?>
<a>
  <b>c</b>
</a>

(Adjust encoding and output format for whatever type of serialisation you're doing.)

If that's the rule you want, and you can get away with it, you might also be able to monkey-patch minidom's Element.writexml, eg.:

>>> from xml.dom import minidom
>>> def newwritexml(self, writer, indent= '', addindent= '', newl= ''):
...     if len(self.childNodes)==1 and self.firstChild.nodeType==3:
...         writer.write(indent)
...         self.oldwritexml(writer) # cancel extra whitespace
...         writer.write(newl)
...     else:
...         self.oldwritexml(writer, indent, addindent, newl)
... 
>>> minidom.Element.oldwritexml= minidom.Element.writexml
>>> minidom.Element.writexml= newwritexml

All the usual caveats about the badness of monkey-patching apply.