Is there a way to get a line number from an ElementTree Element

John Smith picture John Smith · Aug 5, 2011 · Viewed 9.7k times · Source

So I'm parsing some XML files using Python 3.2.1's cElementTree, and during the parsing I noticed that some of the tags were missing attribute information. I was wondering if there is any easy way of getting the line numbers of those Elements in the xml file.

Answer

Duncan Harris picture Duncan Harris · Apr 5, 2016

Took a while for me to work out how to do this using Python 3.x (using 3.3.2 here) so thought I would summarize:

# Force python XML parser not faster C accelerators
# because we can't hook the C implementation
sys.modules['_elementtree'] = None
import xml.etree.ElementTree as ET

class LineNumberingParser(ET.XMLParser):
    def _start_list(self, *args, **kwargs):
        # Here we assume the default XML parser which is expat
        # and copy its element position attributes into output Elements
        element = super(self.__class__, self)._start_list(*args, **kwargs)
        element._start_line_number = self.parser.CurrentLineNumber
        element._start_column_number = self.parser.CurrentColumnNumber
        element._start_byte_index = self.parser.CurrentByteIndex
        return element

    def _end(self, *args, **kwargs):
        element = super(self.__class__, self)._end(*args, **kwargs)
        element._end_line_number = self.parser.CurrentLineNumber
        element._end_column_number = self.parser.CurrentColumnNumber
        element._end_byte_index = self.parser.CurrentByteIndex
        return element

tree = ET.parse(filename, parser=LineNumberingParser())