How to create/write a simple XML parser from scratch?

XP1 picture XP1 · Jun 5, 2011 · Viewed 23.6k times · Source

How to create/write a simple XML parser from scratch?

Rather than code samples, I want to know what are the simplified, basic steps in English.

How is a good parser designed? I understand that regex should not be used in a parser, but how much is regex's role in parsing XML?

What is the recommended data structure to use? Should I use linked lists to store and retrieve nodes, attributes, and values?

I want to learn how to create an XML parser so that I can write one in D programming language.

Answer

Michael Kay picture Michael Kay · Jun 5, 2011

If you don't know how to write a parser, then you need to do some reading. Get hold of any book on compiler-writing (many of the best ones were written 30 or 40 years ago, e.g. Aho and Ullmann) and study the chapters on lexical analysis and syntax analysis. XML is essentially no different, except that the lexical and grammar phases are not as clearly isolated from each other as in some languages.

One word of warning, if you want to write a fully-conformant XML parser then 90% of your effort will be spent getting edge cases right in obscure corners of the spec dealing with things such as parameter entities that most XML users aren't even aware of.