Reading contents of XML file without having to remove the XML declaration

Pingpong picture Pingpong · Dec 16, 2011 · Viewed 10.6k times · Source

I want to read all XML contents from a file. The code below only works when the XML declaration (<?xml version="1.0" encoding="UTF-8"?>) is removed. What is the best way to read the file without removing the XML declaration?

XmlTextReader reader = new XmlTextReader(@"c:\my path\a.xml");
            reader.Read();
            string rs = reader.ReadOuterXml();

Without removing the XML declaration, reader.ReadOuterXml() returns an empty string.

<?xml version="1.0" encoding="UTF-8"?>  
<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:a="http://www.w3.org/2005/08/addressing">
  <s:Header>
    <a:Action s:mustUnderstand="1">http://www.as.com/ver/ver.IClaimver/Car</a:Action>
    <a:MessageID>urn:uuid:b22149b6-2e70-46aa-8b01-c2841c70c1c7</a:MessageID>
    <ActivityId CorrelationId="16b385f3-34bd-45ff-ad13-8652baeaeb8a" xmlns="http://schemas.microsoft.com/2004/09/ServiceModel/Diagnostics">04eb5b59-cd42-47c6-a946-d840a6cde42b</ActivityId>
    <a:ReplyTo>
      <a:Address>http://www.w3.org/2005/08/addressing/anonymous</a:Address>
    </a:ReplyTo>
    <a:To s:mustUnderstand="1">http://localhost/ver.Web/ver2011.svc</a:To>
  </s:Header>
  <s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <Car xmlns="http://www.as.com/ver">
      <carApplication>
        <HB_Base xsi:type="HB" xmlns="urn:core">
          <Header>
            <Advisor>
              <AdvisorLocalAuthorityCode>11</AdvisorLocalAuthorityCode>
              <AdvisorType>1</AdvisorType>
            </Advisor>
          </Header>
          <General>
            <ApplyForHB>yes</ApplyForHB>
            <ApplyForCTB>yes</ApplyForCTB>
            <ApplyForFSL>yes</ApplyForFSL>
            <ConsentSupplied>no</ConsentSupplied>
            <SupportingDocumentsSupplied>no</SupportingDocumentsSupplied>
          </General>
        </HB_Base>
      </carApplication>
    </Car>
  </s:Body>
</s:Envelope>

Update

I know other methods that use NON-xml reader (e.g. by using File.ReadAllText()). But I need to know a way that uses an xml method.

Answer

dthorpe picture dthorpe · Dec 16, 2011

There can be no text or whitespace before the <?xml ?> encoding declaration other than a BOM, and no text between the declaration and the root element other than line break.

Anything else is an invalid document.

UPDATE:

I think your expectation of XmlTextReader.read() is incorrect.

Each call to XmlTextReader.Read() steps through the next "token" in the XML document, one token at a time. "Token" means XML elements, whitespace, text, and XML encoding declaration.

Your call to reader.ReadOuterXML() is returning an empty string because the first token in your XML file is an XML declaration, and an XML declaration does not have an OuterXML.

Consider this code:

    XmlTextReader reader = new XmlTextReader("test.xml");
    reader.Read();
    Console.WriteLine(reader.NodeType);  // XMLDeclaration
    reader.Read();
    Console.WriteLine(reader.NodeType);  // Whitespace
    reader.Read();
    Console.WriteLine(reader.NodeType);  // Element
    string rs = reader.ReadOuterXml();

The code above produces this output:

XmlDeclaration
Whitespace
Element

The first "token" is the XML declaration.

The second "token" encountered is the line break after the XML declaration.

The third "token" encountered is the <s:Envelope> element. From here a call to reader.ReadOuterXML() will return what I think you're expecting to see - the text of <s:Envelope> element, which is the entire soap packet.

If what you really want is to load the XML file into memory as objects, just call var doc = XDocument.Load("test.xml") and be done with the parsing in one fell swoop.

Unless you're working with an XML doc that is so monstrously huge that it won't fit in system memory, there's really not a lot of reason to go poking through the XML document one token at a time.