I'm writing a GIS client tool in C# to retrieve "features" in a GML-based XML schema (sample below) from a server. Extracts are limited to 100,000 features.
I guestimate that the largest extract.xml might get up around 150 megabytes, so obviously DOM parsers are out I've been trying to decide between XmlSerializer and XSD.EXE generated bindings --OR-- XmlReader and a hand-crafted object graph.
Or maybe there's a better way which I haven't considered yet? Like XLINQ, or ????
Please can anybody guide me? Especially with regards to the memory efficiency of any given approach. If not I'll have to "prototype" both solutions and profile them side-by-side.
I'm a bit of a raw prawn in .NET. Any guidance would be greatly appreciated.
Thanking you. Keith.
Sample XML - upto 100,000 of them, of upto 234,600 coords per feature.
<feature featId="27168306" fType="vegetation" fTypeId="1129" fClass="vegetation" gType="Polygon" ID="0" cLockNr="51598" metadataId="51599" mdFileId="NRM/TIS/VEGETATION/9543_22_v3" dataScale="25000">
<MultiGeometry>
<geometryMember>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>153.505004,-27.42196 153.505044,-27.422015 153.503992 .... 172 coordinates omitted to save space ... 153.505004,-27.42196</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</geometryMember>
</MultiGeometry>
</feature>
Use XmlReader
to parse large XML documents. XmlReader
provides fast, forward-only, non-cached access to XML data. (Forward-only means you can read the XML file from beginning to end but cannot move backwards in the file.) XmlReader
uses small amounts of memory, and is equivalent to using a simple SAX reader.
using (XmlReader myReader = XmlReader.Create(@"c:\data\coords.xml"))
{
while (myReader.Read())
{
// Process each node (myReader.Value) here
// ...
}
}
You can use XmlReader to process files that are up to 2 gigabytes (GB) in size.