In C# (.net 4.0 and 4.5 / vs2010 and vs12) when I serialize an object containing a string having an illegal character using XMLSerializer, no error is thrown. However, when I deserialize that result, an "invalid character" error is thrown.
// add to XML
Items items = new Items();
items.Item = "\v hello world"; // contains "illegal" character \v
// variables
System.Xml.Serialization.XmlSerializer serializer = new System.Xml.Serialization.XmlSerializer(typeof(Items));
string tmpFile = Path.GetTempFileName();
// serialize
using (FileStream tmpFileStream = new FileStream(tmpFile, FileMode.Open, FileAccess.ReadWrite))
{
serializer.Serialize(tmpFileStream, items);
}
Console.WriteLine("Success! XML serialized in file " + tmpFile);
// deserialize
Items result = null;
using (FileStream plainTextFile = new FileStream(tmpFile, FileMode.Open, FileAccess.Read))
{
result = (Items)serializer.Deserialize(plainTextFile); //FAILS here
}
Console.WriteLine(result.Item);
"Items" is just a small class autogenerated by xsd /c Items.xsd. Items.xsd is nothing more than a root element (Items) containing one child (Item):
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="Items">
<xs:complexType>
<xs:sequence>
<xs:element name="Item" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The error thrown during deserialization is
Unhandled Exception: System.InvalidOperationException: There is an error in XML document (3, 12). ---> System.Xml.XmlException: '♂', hexadecimal value 0x0B, is an invalid character. Line 3, position 12.
The serialized XML file contains on line 3 this:
<Item> hello world</Item>
I know \v -> & # xB; is an illegal character, but why does XMLSerialize allows it to be serialized (without error)? I find it inconsistent of .NET that it allows me to serialize something without a problem only to find out that I cannot deserialize it.
Is there a solution so either the XMLSerializer removes the illegal characters automatically before serializing or can I instruct the deserialization to ignore the illegal characters?
Currently I do solve it by reading the file contents as a string, replacing "manually" the illegal characters and next deserialize it... but I find that an ugly hack/work around.
You can set XmlWriterSettings
's CheckCharacters
property to avoid writing illegal chars.(Serialize
method would throw exception)
using (FileStream tmpFileStream = new FileStream(tmpFile, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
var writer = XmlWriter.Create(tmpFileStream, new XmlWriterSettings() { CheckCharacters = true});
serializer.Serialize(writer, items);
}
You can create your own XmlTextWriter to filter out unwanted chars while serializing
using (FileStream tmpFileStream = new FileStream(tmpFile, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
var writer = new MyXmlWriter(tmpFileStream);
serializer.Serialize(writer, items);
}
public class MyXmlWriter : XmlTextWriter
{
public MyXmlWriter(Stream s) : base(s, Encoding.UTF8)
{
}
public override void WriteString(string text)
{
string newText = String.Join("", text.Where(c => !char.IsControl(c)));
base.WriteString(newText);
}
}
By creating your own XmlTextReader you can filter out unwanted chars while deserializing
using (FileStream plainTextFile = new FileStream(tmpFile, FileMode.Open, FileAccess.Read))
{
var reader = new MyXmlReader(plainTextFile);
result = (SomeObject)serializer.Deserialize(reader);
}
public class MyXmlReader : XmlTextReader
{
public MyXmlReader(Stream s) : base(s)
{
}
public override string ReadString()
{
string text = base.ReadString();
string newText = String.Join("", text.Where(c => !char.IsControl(c)));
return newText;
}
}
You can set XmlReaderSettings
's CheckCharacters
property to false. Deserialization will work now smoothly. (you'll get \v
back.)
using (FileStream plainTextFile = new FileStream(tmpFile, FileMode.Open, FileAccess.Read))
{
var reader = XmlReader.Create(plainTextFile, new XmlReaderSettings() { CheckCharacters = false });
result = (SomeObject)serializer.Deserialize(reader);
}