illegal xml character on SQL Insert

James A Mohler picture James A Mohler · Apr 14, 2013 · Viewed 26.1k times · Source

I am trying to insert the following string into an sql xml field

<?xml version="1.0" encoding="UTF-8"?>
<Response> 
    <Ip>x.x.x.x</Ip>
    <CountryCode>CA</CountryCode> 
    <CountryName>Canada</CountryName>
    <RegionCode>QC</RegionCode> 
    <RegionName>Québec</RegionName> 
    <City>Dorval</City> 
    <ZipCode>h9p1j3</ZipCode> 
    <Latitude>45.45000076293945</Latitude> 
    <Longitude>-73.75</Longitude> 
    <MetroCode></MetroCode> 
    <AreaCode></AreaCode> 
</Response>

The insert code looks like:

INSERT 
    INTO Traffic(... , xmlGeoLocation, ...)
    VALUES (
        ...
        <!--- 
        <cfqueryparam CFSQLType="cf_sql_varchar" value="#xmlGeoLocation#">, 
        --->
        '#xmlGeoLocation#',

        ...
        )

Two bad things happen:

  1. Québec gets turned into Québec

  2. I get an error saying [Macromedia][SQLServer JDBC Driver][SQLServer]XML parsing: line 8, character 16, illegal xml character

UPDATE:

The incoming test stream is mostly single byte characters.

The é is a two byte character. In particular C3A9

Also I don't have control over the incoming xml stream

Answer

J. Holladay picture J. Holladay · Feb 28, 2015

I'm going to strip the header...

I'm having the same issue with a funny little apostrophe thing. I think the issue is that by the time the string is getting converted to XML, it's not UTF-8 anymore, but sql server is trying to use the header to decode it. If it's VARCHAR, it's in the client's encoding. If it's NVARCHAR, it's UTF-16. Here are some variations I tested:

SQL (varchar, UTF-8):

SELECT CONVERT(XML,'<?xml version="1.0" encoding="UTF-8"?><t>We’re sorry</t>')

Error:

XML parsing: line 1, character 44, illegal xml character

SQL (nvarchar, UTF-8):

SELECT CONVERT(XML,N'<?xml version="1.0" encoding="UTF-8"?><t>We’re sorry</t>')

Error: XML parsing: line 1, character 38, unable to switch the encoding

SQL (varchar, UTF-16)

SELECT CONVERT(XML,'<?xml version="1.0" encoding="UTF-16"?><t>We’re sorry</t>')

Error:

XML parsing: line 1, character 39, unable to switch the encoding

SQL (nvarchar, UTF-16)

SELECT CONVERT(XML,N'<?xml version="1.0" encoding="UTF-16"?><t>We’re sorry</t>')

Worked!