Track.getSimilar: An invalid XML character (Unicode: 0x3) was found in the element…

FRIDI Mourad picture FRIDI Mourad · Apr 9, 2014 · Viewed 9.7k times · Source

I use the last.fm API:Api Last.fm

I have a list of songs (tracks) with their artists and I want to recover for every song like his song. the method Track.getSimilar(Artist, track, key) works perfectly. BUT when the artist or track is in Arabic, I get the following exception:

    [Fatal Error] :2583:13: An invalid XML character (Unicode: 0x3) was found in the element content of the document.
Exception in thread "main" de.umass.lastfm.CallException: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x3) was found in the element content of the document.
at de.umass.lastfm.Caller.call(Caller.java:268)
at de.umass.lastfm.Caller.call(Caller.java:189)
at de.umass.lastfm.Track.getSimilar(Track.java:369)

How can I solve this problem please.?

Thank you in advance

Answer

jasso picture jasso · Apr 10, 2014

Unicode code point 0x3 is a control character. It is not a normal character in any scripts or language systems so its presence is clearly an error, possibly in the database itself. It could be a result of a failed encoding conversion, characters to byte conversion or database write corruption.

XML cannot contain control characters - not even as entity references. Therefore your XML is not well formed and it cannot be processed with XML tools. Instead you need to remove that erroneous character with string processing or similar method.

At the same time you can check for all other characters that are illegal in XML. XML doesn't allow any character from Unicode surrogate blocks [0xD800 - 0xDFFF], non-characters 0xFFFE and 0xFFFF or characters below 0x20 (=control characters) execpt 0x9 [tab], 0xA [LF] and 0xD [CR]. This is formally stated here: http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char