Is UTF-8 the encoding of choice for QR-codes with non ASCII chars by now?

Gonzo picture Gonzo · Mar 14, 2012 · Viewed 9.1k times · Source

Google uses UTF-8 it as default for their very popular encoder. From what I can see they don't even add the byte order mark.

The problem is that most scanners still seem to use JIS8 (QR 2000) instead of iso-8859 (QR 2005) as default, so it mostly does not work to use iso-8859 for encoding.

It seems like utf-8 is the only choice even if it is against the specification.

edit: I will go with utf-8 without ECI and without BOM. Against all spec and spirit but works best at the moment.

Answer

Sean Owen picture Sean Owen · Mar 14, 2012

The specification says that ISO-8859-1 is the default for byte-mode encoding. However in practice, yes, you'll see a lot of Shift-JIS in Japan, or UTF-8.

UTF-8 is the right choice. To do it properly, you need to put some indication in the stream that it's UTF-8. The spec does allow for this. You need to precede the byte segment with an ECI segment that indicates UTF-8.

The zxing encoder will do that for you if you send it a hint that the encoding is UTF-8.