What standard do language codes of the form "zh-Hans" belong to?

Anto picture Anto · Sep 19, 2013 · Viewed 28.1k times · Source

Through the REST API of an application, I receive language codes of the following form: ll-Xxxx.

  • two lowercase letters languages (looks like ISO 639-1),
  • a dash,
  • a code going up to four letters, starting with an uppercase letter (looks like an ISO 639-3 macrolanguage code).

Some examples:

az-Arab Azerbaijani in the Arabic script
az-Cyrl Azerbaijani in the Cyrillic script
az-Latn Azerbaijani in the Latin script

sr-Cyrl Serbian in the Cyrillic script
sr-Latn Serbian in the Latin script

uz-Cyrl Uzbek in the Cyrillic script
uz-Latn Uzbek in the Latin script

zh-Hans Chinese in the simplified script
zh-Hant Chinese in the traditional script

From what I found online:

[ISO 639-1] is the first part of the ISO 639 series of international standards for language codes. Part 1 covers the registration of two-letter codes.

and

ISO 639-3 is an international standard for language codes. In defining some of its language codes, some are defined as macrolanguages [...]

Now I need to write a piece of code to verify that I receive a valid language code.
But since what I receive is a mix of 639-1 (2 letters language) and 639-3 (macrolanguage), what standard am I supposed to stick with ? Are these code belonging to some sort of mixed up (perhaps common) standard ?

Answer

Julien picture Julien · Sep 21, 2013

Following RFC-5646 (at page 4) a language tag can be written with the following form : [language]-[script].