Languages supported by "latin" vs "latin-extended" glyphs in fonts on Google Web Fonts?

its_me picture its_me · Jan 13, 2013 · Viewed 52.1k times · Source

Google Web Fonts Select Character Sets

Some fonts on Google Web Fonts support multiple "character sets". The thing is, if the web font I use only serves the "latin" glyphs, users who translate the page to a language whose glyphs aren't supported will clearly notice the messed up text.

I'd like my web fonts to support the most popular languages in the world aside from English, for example, Spanish, German, French, etc.

For this purpose, I'd like to know, which languages exactly, the "latin" and "latin-extended" cater to, individually.

I expect the answer to look like:

Latin Character Set & Supported Languages:

- ..........
- ..........
- ..........

Latin-Extended Character Set & Supported Languages:

- ..........
- ..........
- ..........

I couldn't find this info in Google Web Fonts documentation, or by Googling.

Answer

Jan Turoň picture Jan Turoň · Feb 15, 2013

Latin

aka Unicode Latin1-Supplement (U+0080 to U+00FF) is meant to support primarily Western European languages (as you mentioned French, German, Spanish, also Portuguese, Italian, Irish, Icelandic, languages of Scandinavian countries and unintentionally also other languages mentioned in the list below). English is supported by standard ASCII. ASCII (first 127 chars, 95 of them are graphemes U+0020 to U+007E) was placed as the very first block in Unicode named Basic Latin. This block is considered as a part of "Latin" and is usually supported even in non-latin fonts to correctly display the font name on latin-based systems.

Latin Extended

Latin Extended on Google fonts means practically block Latin-Extended-A (U+0100 to U+017F) which should (combined with "Latin") support all common latin-written texts. Most languages using this block also use characters from "Latin", so "Latin-Extended" fonts usually contain superset of "Latin" characters, but it is not guaranteed.

In Unicode, there is also Latin-Exteded-B block which is needed in national alphabets for characters Ə, Ș, Ț (but these are often replaced with Ä, Ş, Ţ from Extended-A) and Vietnamese Ơ, Ư (but this has its own category on Google fonts).

African Latin languages are supported by Unicode Latin-Extended-B and Latin-Extended-Additional blocks, but these are mostly not supported by Google's Latin Extended category. There are even more exotic C, D and E extensions (252 characters total), but I haven't seen them in real life, so I guess Google also doesn't count them in their Latin Extended category.

From my observation Google places font into Latin Extended category if it contains some, but not necessarily all characters from Latin-Extended-A block. Webfonts need to be small not to slow page loading (woff/woff2 format is preferred). The more characters the font contains, the bigger its size (fonts covering whole BMP can grow above 10 MB). The author often describes the purpose of his/her font, so only he/she can describe the the logic behind the character support. For example, Lato Google font supports only Polish characters from Latin Extended A block (the author is a Pole), yet it is in Google's "Latin Extended" category. To find out whether the font supports specific language, try to display characters from the list below.

Languages support

From the list of latin-written alphabets below inspected on Omniglot and other sources, I do not count:

  • digraphs from Latin Extended which are commonly replaced by separate chars (Æ is supported by Latin1-Supplement, ß used to be digraph)
  • non-latin alphabets since the question is about Latin vs. Latin-Extended. Some languages use two writing systems: I do not include these where Latin is rare (like Abkhaz) until official step is made (like Kazakh)
  • minority and dead languages (Adyghe, Archi, old Baltic languages, Bislama, Chamorro, Chuvash, Cypriot, Dalecarlian, Extremaduran, Fala, Elfdalian, Faroese, Gilbertese, Glosa, Haida and Eskimo-Aleut languages, Ikizu, Iñupiaq, Latgalian, Istriot, Livonian, Ladin, Kashubian, Marshallese, Mirandese, Montenegrin, Old Norse, Nuxalk, Occitan, Romansh, Rotokas, Sami languages, Samoan, Upper and Lower Sorbian, Tahitian, Tawlu, Tetum, Tongan, Ulithian, Yapese, Zuni, native Indian latin alphabets)
  • historic characters unused in the latest versions of alphabets (like double grave accents, ſ, ĸ)
  • transliteration characters almost exclusive to linguists, namely Pinyin, IPA, UPA

Please comment if something important is missing or if some minority language is used in electronic communication.

ASCII (Basic Latin, often supported even in non-latin fonts)

Clasical Latin, Afrikaans, Asturian, Corsu, Dutch, Greenlandic, Gaelic, Haitian (Creolic), Malay, Shona, Sicilian, Swahili.

English is also supported, with addition of handy '¢' (American) and '£' (British) from Latin1 Supplement, although other currency symbols (like '€') were added much later: since Unicode 2.0 in 1998, in block starting 0x20A0).

Latin

  • Albanian Ç, Ë (Ç is not in Arbëresh dialect)
  • Catalan À, É, È, Í, Ï, Ŀ, Ó, Ò, Ú, Ü, Ç (Ŀ from Ext-A can be written as L with interpunct · character)
  • Danish, Norwegian Æ, Å, Ø
  • Finnish Å, Ä, Ö, Š, Ž (Š, Ž from Ext-A rarely used, can use S, Z)
  • Filipino Á, À, Â, É, È, Ê, Ë, Í, Ì, Î, Ñ, Ó, Ò, Ô, Ú, Ù, Û
  • French Æ, Œ, Â, À, É, È, Ê, Ë, Ç, Î, Ï, Ô, Ù, Û, Ü, Ÿ, », « (Œ from Ext-A less common and used on signposts, but people usually use oe in messages instead, rare Ÿ from Ext-A only in French names, the rest including ÿ in Latin1-supplement, story behind this [fr], note on Wikipedia [en])
  • German Ä, Ö, Ü, ß
  • Icelandic Æ, Á, É, Í, Ó, Ö, Ú, Ý, Þ, Ð
  • Irish Á, É, Í, Ó, Ú
  • Italian Ì, Ù, ª, º (last two sometimes underscored, in English also popular in Numero - Nº)
  • Khasi Ñ, Ï
  • Piedmontese Ë, Ò
  • Portuguese Á, Â, Ã, À, Ç, É, Ê, Ó, Ô, Õ, Ú, ª, º
  • Sardinian Ç
  • Spanish and Galician Ñ, ¿, ¡, ª, º
  • Swedish Å, Ä, Ö

Latin Extended

  • Azeri Ç, Ğ, I (dotless lowercase), İ, Ö, Ş, Ü, Ə (Ə from Ext-B is replacable by Ä, then same alphabet as Turkish)
  • Crimean Tatar Ç, Ǧ, I (dotless lowercase), İ, Ñ, Ö, Ş, Ü (Ǧ from Ext-B can be substituted with Ğ from Ext-A)
  • Serbian, Bosnian and Croatian Ć, Č, Đ, Š, Ž
  • Czech Á, Č, Ď, Ě, É, Í, Ň, Ó, Ř, Š, Ť, Ú, Ů, Ý, Ž
  • Estonian Ä, Ö, Õ, Ü, Š, Ž
  • Esperanto Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ
  • Friulian Â, Ê, Î, Ô, Û
  • Gagauz (Moldavia) Ä, Ç, Ê, I (dotless lowercase), İ, Ö, Ş, Ţ, Ü
  • Guaraní (Paraguay) Á, Í, Ó, Ã, Ẽ, G̃, Ĩ, Ñ, Õ, Ũ, Ỹ (Ĩ, Ũ from Ext-A, Ẽ, Ỹ from Ext-Additional, G̃ not in Unicode, only with combining diacritical mark) characters out of Ext-A scope are often transcribed with circumflex (Ê, Ĝ, Î, Û, Ŷ)
  • Hawaiian Ā, Ē, Ī
  • Hungarian Á, É, Í, Ó, Ö, Ő, Ú, Ü, Ű
  • Kazakh (2017-2025 planned to move from cyrilic) Ä, Ç, Ğ, I (dotless lowercase), İ, Ŋ, Ö, Ş, Ü (revised multiple times, 2019 version)
  • Kurdish Ç, Ê, Î, Ş, Û
  • Latvian Ā, Č, Ē, Ģ, Ķ, Ī, Ļ, Ņ, Ō, Ū, Ŗ, Š, Ž
  • Lithuaian Ą, Č, Ę, Ė, Į, Š, Ų, Ū, Ž
  • Maltese Ċ, Ġ, Ħ
  • Maori Ā, Ē, Ī, Ō, Ū (minority, but more known and popular since 2015)
  • Polish Ą, Ć, Ę, Ł, Ń, Ó, Ś, Ź, Ż
  • Romani Č, Š, Ž (spoken, but rarely written language)
  • Romanian Ă, Â, Î, Ș, Ț (Ș, Ț from Latin Ext-B, can use Ş, Ţ from Ext-A)
  • Sami (Northern, minority language, but has an exclusive Ŧ in Ext-A) Á, Č, Đ, Ŋ, Š, Ŧ, Ž
  • Slovak Ä, Á, Č, Ď, É, Í, Ĺ, Ľ, Ň, Ó, Ô, Ú, Š, Ŕ, Ť, Ý, Ž
  • Slovene Č, Š, Ž
  • Tatar (since 2012) Ä, Ç, Ğ, İ, I (dotless lowercase), Ñ, Ö, Ş, Ü
  • Turkish Ç, Ğ, I (dotless lowercase), İ, Ö, Ş, Ü
  • Vietnamese Ă, Â, Đ, Ê, Ô, Ơ, Ư (Ơ, Ư in Ext-B plus combining tones 0x300 and 0x301, see combining diacritical marks below, has a special category on google fonts)
  • Welsh Â, Ê, Î, Ô, Û, Ŵ, Ŷ

Latin Extended, African (mostly not supported in Latin-Extended fonts). Full support of Africa alphabet has Ubuntu, Fira Sans, EB Garamond, Tinos, News Cycle, Didact Gothic, M Plus, Sawarabi, Cousine, Caudex, Judson, Andika (and of course Noto, see below)

  • Bari (Congo) Ŋ, Ö
  • Bambara (Mali) Ɛ, Ɲ, Ɔ (All from Ext-B)
  • Berber (Tuareg) (Sahara) Ă, Ḍ, Ɣ, Ǝ, Š, Ž, Ḥ, Ḷ, Ṣ, Ṭ, Ẓ (Ɣ, Ǝ from Ext-B, chars with dot below from Ext-Additional)
  • Chichewa (Chewa) (Eastern Africa) Ŵ
  • Dagbani (Congo) Ɛ, Ɣ, Ɔ, Ŋ, Ʒ (Ɛ, Ɣ, Ɔ from Ext-B)
  • Dinka (Sudan) Ä, Ë, Ɛ, Ɛ̈, Ɣ, Ï, Ŋ, Ö, Ɔ, Ɔ̈ (Ɛ, Ɣ, Ɔ from Ext-B, Ɛ̈, Ɔ̈ not in Unicode, only with combining diacritical mark)
  • Fula (Western Africa) Ɓ, Ɗ, Ƴ, Ŋ (Ŋ from Ext-A, rest from Ext-B)
  • Hausa (Chad) Ɓ, Ɗ, Ƴ, Ƙ, R̃ (R̃ not in Unicode, only with combining diacritical mark, rest from Ext-B)
  • Igbo (Nigeria) Ṅ, Ị (Ext-Additional)
  • Malagasy (Madagascar) N̈ (not in Unicode, only with combining diacritical mark, can substitute with Ñ from Latin)
  • Pan-Nigerian Ɓ, Ɗ, Ǝ, Ẹ, Ị, Ƙ, Ṣ, Ụ (Ɓ, Ɗ, Ǝ, Ƙ from Ext-B, Ẹ, Ị, Ṣ, Ụ from Ext-Additional)
  • Wolof (Senegal) À, É, Ë, Ñ, Ŋ, Ó
  • Yoruba (Western Africa) Ẹ, Ọ, Ṣ (Ext-Additional + combining tones Á, À, Ā)

Combining diacritical marks

Alternatively, the font may support the Combining Diacritical Marks block: U+0300 to U+036F. For example, Ř can be typed either as U+0158 (aka precomposed character) or as R + U+030C. Program supporting Unicode should both display and treat the same as a standalone character, but if the program or font doesn't support repertoire, the combining diacritical mark might end up a bit misplaced (like too low Ɛ̈ here on my system), see this very detailed Unicode Q&A on this topic.

Useful fonts with multilanguage support

You might want to customize some fonts (if their licence allows it) by Font Squirrel service or use them as backup. There are wide support free fonts to start with:

  • I really like nice looking serif Quivira open-type font with 11+k chars, 1.5 MB
  • many computers have Arial Unicode installed (part of MS Office, 50k+ chars, 22 MB)
  • there is a Noto project by Google which contain ALL but most recent unicode characters in serif, sans-serif and UI fonts nicely sorted by blocks support (1.1 GB)
  • as the last resort backup font, you may consider ugly looking Unifont (50+k chars, but only 11 MB and embedded devices friendly)