What character encoding should I use for a web page containing mostly Arabic text? Is utf-8 okay?

Paul D. Waite picture Paul D. Waite · Jun 8, 2010 · Viewed 38.8k times · Source

What character encoding should I use for a web page containing mostly Arabic text?

Is utf-8 okay?

Answer

JoeG picture JoeG · Jun 8, 2010

UTF-8 can store the full Unicode range, so it's fine to use for Arabic.


However, if you were wondering what encoding would be most efficient:

All Arabic characters can be encoded using a single UTF-16 code unit (2 bytes), but they may take either 2 or 3 UTF-8 code units (1 byte each), so if you were just encoding Arabic, UTF-16 would be a more space efficient option.

However, you're not just encoding Arabic - you're encoding a significant number of characters that can be stored in a single byte in UTF-8, but take two bytes in UTF-16; all the html encoding characters <,&,>,= and all the html element names.

It's a trade off and, unless you're dealing with huge documents, it doesn't matter.