What are all of the allowable characters for people's names?

Paul W Homer picture Paul W Homer · Jan 7, 2009 · Viewed 77k times · Source

There are the standard A-Z, a-z characters, but also there are hyphens, em dashes, quotes, etc.

Plus, there are all of the international characters, like umlauts, etc.

So, for an English-based system, what's the complete set? What about sets for other languages? What about UTF8, UTF16, etc?

Bonus question: How many name fields are needed, and what are their maximum lengths?

EDIT: There are definitely two different types of characters involved in people's names, those that are there as part of the context, and those that are there for structural reasons. I don't want to limit or interfere with the context characters, but I do need to deal with the structural ones.

For example, I had a name come in that was separated by an em dash, but it was hard to distinguish that from the minus character. To make the system easier for searching, I want to take all five different types of dashes, and map them onto one unique character (minus), that way the searcher doesn't need to know specifically which symbol was initially entered.

The problem exists for dashes, probably quotes as well, but also how many other symbols?

Answer

Joachim Sauer picture Joachim Sauer · Jan 7, 2009

There's good article by the W3C called Personal names around the world that explains the problems (and possible solutions) pretty well (it was originally a two-part blog post by Richard Ishida: part 1 and part 2)

Personally I'd say: support every printable Unicode-Character and to be safe provide just a single field "name" that contains the full, formatted name. This way you can store pretty much every form of name. You might need a more structured storage, but then don't expect to be able to store every single combination in a structured form, as there are simply too many different ones.