What are the valid characters that can show up in a URL host?

Daniel Magliola picture Daniel Magliola · Jul 15, 2009 · Viewed 14k times · Source

I'm writing some code that processes URLs, and I want to make sure i'm not leaving some strange case out...

Are there any valid characters for a host other than: A-Z, 0-9, "-" and "."?

(This includes anything that can be in subdomains, etc. Esentially, anything between :// and the first /)

Thanks!

Answer

Andrew Hare picture Andrew Hare · Jul 15, 2009

Please see Restrictions on valid host names:

Hostnames are composed of series of labels concatenated with dots, as are all domain names1. For example, "en.wikipedia.org" is a hostname. Each label must be between 1 and 63 characters long, and the entire hostname has a maximum of 255 characters.

RFCs mandate that a hostname's labels may contain only the ASCII letters 'a' through 'z' (case-insensitive), the digits '0' through '9', and the hyphen. Hostname labels cannot begin or end with a hyphen. No other symbols, punctuation characters, or blank spaces are permitted.