I am making a website with articles, and I need the articles to have "friendly" URLs, based on the title.
For example, if the title of my article is "Article Test"
, I would like the URL to be http://www.example.com/articles/article_test
.
However, article titles (as any string) can contain multiple special characters that would not be possible to put literally in my URL. For instance, I know that ?
or #
need to be replaced, but I don't know all the others.
What characters are permissible in URLs? What is safe to keep?
To quote section 2.3 of RFC 3986:
"Characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde."
ALPHA DIGIT "-" / "." / "_" / "~"
Note that RFC 3986 lists fewer reserved punctuation marks than the older RFC 2396.