According to RFC 3986 the following characters are reserved and need to be percent-encoded in order to be used in a URI other than as their reserved uses:
:/?#[]@!$&'()*+,;=
Furthermore it specifies some characters that are specifically unreserved: a-zA-Z0-9\-._~
It seems clear that generally one should encode reserved characters (to prevent misinterpretation) and not encode unreserved characters (for readability), but how should characters that do not fall into either category be handled? For example {
and }
do not appear in either list, but they are standard ASCII characters.
Looking to modern browsers for guidance, it seems they sometimes have different behaviors.
For example, consider pasting the URL https://www.google.com/search?q={
into the address bar of a web browser:
https://www.google.com/search?q=%7B
However, if one pastes https://www.google.com/#q={
(removing "search" and changing the ?
to a #
, making the character part of the fragment/hash rather than the query string) we find that:
https://www.google.com/#q=%7B
(via JavaScript)https://www.google.com/#q=%7B
(before executing JavaScript)Furthermore, when using JavaScript to perform the request asynchronously (i.e. using this MDN example modified to use a URL of ?q={
), the URL is not percent-encoded automatically. (I'm guessing this is because the XMLHttpRequest API assumes that the URL be encoded/escaped beforehand.)
I would like to (for a reason related to a bizarre customer requirement) use {
and }
in the filename portion of URLs without (1) breaking things and ideally also without (2) creating ugly-looking percent-encoded entries in the network panel of modern browsers' web inspectors/debuggers.
(RFC 2396)
You should be encoding any of the unwise section and the rfc gives the reason.
additional information from the RFC
Account for
<
>
#
%
primarily
any control characters 00-1F
and 7F
also marked as unwise in the rfc: "
{
}
|
\
^
[
]
`
if you are intending to allow for #
to be in the querystring values then that's a special case, because a #
is a fragment identifier of a uri.
Some characters which do not have to be encoded, are accepted either encoded or not such as ~
There are 2 generally accepted encodings for (space)
%20
and +
Here's a fiddle with some of the test cases I'm using.