Say the path of your URL is:
/thisisa"quote/helloworld/
Then how do you create the rel=canonical
URL?
Is this kosher?
<link rel="canonical" href="/thisisa&quot;/helloworld/" />
UPDATE
To clarify, I'm getting a form submission, I need to convert part of the query string into the URL. So the steps are:
So I need to know which processing has to be done each step of the way...On the first cut, this is my take:
htmlspecialchars($rawQuery)
htmlspecialchars($rawQery)
htmlspecialchars($rawQuery)
urlencode($rawquery)
since it's coming from the URL, wouldn't it already be URL-encoded?htmlspecialchars($rawQuery)
You have to split your question into two:
Yes, the quotation mark character (U+0022) is not allowed in plain and must be encoded with %22
.
It depends on how you declare the attribute value:
By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa. Authors may also use numeric character references to represent double quotes (
"
) and single quotes ('
). For double quotes authors can also use the character entity reference"
.
attr
=
"
value
"
), then you must encode the douvke quoteation mark character inside the attribute value declaration with a character reference ("
, "
or "
).attr
=
'
value
'
), then you don’t need to encode the quotation mark character. But it’s recommended to do so.And since you have slash and a double quotation mark in your attribute value, the third case (using no quotes at all) is not applicable:
In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (ASCII decimal 58). We recommend using quotation marks even when it is possible to eliminate them.
Since a double quotation mark must be encoded in a URL (but the single quotation mark is!), you can use the following to do so with the path segments or you URL path:
$path = '/thisisa"quote/helloworld/';
$path = implode('/', array_map('rawurlencode', explode('/', $path)));
And if you want to put that URL path in a HTML attribute, use the htmlspecialchars
function to encode remaining special HTML characters:
echo '<link rel="canonical" href="' . htmlspecialchars($path) . '" />';