I'm filtering $_SERVER["REQUEST_URI"] such that:
$_request_uri = filter_input(INPUT_SERVER, 'REQUEST_URI', FILTER_SANITIZE_URL);
As explained in php.net:
FILTER_SANITIZE_URL
Remove all characters except letters, digits and $-_.+!*'(),{}|\^~[]`<>#%";/?:@&=.
However,
the browser sends this REQUEST_URI value urlencode'd and therefore it is not sanitized in this filter_input() function. Say the address is
and then the sanitized request url is
/abc/index.php?q=abc%EF%BF%BD%EF%BF%BD123
But it should be
/abc/index.php?q=abc123
It is possible urldecode($_SERVER["REQUEST_URI"]) and then using filter_var() we can get a sanitized value.
$_request_uri = filter_var(urldecode($_SERVER['REQUEST_URI']), FILTER_SANITIZE_URL);
I don't know why the last one seems to me "inelegant" and I'm looking for an elegant way, sanitizing $_SERVER["REQUEST_URI"].
Maybe, accessing a super global array directly ($_SERVER['REQUEST_URI']) while coding disturbs me, thus "inelegant".
Is there an elegant way?
I think you could use either mod_rewrite or apaches SetEnv directive to undecode the url server side. This would have the effect of changing the REQUEST_URI in apache and consequently the value of $_SERVER["REQUEST_URI"] in php.
I dont like this solution, and you likely dont want to do this. The issues I see:
A good solution which avoids the global is to call filter_input or filter_input_array on INPUT_GET (instead of INPUT_SERVER).
$urlParameters = http_build_query(
filter_input_array(
INPUT_GET,
FILTER_SANITIZE_URL
)
);
$_request_uri = filter_input(INPUT_SERVER, 'SCRIPT_URL', FILTER_SANITIZE_URL). ($urlParameters ? "?{$urlParameters}" : "");
print_r($_request_uri);
A better solution would be to whitelist specific parameters and use specific rules for validation, and to use these parameters directly (avoiding setting and parsing $_request_uri)
$_request_parameters = filter_input_array(
INPUT_GET,
array(
'q' => FILTER_SANITIZE_URL,
)
);
print_r($_request_parameters['q']);