I've run into a bit of a problem with a Regex I'm using for humans names.
$rexName = '/^[a-z' -]$/i';
Suppose a user with the name Jürgen wishes to register? Or Böb? That's pretty commonplace in Europe. Is there a special notation for this?
EDIT:, just threw the Jürgen name against a regex creator, and it splits the word up at the ü letter...
http://www.txt2re.com/index.php3?s=J%FCrgen+Blalock&submit=Show+Matches
EDIT2: Allright, since checking for such specific things is hard, why not use a regex that simply checks for illegal characters?
$rexSafety = "/^[^<,\"@/{}()*$%?=>:|;#]*$/i";
(now which ones of these can actually be used in any hacking attempt?)
For instance. This allows ' and - signs, yet you need a ; to make it work in SQL, and those will be stopped.Any other characters that are commonly used for HTML injection of SQL attacks that I'm missing?
I would really say : don't try to validate names : one day or another, your code will meet a name that it thinks is "wrong"... And how do you think one would react when an application tells him "your name is not valid" ?
Depending on what you really want to achieve, you might consider using some kind of blacklist / filters, to exclude the "not-names" you thought about : it will maybe let some "bad-names" pass, but, at least, it shouldn't prevent any existing name from accessing your application.
Here are a few examples of rules that come to mind :
"~{()}@^$%?;:/*§£ø
and probably some othersYes, this is not perfect ; and yes, it will let some non-names pass... But it's probably way better for your application than saying someone "your name is wrong" (yes, I insist ^^ )
And, to answer a comment you left under one other answer :
I could just forbid the most command characters for SQL injection and XSS attacks,
About SQL Injection, you must escape your data before sending those to the database ; and, if you always escape those data (you should !), you don't have to care about what users may input or not : as it is escaped, always, there is no risk for you.
Same about XSS : as you always escape your data when ouputting it (you should !), there is no risk of injection ;-)
EDIT : if you just use that regex like that, it will not work quite well :
The following code :
$rexSafety = "/^[^<,\"@/{}()*$%?=>:|;#]*$/i";
if (preg_match($rexSafety, 'martin')) {
var_dump('bad name');
} else {
var_dump('ok');
}
Will get you at least a warning :
Warning: preg_match() [function.preg-match]: Unknown modifier '{'
You must escape at least some of those special chars ; I'll let you dig into PCRE Patterns for more informations (there is really a lot to know about PCRE / regex ; and I won't be able to explain it all)
If you actually want to check that none of those characters is inside a given piece of data, you might end up with something like that :
$rexSafety = "/[\^<,\"@\/\{\}\(\)\*\$%\?=>:\|;#]+/i";
if (preg_match($rexSafety, 'martin')) {
var_dump('bad name');
} else {
var_dump('ok');
}
(This is a quick and dirty proposition, which has to be refined!)
This one says "OK" (well, I definitly hope my own name is ok!)
And the same example with some specials chars, like this :
$rexSafety = "/[\^<,\"@\/\{\}\(\)\*\$%\?=>:\|;#]+/i";
if (preg_match($rexSafety, 'ma{rtin')) {
var_dump('bad name');
} else {
var_dump('ok');
}
Will say "bad name"
But please note I have not fully tested this, and it probably needs more work ! Do not use this on your site unless you tested it very carefully !
Also note that a single quote can be helpful when trying to do an SQL Injection... But it is probably a character that is legal in some names... So, just excluding some characters might no be enough ;-)