We have an application that accepts URLs from users. This data needs validation, and we're using ESAPI for this purpose. However, we're struggling with URLs containing ampersands.
The problem appears when ESAPI canonicalizes the data before validation. &pid=123 in the URL turns into πd=123 for example. Since π is not whitelisted, the validation fails.
I've tried encoding it, but ESAPI is smarter than that and does canonicalization to avoid double encoding and mixed encoding. I'm a bit stumped here and I'm not sure how to proceed.
I faced the same issue. In my case, for the string \fgdf\gghfh\fgh\dff the canonicalize method formed this into:
Case 1: canonicalize(string) --> INTRUSION - Multiple (2x) encoding detected in \fgdf\gghfh\fgh\dff
Case 2: canonicalize(string, false) --> input=fgdfgghfhfghdff And in this case, it failed with string validation since this ? character is not part of white list of characters.
I finally managed to get it working. Below is the code:
value = ESAPI.encoder().encodeForURL(value);
value = value.replaceAll("", "");
isSafe = validator.isValidInput("APPNAME", value, "URLSTRING", 255, true, false);
The last parameter of false turns off internal canonicalization that is on by default.
I hope this helps.