Preface:
I've a web page with a form and a text field.
1) On submission, the text in the field is sent with ajax to a php script (with GET method).
2) The php script gets the text and passes it as a parameter to a shell tool.
3) The shell C tool parses argc into an array of unichars (actually an NSString in my current implementation)
(4.. 5.. 6.. then the tool does his job, returns a result to stdout that the php script serves back as response to the web page...)
I'm looking for the correct / canonical / "unicode" way to do each step so that: content is properly encoded and preserved, no security issues come out.
What I'm doing now:
1) (JavaScript) the text is retrieved from the form this way
theText = $('#theField').attr('value');
and sent to the server this way
httpReq.open('GET','myScript.php?theText=' + encodeURIComponent(theText),true);
2) (PHP) I get the text
$theText=(isset($_GET["theText"])?$_GET["theText"]:"");
I call the C tool
$cmd = "/usr/bin/thetool -theText ".escapeshellarg($theText);
echo shell_exec( $cmd );
3) (Objective-C) I'm on MacOS X, so I take advantage of NSString and NSUserDefaults classes (but a plain C solution would be good for me as well, assumed that I'll end up with an array of unichars)
int main(int argc, const char * argv[])
{
NSUserDefaults *userDefaults = [NSUserDefaults standardUserDefaults];
NSString *theText = [userDefaults stringForKey: @"theText"];
Question(s)
Is this the good way?
Is escapeshellarg alone safe when invoking shell_exec?
Am I going to lose some characters along the way if the user types something peculiar?
Waiting from a competent reply, I've started making some empiric tests...
First I changed
echo shell_exec( $cmd );
to
echo $cmd;
to see what the command line invocation was turning out to be given various text entered in the form. It seem that escapeshellarg on the PHP side do a good job.
The text passed to the Tool seems to be always properly sealed between single quotes, with "dangerous" character well escaped. I found no way to tamper with the tool invocation.
Then I tested for the text passed to see if something was getting lost somewhere.
I set up the C tool this way and looked for the output
int main(int argc, const char * argv[])
{
NSUserDefaults *userDefaults = [NSUserDefaults standardUserDefaults];
NSString *theText = [userDefaults stringForKey: @"theText"];
int i;
unichar c;
for(i=0;i<[theText length];i++)
{
c = [searchString characterAtIndex:(NSUInteger) i];
printf("%d\n",c);
}
return 0;
}
Made various tries. It seems all ok. As the last test I entered a "MUSICAL SYMBOL G CLEF" in the form
http://www.fileformat.info/info/unicode/char/1d11e/index.htm
It turned out to correctly end into the tool as a couple* of unichars
55348 56606
(* being this a very special character whose code exceeds 65535 it need to be represented with a couple of surrogate unichars. This is the most edge case I found).
Anyway as I stated at the beginning these are just empiric tests. I don't like to assume that sensible code is good just beacuse passes a dozen of test. I'd very happy to receive comments or suggestions (or warnings!).
I tested on Mac OS X - Firefox on the client side - Mac OS X - Mamp on the server side.