How does a stack overflow occur and what are the best ways to make sure it doesn't happen, or ways to prevent one, particularly on web servers, but other examples would be interesting as well?
A stack, in this context, is the last in, first out buffer you place data while your program runs. Last in, first out (LIFO) means that the last thing you put in is always the first thing you get back out - if you push 2 items on the stack, 'A' and then 'B', then the first thing you pop off the stack will be 'B', and the next thing is 'A'.
When you call a function in your code, the next instruction after the function call is stored on the stack, and any storage space that might be overwritten by the function call. The function you call might use up more stack for its own local variables. When it's done, it frees up the local variable stack space it used, then returns to the previous function.
A stack overflow is when you've used up more memory for the stack than your program was supposed to use. In embedded systems you might only have 256 bytes for the stack, and if each function takes up 32 bytes then you can only have function calls 8 deep - function 1 calls function 2 who calls function 3 who calls function 4 .... who calls function 8 who calls function 9, but function 9 overwrites memory outside the stack. This might overwrite memory, code, etc.
Many programmers make this mistake by calling function A that then calls function B, that then calls function C, that then calls function A. It might work most of the time, but just once the wrong input will cause it to go in that circle forever until the computer recognizes that the stack is overblown.
Recursive functions are also a cause for this, but if you're writing recursively (ie, your function calls itself) then you need to be aware of this and use static/global variables to prevent infinite recursion.
Generally, the OS and the programming language you're using manage the stack, and it's out of your hands. You should look at your call graph (a tree structure that shows from your main what each function calls) to see how deep your function calls go, and to detect cycles and recursion that are not intended. Intentional cycles and recursion need to be artificially checked to error out if they call each other too many times.
Beyond good programming practices, static and dynamic testing, there's not much you can do on these high level systems.
In the embedded world, especially in high reliability code (automotive, aircraft, space) you do extensive code reviews and checking, but you also do the following:
But in high level languages run on operating systems:
It depends on the 'sandbox' you have whether you can control or even see the stack. Chances are good you can treat web servers as you would any other high level language and operating system - it's largely out of your hands, but check the language and server stack you're using. It is possible to blow the stack on your SQL server, for instance.
-Adam