Our Website is in .NET but with some old ASP and 32bits libraries too in it. It had been working fine for a while (2 years). But for the past month, we have seen the following error on our IIS7 server, which we have been unable to track down and fix:
"Faulting application w3wp.exe, version 7.0.6001.18000, time stamp 0x47919413, faulting module kernel32.dll, version 6.0.6001.18215, time stamp 0x4995344f, exception code 0xe053534f, fault offset 0x0002f328, process id 0x%9, application start time 0x%10."
We are able to reproduce the error:
One of our .ASPX pages starts loading, executing code and queries (we have response.flush() all over the page to track where the code breaks), then it suddenly stops and we get the above error in IIS.
The page stops loading and, without the response.flush(), it's not redirecting to our error.aspx page (as configured in web.config)
The error does NOT happen all the time. Sometimes, it happens 3 times in a row, then it's working fine for 15 minutes non-stop with a proper redirection to error.aspx.
The error we get then is a classic: "Either BOF or EOF is True, or the current record has been deleted."
When the error occurs, the page hangs and all other session on the same computer from any browsers have hanging web pages as well (BTW, we only allow 1 worker process while we are testing). From other computers, the site loads fine.
I can recycle the Application Pool, kill w3wp.exe, restart IIS. Nothing will do. The only way to successfully load the page again is to Restart MS SQL which handles our Session States. I don't know why this is, but we guessed that the Session Cookies on the users browsers points to a thread which was not terminated properly (due to the above crash) and IIS is waiting for it to terminate to process more code (?). If someone can explain this better, that would be really helpful. Is there a timeout which we can set to "terminate" threads? Is it a MS SQL related issue?
I have also looked at the Private and Virtual Memory usages, because I think our code is not the most effective and I am certain we have remaining memory leaks. However, I saw the page crash even though both Private and Virtual Memories were still quite low (under 100MB each).
I have used Debug Diag and WinDbg as indicated here: http://blogs.msdn.com/b/tess/archive/2009/03/20/debugging-a-net-crash-with-rules-in-debug-diag.aspx, but we are not able to make windbg work, this is what we are trying to do at the moment.
If someone could help us or point us toward the right direction that would be really great, thank you.
"Either BOF or EOF is True, or the current record has been deleted" means the table is empty and you are attempting to do a MoveNext. So check for eof before you do any moves.
IIS is notorious for throwing kernel errors in w3wp.exe like this one. All your errors in session state are just symptoms of the crashed process. Multiple APP pools won't help much - they just spread the error around.
I''d wager it is SQL deadlocks due to your user environment changing. This will cause a 10-second lag as SQL tries to determine which query to kill off. One wins, one loses. The loser gets back a pointer to an unexpectedly empty table and you try a move and subsequent crash. You maybe could point your DB to an ODBC connection and turn on tracing, or figure out a way to get SQL to log it.
I had all the same symptoms as above in Perl. I was able to make a wrapper fn() to do all SQL queries and log all sql, + params and any errors to disk to track down the problem. It was deadlocks, then we were able to code in auto-retry, and eventually we recoded the query order and scanned columns to eliminate the deadlocks.