I am having an issue with my current production server which has just started over the last couple of days. I am running apache httpd-2.2.3 and tomcat-5.5.20, connected with mod_jk v1.3, and have a Spring MVC site hosted on the tomcat. What is happening is that after being up for around 12 hours the web site hangs for our users. When this first happened I could see several of the following errors in the catalina.out
WARN [org.apache.jk.core.MsgContext] Error sending end packet
java.net.SocketException: Broken pipe
After looking this up I came to understand that this meant that a user had cancelled a request before it had completed and so that return path was closed hence the data could not go back. From searching the web it looked like this could cause the thread to remain open in tomcat until it reached its timeout. This seemed to make sense since I got at the end of the catalina.out log when the tomcat fell over
All threads (200) are currently busy, waiting. Increase maxThreads (200) or check the servlet status
The suggestion was to make the following change to the JkModule settings in apache httpd.conf
JkOptions +DisableReuse
I did this after ensuring it caused no side effects to our site and it ran fine the next day but then yesterday the same symptoms appeared with the web site having frozen. This time however there were no errors at all in the catalina.out, we just stopped getting requests through to the tomcat. I can see from the application log that it received the last request at 17:31, and then in the mod_jk.log I can see the following
[Thu Sep 06 17:37:07 2012] [18784:53792] [error] ajp_connection_tcp_get_message::jk_ajp_common.c (947): (worker1) can't receive the response message from tomcat, network problems or tomcat is down (127.0.0.1:8009), err=-104
[Thu Sep 06 17:37:07 2012] [18784:53792] [error] ajp_get_reply::jk_ajp_common.c (1536): (worker1) Tomcat is down or refused connection. No response has been sent to the client (yet)
and then in my httpd error_log
[Thu Sep 06 17:38:39 2012] [error] server reached MaxClients setting, consider raising the MaxClients setting
So it was 6 minutes before I got any error and then after that it was 1 min 30 before the max clients error. Restarting the tomcat also fixed this particular problem.
There have been no changes to our apache, tomcat or connector config except the one I mentioned (current config below) but we have made changes to our site to perform more Ajax requests per user. So what I would like to understand is how am I best to analyse our system to understand what the correct settings changes I can make are to ensure that I don't overload our server but do stop this problem from happening.
Thanks Iain
Current Config
httpd.conf
Timeout 300
KeepAlive on
MaxKeepAliveRequests 100
KeepAliveTimeout 15
LoadModule jk_module modules/mod_jk.so
JkLogLevel error
JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "
JkOptions +ForwardKeySize +ForwardURICompat -ForwardDirectories +DisableReuse
workers.properties
# Define 1 real worker using ajp13
worker.list=worker1
# Set properties for worker1 (ajp13)
worker.worker1.type=ajp13
worker.worker1.host=localhost
worker.worker1.port=8009
worker.worker1.lbfactor=50
worker.worker1.cachesize=10
worker.worker1.cache_timeout=600
worker.worker1.socket_keepalive=1
worker.worker1.recycle_timeout=300
httpd-mpm.conf
StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 150
MaxRequestsPerChild 0
Tomcat settings are just the standard tomcat settings
Turns out the answer was to change the keepalive timeout. All I needed to stop this from happening was to change the KeepAliveTimeout from 15 to 2 and add MaxRequestsPerChild of 5000. I found this stopped this issue from recurring