I have a Java application running in WebLogic 11g on Windows, which after several days, becomes unresponsive. One suspicious symptom I've noticed is that a large number of connections (about 3000) show up in netstat
with a CLOSE_WAIT status even when the server is idle. Since the application server is managing the client connections, I'm not sure what's causing this. We also make a number of web service calls that loopback to the same server, but I believe those connections get closed properly. What else could cause this and how does one troubleshoot a problem like this?
CLOSE_WAIT
is the state the local TCP state machine is in when the remote host sends a FIN (closes it's connection) but the local application has not done the same and sent a reply FIN. It's still possible for the local machine to send data at this point though the client cannot receive it (unless it did only a half-close on the connection).
When the remote host closes (sends a FIN), your local application will get an event of some sort (it's a "read" event on the socket in the base C library) but reading from that connection will return an error to indicate that the connection has closed. At this point the local application should close the connection.
I know little about Java and nothing about WebLogic but I suppose it's possible that the application is not handling the read error properly and thus never closing the connection.