Database connection unexpectedly closed with Glassfish, jTDS and SQL Server 2008

fernacolo picture fernacolo · Nov 18, 2011 · Viewed 8.9k times · Source

I have a Java EE application running on Glassfish and connecting to MSSQL Server 2008 through jTDS. For some unknown reason, the database connection becomes unexpectedly closed during requests. The application is huge, but here is a summary of how the error happens:

During Glassfish setup, with create a connection pool with asadmin create-jdbc-connection-pool and asadmin create-jdbc-resource. The datasource class is net.sourceforge.jtds.jdbcx.JtdsDataSource.

When Glassfish goes up, it calls our implementation of ServletContextListener.contextInitialized(), where we fetch the datasource from JNDI. The datasource is stored on a static variable.

For a while, everything goes fine. All requests are handled and no connection is closed. Our application performs processing using Timer and MDB (Message Driven Bean) EJBs.

This is a sample onMessage() implementation:

public void onMessage(Message message) {
  this.message = message;
  this.connection = dataSource.getConnection(userName, password);
  try {
    doQuery1();
    doTransaction1();
    doTransaction2();
    doQuery2();
    doQuery3();
  } finally {
    this.connection.close();
    this.connection = null;
  }
}

Eventually, we start to get the following exception (happens about 100 times during one hour):

java.sql.SQLException: Invalid state, the Connection object is closed.
  at net.sourceforge.jtds.jdbc.ConnectionJDBC2.checkOpen(ConnectionJDBC2.java)
  at net.sourceforge.jtds.jdbc.ConnectionJDBC2.prepareStatement(ConnectionJDBC2.java)
  at net.sourceforge.jtds.jdbc.ConnectionJDBC2.prepareStatement(ConnectionJDBC2.java)
  at com.sun.gjc.spi.base.ConnectionHolder.prepareStatement(ConnectionHolder.java:475)
  at com.acme.myejbs.MyMDB.doQuery2(MyMDB.java:123)
  at com.acme.myejbs.MyMDB.onMessage(MyMDB.java:614)
  at sun.reflect.GeneratedMethodAccessor115.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java)
  at java.lang.reflect.Method.invoke(Method.java)
  at com.sun.enterprise.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:1011)
  ...
  at $Proxy92.onMessage(Unknown Source)
  at com.sun.messaging.jms.ra.OnMessageRunner.run(OnMessageRunner.java)
  at com.sun.enterprise.connectors.work.OneWork.doWork(OneWork.java:77)
  at com.sun.corba.ee.impl.orbutil.threadpool.ThreadPoolImpl$WorkerThread.run(ThreadPoolImpl.java:555)

The exception happens at random JDBC calls. Sometimes is during ResultSet iteration, other times during query execution.

In very rare cases (7 times during an hour) we get this exception:

java.sql.SQLException: Error in allocating a connection. Cause: This Managed Connection is not valid as the phyiscal connection is not usable
  at com.sun.gjc.spi.base.DataSource.getConnection(DataSource.java:136)
  at com.acme.myejbs.MyMDB.onMessage(MyMDB.java:614)
  at sun.reflect.GeneratedMethodAccessor115.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java)
  at java.lang.reflect.Method.invoke(Method.java)
  at com.sun.enterprise.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:1011)
  ...
  at $Proxy92.onMessage(Unknown Source)
  at com.sun.messaging.jms.ra.OnMessageRunner.run(OnMessageRunner.java)
  at com.sun.enterprise.connectors.work.OneWork.doWork(OneWork.java:77)
  at com.sun.corba.ee.impl.orbutil.threadpool.ThreadPoolImpl$WorkerThread.run(ThreadPoolImpl.java:555)

Also in very rare cases (5 times during an hour) we get this exception:

java.sql.SQLException: I/O Error: Connection reset by peer: socket write error
  at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java)
  at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java)
  at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeQuery(JtdsPreparedStatement.java)
  at com.acme.myejbs.MyMDB.doQuery2(MyMDB.java:126)
  at com.acme.myejbs.MyMDB.onMessage(MyMDB.java:614)
  ...
Caused by: java.net.SocketException: Connection reset by peer: socket write error
  at java.net.SocketOutputStream.socketWrite0(Native Method)
  at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java)
  at java.net.SocketOutputStream.write(SocketOutputStream.java)
  at java.io.DataOutputStream.write(DataOutputStream.java)
  at net.sourceforge.jtds.jdbc.SharedSocket.sendNetPacket(SharedSocket.java)
  at net.sourceforge.jtds.jdbc.RequestStream.putPacket(RequestStream.java)
  at net.sourceforge.jtds.jdbc.RequestStream.flush(RequestStream.java)
  ... 44 more

In rare cases we get this scary exception (NPE inside jTDS):

java.lang.NullPointerException
  at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeQuery(JtdsPreparedStatement.java)
  at com.acme.myejbs.MyMDB.doQuery2(MyMDB.java:126)
  at com.acme.myejbs.MyMDB.onMessage(MyMDB.java:614)
  ...

We can't find why this happens. The used connections never become idle for more than a second during a request. We don't know who is dropping the connection. It might be network instability, but then I guess jTDS should yield only network related exceptions, right?

Another option is some policy or configuration of Glassfish connection pool (maybe Glassfish is closing physical connections prematurely), but how can we track it?

Finally, MS SQL Server 2008 can be remotely dropping connections, but how can we monitor the server side to know if it's happening?

Answer

Chris Townsend picture Chris Townsend · Nov 23, 2011

I had an application that received these types of exceptions nearly exactly. All of my machines were brand new servers and all the network cards were set to auto-sense the network speed. They were all connected to an old switch that was 100MB / second HALF duplex.

Setting all of the machines on that switch to explicitly use the 100MB / second HALF duplex connection setting rather than auto-sense was what worked for me after doing countless hours of searching for a solution. You'll need to find out what your connectivity settings should be or experiment (it will be obvious if you pick the wrong one because you won't be able to connect to the box over remote desktop, so make sure you can get to the physical machine).

It is pretty low hanging fruit to test this. I set up a command window with a ping command from one of the worker machines pinging the database server and could see the packet loss periodically. Once I changed the NIC settings and got it right, the problem went away entirely. There are several articles about on the internet that discuss this problem. It is difficult to track down because it is: 1) periodic and 2) looks like something is wrong with the connection objects, etc.