Tomcat JDBC Connection Pool: testOnBorrow vs testWhileIdle

Antoine Mottier picture Antoine Mottier · Feb 2, 2017 · Viewed 20.5k times · Source

For various reasons connections in a pool can become invalid: server connection timeout, network issues...

My understanding is that a Tomcat JDBC Connection Pool does not provide any guaranty about the validity of the connections it provides to the application.

To prevent (actually only lower the risk) getting an invalid connection from the pool a solution seems to be the configuration of connections validation. Validating a connection means to run a very basic query on the database (e.g. SELECT 1; on MySQL).

Tomcat JDBC Connection Pool offers several options to test the connection. The two I find the more interesting are testOnBorrow and testWhileIdle.

First I was thinking that testOnBorrow is the best option because it basically validate the connection before providing it to the application (with a max frequency defined by validationInterval).

But after a second though I realized that testing the connection right before using it might impact the responsiveness of the application. So I though that using testWhileIdle can be more efficient as it test connections while they are not used.

No matter which option I choose it seems that they only lower the risk from getting an invalid connection but this risk still exist.

So I end up asking: should I use testOnBorrow or testWhileIdle or a mix of both?

On a side note, I'm surprised that validationInterval does not apply to testOnReturn and I don't really get the purpose of testOnConnect.

Answer

Raja Nadar picture Raja Nadar · Jun 24, 2017

There is no 100% right answer to this. It is a matter of trade-off and context.

  • Most of the times, testOnBorrow is the least risky since it ensures (as best it can) that before a connection is returned from the pool for your use, a basic sanity check has been made that the client and db-server are on talking terms.
  • It still doesn't prevent the race condition of the server connection going down, between the time 'the sanity check' was made & the time your application used the connection.
  • But considering this as a corner-case, the testOnBorrow gives pretty good assurance.

  • Now the trade-off with that is that, every time you request a connection, a query (no matter how light-weight) is made to the database-server. This maybe very fast, but the cost is still not zero.

And if you have a busy application, with very good database-connection-reliability, then you'll start seeing from the data, that the COST of "validity check on every connection-request from the pool" outweighs the benefits of detecting connection issues.

  • On the other hand, if your application is not uniformly busy (like most real-world applications) then it is extremely beneficial to have the testOnBorrow option.
  • It ensures to the max, that you have a good connection before you use it. Especially considering the cost (retry + manual intervention + loss of workflow etc.) of "not being able to recover easily" from a failed DB operation.

  • Now imagine if you have the testOnIdle option. This requires that your connections go idle (dependent on the idle timeout of the connection) before a sanity check can be made.

  • This is a performance improvement over testOnBorrow but it comes with its own disadvantages.
    • Real world app-to-db-connections are not just idle-timeout-based breakage, they can be dropped based on firewall rules, n/w congestion, db-server-undergoing maintenance/patching etc.
    • So it goes back to the data-measurement of how many connection errors were observed in the data, when you did not have any sort of "connection validation".
  • And one thing to watch out with this option is when you have your pool working the best with max connections and your app performing well, and for some reason if your db-server undergoes a restart or likewise. All the live connections (from a client perspective) will now mostly error out, till the idle timeout kicks-off. So your db-issue (which would have been a fire-fight) is now compounded a bit, till the app connections get well again or you restart the app as well.

And one last data point is that for some applications, the critical-path is not the "validation query" time (in lower millis hopefully). The applications have bigger issues to deal with. And of course, for some applications, that time is very significant.