I have two Linux servers (A
and B
) with a PostgreSQL 9.5
database installed. I configured hot standby mode as described in the documentation. In this setup, A
is configured as master, B
in hot standby mode. This is working well and behaves as expected.
Now, I want to connect an independent Java EE
application (running on a different machine) via Hibernate
/ JDBC
via a TomEE
datasource to this database setup.
The PostgreSQL driver documentation states, that multiple hosts can be specified in the jdbc connection url:
jdbc:postgresql://host1:port1,host2:port2/database
So my questions are:
A
is down and B
is switched manually to normal operation mode, is my application still able to proceed with database operation with a jdbc connection url as stated above?Note: From various sources I learned, that PostgreSQL
does not support automatic failover (unless third-party software is involved in the process - see comments below). For this reason, failover needs to performed manually, which is ok for this particular use-case.
EDIT-1:
I decided to test pgBouncer
(as suggested in the comments) for a workaround. It works well for my use-case. I wrote a watchdog script, which automates the manual steps:
A
is still alive and listens for incoming connections.B
to normal operation mode and let it become the new master and restart the service.pgBouncer
settings to point to B
instead of A
and restart the service.However, I would be still interested, if anybody has experiences without third party software?
In these kind of situations, it is probably best to test and measure.
I do not have 'hands-on' experience with the PostrgeSQL hot standby mode, but I have done database fail-over for a Java application.
First, test the claims on the PostgreSQL driver documentation page
about the ?targetServerType=master
parameter (mentioned at the bottom of page).
Write a small Java "PgHsm" class with a main-method that uses the PostgreSQL JDBC driver via DriverManager.getConnection
and runs a simple update query.
It should use server A to do the update query. Stop PostgreSQL on server A, run PgHsm: it should fail to connect since server B is not a master.
Make server B the master, run PgHsm: it should run OK.
The datasource is backed by a database connection pool in TomEE. This page lists the ones available in TomEE. But not all database connection pools are equal and I now prefer HikariCP because, in my experience, it handles the "database down" scenario more predictably. See also the test with results on HikariCP's handling database down page.
Unfortunately, HikariCP uses JDBC's get/setNetworkTimeout
to behave predictably
and the PostgreSQL JDBC driver does not implement this (*).
So to be sure that (JavaEE) application threads do not hang forever on a database action, you need to set the connectTimeout
and socketTimeout
JDBC driver options. Setting a socketTimeout
is precarious as it automatically sets a time-limit for ALL queries to the database.
(*) Update: since version 42.2.x
network timeouts are implemented.
The second test to perform involves updating the Java "PgHsm" class to use the database connection pool implementation of your choosing
and start (at least) two threads that continually run simple update queries in a loop (in the loop a database connection is acquired from the pool and returned to the pool after commit/rollback).
While you bring down server A and switch server B to "master" mode, monitor the exceptions logged by "PgHsm" and how long a thread waits/hangs on performing a database action.
The results from the tests can be used to update the JDBC driver options and pool settings. Focus on results where:
The second test relies on server A not being available so that connection test queries (performed by the database connection pool) fail.
In the case where both servers remain available, but master and slave switch, a connection test query will not help
and the database connection pool will provide the wrong (now read-only) database connections to the application.
In that case, manual intervention is required. A "fail-over pattern" for HikariCP is described here
(only available with option allowPoolSuspension
described on the configuration page):
The third test will be with the JavaEE application and by now, you should have a good idea what problems to expect. It is not uncommon for applications to get updated after these kind of tests to improve handling "database down" scenarios (e.g. setting (default) query-timeouts). In your case, a "suspend, flush and resume database connection pool" feature (the pattern described above) to use during the manual failover would also be desirable.