My airflow server periodically fails. When I check the gunicorn logs, the error before all works shutting down looks like this:
OperationalError: (psycopg2.OperationalError) could not translate host name "my-airflow-db.l9zijaslosu.us-east-1.rds.amazonaws.com" to address: Name or service not known
(Background on this error at: http://sqlalche.me/e/e3q8)
I immediately verify that the host name is correct and the database is accepting requests from other tools.
If I restart the Ariflow webserver, the the server operates correctly for 4-5 days, and then the same error occurs.
This issue has been asked before but is typically resolve by telling other developers to not use localhost or postrgres host names. My host name is a fully qualified host name on AWS's domain. It seems exceedingly unlikely that this is a DNS error on Amazon's part.