What does the Amazon ELB automatic health check do and what does it expect?

nazgul picture nazgul · Apr 13, 2012 · Viewed 26.6k times · Source

Here is the thing:

  1. We've implemented a C++ RESTful API Server, with built-in HTTP parser and no standard HTTP server like apache or anything of the kind
  2. It has been in use for several months in Amazon structure, using both plain and SSL communications, and no problems have been identified, related to Amazon infra-structure
  3. We are deploying our first backend using Amazon ELB
  4. Amazon ELB has a customizable health check system but also as an automatic one, as stated here
  5. We've found no documentation of what data is sent by the health check system
  6. The backend simple hangs on the socket read instruction and, eventually, the connection is closed

I'm not looking for a solution for the problem since the backend is not based on a standard web server, just if someone knows what kind of message is being sent by the ELB health check system, since we've found no documentation about this, anywhere.

Help is much appreciated. Thank you.

Answer

Steffen Opel picture Steffen Opel · Apr 13, 2012

Amazon ELB has a customizable health check system but also as an automatic one, as stated here

With customizable you are presumably referring to the health check configurable via the AWS Management Console (see Configure Health Check Settings) or via the API (see ConfigureHealthCheck).

The requirements to pass health checks configured this way are outlined in field Target of the HealthCheck data type documentation:

Specifies the instance being checked. The protocol is either TCP, HTTP, HTTPS, or SSL. The range of valid ports is one (1) through 65535.

Note

  • TCP is the default, specified as a TCP: port pair, for example "TCP:5000". In this case a healthcheck simply attempts to open a TCP connection to the instance on the specified port. Failure to connect within the configured timeout is considered unhealthy.

  • SSL is also specified as SSL: port pair, for example, SSL:5000.

  • For HTTP or HTTPS protocol, the situation is different. You have to include a ping path in the string. HTTP is specified as a HTTP:port;/;PathToPing; grouping, for example "HTTP:80/weather/us/wa/seattle". In this case, a HTTP GET request is issued to the instance on the given port and path. Any answer other than "200 OK" within the timeout period is considered unhealthy.

  • The total length of the HTTP ping target needs to be 1024 16-bit Unicode characters or less.

[emphasis mine]

With automatic you are presumably referring to the health check described in paragraph Cause within Why is the health check URL different from the URL displayed in API and Console?:

In addition to the health check you configure for your load balancer, a second health check is performed by the service to protect against potential side-effects caused by instances being terminated without being deregistered. To perform this check, the load balancer opens a TCP connection on the same port that the health check is configured to use, and then closes the connection after the health check is completed. [emphasis mine]

The paragraph Solution clarifies the payload being zero here, i.e. it is similar to the non HTTP/HTTPS method described for the configurable health check above:

This extra health check does not affect the performance of your application because it is not sending any data to your back-end instances. You cannot disable or turn off this health check.

Summary / Solution

Assuming your RESTful API Server, with built-in HTTP parser is supposed to serve HTTP only indeed, you will need to handle two health checks:

  1. The first one you configured yourself as a HTTP:port;/;PathToPing - you'll receive a HTTP GET request and must answer with 200 OK within the specified timeout period to be considered healthy.
  2. The second one configured automatically by the service - it will open a TCP connection on the HTTP port configured above, won't send any data, and then closes the connection after the health check is completed.

In conclusion it seems that your server might be behaving perfectly fine already and you are just irritated by the 2nd health check's behavior - does ELB actually consider your server to be unhealthy?