Elastic Beanstalk disable health state change based on 4xx responses

Martin Hansen picture Martin Hansen · Apr 4, 2016 · Viewed 13.4k times · Source

I have a rest api running on Elastic Beanstalk, which works great. Everything application-wise is running good, and working as expected.

The application is a rest api, used to lookup different users.

example url: http://service.com/user?uid=xxxx&anotherid=xxxx

If a user with either id's is found, the api responds with 200 OK, if not, responds with 404 Not Found as per. HTTP/1.1 status code defenitions.

It is not uncommon for our api to answer 404 Not Found on a lot of requests, and the elastic beanstalk transfers our environment from OK into Warning or even into Degraded because of this. And it looks like nginx has refused connection to the application because of this degraded state. (looks like it has a threshold of 30%+ into warningand 50%+ into degraded states. This is a problem, because the application is actually working as expected, but Elastic Beanstalks default settings thinks it is a problem, when it's really not.

Does anyone know of a way to edit the threshold of the 4xx warnings and state transitions in EB, or completely disable them?

Or should i really do a symptom-treatment and stop using 404 Not Found on a call like this? (i really do not like this option)

Answer

Elad Nava picture Elad Nava · Jul 6, 2016

Update: AWS EB finally includes a built-in setting for this: https://stackoverflow.com/a/51556599/1123355

Old Solution: Upon diving into the EB instance and spending several hours looking for where EB's health check daemon actually reports the status codes back to EB for evaluation, I finally found it, and came up with a patch that can serve as a perfectly fine workaround for preventing 4xx response codes from turning the environment into a Degraded environment health state, as well as pointlessly notifying you with this e-mail:

Environment health has transitioned from Ok to Degraded. 59.2 % of the requests are erroring with HTTP 4xx.

The status code reporting logic is located within healthd-appstat, a Ruby script developed by the EB team that constantly monitors /var/log/nginx/access.log and reports the status codes to EB, specifically in the following path:

/opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.2.0/gems/healthd-appstat-1.0.1/lib/healthd-appstat/plugin.rb

The following .ebextensions file will patch this Ruby script to avoid reporting 4xx response codes back to EB. This means that EB will never degrade the environment health due to 4xx errors because it just won't know that they're occurring. This also means that the "Health" page in your EB environment will always display 0 for the 4xx response code count.

container_commands:
    01-patch-healthd:
        command: "sudo /bin/sed -i 's/\\# normalize units to seconds with millisecond resolution/if status \\&\\& status.index(\"4\") == 0 then next end/g' /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.2.0/gems/healthd-appstat-1.0.1/lib/healthd-appstat/plugin.rb"
    02-restart-healthd:
        command: "sudo /usr/bin/kill $(/bin/ps aux | /bin/grep -e '/bin/bash -c healthd' | /usr/bin/awk '{ print $2 }')"
        ignoreErrors: true

Yes, it's a bit ugly, but it gets the job done, at least until the EB team provide a way to ignore 4xx errors via some configuration parameter. Include it with your application when you deploy, in the following path relative to the root directory of your project:

.ebextensions/ignore_4xx.config

Good luck, and let me know if this helped!