AWS ECS 503 Service Temporarily Unavailable while deploying

vargen_ picture vargen_ · Jul 5, 2017 · Viewed 15.1k times · Source

I am using Amazon Web Services EC2 Container Service with an Application Load Balancer for my app. When I deploy a new version, I get 503 Service Temporarily Unavailable for about 2 minutes. It is a bit more than the startup time of my application. This means that I cannot do a zero-downtime deployment now.

Is there a setting to not use the new tasks while they are starting up? Or what am I missing here?

UPDATE:

The health check numbers for the target group of the ALB are the following:

Healthy threshold:     5
Unhealthy threshold:   2
Timeout:               5 seconds
Interval:              30 seconds
Success codes:         200 OK

Healthy threshold is 'The number of consecutive health checks successes required before considering an unhealthy target healthy'
Unhealthy threshold is 'The number of consecutive health check failures required before considering a target unhealthy.'
Timeout is 'The amount of time, in seconds, during which no response means a failed health check.'
Interval is 'The approximate amount of time between health checks of an individual target'

UPDATE 2: So, my cluster consists of two EC2 instances, but can scale up if needed. The desired and minimum count is 2. I run one task per instance, because my app needs a specific port number. Before I deploy (jenkins runs an aws cli script) I set the number of instances to 4. Without this, AWS cannot deploy my new tasks (this is another issue to solve). Networking mode is bridge.

Answer

vargen_ picture vargen_ · Jul 20, 2017

So, the issue seems to lie in the port mappings of my container settings in the task definition. Before I was using 80 as host and 8080 as container port. I thought I need to use these, but the host port can be any value actually. If you set it to 0 then ECS will assign a port in the range of 32768-61000 and thus it is possible to add multiple tasks to one instance. In order for this to work, I also needed to change my security group letting traffic come from the ALB to the instances on these ports.
So, when ECS can run multiple tasks on the same instance, the 50/200 min/max healthy percent makes sense and it is possible to do a deploy of new task revision without the need of adding new instances. This also ensures the zero-downtime deployment.

Thank you for everybody who asked or commented!