504 Gateway Timeout - Two EC2 instances with load balancer

user3035649 picture user3035649 · Oct 24, 2014 · Viewed 36.7k times · Source

This might be the impossible issue. I've tried everything. I feel like there's a guy at a switchboard somewhere, twirling his mustache.

The problem:

I have Amazon EC2 running an application. It functions without issue when there is only one instance and no load balancer.

But in my production environment I have two identical instances running behind one load-balancer and when performing certain tasks, like a feature that generates a PDF and attaches it to an email, nothing happens at all, and when using Google Developer tools with the Network tab I get the error "504 Gateway Timeout" once the timeout hits (I have it set at 30 seconds).

My Database is external, on Amazon RDS.

I think.... If I could force a client to stay connected to their initial server they logged in at, this problem would be solved, because it's my understanding that the 504 Gateway Timeout is happening when instance-1 tries to reach out to instance-2 to perform the task.

This happens ONLY WHEN using Load Balancing, but never when connecting straight to one of my two servers.

Load Balancer Settings:

  • The load balancer has a CRECORD on my Registrar so that app.myapplication.com points to myloadbalancerDNSname.elb.amazonaws.com
  • The load balancer has 2 healthy instances, each in the same region but they are in different availability zones.
  • The load balancer is using the same Security Groups as the Instances (allow ALL IPs on ports 22, 80, and 443)
  • The load balancer has cross-zone load balancing turned on.
  • CORS (in Amazon S3) is enabled to GET, POST, PUT, DELETE from * to * (I have no idea how this is associated with my instances but anyway I did it as the instructions said)
  • The load balancer has listeners configured as such:
    • Load Balancer Protocol:HTTP Load Balancer Port:80 Instance Protocol:HTTP Instance Port:80
    • Load Balancer Protocol:HTTPS Load Balancer Port:443 Instance Protocol:HTTP Instance Port:80 (cipher chosen correctly per my Cert provider, and SSL fields 100% surely correct)

Some more ideas:

That being said, I'm not testing with HTTPS, but normal HTTP instead. I'm not convinced SSL is setup properly even though my certificate provider said it is. The reason I'm suspicious is that when I try to key in https://app.myapplication.com I get the error "(failed) net::ERR_CONNECTION_CLOSED" in Google Developer Tools, in the Network tab. But this should be non-applicable because I'm having the problem even using regular HTTP. I can troubleshoot SSL later.

So to reiterate, my problem is having the "504 Gateway Timeout" problem when using some functions, but also occasionally at random instead of loading the page (but rarely). This 504 problem happens ONLY WHEN using Load Balancing, but never when connecting straight to one of my two instances.

I don't know which question to ask, because I've Followed every document to the T, double and triple checked all suggestions all over the web and NOTHING.

Answer

Maximus picture Maximus · Dec 31, 2014

What web server are you using? I had a very similar issue with nginx and AWS load balancing. I added keepalive_timeout 75s; to the http block in my nginx config file and haven't see the issue since.

Make sure you restart nginx after you add and save that line (on ubuntu sudo service nginx restart. On redhat stop nginx /path/to/nginx/executable -s stop then /path/to/nginx/executable to start up nginx)

This fix was recommended by AWS on their help page AWS Load balancer troubleshooting