I'm keen to understand exactly what the ELB Latency Statistic provided by CloudWatch means.
According to the docs:
What I'm not 100% clear on is whether or not the response gets buffered to the ELB before it gets transferred to the client?
Does the statement in the docs mean:
Or:
I want to understand whether or not a poor Maximum Latency CloudWatch metric could be explained by having a significant number of users on ropey 3G connections, or, if it instead indicates an underlying problem with the app servers occasionally responding slowing.
According to AWS support:
As the ELB (when configured with HTTP listeners) acts as a proxy (request headers comes in and gets validated, and then sent to the backend) the latency metric will start ticking as soon as the headers are sent to the backend until the backend sends the first byte responses.
In case of POSTs (or any HTTP methods when the customer is sending additional data) the latency will be ticking even when the customer is uploading the data (as the backend needs the complete request to send a response) and will stop once the backend send out the first byte response. So if you have a slow client sending data, the latency will take into account the upload time + the time the backend took to respond.