I have a Network Load Balancer and an associated Target Group that is configured to do health checks on the EC2 instances. The problem is that I am seeing a very high number of health check requests; multiple every second.
The default interval between checks is supposed to be 30 seconds, but they are coming about 100x more frequently than they should.
My stack is built in CloudFormation, and I've tried overriding HealthCheckIntervalSeconds
, which has no effect. Interestingly, when I tried to manually change the interval in the console, I found those values greyed out:
Here is the relevant part of the template, with my attempt at changing the interval commented out:
NLB:
Type: "AWS::ElasticLoadBalancingV2::LoadBalancer"
Properties:
Type: network
Name: api-load-balancer
Scheme: internal
Subnets:
- Fn::ImportValue: PrivateSubnetA
- Fn::ImportValue: PrivateSubnetB
- Fn::ImportValue: PrivateSubnetC
NLBListener:
Type : AWS::ElasticLoadBalancingV2::Listener
Properties:
DefaultActions:
- Type: forward
TargetGroupArn: !Ref NLBTargetGroup
LoadBalancerArn: !Ref NLB
Port: 80
Protocol: TCP
NLBTargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
# HealthCheckIntervalSeconds: 30
HealthCheckPath: /healthcheck
HealthCheckProtocol: HTTP
# HealthyThresholdCount: 2
# UnhealthyThresholdCount: 5
# Matcher:
# HttpCode: 200-399
Name: api-nlb-http-target-group
Port: 80
Protocol: TCP
VpcId: !ImportValue PublicVPC
My EC2 instances are in private subnets with no access from the outside world. The NLB is internal, so there's no way of accessing them without going through API Gateway. API Gateway has no /healthcheck
endpoint configured, so that rules out any activity coming from outside of the AWS network, like people manually pinging the endpoint.
This is a sample of my app's log taken from CloudWatch, while the app should be idle:
07:45:33 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:33 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:33 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:33 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:34 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:34 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:34 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:35 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:35 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
07:45:35 {"label":"Received request URL","value":"/healthcheck","type":"trace"}
I'm getting usually 3 to 6 requests every second, so I'm wondering if this is just the way the Network Load Balancers work, and AWS still haven't documented that (or I haven't found it), or otherwise how I might fix this issue.
Update: this has been answered on the related aws forum post which confirms that it's normal behaviour for network load balancers and cites their distributed nature as the reason. There is no way to configure a custom interval. At this moment, the docs are still out of date and specify otherwise.
This is either a bug in NLB Target Groups, or normal behaviour with incorrect documentation. I've come to this conclusion because:
In this case I think it might be normal behaviour that's been documented incorrectly, but there's no way of verifying that unless someone from AWS can, and it's almost impossible to get an answer for an issue like this on the aws forum.
It would be useful to be able to configure the setting, or at least have the docs updated.