4 May 2024

ALB behind an NLB Gotchas

An application load balancer (ALB) is deployed in Amazon Web Services (AWS) behind a network load balancer (NLB). In turn, behind the ALB are few pods of a kubernetes cluster:

NLB -> ALB -> pods

The reasons for such an architecture are beyond the scope of this post.

Problem statement: after the initial setup things worked as expected. Soon(ish) the NLB started reporting the ALB as not healthy.

Tip: consider the availability zones (AZs).

Details

To understand the reasons for such an unexpected behavior let’s dwell into the details:

assume NLB and ALB work only with HTTP traffic on port 80;
there are 3 AZs;
NLB was configured (by mistake) to reside in AZ1 and AZ2;
there are 2 pods, one per AZ, randomly distributed between the AZs;
NLB has the default settings with cross-az balancing disabled. Hence the NLB’s ENI specific to the AZ is connected only to the ENI of the ALB in that AZ.

Right after the installation:

            +-----+         +-----+
============|     |=========|     |============
            |     |         |     |
AZ1         |     o---->----o     o-->--[pod1]
            |     |         |     |
============+     +=========+     +============
            |     |         |     |
AZ2         | NLB o---->----o ALB o-->--[pod2]
            |     |         |     |
============+     +=========+     +============
            |     |         |     |
AZ3         |     |         o     o-->--[pod3]
            |     |         |     |
============|     |=========|     |============
            +-----+         +-----+

Legend:

  o - elastic network interface
  = - AZ boundary

So far so good.

After a while:

pods can come and go as needed in different AZs. Say, there are no more pods in AZ2.
An AWS ALB optimization process kicks in, removing the ALB’s ENIs which do not have targets in the corresponding AZ, specifically in AZ2.
From the NLB perspective, the health check for the ALB in AZ2 is now failing.

            +-----+         +-----+
============|     |=========|     |============
            |     |         |     |
AZ1         |     o---->----o     o-->--[pod1]
            |     |         |     |
============+     +=========+     +============
            |     |         |     |
AZ2         | NLB o---->----o ALB |
            |     |         |     |
============+     +=========+     +============
            |     |         |     |
AZ3         |     |         o     o-->--[pod3]
            |     |         |     |
============|     |=========|     |============
            +-----+         +-----+

Legend:

  o - elastic network interface
  = - AZ boundary

Lessons Learned

NLB Settings

NLB should be configured:

to have presence in all the AZs;
to perform cross-az load balancing.

Optimize the ALB’s Health Check

Consider the ALB target group first. Its health checks are terminated on the pods:

protocol: HTTP
port: 80
path: /

Now let’s talk about the health check for the target group embedding the ALB. If we just repeat the above health check, these will be terminated on the pods. A better solution would be to have the health checks terminated on the ALB. To accomplish this, add a special rule to the ALB listener:

path: /alb-health-check
return fixed response: 200

#
# respond to the health check from NLB
#
resource "aws_lb_listener_rule" "alb_health_check" {
  listener_arn = arn
  priority     = 555

  action {
    type = "fixed-response"
    fixed_response {
      content_type = "text/plain"
      message_body = "ok"
      status_code  = "200"
    }
  }

  condition {
    path_pattern {
      values = ["/alb-health-check"]
    }
  }
}

For the target group embedding the ALB, add the above path as a health check:

protocol: HTTP
port: 80
path: /alb-health-check

resource "aws_lb_target_group" "alb_target_group" {
  name = "alb_target_group"
  port = 80
  protocol = "HTTP"
  target_type = "alb"

  health_check {
    path = "/alb-health-check"
    port = 80
    protocol = "HTTP"
  }
}

Notes to Self

Alex Sokolsky's Notes on Computers and Programming

ALB behind an NLB Gotchas

Details

Lessons Learned

NLB Settings

Optimize the ALB’s Health Check

See Also