Notes to Self

Alex Sokolsky's Notes on Computers and Programming

4 May 2024

ALB behind an NLB Gotchas

An application load balancer (ALB) is deployed in Amazon Web Services (AWS) behind a network load balancer (NLB). In turn, behind the ALB are few pods of a kubernetes cluster:

NLB -> ALB -> pods

The reasons for such an architecture are beyond the scope of this post.

Problem statement: after the initial setup things worked as expected. Soon(ish) the NLB started reporting the ALB as not healthy.

Tip: consider the availability zones (AZs).

Details

To understand the reasons for such an unexpected behavior let’s dwell into the details:

Right after the installation:

            +-----+         +-----+
============|     |=========|     |============
            |     |         |     |
AZ1         |     o---->----o     o-->--[pod1]
            |     |         |     |
============+     +=========+     +============
            |     |         |     |
AZ2         | NLB o---->----o ALB o-->--[pod2]
            |     |         |     |
============+     +=========+     +============
            |     |         |     |
AZ3         |     |         o     o-->--[pod3]
            |     |         |     |
============|     |=========|     |============
            +-----+         +-----+

Legend:

  o - elastic network interface
  = - AZ boundary

So far so good.

After a while:

            +-----+         +-----+
============|     |=========|     |============
            |     |         |     |
AZ1         |     o---->----o     o-->--[pod1]
            |     |         |     |
============+     +=========+     +============
            |     |         |     |
AZ2         | NLB o---->----o ALB |
            |     |         |     |
============+     +=========+     +============
            |     |         |     |
AZ3         |     |         o     o-->--[pod3]
            |     |         |     |
============|     |=========|     |============
            +-----+         +-----+

Legend:

  o - elastic network interface
  = - AZ boundary

Lessons Learned

NLB Settings

NLB should be configured:

Optimize the ALB’s Health Check

Consider the ALB target group first. Its health checks are terminated on the pods:

Now let’s talk about the health check for the target group embedding the ALB. If we just repeat the above health check, these will be terminated on the pods. A better solution would be to have the health checks terminated on the ALB. To accomplish this, add a special rule to the ALB listener:

#
# respond to the health check from NLB
#
resource "aws_lb_listener_rule" "alb_health_check" {
  listener_arn = arn
  priority     = 555

  action {
    type = "fixed-response"
    fixed_response {
      content_type = "text/plain"
      message_body = "ok"
      status_code  = "200"
    }
  }

  condition {
    path_pattern {
      values = ["/alb-health-check"]
    }
  }
}

For the target group embedding the ALB, add the above path as a health check:

resource "aws_lb_target_group" "alb_target_group" {
  name = "alb_target_group"
  port = 80
  protocol = "HTTP"
  target_type = "alb"

  health_check {
    path = "/alb-health-check"
    port = 80
    protocol = "HTTP"
  }
}

See Also

tags: aws