Blog
Engineering
Kubernetes
AWS
6
minutes

Our migration from Kubernetes built-in NLB to ALB controller

A standard trap is trusting the in-tree Kubernetes service load balancer on AWS. When you delete a Service of type LoadBalancer, the in-tree controller frequently fails to delete the underlying AWS resources. You end up with dozens of orphaned Network Load Balancers silently racking up massive cloud bills. Transitioning to the out-of-tree AWS Load Balancer Controller is mandatory to stop the bleeding.
April 17, 2026
Pierre Mavro
CTO & Co-founder
Summary
Twitter icon
linkedin icon

Key Points:

  • The maintenance dead-end: The built-in Kubernetes NLB integration is legacy code. AWS does not actively maintain it, leading to unresolved bugs and orphaned infrastructure.
  • Feature limitations: The in-tree controller cannot handle modern networking requirements like PROXY protocol IP preservation or fine-grained target group attributes.
  • Migration hazards: Switching controllers provisions an entirely new load balancer with a new DNS name. Managing this DNS crossover without dropping traffic requires strict routing governance.

Working with Kubernetes Services is convenient, especially when you can deploy Load Balancers via cloud providers simply by declaring type: LoadBalancer.

At Qovery, our orchestration engine initially relied on the Kubernetes built-in Network Load Balancer (NLB). It seemed like the rational choice for maintaining cloud-agnostic deployments without adding extra dependencies.

The reality of Day-2 operations proved otherwise. We were forced to migrate to the AWS Load Balancer Controller (ALB Controller) to simplify management, stop billing leaks, and gain access to necessary routing features. If you are operating Amazon EKS clusters in production, moving to the out-of-tree controller from day one is non-negotiable.

The 1,000-cluster reality: why in-tree controllers fail at scale

Relying on the default Kubernetes load balancer works perfectly in a local development cluster. At an enterprise scale of thousands of clusters, relying on legacy in-tree cloud providers creates a massive financial and operational liability. An orphaned load balancer on a single cluster is an annoyance.

Across a fleet of hundreds of Amazon EKS clusters, orphaned load balancers generate thousands of dollars in cloud waste every month. Resolving this requires migrating to the AWS Load Balancer Controller and utilizing an Agentic Kubernetes Management Platform to enforce strict, standardized ingress configurations globally.

Managing 100+ K8s Clusters

From cluster sprawl to fleet harmony. Master the intent-based orchestration and predictive sizing required to build high-performing, AI-ready Kubernetes fleets.

Best practices to manage 100+ Kubernetes clusters

Why did we start with the in-tree NLB controller

For our customers and many platform engineers, the built-in NLB is the default choice because it ships natively with Kubernetes.

  • Kubernetes native: It uses native objects, reducing the need for deep AWS-specific knowledge.
  • Cloud-agnostic intent: It theoretically makes it easier to migrate to other cloud providers without rewriting complex ingress manifests. As a platform managing multi-cloud deployments, we must maintain transparency for our customers.
  • Low initial overhead: It requires zero additional Helm charts or IAM roles to install.

The operational cost of legacy code

Migration to the ALB Controller came four years after we initially adopted the built-in NLB. We survived without it for a long time, but the technical debt eventually compounded into critical failures.

We began facing severe infrastructure leaks. When a developer deleted an environment, the Kubernetes Service was removed, but the underlying AWS Network Load Balancer was not cleaned up correctly. AWS support confirmed they were no longer prioritizing fixes for the in-tree load balancer code, directing everyone to use their out-of-tree AWS Load Balancer Controller instead.

When you use the Kubernetes built-in NLB, you are entirely on your own. We had to manually instrument our Rust-based Qovery Engine to hunt down and delete orphaned AWS resources via the AWS API to enforce Kubernetes cost optimization.

// fix for NLB not properly removed by the legacy in-tree controller
pub fn clean_up_deleted_k8s_nlb(
    event_details: EventDetails,
    target: &DeploymentTarget,
) -> Result<(), Box<EngineError>> {
    // custom logic to force-delete orphaned AWS Load Balancers
    // to prevent massive cloud billing leaks
}

Feature gaps forced the migration

Beyond the bugs, we needed to leverage advanced AWS networking features that the built-in controller simply ignores. Moving to the AWS Load Balancer Controller provided access to critical annotations:

  • PROXY protocol support: service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*". This annotation preserves the client source IP address, which is mandatory for strict security auditing and rate limiting.
  • Direct pod routing: service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip". This bypasses kube-proxy and routes traffic directly to the pod IP addresses, reducing network hops and lowering latency.
  • Target group attributes: service.beta.kubernetes.io/aws-load-balancer-target-group-attributes. This allows fine-tuned control over the AWS target groups, such as enabling deregistration delay or sticky sessions directly from the Kubernetes manifest.
apiVersion: v1
kind: Service
metadata:
  name: api-gateway
  namespace: production
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  type: LoadBalancer
  selector:
    app: api-gateway
  ports:
    - port: 443
      targetPort: 8443

The deployment hazard you must anticipate

When you migrate an existing Service from the in-tree controller to the AWS Load Balancer Controller, things will break if you are not careful.

The biggest failure point is DNS routing. The new controller provisions an entirely new load balancer with a completely new AWS DNS name. If you simply update your Service annotations on a live production deployment, Kubernetes will detach the old load balancer and spin up the new one. Because your external DNS (like Route53 or Cloudflare) still points to the old load balancer name, you will drop 100% of your incoming traffic while you wait for the new DNS records to propagate.

You must provision the new Service alongside the old one, update your DNS CNAME records, wait out the TTL expiration, and only then decommission the legacy Service.

🚀 Real-world proof

Hyperline wanted to accelerate their time to market and avoid the overhead of building custom DevOps pipelines for developer testing.

The result: Eliminated the need for a dedicated DevOps engineer, saving significant costs and improving deployment confidence through automated ephemeral environments. Read the Hyperline case study.

Intent-based ingress with Qovery

Installing the AWS Load Balancer Controller requires configuring strict AWS IAM roles for Service Accounts (IRSA), deploying the Helm chart, and managing webhook certificates. Doing this manually across thousands of clusters introduces massive configuration drift.

Qovery abstracts this complexity. As an Agentic Kubernetes Management Platform, Qovery natively handles the AWS Load Balancer Controller lifecycle across your Amazon EKS fleet.

# .qovery.yml
application:
  api-gateway:
    build_mode: docker
    ports:
      - internal_port: 8443
        publicly_accessible: true
        routing_type: custom_domain

Instead of fighting raw Kubernetes annotations and Terraform state files, platform teams declare their routing intent.

Qovery provisions the correct load balancers, attaches the target groups, and configures the networking automatically. This eliminates cost leaks from orphaned resources and ensures your ingress layer is permanently maintained.

FAQs

Why did AWS stop maintaining the in-tree Kubernetes load balancer?

The Kubernetes community mandated moving all cloud-specific provider code out of the core Kubernetes repository to reduce bloat and separate release cycles. AWS shifted all development focus to the out-of-tree AWS Load Balancer Controller, leaving the built-in controller as legacy code that receives no new features or non-critical bug fixes.

What happens when you delete an in-tree LoadBalancer Service on Amazon EKS?

Due to unpatched bugs in the legacy in-tree controller, deleting the Kubernetes Service frequently fails to trigger the deletion of the corresponding AWS Network Load Balancer. This leaves orphaned load balancers running in your AWS account, quietly consuming your cloud budget until you manually audit and delete them via the AWS console.

How do I migrate to the AWS Load Balancer Controller without downtime?

Migrating a Service to the new controller provisions a completely new AWS load balancer with a different DNS name. To avoid downtime, you must deploy the new Service alongside the old one, update your DNS CNAME records to point to the new load balancer, wait for the DNS TTL to expire globally, and then delete the legacy Service.

Share on :
Twitter icon
linkedin icon
Tired of fighting your Kubernetes platform?
Qovery provides a unified Kubernetes control plane for cluster provisioning, security, and deployments - giving you an enterprise-grade platform without the DIY overhead.
See it in action

Suggested articles

Kubernetes
8
 minutes
Kubernetes management in 2026: mastering Day-2 ops with agentic control

The cluster coming up is the easy part. What catches teams off guard is what happens six months later: certificates expire without a single alert, node pools run at 40% over-provisioned because nobody revisited the initial resource requests, and a manual kubectl patch applied during a 2am incident is now permanent state. Agentic control planes enforce declared state continuously. Monitoring tools just report the problem.

Mélanie Dallé
Senior Marketing Manager
Kubernetes
 minutes
How to automate environment sleeping and stop paying for idle Kubernetes resources

Scaling your deployments to zero is only half the battle. If your cluster autoscaler does not aggressively bin-pack and terminate the underlying worker nodes, you are still paying for idle metal. True environment sleeping requires tight integration between your ingress layer and your node provisioner to actually realize FinOps savings.

Mélanie Dallé
Senior Marketing Manager
Kubernetes
DevOps
6
 minutes
10 best Kubernetes management tools for enterprise fleets in 2026

The biggest mistake enterprises make when evaluating Kubernetes management platforms is confusing infrastructure provisioning with Day-2 operations. Tools like Terraform or kOps are excellent for spinning up the underlying EC2 instances and networking, but they do absolutely nothing to prevent configuration drift, automate certificate rotation, or right-size your idle workloads once the cluster is actually running.

Mélanie Dallé
Senior Marketing Manager
DevOps
Kubernetes
Platform Engineering
6
 minutes
10 best Red Hat OpenShift alternatives to reduce licensing costs

For years, Red Hat OpenShift has been the safe choice for heavily regulated, on-premise environments. It operates as a secure fortress. But in the public cloud, that fortress acts as an expensive prison. Paying proprietary per-core licensing fees on top of your standard AWS or GCP compute bill is a redundant "middleware tax." Escaping OpenShift requires decoupling your infrastructure from your developer experience by running standard, vanilla Kubernetes paired with an agentic control plane.

Morgan Perry
Co-founder
AI
Product
3
 minutes
Qovery Skill for AI Agents: Deploy Apps in One Prompt

Use Qovery from Claude Code, OpenCode, Codex, and 20+ AI Coding agents

Romaric Philogène
CEO & Co-founder
Kubernetes
 minutes
Stopping Kubernetes cloud waste: agentic automation for enterprise fleets

Agentic Kubernetes resource reclamation is the practice of using an autonomous control plane to continuously identify, suspend, and delete idle infrastructure across a multi-cloud Kubernetes fleet. It replaces manual cleanup and reactive autoscaling with intent-based policies that act on business state, eliminating the configuration drift and cloud waste typical of unmanaged fleets.

Mélanie Dallé
Senior Marketing Manager
Platform Engineering
Kubernetes
DevOps
10
 minutes
What is Kubernetes? The reality of Day-2 enterprise fleet orchestration

Kubernetes focuses on container orchestration, but the reality on the ground is far less forgiving. Provisioning a single cluster is a trivial Day-1 exercise. The true operational nightmare begins on Day 2. Teams that treat multi-cloud fleets like isolated pets inevitably face crushing YAML configuration drift, runaway AWS bills, and severe scaling bottlenecks.

Morgan Perry
Co-founder
Kubernetes
DevOps
5
 minutes
Top 10 Rancher alternatives in 2026: beyond cluster management

Rancher solved the Day-1 problem of launching clusters across disparate bare-metal environments. But in 2026, launching clusters is no longer the bottleneck. The real failure point is Day-2: managing the operational chaos, security patching, and configuration drift on top of them. Rancher is a heavy, ops-focused fleet manager that completely ignores the application developer. If your goal is developer velocity and automated FinOps, you must graduate from basic fleet management to an intent-based Kubernetes Management Platform like Qovery.

Morgan Perry
Co-founder

It’s time to change
the way you manage K8s

Turn Kubernetes into your strategic advantage with Qovery, automating the heavy lifting while you stay in control.