← Articles/No. 515 · AWS

Scaling Kubernetes on AWS: Day-2 operations and fleet management

Scaling Kubernetes on AWS requires balancing node-level elasticity with pod-level demands. A critical architectural oversight occurs when teams misconfigure the Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) to compete over the same metrics, causing scaling thrash. Effective Day-2 operations demand standardizing these configurations across your fleet to prevent runaway cloud costs.

Morgan Perry

Co-founder

MAR 15, 2026 · 13 MIN

Scaling Kubernetes on AWS: Day-2 operations and fleet management

Key Points:

The scaling triad: Control the interaction between HPA, VPA, and Karpenter to ensure reliable elasticity without metric thrashing.
FinOps and cost control: Implement spot instance orchestration and automated idle environment teardowns to eliminate cloud waste across large fleets.
Agentic fleet management: Transition from manual YAML edits to intent-based orchestration to manage thousands of clusters consistently.

Qovery · Kubernetes for the AI era

Simplify Kubernetes - for humans and AI agents

Learn more

Effective Day-2 operations demand standardizing these configurations across your fleet to prevent runaway cloud costs.

The 1,000-cluster reality: why manual scaling breaks

As enterprises scale from a handful of clusters to hundreds or thousands, manual scaling interventions become a severe operational bottleneck. Platform Architects and SREs face configuration drift, inconsistent RBAC policies, and unpredictable cloud bills. Scaling Kubernetes on AWS is no longer about configuring a single Auto Scaling Group (ASG); it is about fleet-wide governance. This requires agentic automation-where intent-based configurations dictate scaling behavior globally, rather than relying on manual YAML fatigue.

The mechanics of scaling Kubernetes on AWS

Scaling workloads on Amazon EKS requires precise coordination between pod-level controllers and node-level provisioners. The Kubernetes SIG-Autoscaling documentation explicitly warns against overlapping metric targets, yet it remains a frequent cause of cluster instability in enterprise environments.

Horizontal pod autoscaler (HPA) vs vertical pod autoscaler (VPA)

HPA adds more pod replicas as traffic increases, while VPA allocates more CPU or memory to existing pods. A critical failure point occurs when teams apply both HPA and VPA to the same CPU or memory metrics, causing the controllers to fight. Instead, tie HPA to custom metrics (like queue length or request rate) and let VPA handle baseline resource right-sizing.

JAVASCRIPT

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: payment-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payment-service
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: External
    external:
      metric:
        name: sqs_queue_length
      target:
        type: AverageValue
        averageValue: 30

Cluster autoscaling with Karpenter

For node-level scaling, the traditional AWS Cluster Autoscaler relies on rigid Auto Scaling Groups (ASGs). Modern Day-2 operations mandate Karpenter, an open-source node provisioning project built for AWS. Karpenter bypasses ASGs entirely, directly provisioning the exact compute capacity required by pending pods. This reduces scheduling latency from minutes to seconds.

JAVASCRIPT

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot", "on-demand"]
    - key: kubernetes.io/arch
      operator: In
      values: ["amd64", "arm64"]
  limits:
    resources:
      cpu: 1000
  providerRef:
    name: default

🚀 Real-world proof

When French unicorn Alan hit scaling and reliability limits with AWS Elastic Beanstalk, they used Qovery to abstract Kubernetes complexity and manage 100+ services without losing network control.

⭐ The result: Deployment times dropped from 55 minutes to 8 minutes (an 85% reduction) while eliminating the need for a dedicated infrastructure engineer. Read the Alan case study.

Day 2 Operations & Scaling Checklist

Is Kubernetes a bottleneck? Audit your Day 2 readiness and get a direct roadmap to transition to a mature, scalable Platform Engineering model.

Download the checklist!

Kubernetes Day 2 Operations & Scaling Checklist

Agents ship fast. Guardrails keep them safe.

Qovery ensures every agent action is scoped, audited, and policy-checked. Start deploying in under 10 minutes.

Try Qovery free

Optimizing networking and infrastructure at scale

Scaling pods rapidly exposes underlying AWS infrastructure limits. The most frequent networking failure at scale is IP address exhaustion within the Amazon VPC Container Network Interface (CNI). As Karpenter spins up new nodes and HPA schedules hundreds of pods, the subnet can run out of available IPv4 addresses, stalling the entire scaling event.

Platform teams must implement Prefix Delegation within the VPC CNI or transition to IPv6 architectures to ensure IP availability during massive scale-out events.

JAVASCRIPT|Enable Prefix Delegation in AWS VPC CNI

kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true

Furthermore, integrating the AWS Load Balancer Controller ensures that external traffic routes correctly as new worker nodes come online, bypassing the legacy in-tree cloud providers.

Enforcing FinOps and cost controls

Scaling without financial governance guarantees cloud waste. Budget and Risk Owners require strict ROI tracking aligned with the FinOps Foundation principles. Effective FinOps for Kubernetes on AWS requires:

Spot instance orchestration: Utilizing Karpenter to prioritize spot instances for stateless workloads and gracefully handling interruptions via the AWS Node Termination Handler.
Ephemeral environments: Automatically tearing down development and staging clusters during non-business hours to eliminate idle runtime costs.
Resource quotas: Enforcing namespace limits via native Kubernetes constructs to prevent individual development teams from consuming the entire cluster capacity.

Intent-based fleet management with Qovery

Managing HPA, Karpenter, and VPC CNI configurations across a single cluster requires high engineering effort; enforcing it across thousands of clusters requires an Agentic Kubernetes Management Platform. Qovery abstracts these infrastructure components into intent-based configurations. Instead of manually applying disparate YAML files across environments via kubectl, SREs define the desired state, and Qovery agents execute the scaling logic globally.

JAVASCRIPT|.qovery.yml

application:
  payment-service:
    auto_scaling:
      min_instances: 3
      max_instances: 50
      cpu_threshold: 75

By utilizing a Kubernetes management platform like Qovery, platform teams eliminate toil, standardize multi-cluster deployments, and enforce FinOps controls automatically without requiring developers to understand the underlying AWS compute primitives.

Conclusion

Scaling Kubernetes on AWS demands a transition from manual operations to automated, agentic fleet management. By mastering HPA, Karpenter, and strict FinOps practices, enterprises ensure a highly available architecture without spiraling costs or unchecked configuration drift.

FAQs

Q: What is the difference between HPA and VPA in Kubernetes?

‍A: Horizontal Pod Autoscaler (HPA) adds or removes pod replicas based on metrics like CPU, memory, or external queue lengths. Vertical Pod Autoscaler (VPA) adjusts the CPU and memory requests and limits for existing pods. Using both simultaneously on the same metric causes resource conflicts and scaling thrash.

Q: How does Karpenter improve AWS EKS scaling?

‍A: Karpenter is a high-performance node provisioning project for Kubernetes on AWS. It bypasses traditional Auto Scaling Groups (ASGs) to provision right-sized compute nodes directly in response to pending pods. This reduces scheduling latency and lowers costs through dynamic instance type selection and spot instance prioritization.

Q: What are the biggest FinOps challenges with scaling Kubernetes fleets?

‍A: The main FinOps challenges include overprovisioning resources to handle peak loads, running idle development environments outside of business hours, and poor visibility into namespace-level consumption. Utilizing strict resource quotas, automated environment teardowns, and intent-based scaling platforms mitigates these financial risks.

About the author

Morgan Perry

Morgan co-founded Qovery and leads engineering. He writes about Kubernetes architecture, DevOps best practices, and building resilient infrastructure at scale.

Next step

Agents ship fast. Guardrails keep them safe.

Qovery ensures every agent action is scoped, audited, and policy-checked. Start deploying in under 10 minutes.

Try Qovery free Book a demo

All articles →

554 · AI8 min

Scaling Kubernetes on AWS: Day-2 operations and fleet management

Key Points:

The 1,000-cluster reality: why manual scaling breaks

The mechanics of scaling Kubernetes on AWS

Horizontal pod autoscaler (HPA) vs vertical pod autoscaler (VPA)

Cluster autoscaling with Karpenter

🚀 Real-world proof

Day 2 Operations & Scaling Checklist

Optimizing networking and infrastructure at scale

Enforcing FinOps and cost controls

Intent-based fleet management with Qovery

Conclusion

FAQs

Q: What is the difference between HPA and VPA in Kubernetes?

Q: How does Karpenter improve AWS EKS scaling?

Q: What are the biggest FinOps challenges with scaling Kubernetes fleets?

Agents ship fast. Guardrails keep them safe.

More articles

Coding Agents Write the Code. Who Verifies It Works? We Built the Answer.

Beneath the Stack: A Software Engineer's Journey into Infrastructure

The Lovable Experience. Enterprise Governance. Your Infrastructure. We Built It.