Scaling Kubernetes on AWS: Day-2 operations and fleet management



Key Points:
- The scaling triad: Control the interaction between HPA, VPA, and Karpenter to ensure reliable elasticity without metric thrashing.
- FinOps and cost control: Implement spot instance orchestration and automated idle environment teardowns to eliminate cloud waste across large fleets.
- Agentic fleet management: Transition from manual YAML edits to intent-based orchestration to manage thousands of clusters consistently.
Scaling Kubernetes on AWS requires balancing node-level elasticity with pod-level demands. A critical architectural oversight occurs when teams misconfigure the Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) to compete over the same metrics, causing scaling thrash.
Effective Day-2 operations demand standardizing these configurations across your fleet to prevent runaway cloud costs.
The 1,000-cluster reality: why manual scaling breaks
As enterprises scale from a handful of clusters to hundreds or thousands, manual scaling interventions become a severe operational bottleneck. Platform Architects and SREs face configuration drift, inconsistent RBAC policies, and unpredictable cloud bills. Scaling Kubernetes on AWS is no longer about configuring a single Auto Scaling Group (ASG); it is about fleet-wide governance. This requires agentic automation—where intent-based configurations dictate scaling behavior globally, rather than relying on manual YAML fatigue.
The mechanics of scaling Kubernetes on AWS
Scaling workloads on Amazon EKS requires precise coordination between pod-level controllers and node-level provisioners. The Kubernetes SIG-Autoscaling documentation explicitly warns against overlapping metric targets, yet it remains a frequent cause of cluster instability in enterprise environments.
Horizontal pod autoscaler (HPA) vs vertical pod autoscaler (VPA)
HPA adds more pod replicas as traffic increases, while VPA allocates more CPU or memory to existing pods. A critical failure point occurs when teams apply both HPA and VPA to the same CPU or memory metrics, causing the controllers to fight. Instead, tie HPA to custom metrics (like queue length or request rate) and let VPA handle baseline resource right-sizing.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: payment-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: payment-service
minReplicas: 3
maxReplicas: 50
metrics:
- type: External
external:
metric:
name: sqs_queue_length
target:
type: AverageValue
averageValue: 30Cluster autoscaling with Karpenter
For node-level scaling, the traditional AWS Cluster Autoscaler relies on rigid Auto Scaling Groups (ASGs). Modern Day-2 operations mandate Karpenter, an open-source node provisioning project built for AWS. Karpenter bypasses ASGs entirely, directly provisioning the exact compute capacity required by pending pods. This reduces scheduling latency from minutes to seconds.
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
limits:
resources:
cpu: 1000
providerRef:
name: default🚀 Real-world proof
When French unicorn Alan hit scaling and reliability limits with AWS Elastic Beanstalk, they used Qovery to abstract Kubernetes complexity and manage 100+ services without losing network control.
⭐ The result: Deployment times dropped from 55 minutes to 8 minutes (an 85% reduction) while eliminating the need for a dedicated infrastructure engineer. Read the Alan case study.
Optimizing networking and infrastructure at scale
Scaling pods rapidly exposes underlying AWS infrastructure limits. The most frequent networking failure at scale is IP address exhaustion within the Amazon VPC Container Network Interface (CNI). As Karpenter spins up new nodes and HPA schedules hundreds of pods, the subnet can run out of available IPv4 addresses, stalling the entire scaling event.
Platform teams must implement Prefix Delegation within the VPC CNI or transition to IPv6 architectures to ensure IP availability during massive scale-out events.
# Enable Prefix Delegation in AWS VPC CNI
kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=trueFurthermore, integrating the AWS Load Balancer Controller ensures that external traffic routes correctly as new worker nodes come online, bypassing the legacy in-tree cloud providers.
Enforcing FinOps and cost controls
Scaling without financial governance guarantees cloud waste. Budget and Risk Owners require strict ROI tracking aligned with the FinOps Foundation principles. Effective FinOps for Kubernetes on AWS requires:
- Spot instance orchestration: Utilizing Karpenter to prioritize spot instances for stateless workloads and gracefully handling interruptions via the AWS Node Termination Handler.
- Ephemeral environments: Automatically tearing down development and staging clusters during non-business hours to eliminate idle runtime costs.
- Resource quotas: Enforcing namespace limits via native Kubernetes constructs to prevent individual development teams from consuming the entire cluster capacity.
Intent-based fleet management with Qovery
Managing HPA, Karpenter, and VPC CNI configurations across a single cluster requires high engineering effort; enforcing it across thousands of clusters requires an Agentic Kubernetes Management Platform. Qovery abstracts these infrastructure components into intent-based configurations. Instead of manually applying disparate YAML files across environments via kubectl, SREs define the desired state, and Qovery agents execute the scaling logic globally.
# .qovery.yml
application:
payment-service:
auto_scaling:
min_instances: 3
max_instances: 50
cpu_threshold: 75By utilizing a Kubernetes management platform like Qovery, platform teams eliminate toil, standardize multi-cluster deployments, and enforce FinOps controls automatically without requiring developers to understand the underlying AWS compute primitives.
Conclusion
Scaling Kubernetes on AWS demands a transition from manual operations to automated, agentic fleet management. By mastering HPA, Karpenter, and strict FinOps practices, enterprises ensure a highly available architecture without spiraling costs or unchecked configuration drift.
FAQs
Q: What is the difference between HPA and VPA in Kubernetes?
A: Horizontal Pod Autoscaler (HPA) adds or removes pod replicas based on metrics like CPU, memory, or external queue lengths. Vertical Pod Autoscaler (VPA) adjusts the CPU and memory requests and limits for existing pods. Using both simultaneously on the same metric causes resource conflicts and scaling thrash.
Q: How does Karpenter improve AWS EKS scaling?
A: Karpenter is a high-performance node provisioning project for Kubernetes on AWS. It bypasses traditional Auto Scaling Groups (ASGs) to provision right-sized compute nodes directly in response to pending pods. This reduces scheduling latency and lowers costs through dynamic instance type selection and spot instance prioritization.
Q: What are the biggest FinOps challenges with scaling Kubernetes fleets?
A: The main FinOps challenges include overprovisioning resources to handle peak loads, running idle development environments outside of business hours, and poor visibility into namespace-level consumption. Utilizing strict resource quotas, automated environment teardowns, and intent-based scaling platforms mitigates these financial risks.

Suggested articles
.webp)












