Blog
Engineering
DevOps
Platform Engineering
Kubernetes
10
minutes

Everything I Wanted To Know About Kubernetes Autoscaling

Master the three dimensions of Kubernetes scaling: HPA, VPA, and Cluster Autoscaling. Learn why CPU scales easily, while memory requires architecture changes to avoid bottlenecks.
March 6, 2026
Pierre Mavro
CTO & Co-founder
Summary
Twitter icon
linkedin icon

Key Points

  • Three Dimensions of Scaling: Effective autoscaling requires balancing Horizontal (Pod count), Vertical (Resource size), and Cluster (Node count) scaling simultaneously to avoid bottlenecks.
  • The CPU vs. Memory Split: Scaling horizontally on CPU is straightforward, but memory-intensive applications often require architectural re-designing to handle distributed workloads efficiently.
  • Boot Delay Realities: Autoscaling is not instantaneous; you must account for Node boot times (avg. 2 mins on AWS) and Image pull times (which can take minutes for multi-GB images).

Kubernetes is today the most well-known container scheduler used by thousands of companies. Being able to quickly and automatically scale your application is standard nowadays; however, knowing how to do it well is another topic.

In this article, we'll cover how pod autoscaling works, how it can be used, and a specific Qovery internal use case.

There are three kinds of scaling:

  • Horizontal scaling: Your application can have multiple instances. As soon as you need to support more workload, new instances will pop up to handle it. Scaling ends when the limit you've set has been reached or when no more nodes can be used to support your workload.
  • Vertical scaling: Your application cannot run in parallel, so leveraging the current resources is the way to scale. Scaling issues occur when you reach the physical machine limits. Being able to have multiple instances with vertical scaling is possible but rare.
  • Multi-dimensional scaling: Less frequent, it combines horizontal and vertical scaling simultaneously. It's also more complex to manage because defining when to scale horizontally or vertically depends on many parameters.

Know your application limits and bottleneck

Applying autoscaling on the CPU doesn't work all the time because your application limits may not be (only) CPU side. First, define your application limits:

  • CPU: Calculation, compression, map reduce...
  • Memory: Cache data, store to then compute...
  • Disk: Store big data which can't fit into memory, flush on disk...
  • Network: Request external data (database, images, videos...), API calls...

If you have written your application, you should know where the bottleneck will happen at first. If you don't, you will have to load test your application to find resources where the contention will occur. For an API REST application, you can use existing tools to perform HTTP calls to try to saturate the service and see on which resource your application struggles.

It's essential to load test in the same conditions as production—thanks to Qovery, cloning an environment instantly is easy! Once you have results, consider these rules for autoscaling:

  • CPU: Scaling horizontally on CPU is generally one of the easiest ways to scale.
  • Memory: Memory can only scale vertically by design. If your application works this way, re-architecture it to distribute work across several instances is the way to scale.
  • Disk: Local disk is the most performant, but storing on a shared drive is preferable if you care less about performance and more about data availability across nodes.
  • Network: Scaling horizontally is common, but defining the metric (connection number, latency, throughput) may not be easy.

Qovery use case

At Qovery, we use the Qovery Engine as an example. The engine itself doesn't consume much CPU or memory, but when it runs Terraform, Terraform can consume 500+ Mb per process. To size pod instances correctly, we limit parallel runs of Terraform in a single instance to avoid Out Of Memory (OOM) issues.

Because memory is the bottleneck, we horizontally distributed the workload across several Engines. We created a custom metric based on the Engine number of requests executed in parallel. If an Engine performs tasks, we scale up to handle new ones.

We implemented a metric in the Engine application:

kubernetes prometheus

So we implemented a metric for it in the Engine application:

lazy_static! {
    static ref METRICS_NB_RUNNING_TASKS: IntGauge =
        register_int_gauge!("taskmanager_nb_running_tasks", "Number of tasks currently running").unwrap();
}

Prometheus is configured to scrape Qovery metrics every 30s:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/instance: qovery-engine
  name: qovery-engine
  namespace: qovery-prod
spec:
  endpoints:
  - interval: 30s 
    port: metrics
    scrapeTimeout: 5s
  namespaceSelector:
    matchNames:
    - qovery-prod
  selector:
    matchLabels:
      app.io/instance: qovery-engine

Then, with the Prometheus Adapter, we act on the Pod autoscaler:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: qovery-engine
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: qovery-engine
  minReplicas: 1
  maxReplicas: 30
  metrics:
    - type: Pods
      pods:
        metric:
          name: taskmanager_nb_running_tasks
        target:
          type: AverageValue
          averageValue: 0.5

Enhance autoscaling pod boot time

Autoscaling pods can take some time for several reasons:

  1. Boot node: If resources are full, Kubernetes creates a new node (average 2 min on AWS).
  2. Boot pod (pull image): Pulling multi-GB images can take minutes.
  3. Application boot delay: Varying by application (e.g., JVM-based).

We eliminate points 1 and 2 using overprovisioning pods. We use a priority class with a value of -1:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: qovery-engine-overprovisioning
value: -1
globalDefault: false

The deployment for preempting pods ensures real Engine pods replace them instantly because the images are already pulled and resources are allocated:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: qovery-engine-overprovisionning
spec:
  replicas: {{ .Values.overprovisionning.replicas }}
  template:
    spec:
      priorityClassName: qovery-engine-overprovisioning
      containers:
      - name: qovery-engine-overprovisionning
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        command: ["/bin/sh", "-c", "tail -f /dev/null"]

Conclusion

Autoscaling is not magic. Kubernetes helps, but the most important thing is knowing your application limits and bottlenecks. Taking time to test, validate, and regularly load testing your app is crucial to success.

Master Fleet-First Kubernetes

From cluster sprawl to fleet harmony, learn the operational strategies and architectural frameworks required to orchestrate high-performing, global, AI-ready Kubernetes fleets.

Share on :
Twitter icon
linkedin icon
Tired of fighting your Kubernetes platform?
Qovery provides a unified Kubernetes control plane for cluster provisioning, security, and deployments - giving you an enterprise-grade platform without the DIY overhead.
See it in action

Suggested articles

Kubernetes
7
 minutes
Day 2 operations: an executive guide to Kubernetes operations and scale

Kubernetes success is determined by Day 2 execution, not Day 1 deployment. While migration is a bounded project, maintenance is an infinite loop that often consumes 40% of senior engineering capacity. To protect margins and velocity, enterprises must transition from manual toil to agentic automation that handles scaling, security, and cost.

Mélanie Dallé
Senior Marketing Manager
Kubernetes
8
 minutes
The 2026 guide to Kubernetes management: master day-2 ops with agentic control

Master Kubernetes management in 2026. Discover how Agentic Automation resolves Day-2 Ops, eliminates configuration drift, and cuts cloud spend on vanilla EKS/GKE/AKS.

Romaric Philogène
CEO & Co-founder
DevOps
Kubernetes
6
 minutes
Day-0, day-1, and day-2 Kubernetes: defining the phases of fleet management

Day-0 is planning, Day-1 is deployment, and Day-2 is the infinite lifecycle of maintenance. While Day-0/1 are foundational, Day-2 is where enterprise operational debt accumulates. At fleet scale (1,000+ clusters), managing these differences manually is impossible, requiring agentic automation to maintain stability and eliminate toil.

Morgan Perry
Co-founder
Kubernetes
6
 minutes
Kubernetes observability at scale: cutting the noise in multi-cloud environments

Stop overpaying for Kubernetes observability. Learn how in-cluster monitoring and AI-driven troubleshooting with Qovery Observe can eliminate APM ingestion fees, reduce SRE bottlenecks, and make your cloud costs predictable.

Mélanie Dallé
Senior Marketing Manager
Kubernetes
 minutes
Understanding CrashLoopBackOff: Fixing AI workloads on Kubernetes

Stop fighting CrashLoopBackOff on your AI deployments. Learn why traditional Kubernetes primitives fail large models and GPU workloads, and how to orchestrate AI infrastructure without shadow IT.

Mélanie Dallé
Senior Marketing Manager
Kubernetes
Platform Engineering
 minutes
Mastering multi-cluster Kubernetes management: Strategies for scale

Stop fighting cluster sprawl. Learn why traditional scripting and GitOps fail at scale, and discover how to achieve fleet-wide consistency without the complexity of Kubernetes Federation.

Mélanie Dallé
Senior Marketing Manager
Developer Experience
Kubernetes
8
 minutes
Top 5 Kubernetes automation tools for streamlined management and efficiency

Looking to automate your Kubernetes environment in 2026? Discover the top automation tools, their weaknesses, and why scaling your infrastructure requires a unified management platform.

Mélanie Dallé
Senior Marketing Manager
AI
 minutes
Beyond Compute Constraints: Why AI Success is an Orchestration Problem

As the AI race shifts from hardware acquisition to GPU utilization, success is now an orchestration problem. Learn how to bridge the 84% capacity gap, eliminate "ghost" expenses, and leverage AI infrastructure copilots to maximize ROI in 2026.

Romaric Philogène
CEO & Co-founder

It’s time to change
the way you manage K8s

Turn Kubernetes into your strategic advantage with Qovery, automating the heavy lifting while you stay in control.