← Articles/No. 507 · Engineering

How to achieve zero downtime on kubernetes: a Day-2 architecture guide

Achieving zero-downtime deployments on Kubernetes requires more than running multiple pods. It demands a standardized architecture utilizing Pod Disruption Budgets (PDBs), precise liveness and readiness probes, pod anti-affinity, and graceful termination handling. At an enterprise scale, these configurations must be enforced via a centralized control plane to prevent catastrophic configuration drift.

Qovery Team

The Qovery Team

FEB 28, 2026 · 11 MIN

How to achieve zero downtime on kubernetes: a Day-2 architecture guide

Key points:

Enforce redundancy globally: Running at least two replicas alongside a strict Pod Disruption Budget (PDB) is the non-negotiable baseline for surviving node failures and cluster maintenance.
Automate health diagnostics: Liveness and readiness probes dictate how the kubelet routes traffic and self-heals broken pods during seamless rolling updates.
Abstract the configuration toil: Managing these YAML configurations across thousands of clusters manually destroys engineering velocity. Centralized management platforms automate zero-downtime standards without expanding DevOps headcount.

Pulling a container image and deploying a pod is straightforward. Keeping that application highly available during node failures, traffic spikes, and infrastructure upgrades is a complex engineering challenge.

Qovery · Kubernetes for the AI era

Build with Claude Code, Deploy with Qovery

Learn more

Kubernetes provides the native primitives required to achieve true zero-downtime deployments, but it does not apply them automatically. Engineering teams must explicitly define how the orchestrator handles traffic routing, health checks, and termination signals.

In this architectural guide, we define the strict Day-2 operational standards required to achieve zero downtime on Kubernetes, and how to scale these configurations across a global fleet.

The 1,000-cluster reality: standardizing zero downtime at scale

Configuring zero downtime for a single application is a routine technical task. Enforcing these configurations across thousands of microservices and hundreds of global clusters is a massive Day-2 operational liability.

Without an automated, centralized control plane, platform engineers must manually define and maintain Pod Disruption Budgets, affinity rules, and custom probes via disparate YAML files. This manual approach inevitably leads to configuration drift, dropped connections during scaling events, and prolonged outages during routine node drains. To survive at an enterprise scale, organizations must abstract this manual configuration away from developers, utilizing an agentic management platform to enforce standard zero-downtime rollouts automatically.

🚀 Real-world proof

Getsafe faced escalating costs, compliance hurdles, and critical downtimes on legacy infrastructure that halted their rapid scaling.

1. Control your container image registries

In a production environment, relying on a public or unauthenticated image registry introduces an immediate single point of failure. If the external registry experiences an outage, or an image tag is overwritten, your cluster will throw an ImagePullBackOff error, halting scaling events and rollbacks.

Enterprise platform teams must synchronize container images to private, dedicated registries hosted within their cloud provider account. A centralized control plane automates this process, ensuring that an unavailable external registry never impacts a live production workload.

2. High availability through replicas

Relying on a single application instance guarantees downtime. A common misconception is that a single replica survives rolling updates because Kubernetes starts a new instance before shutting down the old one. While true for basic deployments, this rule does not apply to underlying infrastructure failures.

If a node crashes, or the cluster initiates a node drain (such as during an EKS upgrade), a single pod receives a SIGTERM signal and enters a TERMINATING state. The service stops sending traffic, resulting in immediate downtime while the scheduler waits to pull the image and attach disks to a new node. Running a minimum of two replicas is the absolute baseline for high availability.

JAVASCRIPT

apiVersion: apps/v1
kind: Deployment
metadata:
  name: enterprise-app
spec:
  replicas: 2

3. Enforce pod disruption budgets (pdb)

A PodDisruptionBudget (PDB) limits the number of concurrent disruptions that your application experiences during voluntary disruptions, such as cluster maintenance or upgrades.

If an application runs three replicas, a PDB ensures that at least two pods remain active at all times, preventing the orchestrator from taking the entire service offline simultaneously.

JAVASCRIPT

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: standard-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: enterprise-app

K8s Production Best Practices

Cut through the complexity. Get actionable configurations to slash cloud costs by 30-70%, prevent downtime, and lock down your cluster security.

Get the guide

Kubernetes Best Practices for Production

4. Configure rolling update strategies

Kubernetes offers two primary deployment strategies: Recreate (which forces the application to shut down entirely before starting the new version) and RollingUpdate.

To avoid downtime, RollingUpdate must be applied and tuned using the maxUnavailable and maxSurge parameters. This controls the speed of the deployment, ensuring that enough legacy pods remain active to handle traffic while new pods initialize.

5. Automate deployment rollbacks

Kubernetes does not automatically revert a failed deployment to a previous state natively. If an application crashes on boot, the deployment stalls.

At scale, platform teams must utilize centralized Day-2 platforms or deployment tools (like Helm or ArgoCD) configured with atomic rollbacks. If the health probes of the newly deployed pods fail to return a healthy status within the timeout period, the system must automatically terminate the new pods and restore traffic strictly to the previous stable version.

Agents ship fast. Guardrails keep them safe.

Qovery ensures every agent action is scoped, audited, and policy-checked. Start deploying in under 10 minutes.

Try Qovery free Book a demo

6. Master liveness and readiness probes

Probes are the diagnostic backbone of zero downtime.

Liveness probes dictate pod survival. If this probe fails, the kubelet kills the pod and restarts it with an exponential backoff.
Readiness probes dictate traffic routing. If this probe fails, the pod remains alive, but the load balancer immediately stops sending it HTTP requests.

A simple TCP check is insufficient for enterprise environments. Teams must configure custom HTTP endpoints within their applications to accurately reflect database connectivity and cache health.

7. Tune the initial boot time delay

Heavy enterprise applications (like large Java Spring Boot services) require significant CPU time to initialize schemas before they can accept traffic. If a liveness probe fires before the boot sequence completes, Kubernetes will trap the pod in an infinite restart loop.

Use the initialDelaySeconds parameter to allow the application adequate time to boot before the kubelet begins polling for health.

JAVASCRIPT

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 60

8. Handle graceful termination (sigterm)

Ignoring termination signals results in dropped user connections and corrupted database transactions. When Kubernetes terminates a pod, it sends a SIGTERM signal. The application must be programmed to intercept this signal, finish processing active HTTP requests, close database connections gracefully, and then exit.

If the application ignores the SIGTERM, Kubernetes waits for the terminationGracePeriodSeconds (defaulting to 30 seconds) before executing a hard SIGKILL.

9. Implement pod anti-affinity

Deploying 50 replicas provides no high availability if the scheduler places all 50 pods on the exact same physical node. Pod anti-affinity forces Kubernetes to distribute replicas across different nodes or availability zones.

Soft anti-affinity (preferredDuringSchedulingIgnoredDuringExecution) attempts to separate pods, but will group them if node resources are exhausted. This is highly cost-effective for FinOps control.
Hard anti-affinity (requiredDuringSchedulingIgnoredDuringExecution) strictly forbids pods from sharing a node, ensuring absolute isolation at the cost of requiring more underlying infrastructure.

10. Define strict resource requests and limits

Failing to define CPU and memory limits guarantees downtime. Without memory limits, an application with a memory leak will trigger an Out Of Memory (OOM) kill from the Linux kernel. Without CPU limits, a single pod can monopolize node resources, starving critical system daemonsets and causing the node to become unresponsive.

11. Configure horizontal pod autoscaling (hpa)

Autoscaling prevents downtime during severe traffic spikes by dynamically provisioning new replicas based on CPU utilization or custom metrics.

JAVASCRIPT

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60

Autoscaling is not magic. It relies entirely on the successful configuration of probes, graceful terminations, and accurate resource requests. By abstracting these 11 configurations into a centralized platform engineering strategy, organizations can scale to thousands of clusters while guaranteeing zero downtime and eliminating manual YAML toil.

K8s Production Best Practices

Cut through the complexity. Get actionable configurations to slash cloud costs by 30-70%, prevent downtime, and lock down your cluster security.

Get the guide

FAQs

Q: Why does a single Kubernetes replica cause downtime during node drains?

‍A: When a cluster initiates a node drain for maintenance or an upgrade, the active pod receives a SIGTERM signal and stops accepting traffic. If there is only one replica, the service experiences immediate downtime while the scheduler waits to provision a replacement pod on a new node. A minimum of two replicas is strictly required to maintain traffic routing during this hardware transition.

Q: What is the difference between a Liveness probe and a Readiness probe?

‍A: A liveness probe determines if a pod is healthy; if it fails, the kubelet kills and restarts the container. A readiness probe determines if the pod is capable of handling HTTP requests; if it fails, the pod remains alive, but the load balancer automatically stops routing user traffic to it until it recovers.

Q: How does Pod Anti-Affinity prevent Kubernetes outages?

‍A: If multiple replicas of an application are scheduled on the exact same physical node, a single hardware failure will take down all instances simultaneously. Pod anti-affinity rules force the Kubernetes scheduler to distribute replicas across different nodes or geographic availability zones, isolating the blast radius of hardware crashes.

About the author

Qovery Team

The engineering, product, and developer experience team behind the Qovery platform.

Next step

Agents ship fast. Guardrails keep them safe.

Qovery ensures every agent action is scoped, audited, and policy-checked. Start deploying in under 10 minutes.

Try Qovery free Book a demo

All articles →

564 · AI Agents10 min

How to achieve zero downtime on kubernetes: a Day-2 architecture guide

Key points:

The 1,000-cluster reality: standardizing zero downtime at scale

🚀 Real-world proof

1. Control your container image registries

2. High availability through replicas

3. Enforce pod disruption budgets (pdb)

K8s Production Best Practices

4. Configure rolling update strategies

5. Automate deployment rollbacks

6. Master liveness and readiness probes

7. Tune the initial boot time delay

8. Handle graceful termination (sigterm)

9. Implement pod anti-affinity

10. Define strict resource requests and limits

11. Configure horizontal pod autoscaling (hpa)

K8s Production Best Practices

FAQs

Q: Why does a single Kubernetes replica cause downtime during node drains?

Q: What is the difference between a Liveness probe and a Readiness probe?

Q: How does Pod Anti-Affinity prevent Kubernetes outages?

Agents ship fast. Guardrails keep them safe.

More articles

Base44 vs Lovable: Which AI App Builder Should You Use in 2026?

How the market pulled us into becoming an agentic platform

A New Console for Qovery