How to achieve zero downtime on kubernetes: a Day-2 architecture guide



Key points:
- Enforce redundancy globally: Running at least two replicas alongside a strict Pod Disruption Budget (PDB) is the non-negotiable baseline for surviving node failures and cluster maintenance.
- Automate health diagnostics: Liveness and readiness probes dictate how the kubelet routes traffic and self-heals broken pods during seamless rolling updates.
- Abstract the configuration toil: Managing these YAML configurations across thousands of clusters manually destroys engineering velocity. Centralized management platforms automate zero-downtime standards without expanding DevOps headcount.
Pulling a container image and deploying a pod is straightforward. Keeping that application highly available during node failures, traffic spikes, and infrastructure upgrades is a complex engineering challenge.
Kubernetes provides the native primitives required to achieve true zero-downtime deployments, but it does not apply them automatically. Engineering teams must explicitly define how the orchestrator handles traffic routing, health checks, and termination signals.
In this architectural guide, we define the strict Day-2 operational standards required to achieve zero downtime on Kubernetes, and how to scale these configurations across a global fleet.
The 1,000-cluster reality: standardizing zero downtime at scale
Configuring zero downtime for a single application is a routine technical task. Enforcing these configurations across thousands of microservices and hundreds of global clusters is a massive Day-2 operational liability.
Without an automated, centralized control plane, platform engineers must manually define and maintain Pod Disruption Budgets, affinity rules, and custom probes via disparate YAML files. This manual approach inevitably leads to configuration drift, dropped connections during scaling events, and prolonged outages during routine node drains. To survive at an enterprise scale, organizations must abstract this manual configuration away from developers, utilizing an agentic management platform to enforce standard zero-downtime rollouts automatically.
🚀 Real-world proof
Getsafe faced escalating costs, compliance hurdles, and critical downtimes on legacy infrastructure that halted their rapid scaling.
⭐ The result: By utilizing Qovery to abstract their Kubernetes deployments, Getsafe eliminated downtime during critical upgrades, reduced infrastructure costs, and achieved full regulatory compliance. Read the Getsafe case study.
1. Control your container image registries
In a production environment, relying on a public or unauthenticated image registry introduces an immediate single point of failure. If the external registry experiences an outage, or an image tag is overwritten, your cluster will throw an ImagePullBackOff error, halting scaling events and rollbacks.
Enterprise platform teams must synchronize container images to private, dedicated registries hosted within their cloud provider account. A centralized control plane automates this process, ensuring that an unavailable external registry never impacts a live production workload.
2. High availability through replicas
Relying on a single application instance guarantees downtime. A common misconception is that a single replica survives rolling updates because Kubernetes starts a new instance before shutting down the old one. While true for basic deployments, this rule does not apply to underlying infrastructure failures.
If a node crashes, or the cluster initiates a node drain (such as during an EKS upgrade), a single pod receives a SIGTERM signal and enters a TERMINATING state. The service stops sending traffic, resulting in immediate downtime while the scheduler waits to pull the image and attach disks to a new node. Running a minimum of two replicas is the absolute baseline for high availability.
apiVersion: apps/v1
kind: Deployment
metadata:
name: enterprise-app
spec:
replicas: 23. Enforce pod disruption budgets (pdb)
A PodDisruptionBudget (PDB) limits the number of concurrent disruptions that your application experiences during voluntary disruptions, such as cluster maintenance or upgrades.
If an application runs three replicas, a PDB ensures that at least two pods remain active at all times, preventing the orchestrator from taking the entire service offline simultaneously.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: standard-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: enterprise-app
4. Configure rolling update strategies
Kubernetes offers two primary deployment strategies: Recreate (which forces the application to shut down entirely before starting the new version) and RollingUpdate.
To avoid downtime, RollingUpdate must be applied and tuned using the maxUnavailable and maxSurge parameters. This controls the speed of the deployment, ensuring that enough legacy pods remain active to handle traffic while new pods initialize.
5. Automate deployment rollbacks
Kubernetes does not automatically revert a failed deployment to a previous state natively. If an application crashes on boot, the deployment stalls.
At scale, platform teams must utilize centralized Day-2 platforms or deployment tools (like Helm or ArgoCD) configured with atomic rollbacks. If the health probes of the newly deployed pods fail to return a healthy status within the timeout period, the system must automatically terminate the new pods and restore traffic strictly to the previous stable version.
6. Master liveness and readiness probes
Probes are the diagnostic backbone of zero downtime.
- Liveness probes dictate pod survival. If this probe fails, the kubelet kills the pod and restarts it with an exponential backoff.
- Readiness probes dictate traffic routing. If this probe fails, the pod remains alive, but the load balancer immediately stops sending it HTTP requests.
A simple TCP check is insufficient for enterprise environments. Teams must configure custom HTTP endpoints within their applications to accurately reflect database connectivity and cache health.
7. Tune the initial boot time delay
Heavy enterprise applications (like large Java Spring Boot services) require significant CPU time to initialize schemas before they can accept traffic. If a liveness probe fires before the boot sequence completes, Kubernetes will trap the pod in an infinite restart loop.
Use the initialDelaySeconds parameter to allow the application adequate time to boot before the kubelet begins polling for health.
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 608. Handle graceful termination (sigterm)
Ignoring termination signals results in dropped user connections and corrupted database transactions. When Kubernetes terminates a pod, it sends a SIGTERM signal. The application must be programmed to intercept this signal, finish processing active HTTP requests, close database connections gracefully, and then exit.
If the application ignores the SIGTERM, Kubernetes waits for the terminationGracePeriodSeconds (defaulting to 30 seconds) before executing a hard SIGKILL.
9. Implement pod anti-affinity
Deploying 50 replicas provides no high availability if the scheduler places all 50 pods on the exact same physical node. Pod anti-affinity forces Kubernetes to distribute replicas across different nodes or availability zones.
- Soft anti-affinity (
preferredDuringSchedulingIgnoredDuringExecution) attempts to separate pods, but will group them if node resources are exhausted. This is highly cost-effective for FinOps control. - Hard anti-affinity (
requiredDuringSchedulingIgnoredDuringExecution) strictly forbids pods from sharing a node, ensuring absolute isolation at the cost of requiring more underlying infrastructure.
10. Define strict resource requests and limits
Failing to define CPU and memory limits guarantees downtime. Without memory limits, an application with a memory leak will trigger an Out Of Memory (OOM) kill from the Linux kernel. Without CPU limits, a single pod can monopolize node resources, starving critical system daemonsets and causing the node to become unresponsive.
11. Configure horizontal pod autoscaling (hpa)
Autoscaling prevents downtime during severe traffic spikes by dynamically provisioning new replicas based on CPU utilization or custom metrics.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
Autoscaling is not magic. It relies entirely on the successful configuration of probes, graceful terminations, and accurate resource requests. By abstracting these 11 configurations into a centralized platform engineering strategy, organizations can scale to thousands of clusters while guaranteeing zero downtime and eliminating manual YAML toil.
FAQs
Q: Why does a single Kubernetes replica cause downtime during node drains?
A: When a cluster initiates a node drain for maintenance or an upgrade, the active pod receives a SIGTERM signal and stops accepting traffic. If there is only one replica, the service experiences immediate downtime while the scheduler waits to provision a replacement pod on a new node. A minimum of two replicas is strictly required to maintain traffic routing during this hardware transition.
Q: What is the difference between a Liveness probe and a Readiness probe?
A: A liveness probe determines if a pod is healthy; if it fails, the kubelet kills and restarts the container. A readiness probe determines if the pod is capable of handling HTTP requests; if it fails, the pod remains alive, but the load balancer automatically stops routing user traffic to it until it recovers.
Q: How does Pod Anti-Affinity prevent Kubernetes outages?
A: If multiple replicas of an application are scheduled on the exact same physical node, a single hardware failure will take down all instances simultaneously. Pod anti-affinity rules force the Kubernetes scheduler to distribute replicas across different nodes or geographic availability zones, isolating the blast radius of hardware crashes.

Suggested articles
.webp)











