The evolution of Kubernetes mTLS: from manual logic to Ambient Mesh



Many mature companies, looking for high security, express a desire for mTLS-encrypted traffic across their clusters, often seeking solutions that avoid the typical complexities associated with service meshes. This is a reasonable request with various solutions available, each presenting distinct trade-offs. This post will explore these options.
What is mTLS?
Mutual TLS (mTLS) is bidirectional network encryption, extending standard TLS (HTTPS).
While uncommon for public web browsers, mTLS is crucial for securing internal infrastructure (ex: Kubernetes clusters). It provides the core properties necessary for a zero trust security model:
- Authenticity: Peers verify each other's identities.
- Confidentiality: Data is protected from eavesdroppers.
- Integrity: Data cannot be tampered with in transit.
These security guarantees are essential for meeting various internal, banking transactions systems, governmental compliance requirements (e.g., FIPS), etc…
Solutions
Manual integration
The most straightforward, or "traditional" method to implement Mutual TLS is by modifying every application to handle its own certificate management. At Qovery we’ve seen multiple customers performing mTLS this way. While it works well, it’s time consuming and requires a lot of effort to keep it under control!
This approach quickly becomes difficult for several reasons, scaling from tricky in small deployments to extremely challenging in large, diverse environments.
Provisioning certificates manually is a massive undertaking:
- PKI Overhead: You must define a naming convention, establish and distribute roots of trust (Certificate Authority), sign certificates, and manage their storage.
- Rotation: You must ensure a mechanism is in place to automatically and reliably rotate certificates before they expire to prevent critical outages (the dreaded "2AM alert").
- Integration: While tools like cert-manager or SPIRE exist to help, most large organizations need to integrate with pre-existing, non-Kubernetes Public Key Infrastructure (PKI), making the implementation even more complex.
Once certificates are managed, every application must be modified and maintained to correctly use them:
- Inconsistent Implementation: Across a polyglot environment (multiple programming languages and frameworks), it is hard to ensure consistent, correct logic for sending and verifying peer certificates.
- Out of the box softwares: Many applications, especially vendor or open-source tools, cannot (or not easily) be modified and may not support mTLS or your specific certificate scheme.
- Development Cost: Rolling out a code change to every application in a company is a slow, costly, and high-risk endeavor.
Implementing this change across an existing cluster creates significant operational risks:
- Non-atomic Rollout: Since you cannot update all services simultaneously, how do you manage the partial state where some services use mTLS and others do not?
- Verification Gap: How do you guarantee 100% of all inter-service traffic is using mTLS? It is easy to miss a communication path or egress point. Of course you can monitor everything, but the workload is heavy.
- Risk of Outages: Incorrectly implemented certificate rotation logic or missed expiration dates are major sources of preventable downtime.
The conclusion is that the complexity and operational overhead of manual, application-level mTLS causes a new DevOps overhead. This is where a platform engineering solution like Qovery delivers its most significant value (Kubernetes)
CNI based solution
A common suggestion in security discussions is to use a Container Network Interface (CNI) for implementing Mutual TLS (mTLS). However, this approach has a critical flaw: no current CNI natively supports true Mutual TLS.
So why the confusion? CNIs are often mentioned alongside mTLS for a few reasons, despite not offering the full solution:
- NetworkPolicy: Most CNIs implement Kubernetes NetworkPolicy. This is a component of a zero-trust architecture, but it only handles basic network segmentation. It does not provide encryption, authenticity, integrity, or confidentiality, the core benefits of mTLS.
- Other Encryption Mechanisms: Some CNIs offer different forms of network encryption. For instance, Calico and Cilium support protocols like WireGuard or IPSec. While these provide a degree of encryption, they are not mTLS and offer different security guarantees.
- Cilium's "Mutual Authentication": Cilium offers a feature it calls "Mutual Authentication." Although inspired by mTLS, it does not use the TLS protocol. Therefore, while it provides a form of peer authentication, it is not a technical substitute for the official "Mutual TLS" requirement.
The mTLS naming certainly creates confusion since, while it provides mutual authentication, it is not a technical equivalent due to the absence of the TLS protocol.
Service Mesh: The standard for automated mTLS
The most common and successful way to deploy Mutual TLS (mTLS) broadly across Kubernetes is by using a sidecar-based service mesh architecture. The most popular examples are Istio and Linkerd.
Instead of requiring you to modify your application code (compared to the manual version I explained above), a service mesh automatically deploys a small, dedicated network proxy (the sidecar) alongside every application container in a Pod.
- Zero-Code mTLS: This proxy intercepts all incoming and outgoing network traffic for the application. It automatically performs the mTLS handshake, certificate management, and encryption/decryption on behalf of the application.
- Application Transparency: The result is that traffic between any two meshed workloads is secured with mTLS without a single change to the application code.
Sidecar meshes are popular because they resolve all the major operational headaches of manual mTLS implementation:
- Certificate Management: The mesh automatically handles the secure provisioning, distribution, and rotation of workload certificates.
- Seamless Migration: Both Istio and Linkerd provide a critical feature for migration: they can automatically send and accept both plaintext and mTLS traffic. This allows you to deploy the mesh incrementally without breaking existing services, before eventually enforcing a strict mTLS-only policy cluster-wide.
- Consistency: They enforce consistent mTLS implementation across all workloads, regardless of the application's programming language.
While service meshes are the dominant choice for mTLS, mostly to be used in production deployments, they come with a key trade-off:
- Complexity and Overhead: Service meshes are large platforms that offer many features beyond mTLS (ex: traffic routing, rate limits, observability…). This can lead to a perception of being overkill if your only goal is mTLS, and it adds resource consumption (CPU, memory) and potential latency due to the extra proxy hop.
Node-Based Architecture
The concerns regarding the overhead and complexity of sidecar-based meshes have led to the development of a lighter, more focused architecture known as Ambient Mode (or Node-Based Mesh). Currently, Istio is the primary service mesh offering this approach.
Ambient Mode was explicitly designed to simplify the adoption of security features, with the core goal being "mTLS everywhere with minimal overhead." Instead of deploying a resource-intensive proxy per-workload (the sidecar model), Ambient Mode deploys a proxy per-node. This proxy handles the transparent mTLS encryption for all pods running on that node.
For users whose primary requirement is just mTLS, Ambient Mode often proves a better fit, directly addressing sidecar pain points:

Conclusion
For mTLS on Kubernetes, the best choice today is typically Ambient mode if this is the only requested feature. It's the fastest, lowest-overhead way to deploy mTLS cluster-wide.
The Sidecar approach is also a very popular and viable option, but be aware it adds a lot of extra functionality beyond just mTLS, which might be overkill, and require a high maintenance.
Manual integration is an option of last resort, suitable only for organizations with very specific needs and mature operations, as it is extremely challenging.

Suggested articles
.webp)










