Blog
Engineering
6
minutes

The importance of SemVer for your applications

For some developers, SemVer can look just cosmetic, nice to have, or simply useless. But SemVer format is mandatory to make reliable software. I'll explain how over one year, we encountered 2 issues related to SemVer. The first one was critical and led to a production outage, while the other was a lot of trouble for several companies to upgrade a managed service.
September 26, 2025
Pierre Mavro
CTO & Co-founder
Summary
Twitter icon
linkedin icon

What is SemVer?

First of all, let's remind us what SemVer is: SemVer (for Semantic Versioning) is the process of assigning either "uniqueversion' names or "uniqueVersion" numbers to unique states of computer software. Within a given version number category (e.g., major, minor), these numbers are generally assigned in increasing order and correspond to new developments in the software. [Wikipedia]

1st issue: Patch version

To get a bit of context, we are using EKS (Kubernetes on AWS) for Qovery production, and we wanted to remove the SSH access (remote) from our Kubernetes nodes.

Config change

On Terraform, it was just a few lines to remove:

resource "aws_eks_node_group" "eks-cluster-workers" {
cluster_name = aws_eks_cluster.eks_cluster.name
...
// lines removed
remote_access {
ec2_ssh_key = "qovery"
source_security_group_ids = [aws_security_group.eks_cluster_workers.id]
}
...
}

The applying workflow for those changes was:

  1. Force the deployment of new EKS nodes (EC2 instances behind it)
  2. Move pods (a set of containers) from old nodes to fresh new nodes
  3. Delete old nodes

Before running the changes, we checked that every pod on the cluster was running smoothly; there was no issue. So we decided to apply the change as everything looked OK and the operation "as usual".

Outage

The rollout occurred, and all pods moved from the old nodes to the new ones. Pods were able to start without problems...instead of one! Our Qovery Engine (written in Rust), is dedicated to infrastructure deployments.

The pod was crashing 2s after starting during the initialization phase and was in CrashloopBackOff status:

2022-01-16T16:52:09Z DEBUG app::utils: message: Requesting deployment task at engine.local.infrastructure                                                    
thread 'tokio-warp-http' panicked at 'called `Result::unwrap()` on an `Err` value: Other("Failed to parse patch version")', /usr/local/cargo/registry/src/git
hub.com-1ecc6299db9ec823/procfs-0.9.1/src/lib.rs:303:34

We've got an unwrap() here (non caught error) on a Result. So this issue was voluntary, not handled, or definitively not expected.

Investigation

Looking deeper into our Cargo dependencies (Cargo.toml), we did not use this "procfs" library directly. Meaning it's a library dependency, so we dug into the Cargo.lock to find the dependency and bingo! Prometheus library was using it:

[[package]]
name = "prometheus"
version = "0.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5986aa8d62380092d2f50f8b1cdba9cb9b6731ffd4b25b51fd126b6c3e05b99c"
dependencies = [
"cfg-if 1.0.0",
"fnv",
"lazy_static",
"libc",
"memchr",
"parking_lot 0.11.2",
"procfs 0.9.1",
"protobuf",
"thiserror",
]

Alright, we see the procfs version is 0.9.1 as described in the error message. So looking at the code in src/lib.rs line 303 we got this:

    /// The version of the currently running kernel.
///
/// This is a lazily constructed static. You can also get this information via
/// [KernelVersion::new()].
static ref KERNEL: KernelVersion = {
KernelVersion::current().unwrap()
};

We can see inside current() that it reads a file (the kernel version):

    /// Returns the kernel version of the currently running kernel.
///
/// This is taken from `/proc/sys/kernel/osrelease`;
pub fn current() -> ProcResult {
read_value("/proc/sys/kernel/osrelease")
}

And the function in charge of this is splitting on dot and ASCII digits:

    pub fn from_str(s: &str) -> Result {
let pos = s.find(|c: char| c != '.' && !c.is_ascii_digit());
let kernel = if let Some(pos) = pos {
let (s, _) = s.split_at(pos);
s
} else {
s
};
let mut kernel_split = kernel.split('.');
...
let patch = patch.parse().map_err(|_| "Failed to parse patch version")?;

Ok(Version { major, minor, patch })
}

Here comes the fun fact! Let's take a look at the kernel version on AWS EC2 instances deployed by EKS:

4.14.256-197.484.amzn2.x86_64

And we expected to have this struct:

pub struct Version {
pub major: u8,
pub minor: u8,
pub patch: u8,
}

So here is the problem in the kernel number (4.14.256-197.484.amzn2.x86_64).

256: we expected to have a patch in u8, whereas the max of a u8 is 255, so we needed at least a u16. (https://github.com/eminence/procfs/pull/140).

It's not common to have such a high number, but not rare. We can see it on Chrome, vanilla Kernel, and many other big projects. After the patch version corresponds to a build change, it's also common to find the "-xxx" version.

Unfortunately, this library wasn't expecting to have numbers higher than 255, and SemVer discussion about this topic doesn't look to be simple: https://github.com/semver/semver/issues/304.

Solutions and fix

At this moment, we had 3 solutions in mind:

  1. Temporarily remove Prometheus lib
  2. Fix by ourselves those issues report them upstream. It can take some time, depending on the number of changes and impact behind it (primarily for lib retro-compatibility)
  3. Look into commits, and try to move to newer versions to see if those issues were fixed

Temporarily remove Prometheus lib

We decided not to remove Prometheus lib, because at Qovery, it's not only used for observability purposes but also to manage the Kubernetes Horizontal Pod Autoscaler (HPA). It helps us to absorb a load of infrastructure changes automatically and scale accordingly to the number of Engines for deployment requests:

engine infra hpa

Fix by ourselves those issues

We wanted to go fast. Making a fork, patching libs, and later on, making a PR to the official project repository sounds a good solution, but it would take time. So before investing this time, we needed to be sure that newer commits of profs and Prometheus were not embedding fixes.

Move to newer versions

Finally, "procfs" in the latest released version had the bug fixed! Prometheus lib was not up to date with this latest release of procfs version, but the main branch was up to date.

So thanks to Rust and Cargo, it's effortless to point to git commit. We updated the Cargo.toml file like this:

[dependencies]
prometheus = { git = "https://github.com/tikv/rust-prometheus", rev = "ac86a264223c8d918a43e739ca3c48bb4aaedb90", features = ["process"] }

And finally, we were able to quickly release and deploy a newer Engine version with those fixes.

2nd issue: AWS Elasticache and Terraform

This one was a pain for a lot of companies. AWS decided to change the version format of their Elasticache service. Before version 6, SemVer was present:

  • 5.0.0
  • 4.0.10
  • 3.2.10

When version 6 was released, they decided to use 6.x version format! (https://web.archive.org/web/20210819001915/https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/SelectEngine.html)

As you can imagine comparing 2 versions with 2 different types (int vs. string), would fail on a lot of software like Terraform. You can easily find all related issues due to this change on the "Terraform AWS provider" GitHub issue page: https://github.com/hashicorp/terraform-provider-aws/issues?q=elasticache+6.x

At this time, it was a headache for people using Terraform (like Qovery) who wanted to upgrade from a previous version and "Terraform AWS provider" developers. AWS understood their mistake as they recently removed 6.x and switched back to SemVer format (https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/SelectEngine.html). And, so...here we go again on the "Terraform AWS provider" on GitHub: https://github.com/hashicorp/terraform-provider-aws/issues/22385

Conclusion

Big corporations like AWS or small libraries maintained by a few developers can easily have a considerable impact when SemVer rules are not strictly respected with the correct type. Versioning your applications is not "cosmetic" at all, as you can see.

Following some rules and good practices is essential if we make reliable software in our tech industry. SemVer is definitively one of them!

Share on :
Twitter icon
linkedin icon
Tired of fighting your Kubernetes platform?
Qovery provides a unified Kubernetes control plane for cluster provisioning, security, and deployments - giving you an enterprise-grade platform without the DIY overhead.
See it in action

Suggested articles

Kubernetes
8
 minutes
Kubernetes management in 2026: mastering Day-2 ops with agentic control

The cluster coming up is the easy part. What catches teams off guard is what happens six months later: certificates expire without a single alert, node pools run at 40% over-provisioned because nobody revisited the initial resource requests, and a manual kubectl patch applied during a 2am incident is now permanent state. Agentic control planes enforce declared state continuously. Monitoring tools just report the problem.

Mélanie Dallé
Senior Marketing Manager
Kubernetes
6
 minutes
Kubernetes observability at scale: how to cut APM costs without losing visibility

The instinct when setting up Kubernetes observability is to instrument everything and send it all to your APM vendor. That works fine at ten nodes. At a hundred, the bill becomes a board-level conversation. The less obvious problem is the fix most teams reach for: aggressive sampling. That is how intermittent failures affecting 1% of requests disappear from your monitoring entirely.

Mélanie Dallé
Senior Marketing Manager
Kubernetes
 minutes
How to automate environment sleeping and stop paying for idle Kubernetes resources

Scaling your deployments to zero is only half the battle. If your cluster autoscaler does not aggressively bin-pack and terminate the underlying worker nodes, you are still paying for idle metal. True environment sleeping requires tight integration between your ingress layer and your node provisioner to actually realize FinOps savings.

Mélanie Dallé
Senior Marketing Manager
Kubernetes
DevOps
6
 minutes
10 best Kubernetes management tools for enterprise fleets in 2026

The structure, table, tool list, and code blocks are all worth keeping. The main work is fixing AI-isms in the prose, updating the case study to real metrics, correcting the FAQ format, and replacing the CTAs with the proper HTML blocks. The tool descriptions need the "Core strengths / Potential weaknesses" headers made less template-y, and the intro needs a sharper human voice.

Mélanie Dallé
Senior Marketing Manager
DevOps
Kubernetes
Platform Engineering
6
 minutes
10 best Red Hat OpenShift alternatives to reduce licensing costs

For years, Red Hat OpenShift has been the safe choice for heavily regulated, on-premise environments. It operates as a secure fortress. But in the public cloud, that fortress acts as an expensive prison. Paying proprietary per-core licensing fees on top of your standard AWS or GCP compute bill is a redundant "middleware tax." Escaping OpenShift requires decoupling your infrastructure from your developer experience by running standard, vanilla Kubernetes paired with an agentic control plane.

Morgan Perry
Co-founder
AI
Product
3
 minutes
Qovery Skill for AI Agents: Deploy Apps in One Prompt

Use Qovery from Claude Code, OpenCode, Codex, and 20+ AI Coding agents

Romaric Philogène
CEO & Co-founder
Kubernetes
 minutes
Stopping Kubernetes cloud waste: agentic automation for enterprise fleets

Agentic Kubernetes resource reclamation is the practice of using an autonomous control plane to continuously identify, suspend, and delete idle infrastructure across a multi-cloud Kubernetes fleet. It replaces manual cleanup and reactive autoscaling with intent-based policies that act on business state, eliminating the configuration drift and cloud waste typical of unmanaged fleets.

Mélanie Dallé
Senior Marketing Manager
Platform Engineering
Kubernetes
DevOps
10
 minutes
What is Kubernetes? The reality of Day-2 enterprise fleet orchestration

Kubernetes focuses on container orchestration, but the reality on the ground is far less forgiving. Provisioning a single cluster is a trivial Day-1 exercise. The true operational nightmare begins on Day 2. Teams that treat multi-cloud fleets like isolated pets inevitably face crushing YAML configuration drift, runaway AWS bills, and severe scaling bottlenecks.

Morgan Perry
Co-founder

It’s time to change
the way you manage K8s

Turn Kubernetes into your strategic advantage with Qovery, automating the heavy lifting while you stay in control.