How We Built Qovery - Part 1

I am excited to launch a new series of engineering articles to dig into all the details of How we Built Qovery. A platform built for DevOps, SRE, Platform Engineers, and Developers since January 2020. Since day 1, the Qovery team has strived to make Qovery as open as possible and fight against the black box effect! In this series of 5 articles, I will explain as much as possible how things work behind the scene. I will start with a high-level view of the services and the architectural design we have chosen to put in place. Let's go!

Romaric Philogène

Romaric Philogène

September 21, 2022 · 6 min read
How We Built Qovery - Part 1 - Qovery
Written byRomaric Philogène

Romaric Philogène

CEO and co-founder of Qovery. Romaric has 10+ years of experience in R&D. From the Ad-Tech to the financial industry, he has deep expertise in highly-reliable and performant systems.

See all articles
EngineeringQovery

Some context

Today, developers' teams need to be more autonomous to deliver new features rapidly. However, providing access to infrastructure without sufficient control results in IT waste, including costly cloud security breaches and skyrocketing infrastructure costs. It is not acceptable for most organizations, especially in highly regulated industries. The deep integration of Qovery services (databases, load balancers, domains…) enables developers to manage their own cloud environments in non-production environments while enabling DevOps engineering teams to accelerate infrastructure provisioning dramatically.

In a nutshell, Qovery is a cloud deployment platform focusing on the User Experience. As a Developer, you can test and release features by creating on-demand environments with a delightful experience. As a DevOps, you can integrate Qovery in your existing workflow and keep full control of all pieces to ensure the best practices and security.

To provide such a consolidated experience between Developers and DevOps engineers, we had to create an abstraction layer on top of many infrastructure components such as Kubernetes, load balancers, VPC, security group, domains, etc... We got inspired by platforms such as Heroku for the Developer Experience, VMWare and Rancher for the Ops experience. This is how Qovery has emerged in 3 years and now helps more than 30 000 developers and hundreds of Platform Engineering teams.

Read more about what Platform Engineering is.

It's all remote

If you look at this demo video, you will see that in less than 3 minutes, you can deploy an application from a GitHub repository to AWS. And if you look at this one, you will see that you can clone a complete environment and infrastructure in one click. All those actions happen remotely! Qovery is a remote system that executes actions on the linked cloud account. When using Qovery for the first time, you need to connect your cloud account, and then Qovery takes care of the rest. Creating the VPC, Security Groups, Kubernetes cluster, Prometheus, ... all the services required to run your applications the best way.

Overview of Qovery Architecture (simplistic view)
Overview of Qovery Architecture (simplistic view)

The Qovery Engine (open-source) is in charge of bootstrapping your infrastructure, managing the upgrade, and deploying your applications.

A user installs Qovery on his cloud account (simplistic view)
A user installs Qovery on his cloud account (simplistic view)

Here is an example of what happens when a user installs Qovery on his cloud account. The interesting part is that a local Qovery Engine (2) pulls a task from the control plane, generates all the configuration files (Terraform, Helm, and others), and bootstraps the full infrastructure on the target remote cloud account. The process is fully automatic and takes between 10 to 30 minutes, depending on the cloud provider target. Once the infrastructure is ready, a remote Qovery Engine is created and initiates a secured connection to the control plane.

A user deploy a first application with Qovery (simplistic view)
A user deploy a first application with Qovery (simplistic view)

This is where the user can deploy the first applications and services. Qovery takes care of everything, from the load balancers, temporary domain, TLS, and all the steps required to make the application ready to be used.

Deployment via Git (simplistic view)
Deployment via Git (simplistic view)

And because Qovery integrates with GitHub, GitLab and Bitbucket, if a commit happens on a monitored repository, then Qovery will automatically trigger a new deployment.

Qovery also integrates into an existing CI with an existing Container Registry (simplistic view)
Qovery also integrates into an existing CI with an existing Container Registry (simplistic view)

Qovery also integrates with external Container Registries and CI platforms like GitHub Actions, GitLab CI, and Circle CI. I will not cover all those use cases here, but I will in the coming parts - since it's very common to use cases from DevOps and Platform Engineers using Qovery.

Qovery Control Plane

Hundreds of Qovery Engines pull tasks from the Control Plane (simplistic view)
Hundreds of Qovery Engines pull tasks from the Control Plane (simplistic view)

The Qovery Control Plane is in charge of forwarding tasks to the appropriate Qovery Engine. Indeed, in the case of Qovery, hundreds of Qovery Engines are connected to the Control Plane waiting for new tasks. The connection is initiated by the Qovery Engine to the control plane via gRPC/TLS.

Multiple user interfaces are provided (simplistic view)
Multiple user interfaces are provided (simplistic view)

The Control Plane also provides a rich open web API that serves our open-source web interface, Terraform Provider and CLI.

This how the git hook requests are handled by Qovery
This how the git hook requests are handled by Qovery

Thousands of requests per second can hit the Control Plane. That's why it's composed of multiple services with their responsibility. (We'll talk about it in Part II)

Qovery Engine

The Qovery Engine is in charge of executing the tasks pulled from the Control Plane. From infrastructure creation to application deployment and network configuration. The Qovery Engine behaves like a state machine.

Qovery Engine workflow (simplistic view)
Qovery Engine workflow (simplistic view)

The goal is to reach the desired state and report all the operations to the Control Plane of what's going on. In case of task execution failure, the Qovery Engine can remediate it by itself. You can watch this short presentation to get a deep view of how the Engine works 👇

Other services

We developed dozens of services to provide the following features:

  • Remote Shell Service: a service written in Rust to provide remote secure access to a pod with a wonderful Developer Experience.
  • Scheduler: a service written in Rust to schedule tasks over time (E.g, Start and Stop feature).
  • Agent: a service written in Rust that forwards the app logs from Loki, retrieves Kubernetes metrics, and states to the Control Plane.
  • Webhook Gateway: a service written in Rust built to handle thousands of Git webhook requests per second.
  • WebSocket Gateway: a service written in Rust to provide real-time data to the control plane.
  • Pleco: A tool to automatically removes Cloud managed services and Kubernetes resources based on tags with TTL

We also use third-party services like:

  • Postgres and Redis to store the control plane data
  • Auth0 (acquired by Okta) for authentication
  • Kubernetes for container scheduling
  • Loki to store infra logs into S3
  • Chargebee and Stripe for payment
  • Grafana and Tableau for data analysis
  • BigQuery for storing usage events
  • Posthog for product analytics
  • Intercom and Discourse for providing community and dedicated support

Side note: our control plane's main service (core) is not written in Rust (haha) but in Kotlin and with the Spring Boot framework.

3 Architectural Pillars

Zero Trust

Qovery is built to make all the infrastructure of our customers safe from external attacks and even internal attacks. Environment Variables, Cloud Credentials, and Connections to the Qovery Control Plane are all encrypted (KMS). We'll dig into this in the next parts.

Autonomous

If you remove the Control Plane, the Qovery Engine will no longer be able to pull tasks, but the overall remote infrastructure will not be impacted. Meaning that no downtime can happen on the remote infrastructure since it's non-dependent on Qovery to run.

Resilient

Since Qovery is built on Kubernetes and the managed services of the supported cloud providers, all resources are out of the box and configured the right way to be resilient. E.g., When Qovery installs for you a Kubernetes cluster, it's at least running 3 worker nodes in 3 Availability Zones (AZ). So you don't have to worry about if the best practices have been applied.

Transparency

Qovery V3 will add a fourth pillar - Transparency! In a nutshell, you can track down every change of what happens on your infrastructure with Qovery. We'll have the chance to talk about it later on.

Wrapping up

This first part gives us a better sense of how Qovery works. Qovery is a remote system divided into 2. On one side, the control plane is in charge of all the business logic; on the other, the Qovery Engine manages the required infrastructure to deploy the user applications. Security is at the heart of the Qovery design. The Qovery Engine initiates secure connections to the Control Plane and pulls tasks that it is in charge of executing. Dozen of other services are also involved in providing an outstanding user experience.

In the second part, we will dig deeper into how the control plane works. Stay in touch

---

👋🏼 Pssttt, we are launching our first Platform Engineering Newsletter - feel free to subscribe

Test and Release Features 4x Faster with On-demand Environments

Qovery is a Platform to Deploy Production-like Environments in your AWS account in Seconds; Helping Developers To Test and Release Features Faster ⚡️

Try it out now!
Test and Release Features 4x Faster with On-demand Environments
EngineeringQovery

You might also like

Why Preview Environments Are The New Thing in DevOps

Consider the scenario where a complex product is being developed by dozens of engineers working on different features of a product. Not only the development environment is the same, but the staging environment is also shared. As different features are merged into the shared environment, they break the code. So QA has to wait until this is fixed. A feature or bug fix may be working perfectly on the developer’s own machine, but there is no way for the QA team to test that one feature in isolation. This problem is intensified when the product has a lot of integrations and different data sources. The DevOps team simply cannot provision so many different environments on time; they will miss the bus. If developers keep using the same shared staging environment, it will take ages to deliver a mature product to the market. This is where the concept of preview environments comes into play. In this article, we will tell you what preview environments are and how they solve the above problems. We will also discuss why so many DevOps teams are adopting it to improve team productivity. - Read more

July 26, 2022 · 5 min read
Why Preview Environments Are The New Thing in DevOps

Your CI GitFlow is Broken

One of the great things about GitFlow is that it makes parallel development very easy by isolating new development from finished work. New development, such as features, is done in feature branches and is only merged back into the main body of code when developers have validated the feature and the code is ready for release. For most development teams, feature validation happens in a staging branch coupled with a single testing environment. When this single environment is broken, releases are delayed, developers are stressed, and your team loses the benefits of GitFlow - promoting parallel development. In this article, I will explain why using a single testing environment breaks the GitFlow benefits and introduce a solution to get dynamic testing environments per branch - Preview Environments. - Read more

August 28, 2022 · 3 min read
Your CI GitFlow is Broken