How We Built Qovery - Part 1
I am excited to launch a new series of engineering articles to dig into all the details of How we Built Qovery. A platform built for DevOps, SRE, Platform Engineers, and Developers since January 2020. Since day 1, the Qovery team has strived to make Qovery as open as possible and fight against the black box effect! In this series of 5 articles, I will explain as much as possible how things work behind the scene. I will start with a high-level view of the services and the architectural design we have chosen to put in place. Let's go!
Romaric PhilogèneSeptember 21, 2022 · 6 min read
Today, developers' teams need to be more autonomous to deliver new features rapidly. However, providing access to infrastructure without sufficient control results in IT waste, including costly cloud security breaches and skyrocketing infrastructure costs. It is not acceptable for most organizations, especially in highly regulated industries. The deep integration of Qovery services (databases, load balancers, domains…) enables developers to manage their own cloud environments in non-production environments while enabling DevOps engineering teams to accelerate infrastructure provisioning dramatically.
In a nutshell, Qovery is a cloud deployment platform focusing on the User Experience. As a Developer, you can test and release features by creating on-demand environments with a delightful experience. As a DevOps, you can integrate Qovery in your existing workflow and keep full control of all pieces to ensure the best practices and security.
To provide such a consolidated experience between Developers and DevOps engineers, we had to create an abstraction layer on top of many infrastructure components such as Kubernetes, load balancers, VPC, security group, domains, etc... We got inspired by platforms such as Heroku for the Developer Experience, VMWare and Rancher for the Ops experience. This is how Qovery has emerged in 3 years and now helps more than 30 000 developers and hundreds of Platform Engineering teams.
Read more about what Platform Engineering is.
If you look at this demo video, you will see that in less than 3 minutes, you can deploy an application from a GitHub repository to AWS. And if you look at this one, you will see that you can clone a complete environment and infrastructure in one click. All those actions happen remotely! Qovery is a remote system that executes actions on the linked cloud account. When using Qovery for the first time, you need to connect your cloud account, and then Qovery takes care of the rest. Creating the VPC, Security Groups, Kubernetes cluster, Prometheus, ... all the services required to run your applications the best way.
The Qovery Engine (open-source) is in charge of bootstrapping your infrastructure, managing the upgrade, and deploying your applications.
Here is an example of what happens when a user installs Qovery on his cloud account. The interesting part is that a local Qovery Engine (2) pulls a task from the control plane, generates all the configuration files (Terraform, Helm, and others), and bootstraps the full infrastructure on the target remote cloud account. The process is fully automatic and takes between 10 to 30 minutes, depending on the cloud provider target. Once the infrastructure is ready, a remote Qovery Engine is created and initiates a secured connection to the control plane.
This is where the user can deploy the first applications and services. Qovery takes care of everything, from the load balancers, temporary domain, TLS, and all the steps required to make the application ready to be used.
And because Qovery integrates with GitHub, GitLab and Bitbucket, if a commit happens on a monitored repository, then Qovery will automatically trigger a new deployment.
Qovery also integrates with external Container Registries and CI platforms like GitHub Actions, GitLab CI, and Circle CI. I will not cover all those use cases here, but I will in the coming parts - since it's very common to use cases from DevOps and Platform Engineers using Qovery.
The Qovery Control Plane is in charge of forwarding tasks to the appropriate Qovery Engine. Indeed, in the case of Qovery, hundreds of Qovery Engines are connected to the Control Plane waiting for new tasks. The connection is initiated by the Qovery Engine to the control plane via gRPC/TLS.
Thousands of requests per second can hit the Control Plane. That's why it's composed of multiple services with their responsibility. (We'll talk about it in Part II)
The Qovery Engine is in charge of executing the tasks pulled from the Control Plane. From infrastructure creation to application deployment and network configuration. The Qovery Engine behaves like a state machine.
The goal is to reach the desired state and report all the operations to the Control Plane of what's going on. In case of task execution failure, the Qovery Engine can remediate it by itself. You can watch this short presentation to get a deep view of how the Engine works 👇
We developed dozens of services to provide the following features:
- Remote Shell Service: a service written in Rust to provide remote secure access to a pod with a wonderful Developer Experience.
- Scheduler: a service written in Rust to schedule tasks over time (E.g, Start and Stop feature).
- Agent: a service written in Rust that forwards the app logs from Loki, retrieves Kubernetes metrics, and states to the Control Plane.
- Webhook Gateway: a service written in Rust built to handle thousands of Git webhook requests per second.
- WebSocket Gateway: a service written in Rust to provide real-time data to the control plane.
- Pleco: A tool to automatically removes Cloud managed services and Kubernetes resources based on tags with TTL
We also use third-party services like:
- Postgres and Redis to store the control plane data
- Auth0 (acquired by Okta) for authentication
- Kubernetes for container scheduling
- Loki to store infra logs into S3
- Chargebee and Stripe for payment
- Grafana and Tableau for data analysis
- BigQuery for storing usage events
- Posthog for product analytics
- Intercom and Discourse for providing community and dedicated support
Qovery is built to make all the infrastructure of our customers safe from external attacks and even internal attacks. Environment Variables, Cloud Credentials, and Connections to the Qovery Control Plane are all encrypted (KMS). We'll dig into this in the next parts.
If you remove the Control Plane, the Qovery Engine will no longer be able to pull tasks, but the overall remote infrastructure will not be impacted. Meaning that no downtime can happen on the remote infrastructure since it's non-dependent on Qovery to run.
Since Qovery is built on Kubernetes and the managed services of the supported cloud providers, all resources are out of the box and configured the right way to be resilient. E.g., When Qovery installs for you a Kubernetes cluster, it's at least running 3 worker nodes in 3 Availability Zones (AZ). So you don't have to worry about if the best practices have been applied.
Qovery V3 will add a fourth pillar - Transparency! In a nutshell, you can track down every change of what happens on your infrastructure with Qovery. We'll have the chance to talk about it later on.
This first part gives us a better sense of how Qovery works. Qovery is a remote system divided into 2. On one side, the control plane is in charge of all the business logic; on the other, the Qovery Engine manages the required infrastructure to deploy the user applications. Security is at the heart of the Qovery design. The Qovery Engine initiates secure connections to the Control Plane and pulls tasks that it is in charge of executing. Dozen of other services are also involved in providing an outstanding user experience.
In the second part, we will dig deeper into how the control plane works. Stay in touch
👋🏼 Pssttt, we are launching our first Platform Engineering Newsletter - feel free to subscribe