How DoorDash migrated from Heroku to AWS
We are launching a new series of articles, called Tech Stories, dedicated to the stories behind today's most inspiring companies (and people) about their technology choices, decisions, or new implementations.
For the first article in this series, I will relate the DoorDash migration story and give you a closer look at when the engineering team realized they needed to switch, how they operated the switch, and their recommendations for others doing the same. Let’s go!
Morgan PerryNovember 23, 2021 · 9 min read
Founded in 2013, DoorDash is an on-demand food delivery platform that allows customers to order food and beverages from restaurants nearby. People can order through the company’s website or their Android and iOS apps respectively.
DoorDash became an instant success with customers due to its flexible workforce of drivers. The company went public in December 2020, making it one of the largest IPO’s in the food delivery industry.
Knowing when to switch tooling as your company and application scales is tough. Switching from one provider or architecture to another is a common path many teams find themselves following.
When DoorDash began, they started with Heroku and then later switched to AWS as their team and needs changed.
For any growing tech company, providing fast and real-time services would not be possible without a robust computing infrastructure to power backend systems. No matter how great the code and applications are if you don’t have the server capacity to run them.
Heroku is the ideal solution for any startup that is just starting out. Basically, the solution offers a cost-effective hosting service, a stunning continuous deployment workflow, and an easy way to integrate it with other software. With this, you (as a developer) can deploy faster. All these let you concentrate on iterating on your product without assigning a dedicated person to DevOps for an extended period.
It was not much different for DoorDash. The team originally started on Heroku, mainly because "it was a simple and convenient way to get our app up and running. Instead of worrying about the low-level complexities of server infrastructure, we could focus our time on developing product features".
But the Doordash tech team quickly realized the limits of this initial choice. Their existing Heroku-based infrastructure wasn’t meeting their needs, and that they would need to upgrade to Amazon Web Services.
In the beginning, dev teams were autonomous and could ship code faster, and independently from each other. But as their traffic scaled up, they started to face some serious problems and limitations with Heroku:
Performance: The most pressing cause was the poor performance of Heroku server instances (known as Dynos): "What you should know is that Heroku runs multiple Dynos on a single Amazon EC2; which makes then each Dyno tremendously constrained in its resources performances (CPU and RAM resources)". Even though they did extensive performance tuning of their apps, they were forced to run many more dynos than they would have liked and it then became impossible to consider scaling. Heroku like any other PaaS is poor when it comes to resource usage. It has strict machine resource structure plans.
Cost Efficiency: As their traffic scaled up, they needed to use more dynos, and then started to overpay. For DoorDash teams, Heroku dynos were very expensive for the computing resources they were getting. "For roughly the same price as a Heroku dyno with 1GB RAM, we could have rented an Amazon EC2 instance with 3.75GB RAM".
Reliability: Although it is a widely adopted solution used by hundreds of thousands of developers, Heroku seems to experience frequent periods of degradation and reliability issues. For instance, DoorDash teams reported that an outage in Heroku’s deployment API would seem to pop up every week or two. Worst, one memorable 12-hour incident prevented them from pushing a critical hotfix and permanently eroded their trust in the platform.
Control: Using a “platform-as-a-service” (PaaS) was not without tradeoffs. With Heroku for instance you are locked with Heroku’s proprietary file system. You lose fine-grained control and visibility over your servers. For DoorDash teams, the need of installing custom software in their server stack was far from straightforward, and it was not possible to SSH into a server to debug a CPU or memory issue.
For many startups, Heroku has been the ideal solution until it started growing. For all of them, hosting Heroku has primarily been about speed. They can develop, ship, and scale without worrying about the infrastructure. But it won’t help once your business starts growing.
DoorDash has not been spared. All these limitations pushed the DoorDash tech team to leave Heroku as soon as possible, and find a new hosting solution to support their scale phase.
The logical upgrade choice for DoorDash: AWS and its Elastic Compute Cloud (EC2) service.
Amazon EC2 is a computing platform with a choice of processor, storage, networking, operating system, and purchase model. Basically, Amazon EC2 provides scalable computing capacity in the AWS cloud. Leveraging it enables organizations to develop and deploy applications faster, without needing to invest in hardware upfront.
For DoorDash teams, the EC2 virtual server instances provided a wide variety of CPU and memory configurations, feature full root access, and offer much better performance-per-dollar than Heroku.
While AWS looked like an attractive solution, it requires significant investment and commitment into DevOps in terms of maintaining infrastructure. AWS is like Lego building blocks and offers more than a hundred services and thousands of features. However, the learning curve can be overwhelming and filled with technical challenges that may be far from the promise of technical abstraction.
While going to AWS seemed like a no-brainer for the DoorDash teams, migrating from a high-level, managed platform provider like Heroku can be a daunting task.
There was a significant amount of work needed to administer a cluster of servers and set up continuous code deployment. Here is their challenge at that time:
To automate the process of setting up our servers, you would normally need to set up “configuration management” software such as Chef or Puppet. However, these tools tend to be clunky and require learning a domain-specific language. We would need to write scripts to perform tasks such as installing all of our app’s third-party dependencies or configuring all the software in our server stack. And to test that this code works, we would need to set up a local development environment using Vagrant. All of this amounted to a significant amount of complexity that was not looking too appealing. - Alvin Chow, Staff Software Engineer @ DoorDash
Using Docker allowed us to make this transition in much less time and effort than would otherwise have been possible.
Given the complex task ahead, the teams were looking for an easier way to access AWS. That's when they considered using Docker to manage this migration.
In nutshell, Docker is an open-source platform for building, deploying, and managing containerized applications. So it's a high-level virtualization platform that lets you package software as isolated units called containers. Containers are created from images that include everything needed to run the packaged workload, such as executable binaries and dependency libraries.
Simply put, Docker containers are simply snapshots of a known and working system state, making it possible to “build once, run anywhere”.
After learning what Docker was capable of, we knew that it could play a key role in accelerating our AWS migration - Alvin Chow, Staff Software Engineer @ DoorDash
The magic could begin. So instead of spending effort configuring EC2 instances to run their apps, the idea was to move this complexity into the Docker container environment. Then,
Building the Docker Image
To briefly explain, a Docker image is a file used to execute code in a Docker container. Docker images act as a set of instructions to build a Docker container, like a template. Docker images also act as the starting point when using Docker. An image is comparable to a snapshot in virtual machine (VM) environments.
So for the DoorDash team, the first step was to define the Docker image that would house their app, writing a simple configuration file (aka "Dockerfile"). At that time, the app being in Django, most of the work was figuring out how to get tricky third-party Python dependencies to install and set up the more complex software components in their technical stack (like web servers or databases).
Preparing the Docker Environment
Docker is both a development tool and a runtime environment. A running Docker container is an instantiation of an image (see step 1).
Then, the second step was to set up a Docker runtime environment on EC2. To save time, the DoorDash teams then decided to use AWS Opswork (a configuration management service that provides managed instances of Chef and Puppet). The main benefits of Opswork for DoorDash are it allowed them to use Chef to automate how servers are configured, deployed, and managed across their EC2 instances. While they use (and couldn't avoid) Chef, they finally didn’t have to spend time dealing with it because they had already defined the vast majority of system configuration inside their Docker image.
Now let's take a closer look at what the Docker container-based deployment flow looked like:
This code deployment flow mostly consisted of building a Docker image off their codebase and distributing it to the EC2 instances, through the Docker image server. The only thing DoorDash teams needed to do was write a few short Chef scripts to download the latest Docker image build and start up a new Docker container.
- 2x performance gain compared to the Heroku environment
- DoorDash’s average API response time dropped from around 220ms to under 100ms
- Background task execution times dropped by half
- Cutting the hosting bill dramatically (with more robust EC2 hardware, the team was able to run half the number of servers)
- With an extra degree of control over the software stack, the team could improve the web API throughput (by installing Nginx as a reverse proxy in front of the app servers)
From conception to implementation, the migration from Heroku to AWS took the DoorDash team about a month with two full-time engineers. This also included learning Docker, AWS, integrating everything together, and testing (a lot).
DoorDash teams were fully satisfied with their decision to move to AWS, and Docker was a key driver to do it quickly and easily.
What we appreciated most about Docker was that once we got a container working locally, we were confident that it would work in a production environment as well. Because of this, we were able to make the switch with zero glitches or problems - Alvin Chow, Staff Software Engineer @ DoorDash
The team was able to greatly improve their server performance and gain much more control over their software stack, without having to manage and support a ton of DevOps and SysAdmin resources.
Another big advantage for the team was portability. Rather than being locked into an existing hosting provider, they now can easily deploy to move an application (or data) from one cloud service provider to another without the need to rewrite or restructure them.
Docker is a powerful technology that significantly helped DoorDash (and should empower any developers) to take control over how they deploy their apps while shrinking the marginal benefit that managed PaaS solutions as Heroku provide.
Needless to say, Docker revolutionized software containerization and became the de facto choice for containerization. At Qovery, we leverage the full power of Docker containers by making it a core part of our product experience, allowing any developer to easily build and deploy great applications on AWS - in minutes instead of months.
Source: this story is inspired by Alvin's blog post (former Staff Software Engineer @ DoorDash)