← Articles/No. 554 · AI Agents

Coding Agents Write the Code. Who Verifies It Works? We Built the Answer.

Coding agents are good at reading a spec and producing code. But producing code is one step in a longer process. The real loop is Spec -> Code -> Deploy -> Test -> Verify -> Ship. Agents stop at step two.

Romaric Philogene

CEO & Co-founder

JUN 9, 2026 · 8 MIN

Coding Agents Write the Code. Who Verifies It Works? We Built the Answer.

This article is part of our guide to Agentic Infrastructure.

Last week I assigned a Linear issue to a coding agent. Straightforward feature - add a filter to an API endpoint, update the tests, adjust the docs. Ten minutes later I had a PR. The code looked clean. The sandbox tests passed.

Qovery · Kubernetes for the AI era

Build with Claude Code, Deploy with Qovery

Learn more

Then the real work started. I pulled the branch, spun up my local environment, configured the database seed, launched the app, clicked through the feature manually, found two edge cases the agent missed where the filter broke on empty arrays, pushed fixes, waited for CI. The "10-minute PR" cost me an hour of verification.

The agent did the easy part.

Coding agents work

Coding agents work. They're not a gimmick.

40% of Cursor's internal PRs now come from cloud agents. Agent usage across their platform grew 15x in a single year. OpenAI's Codex runs tasks in parallel sandboxes, powered by codex-1 - a model trained via reinforcement learning on real software engineering tasks. Cisco, Temporal, and Superhuman use it in production. GitHub Copilot's coding agent lets you assign GitHub issues directly and it works autonomously inside GitHub Actions, tapping into an ecosystem of 25,000+ actions. Claude Code plugs into any PR or issue with an @claude mention and generates working implementations.

For well-tested codebases with clear specs, these tools save real time. I use them daily. They're good at reading a specification and producing code.

But producing code is one step in a longer process.

What are coding agents?

Coding agents are AI systems that autonomously read a specification, write code, and open a pull request with minimal human intervention. Unlike a code-completion assistant that suggests the next line as you type, a coding agent takes a task end-to-end: it interprets the requirement, navigates the codebase, edits multiple files, runs tests in a sandbox, and produces a reviewable PR.

The current generation of AI coding agents includes:

GitHub Copilot coding agent - assign a GitHub issue and it works autonomously inside GitHub Actions.
OpenAI Codex - runs tasks in parallel sandboxes, powered by the codex-1 model.
Claude Code - triggered by an @claude mention in any PR or issue.
Cursor cloud agents - background agents that handle tasks in isolated VMs.
Gemini and OpenCode - additional runtimes gaining adoption.

These tools are genuinely effective at the generation step. Where every one of them stops short is the same place: verifying that the code actually works in a real environment. That gap - not code quality - is the real bottleneck, and it's what the rest of this article is about.

The broken loop

Every developer knows the real loop:

What developers do:
Spec -> Code -> Deploy -> Test -> Verify -> Ship ✓

Here's where coding agents stop:

What coding agents do:
Spec -> Code -> ???  🛑

Everything after the code is still on you. And "everything after the code" is where most of the risk lives. A function that passes unit tests in a sandbox can still break when it hits a real database with production-shaped data. A UI change that looks correct in isolation can blow up when it interacts with the actual authentication flow. An API endpoint that works in a clean room can timeout when it talks to a real upstream service.

The agents write code. The humans verify it works. We're back to doing the hard part manually.

The sandbox trap

Every major coding agent runs in some form of isolation. OpenAI's Codex launched with internet access completely disabled - the agent could only interact with code explicitly provided via GitHub repositories. They walked that back weeks later, adding limited network access, because the constraint was too restrictive for real work.

GitHub Copilot's coding agent limits internet access to "a trusted list of destinations." CI/CD workflows won't run without human approval. The agent can't push to existing branches - only ones it creates.

Claude Code runs on standard GitHub-hosted runners. Secure by default. No access to staging environments, no ability to deploy, no preview URLs.

Every agent operates in a clean room that looks nothing like the environment where its code will actually run.

What each agent can actually do today

	Write Code	Run Unit Tests	Deploy to Real Env	Run E2E Tests	Preview URL
GitHub Copilot	Yes	Yes (CI)	No	No	No
OpenAI Codex	Yes	Yes (sandbox)	No	No	No
Claude Code	Yes	Yes (runner)	No	No	No
Cursor Cloud	Yes	Yes (VM)	No	No	No
Qovery Agent	Yes	Yes	Yes	Yes	Yes

The security-verification tradeoff

Sandboxing agents makes sense. Giving an autonomous AI agent unrestricted access to your infrastructure is a legitimate security concern. Codex isolates for safety. Copilot blocks CI until a human approves. Claude stays on GitHub's runners.

But security demands isolation, and verification demands access to real infrastructure. Every vendor chose security. Verification stays unsolved.

The result is a gap. The agent writes the code in a sandbox. A human deploys it to a real environment and verifies it works. The loop stays broken.

Cursor said it themselves

Cursor's engineering team published their findings from building and scaling cloud agents. Their conclusions:

Cursor ended up building what they describe as "essentially enterprise IT for agents" - secret redaction, network policies, credential management. They migrated to Temporal for durable execution handling 50 million actions per day across 7 million workflows. All of that infrastructure to give agents a real working environment.

And even Cursor's solution addresses the development environment problem - giving agents access to repos, dependencies, and build tools. It doesn't solve the deployment verification problem: spinning up an ephemeral environment with real databases and real services, deploying the code, running end-to-end tests, and handing the reviewer a working preview URL.

That last mile - from "code that compiles" to "software that works in a real environment" - remains a manual human task across the entire market.

😩

Before

Reviewing a 2,000-line diff and hoping it works in production

😎

After

Clicking a preview URL and seeing it work

Give your coding agents real environments.

Qovery Agent connects your AI coding agent to real infrastructure - with governance, audit trails, and preview URLs. Runs on your Kubernetes cluster.

Try Qovery free Talk with us

Closing the loop

I've been building infrastructure-as-a-service for 5 years at Qovery. We manage deployments on Kubernetes for thousands of engineering teams. When I saw coding agents hitting this wall, the answer was obvious: they need the same thing developers need. Environments.

This is what the Qovery Agent does. Here's the actual flow:

You write a spec as a Linear issue
You assign it to the Qovery Agent - it shows up as a real member of your Linear workspace, mentionable and assignable like any teammate
The agent claims the issue, reports that it's starting work, and spins up an ephemeral environment on your Kubernetes cluster via Qovery
Inside that environment, your coding agent runs - Claude Code, Codex, Cursor, Gemini, or OpenCode
It clones the repo, creates a branch, implements the task, deploys the application, runs the tests
It opens a PR. The reviewer gets a preview URL pointing to a live, deployed version of the feature
Progress shows up in Linear in real time - thoughts, actions, errors, a step-by-step checklist
After the PR is open, the environment stays alive for a configurable grace period, then auto-cleans

  Linear Issue
   -> 
  Qovery Agent
   -> 
  K8s Environment
   -> 
  Your Agent (Claude / Codex / Cursor)
  

   -> 
  Code + Deploy + Test
   -> 
  PR with Preview URL
   -> 
  Human Reviews Working Software

The key point: Qovery Agent works with your coding agent. You bring your own. Connect your Anthropic API key and use Claude Code. Connect your OpenAI key and use Codex. Prefer Cursor or Gemini? Plug them in. Five runtimes are supported today. The agent you already use gets the infrastructure it's been missing - it can now deploy, test against real services, and verify its own work.

Security without the tradeoff

Qovery gives agents governed access on your own infrastructure.

Everything the Qovery Agent does runs on your Kubernetes cluster. No data leaves your infrastructure. Inside each workspace container, an HTTP proxy intercepts all outbound agent traffic with full governance controls:

DLP filters catch API keys, private keys, and sensitive file paths before they can leak
Domain allowlists and blocklists control what the agent can reach
A kill switch immediately blocks all outbound traffic if something goes wrong
A real-time approval queue lets admins review and approve specific requests
Every agent request is logged in a full audit trail

The agent gets a real environment. The environment is inside your infrastructure. You define the governance rules. The security-verification tradeoff dissolves.

What changes

The developer loop becomes:

Linear Issue -> Qovery Agent -> Code + Deploy + Test -> PR with Preview URL -> Human Reviews Working Software

The human's job shifts from "pull the branch, spin up an environment, verify the code works" to "click the preview URL, review working software, decide if it ships."

It's bidirectional too. If the agent gets stuck or the reviewer wants changes, they send a message from Linear. The agent wakes up, reads the instructions, and iterates. No context switching, no terminal juggling.

Coding agents solved the code generation problem. The environment problem is what's left. That's what we built - see Qovery for AI coding agents.

Frequently asked questions

What is a coding agent?

A coding agent is an AI system that autonomously takes a software task from specification to pull request - interpreting the requirement, editing code across files, running tests, and opening a reviewable PR. It goes beyond autocomplete-style assistants by handling the full generation step on its own, typically inside a sandboxed environment.

What are the best coding agents in 2026?

The leading coding agents are GitHub Copilot's coding agent, OpenAI Codex, Claude Code, and Cursor's cloud agents, with Gemini and OpenCode gaining adoption. They differ in how they're triggered and where they run, but all of them are strong at generating code and limited by the same constraint: they can't deploy to or verify against real environments on their own.

Can coding agents deploy and test their own code?

Not by default. Every major coding agent runs in an isolated sandbox for security, which means no access to staging environments, real databases, or preview URLs. They can write code and run unit tests, but deploying to a real environment and running end-to-end verification remains a manual human task - unless you connect them to a platform like the Qovery Agent that provides governed, real environments.

What is the difference between a coding agent and an AI coding assistant?

An AI coding assistant (like inline autocomplete) suggests code as you type and keeps the human in control of every step. A coding agent works autonomously: you hand it a task and it produces a finished PR. Agents take on more of the loop, which is exactly why the missing "deploy and verify" step matters so much more for them.

About the author

Romaric Philogene

Romaric founded Qovery to make Kubernetes accessible to every engineering team. He writes about platform strategy, developer experience, and the future of cloud infrastructure.

Next step

Give your coding agents real environments.

Qovery Agent connects your AI coding agent to real infrastructure - with governance, audit trails, and preview URLs. Runs on your Kubernetes cluster.

Try Qovery free Talk with us

All articles →

566 · AI Agents6 min

Coding Agents Write the Code. Who Verifies It Works? We Built the Answer.

Coding agents work

What are coding agents?

The broken loop

The sandbox trap

What each agent can actually do today

The security-verification tradeoff

Cursor said it themselves

Closing the loop

Security without the tradeoff

What changes

Give your coding agents real environments.

More articles

Claude Routines Need a Governed Home: Centralizing What Your AI Agents Can Reach

What Is an MCP Server for Infrastructure? How AI Agents Deploy Safely

Base44 vs Lovable: Which AI App Builder Should You Use in 2026?