Free AssessmentHow AI-mature is your organization? Take the test and find out.
← Articles/No. 554 · AI

Coding Agents Write the Code. Who Verifies It Works? We Built the Answer.

Coding agents are good at reading a spec and producing code. But producing code is one step in a longer process. The real loop is Spec -> Code -> Deploy -> Test -> Verify -> Ship. Agents stop at step two.

Romaric Philogene
CEO & Co-founder
JUN 9, 2026 · 8 MIN
Coding Agents Write the Code. Who Verifies It Works? We Built the Answer.

Last week I assigned a Linear issue to a coding agent. Straightforward feature - add a filter to an API endpoint, update the tests, adjust the docs. Ten minutes later I had a PR. The code looked clean. The sandbox tests passed.

Then the real work started. I pulled the branch, spun up my local environment, configured the database seed, launched the app, clicked through the feature manually, found two edge cases the agent missed where the filter broke on empty arrays, pushed fixes, waited for CI. The "10-minute PR" cost me an hour of verification.

Qovery · Kubernetes for the AI era
Build with Claude Code, Deploy with Qovery
Learn more

The agent did the easy part.

Coding agents work

Coding agents work. They're not a gimmick.

40% of Cursor's internal PRs now come from cloud agents. Agent usage across their platform grew 15x in a single year. OpenAI's Codex runs tasks in parallel sandboxes, powered by codex-1 - a model trained via reinforcement learning on real software engineering tasks. Cisco, Temporal, and Superhuman use it in production. GitHub Copilot's coding agent lets you assign GitHub issues directly and it works autonomously inside GitHub Actions, tapping into an ecosystem of 25,000+ actions. Claude Code plugs into any PR or issue with an @claude mention and generates working implementations.

For well-tested codebases with clear specs, these tools save real time. I use them daily. They're good at reading a specification and producing code.

But producing code is one step in a longer process.

The broken loop

Every developer knows the real loop:

What developers do:
Spec -> Code -> Deploy -> Test -> Verify -> Ship

Here's where coding agents stop:

What coding agents do:
Spec -> Code -> ??? 🛑

Everything after the code is still on you. And "everything after the code" is where most of the risk lives. A function that passes unit tests in a sandbox can still break when it hits a real database with production-shaped data. A UI change that looks correct in isolation can blow up when it interacts with the actual authentication flow. An API endpoint that works in a clean room can timeout when it talks to a real upstream service.

The agents write code. The humans verify it works. We're back to doing the hard part manually.

The sandbox trap

Every major coding agent runs in some form of isolation. OpenAI's Codex launched with internet access completely disabled - the agent could only interact with code explicitly provided via GitHub repositories. They walked that back weeks later, adding limited network access, because the constraint was too restrictive for real work.

GitHub Copilot's coding agent limits internet access to "a trusted list of destinations." CI/CD workflows won't run without human approval. The agent can't push to existing branches - only ones it creates.

Claude Code runs on standard GitHub-hosted runners. Secure by default. No access to staging environments, no ability to deploy, no preview URLs.

Every agent operates in a clean room that looks nothing like the environment where its code will actually run.

What each agent can actually do today

Write CodeRun Unit TestsDeploy to Real EnvRun E2E TestsPreview URL
GitHub CopilotYesYes (CI)NoNoNo
OpenAI CodexYesYes (sandbox)NoNoNo
Claude CodeYesYes (runner)NoNoNo
Cursor CloudYesYes (VM)NoNoNo
Qovery AgentYesYesYesYesYes

The security-verification tradeoff

Sandboxing agents makes sense. Giving an autonomous AI agent unrestricted access to your infrastructure is a legitimate security concern. Codex isolates for safety. Copilot blocks CI until a human approves. Claude stays on GitHub's runners.

But security demands isolation, and verification demands access to real infrastructure. Every vendor chose security. Verification stays unsolved.

The result is a gap. The agent writes the code in a sandbox. A human deploys it to a real environment and verifies it works. The loop stays broken.

Cursor said it themselves

Cursor's engineering team published their findings from building and scaling cloud agents. Their conclusions:

Cursor ended up building what they describe as "essentially enterprise IT for agents" - secret redaction, network policies, credential management. They migrated to Temporal for durable execution handling 50 million actions per day across 7 million workflows. All of that infrastructure to give agents a real working environment.

And even Cursor's solution addresses the development environment problem - giving agents access to repos, dependencies, and build tools. It doesn't solve the deployment verification problem: spinning up an ephemeral environment with real databases and real services, deploying the code, running end-to-end tests, and handing the reviewer a working preview URL.

That last mile - from "code that compiles" to "software that works in a real environment" - remains a manual human task across the entire market.

😩
Before
Reviewing a 2,000-line diff and hoping it works in production
😎
After
Clicking a preview URL and seeing it work
Give your coding agents real environments.
Qovery Agent connects your AI coding agent to real infrastructure - with governance, audit trails, and preview URLs. Runs on your Kubernetes cluster.
Try Qovery free

Closing the loop

I've been building infrastructure-as-a-service for 5 years at Qovery. We manage deployments on Kubernetes for thousands of engineering teams. When I saw coding agents hitting this wall, the answer was obvious: they need the same thing developers need. Environments.

This is what the Qovery Agent does. Here's the actual flow:

  1. You write a spec as a Linear issue
  2. You assign it to the Qovery Agent - it shows up as a real member of your Linear workspace, mentionable and assignable like any teammate
  3. The agent claims the issue, reports that it's starting work, and spins up an ephemeral environment on your Kubernetes cluster via Qovery
  4. Inside that environment, your coding agent runs - Claude Code, Codex, Cursor, Gemini, or OpenCode
  5. It clones the repo, creates a branch, implements the task, deploys the application, runs the tests
  6. It opens a PR. The reviewer gets a preview URL pointing to a live, deployed version of the feature
  7. Progress shows up in Linear in real time - thoughts, actions, errors, a step-by-step checklist
  8. After the PR is open, the environment stays alive for a configurable grace period, then auto-cleans
Linear Issue -> Qovery Agent -> K8s Environment -> Your Agent (Claude / Codex / Cursor)
-> Code + Deploy + Test -> PR with Preview URL -> Human Reviews Working Software

The key point: Qovery Agent works with your coding agent. You bring your own. Connect your Anthropic API key and use Claude Code. Connect your OpenAI key and use Codex. Prefer Cursor or Gemini? Plug them in. Five runtimes are supported today. The agent you already use gets the infrastructure it's been missing - it can now deploy, test against real services, and verify its own work.

Security without the tradeoff

Qovery gives agents governed access on your own infrastructure.

Everything the Qovery Agent does runs on your Kubernetes cluster. No data leaves your infrastructure. Inside each workspace container, an HTTP proxy intercepts all outbound agent traffic with full governance controls:

  • DLP filters catch API keys, private keys, and sensitive file paths before they can leak
  • Domain allowlists and blocklists control what the agent can reach
  • A kill switch immediately blocks all outbound traffic if something goes wrong
  • A real-time approval queue lets admins review and approve specific requests
  • Every agent request is logged in a full audit trail

The agent gets a real environment. The environment is inside your infrastructure. You define the governance rules. The security-verification tradeoff dissolves.

What changes

The developer loop becomes:

Linear Issue -> Qovery Agent -> Code + Deploy + Test -> PR with Preview URL -> Human Reviews Working Software

The human's job shifts from "pull the branch, spin up an environment, verify the code works" to "click the preview URL, review working software, decide if it ships."

It's bidirectional too. If the agent gets stuck or the reviewer wants changes, they send a message from Linear. The agent wakes up, reads the instructions, and iterates. No context switching, no terminal juggling.

Coding agents solved the code generation problem. The environment problem is what's left. That's what we built.


References

Romaric Philogene
About the author
Romaric Philogene

Romaric founded Qovery to make Kubernetes accessible to every engineering team. He writes about platform strategy, developer experience, and the future of cloud infrastructure.

Next step

Give your coding agents real environments.

Qovery Agent connects your AI coding agent to real infrastructure - with governance, audit trails, and preview URLs. Runs on your Kubernetes cluster.