6 Comments
User's avatar
Steve Latz's avatar

If a major payments application is making 1,000 changes a week to its codebase, even if those changes are "thoroughly" tested, that indicates a very problematic approach to their foundation architecture as well as their change management processes.

SilverLionApps's avatar

I appreciated this article, but it was disappointing that there's no discussion of how Stripe handles the biggest bottleneck (imo) in this approach -- the code review. Knowing how Stripe handles reviewing those 1,000 PRs /week would be valuable insight. I'm sure I could take guesses at what they do, but I hope there's a follow up article on this topic.

JohnWick's avatar

You are right. Would be a good and vital insight considering that number of PRs. IMO having mostly ran on AI with less human intervention comes down to behavioural and spec review through scenarios vs code review itself. Likely goes within feedback layer stack

Lina Dikhtiaruk's avatar

blueprints mixing deterministic guardrails with agentic loops is the architecture everyone building agent workflows should study

Mitchell Kosowski's avatar

The attended vs. unattended framing is the big deal here. Many are still babysitting AI coding toolswatching every step course-correcting constantly. Stripe skipped ahead to "fire and forget."

Important to note the model isn't what makes it work. It's the years of investment in deterministic environments, strong CI gates, and structured tooling that were built for humans. That infrastructure accidentally became the perfect runway for autonomous agents.

Companies with great developer tooling are now getting compounding returns they never planned for. That's a strong case for investing in infra even when the payoff isn't obvious yet.

Pawel Jozefiak's avatar

1,300 PRs a week from AI agents. The infrastructure behind that must be enormous.

I run a single AI agent on a dedicated Mac Mini and even that required serious infrastructure work. Display permissions for headless operation, virtual displays for screen access, 25 LaunchAgents running background services. And that's for one agent.

The coordination problem Stripe solved is what interests me most. When you scale agents, the bottleneck isn't capability. It's environment management. Each agent needs its own context, permissions, and state. Multiply that by hundreds and you need a whole platform team just for agents.

Makes me wonder how many of those 1,300 PRs are fixing issues from other agent PRs. At scale, agent-to-agent coordination is the real challenge.