Welcome to FlowZap, the App to diagram with Speed, Clarity and Control.

What Shopify River Teaches Us About Internal AI Agent Architecture — and What It Changes for Insurance, Banking, and Telecom

6/28/2026

Tags: Shopify, River, AI Agents, Architecture, Insurance, Banking, Telecom

Jules Kovac

Jules Kovac

Business Analyst, Founder

What Shopify River Teaches Us About Internal AI Agent Architecture — and What It Changes for Insurance, Banking, and Telecom

 

Why This Matters Now

Shopify made a bet in 2024 that seemed absurd at the time: migrate the entire company to a single monorepo ("World") and impose Nix everywhere for reproducibility. The reason? "Code is going to be increasingly written with AI, and our infrastructure needs to be the substrate for that."

Two years later, that bet paid off. The article "Under the River," published on May 28, 2026 on Shopify's engineering blog — and co-authored by the agent itself — details the infrastructure that made this deployment possible.

The numbers: 59,918 sessions in 30 days, 5,170 Slack channels, 7,000+ people reached, 3,536 agent-co-authored PRs merged. But here's what matters for the AI process architect: not the numbers. The architecture underneath.

 

The 4-Layer Architecture

Here's the full flow from a question asked to @River in Slack to a co-authored commit:

slack { # Slack - Interface
n1: circle label:"@River in Slack"
n2: rectangle label:"Public channel question"
n11: rectangle label:"Response received"
n1.handle(right) -> n2.handle(left)
n2.handle(bottom) -> river.n3.handle(top) [label="1. Request"]
}

river { # River - Agent Surface
n3: rectangle label:"Analyzes request"
n4: rectangle label:"Delegates to Aquifer"
n10: rectangle label:"Formats response"
n3.handle(right) -> n4.handle(left)
n4.handle(bottom) -> aquifer.n5.handle(top) [label="2. Session"]
aquifer.n9.handle(top) -> n10.handle(bottom) [label="5. Result"]
n10.handle(top) -> slack.n11.handle(bottom) [label="6. Response"]
}

aquifer { # Aquifer - Foundation
n5: rectangle label:"Harness plans"
n6: rectangle label:"Sandbox executes"
n7: diamond label:"Done?"
n8: rectangle label:"Consolidates"
n9: rectangle label:"Returns result"
n5.handle(right) -> n6.handle(left)
n6.handle(bottom) -> world.n12.handle(top) [label="3. Read/write"]
n6.handle(right) -> n7.handle(left)
n7.handle(top) -> n5.handle(bottom) [label="No"]
n7.handle(bottom) -> n8.handle(top) [label="Yes"]
n8.handle(right) -> n9.handle(left)
}

world { # World - Monorepo
n12: rectangle label:"Executes on repo"
n13: rectangle label:"Runs tests - opens PR"
n12.handle(right) -> n13.handle(left)
n13.handle(top) -> aquifer.n7.handle(bottom) [label="4. Output"]
}

Four layers, six cross-lane messages. None of these layers is the model. The model is an implementation detail inside the Harness. The architecture is the decision that counts.

 

Decision 1: Agent-Friendly = Human-Friendly (the counterintuitive pattern)

Shopify discovered that every change made for agents was also the right change for humans:

Human Problem Agent Problem Shared Solution
Unreproducible dev environmentAgent can't reproduce anything eitherNix everywhere
Fragmented repoAgent can't see across silosMonorepo "World"
Undocumented knowledgeAgent can't learn those thingsWritten skills files
"The work to make a codebase legible to an agent is simply the debt you owe to your human engineers. Agents make that debt visible." — Shopify Engineering

What this changes for the AI process architect: Stop building agents that compensate for technical debt. Build infrastructure that makes the codebase legible — for humans AND agents. The ROI is dual-purpose.

Insurance transposition: An insurance company that documents its underwriting rules for an agent also documents them for its human underwriters. The rule repository becomes the shared asset.

 

Decision 2: A Private Agent Has a Ceiling

River has a radical constraint: it only works in public channels. No direct messages. Every conversation becomes an indexed Slack transcript, searchable by all Shopify employees.

Why? Because a private agent has a ceiling: the person at the keyboard.

"If every interaction with an agent happens in a private window, the only person who learns anything is the person at the keyboard." — Tobi Lütke

Shopify mines this public conversation corpus. One developer's hard-won fix becomes the next developer's starting point. The agent improves without model fine-tuning, simply by absorbing patterns from the corpus. The codebase teaches the agent. The agent teaches the codebase.

What this changes for the AI process architect: Your next internal agent should be public by default. An agent session's "privacy" is a disadvantage — it prevents collective learning. The conversation corpus is a compounding asset.

Banking transposition: In a bank, compliance agents and developers use separate channels. A public internal agent forces cross-functionality — the regulatory question asked by a developer becomes visible to the entire compliance team. The corpus becomes institutional memory.

 

Decision 3: Decouple Brain from Hands

This is Aquifer's central architectural decision. Shopify decomposes the infrastructure into three entities:

Session → Durable. Append-only event log. Postgres. Canonical truth.
Harness → Agent loop. Reads history, calls the LLM, emits tool intents. Disposable.
Sandbox → Where code runs. Filesystem, shell, the repo. Disposable.

dev { # Dev - Slack
n1: circle label:"@River mention"
n2: rectangle label:"Agent request"
n12: rectangle label:"Receives response"
n1.handle(right) -> n2.handle(left)
n2.handle(bottom) -> harness.n3.handle(top) [label="Prompt"]
}

harness { # Harness - Brain
n3: rectangle label:"Loads history"
n4: rectangle label:"Calls LLM"
n5: rectangle label:"Emits tool intents"
n11: rectangle label:"Responds to dev"
n3.handle(right) -> n4.handle(left)
n4.handle(right) -> n5.handle(left)
n5.handle(bottom) -> sandbox.n6.handle(top) [label="bash, edit"]
sandbox.n10.handle(top) -> n11.handle(bottom) [label="Result"]
n11.handle(top) -> dev.n12.handle(bottom) [label="Response"]
}

sandbox { # Sandbox - Hands
n6: rectangle label:"Sets up environment"
n7: rectangle label:"Runs commands"
n8: diamond label:"OK?"
n9: rectangle label:"Raw output"
n10: rectangle label:"Returns output"
n6.handle(right) -> n7.handle(left)
n7.handle(right) -> n8.handle(left)
n8.handle(right) -> n9.handle(left) [label="Yes"]
n8.handle(bottom) -> n6.handle(top) [label="No - retry"]
n9.handle(right) -> n10.handle(left)
}

The Harness lives outside the Sandbox. The agent doesn't live where the code lives.

This pattern isn't unique to Shopify. Anthropic formalizes it differently with its tool use and computer use — where the agent emits tool intents from a secure context to an isolated execution environment. OpenAI does it with Operator, where the "brain" (the GPT reasoning about the task) is decoupled from the "browser" that executes actions. Hermes Agent (Nous Research) applies the same principle with its profiles — the SOUL.md (brain) is distinct from the skills and execution sandbox.

The key difference: Shopify pushes the decoupling all the way to the session. At Anthropic, the session is tied to the process. At Shopify, the session outlives the process.

Three properties that follow from this, and you can't retrofit any of them:

- Security: The agent loop is not in the same blast radius as rm -rf
- Replaceability: Swap models, runtimes, even languages on the Harness side without touching the Sandbox
- Observability: The entire decision stream lives on the Harness side, visible in one place

What this changes for the AI process architect: If your agent lives in the same process as code execution, you won't be able to retrofit safety and observability later. Start with this boundary.

Telecom transposition: A network diagnostic agent at a telecom operator must NEVER have direct access to equipment. The Harness analyzes logs and emits intents ("check interface X"), the Sandbox executes in an isolated environment with precise permissions. The audit trail is in the Harness, not the Sandbox.

 

Decision 4: Cattle, Not Pets — at the Session Level

The key line from the article: "Cells die, sandboxes die, machines die. The conversation doesn't."

Aquifer's session model is radically simple:

- A Session Cell = an ephemeral process on a host, running the Go runtime and the Harness
- Idle → it exits. Next interaction → fresh Cell, possibly on a different host
- The session identity is unchanged. The conversation is fully preserved in Postgres, not in memory.

"We don't nurse individual processes. We provision, run, suspend, destroy, and re-provision them, and we do it on a foundation that makes this cheap."

What this changes for the AI process architect: Don't build a monolithic agent that keeps everything in memory. Build a foundation that treats the session as the durable entity, and everything else as disposable.

The semantic bridge with Hermes: The Hermes Agent profile works on the same principle — the SOUL.md, skills, and persistent memory survive sessions. The Hermes process is disposable. The conversation and configuration are durable. Shopify and Hermes converge on the same pattern without coordination — a sign that this is an architectural law, not an implementation preference.

 

Decision 5: The Next Agent Is a Profile, Not a Platform

Once River shipped, other Shopify teams wanted their own:

- PR review agents
- Research agents
- Migration agents
- Compliance scans
- Performance investigations

All variants of the same idea: agentic workflows against the monorepo, in Slack, durable, multiplayer.

Shopify's answer: Aquifer, the platform. River is one profile on top. PR review is another profile. The headless "pi" agent is a third.

river { # Interactive Mode
n1: circle label:"@River Slack"
n2: rectangle label:"Public session"
n1.handle(right) -> n2.handle(left)
n2.handle(bottom) -> aquifer.n7.handle(top) [label="via Aquifer"]
}

review { # Automation Mode
n3: circle label:"PR Webhook"
n4: rectangle label:"Automated review"
n3.handle(right) -> n4.handle(left)
n4.handle(bottom) -> aquifer.n8.handle(top) [label="via Aquifer"]
}

ci { # Batch Mode
n5: circle label:"CI Trigger"
n6: rectangle label:"Ephemeral batch"
n5.handle(right) -> n6.handle(left)
n6.handle(bottom) -> aquifer.n9.handle(top) [label="via Aquifer"]
}

aquifer { # Aquifer - Shared Foundation
n7: rectangle label:"Session model"
n8: rectangle label:"Sandbox plane"
n9: rectangle label:"Gateway"
n7.handle(right) -> n8.handle(left)
n8.handle(right) -> n9.handle(left)
}

A profile = system prompt + skills + extensions + sandbox policy + model defaults. Adding a new agent = adding a bundle, not building a new platform.

What this changes for the AI process architect: If your second agent forces a second platform, you haven't built the foundation yet.

The semantic bridge across ecosystems:

- Shopify calls it a "profile" (Nix bundle with prompt + skills + sandbox policy)
- Hermes Agent also calls it a "profile" — SOUL.md + skills + plugins + cron + memories
- OpenAI calls it a "custom GPT" — instructions + knowledge files + actions
- Anthropic has no direct equivalent — Claude Code and the Claude API are distinct products, not profiles on a unified foundation

The Shopify/Hermes convergence on the "profile" concept is striking — two independent ecosystems arriving at the same architectural pattern. The divergence with Anthropic is equally instructive: without a "profile" layer, every new agent forces a new integration.

 

The AI Process Architect's Roadmap

Shopify proved that a large-scale internal agent deployment is not a model problem — it's an architecture problem. Three priorities, extracted directly from the article and transposed to regulated sectors:

Priority What It Means in Practice Critical Sector
1. Decouple brain from handsThe Harness does NOT live in the Sandbox. Safety, replaceability, and observability are not optional — you cannot retrofit them.Banking: the regulatory audit trail lives in the Harness. Insurance: the human underwriter validates Harness decisions before Sandbox execution.
2. Make the agent multiplayer by constructionA private agent has a ceiling: the person at the keyboard. A public agent teaches every session that follows. The corpus is the compounding asset.Telecom: the network diagnostic corpus grows richer with every incident. Retail: merchant catalog questions become the collective knowledge base.
3. Treat the next agent as a profileThe cost of a new agent should be a new bundle on the same foundation. Not a new platform.All sectors: if your compliance agent and your developer agent don't share the same foundation, you're building the infrastructure twice — and creating two attack surfaces.

And the meta-lesson, the one that runs through the entire article: the session is the thing that must survive. Not the process, not the sandbox, not the model. The conversation. If you don't build around this idea, you'll rebuild everything later.

 

Why This Architecture Matters to Regulated Sectors

The Shopify case study isn't just an eCommerce lesson. The 5 architectural decisions documented by River are directly transferable to three priority FlowZap sectors:

 

Insurance

An insurance company deploying an internal agent for underwriting or claims handling faces the same problem as Shopify: how do you immutably audit an agent's actions? The answer is in Decision 3 — the Harness (which contains the decision flow) is separated from the Sandbox (which executes). The audit trail is in the Harness. The auditor doesn't look at what the Sandbox did — they look at what the Harness decided.

 

Banking

A bank deploying an agent for regulatory compliance or credit analysis needs Decision 2: the agent must be public by default. The regulatory question asked by a developer becomes visible to the entire compliance team. The agent conversation corpus becomes the bank's institutional memory — searchable, auditable, presentable to the regulator.

 

Telecom

A telecom operator deploying a network diagnostic agent needs Decision 5: the next agent is a profile, not a platform. The N1 diagnostic agent (first level) and the N2 escalation agent (second level) share the same Aquifer-like foundation. The N1 profile has restricted permissions (log read-only). The N2 profile has expanded permissions (interface restart). Same foundation, different sandbox policies.

 

Back to all Blog articles