Tags: AI Agents, Claw Code, Agent Architecture, MCP, Harness Engineering, Open Source

The Leak That Changed the Conversation

On March 31, 2026, developers reported that Claude Code v2.1.88 shipped with a 59.8MB JavaScript source map through the npm package, exposing the internal structure of Anthropic's coding agent CLI. Public reporting and project documentation around the incident described the exposed package as containing roughly 512,000 lines of TypeScript spread across about 1,900 source files, which immediately turned a packaging mistake into a design document for the broader developer community.

That is the part most people miss. The real story was never just that code leaked; it was that the leak made the harness legible. Once developers could see the moving parts, it became obvious that the durable innovation was not "an LLM in a terminal," but the orchestration layer that turns a model into an operator.

What Claude Code Actually Is under the Hood

Anthropic describes Claude Code as an agentic coding tool that lives in the terminal, understands your codebase, and helps by executing routine tasks, explaining code, and handling git workflows through natural-language commands. In other words, Claude Code is not just autocomplete with a nicer interface; it is a runtime that can inspect files, decide on actions, call tools, run commands, and keep iterating until a task is complete.

That distinction matters because the user experience is emergent. The model generates reasoning and instructions, but the harness decides what context gets loaded, which tools are allowed, how outputs stream back, when permission is required, how memory persists, and how the agent recovers from failure. That is the machinery users actually feel.

Why Claw-Code Took Off

Claw Code presents itself as a clean-room, open-source rewrite of the Claude Code harness architecture, built from scratch in Python and Rust rather than as a direct fork of Anthropic's proprietary codebase. The project page explicitly frames it as "architecture, reimagined," and the associated repository materials note that development continued through community-maintained channels during repository ownership changes.

That framing is why the project exploded as a cultural moment. Claw-Code was not interesting merely because it mirrored a famous product; it was interesting because it turned a previously opaque category into something inspectable, modifiable, and portable. The moment a harness can be reimplemented in the open, vendor exclusivity starts to look less like a moat and more like a temporary packaging advantage.

Harness Engineering Explained

Harness engineering is the discipline of connecting model intelligence to reliable action. It sits between the raw LLM and the real world, and its job is to make the agent useful without making it dangerous, brittle, or impossible to debug.

In practical terms, the harness answers questions like these:

What instructions are assembled before a turn starts?
What memory is loaded, compacted, or discarded?
Which tools exist, and what schema do they expose?
When does the agent need explicit user approval?
How are subprocesses sandboxed?
How are streaming responses rendered back to the terminal?
How do slash commands, sessions, and provider switching work?

Claw Code's public architecture descriptions point directly at those layers: command handling, tool plugins, model abstraction, a query engine, task management, session persistence, permissions, MCP integration, and a Rust runtime for performance-critical execution paths.

That is why the term "harness" matters so much. The harness is what turns a frontier model into a system with posture, policy, and repeatability.

The Core Architecture

Claw Code describes a split design in which Python handles orchestration and Rust handles runtime-critical work. The published architecture overview names Python-side modules such as commands.py, tools.py, models.py, query_engine.py, task.py, and main.py, while a Rust core handles the lower-level runtime path.

That split is not stylistic; it is operational.

Layer	Primary role	Why it belongs there
Python orchestration	Command registry, prompt assembly, model routing, session logic, task lifecycle.	Python is fast to evolve, easy to inspect, and well-suited to agent control flow.
Rust runtime	Performance-sensitive execution, tool runtime behavior, streaming, permissions, protocol plumbing.	Rust gives tighter control over latency, isolation, concurrency, and safety.

A clean way to think about it is this: Python decides what should happen next, and Rust makes sure it happens safely, consistently, and fast. That is the essence of a production-grade agent harness.

Boot Sequence

The architecture becomes much clearer if you follow a single task from input to output.

The user enters a command in the terminal.
The session layer loads instructions, prior transcript state, and relevant context.
The prompt layer assembles the current turn for the model.
The query engine sends the request, handles streaming, and tracks budget or retries.
The model returns either a direct response or an intended action.
The planner routes that action through the tool system.
The permission layer decides whether the action is allowed, denied, or requires confirmation.
The runtime executes the tool in an isolated context and streams observations back.
The agent consumes those observations, updates memory, and either ends the task or iterates again.

This loop is the product. The model is inside the loop, but the loop itself is the real software asset.

See it in FlowZap Code

The following diagram is a conceptual rendering of the Claw-Code harness workflow based on the project's published architecture overview. It matches the workflow at the systems-design level, captures the right control loop and the right layer boundaries, but does not exactly match the implementation level.

User { # User
  n1: circle label="Start"
  n2: rectangle label="Enter CLI command"
  n3: rectangle label="View streamed output"
  n4: circle label="Done"
  n1.handle(right) -> n2.handle(left)
  n2.handle(bottom) -> Python.n5.handle(top) [label="Command"]
  n3.handle(right) -> n4.handle(left)
}
Python { # Python Orchestration
  n5: rectangle label="Load session context"
  n6: rectangle label="Build prompt"
  n7: rectangle label="Call model"
  n8: diamond label="Tool needed?"
  n9: rectangle label="Route tool call"
  n10: rectangle label="Update memory"
  n11: rectangle label="Finalize response"
  n5.handle(right) -> n6.handle(left)
  n6.handle(right) -> n7.handle(left)
  n7.handle(bottom) -> External.n15.handle(top) [label="LLM request"]
  n8.handle(bottom) -> n9.handle(top) [label="Yes"]
  n8.handle(right) -> n11.handle(left) [label="No"]
  n9.handle(bottom) -> Rust.n12.handle(top) [label="Tool call"]
  n10.handle(right) -> n11.handle(left)
  n11.handle(top) -> User.n3.handle(bottom) [label="Answer"]
}
Rust { # Rust Runtime
  n12: diamond label="Permission granted?"
  n13: rectangle label="Execute tool in sandbox"
  n14: rectangle label="Stream tool events"
  n12.handle(right) -> n13.handle(left) [label="Yes"]
  n12.handle(top) -> Python.n11.handle(bottom) [label="Denied"]
  n13.handle(bottom) -> External.n16.handle(top) [label="File access"]
  n13.handle(right) -> n14.handle(left)
  n13.handle(top) -> External.n17.handle(bottom) [label="MCP call"]
  n14.handle(top) -> Python.n10.handle(bottom) [label="Observations"]
  n14.handle(left) -> User.n3.handle(right) [label="Stream"]
}
External { # External Systems
  n15: rectangle label="LLM provider"
  n16: rectangle label="File system"
  n17: rectangle label="MCP server"
  n15.handle(left) -> Python.n8.handle(right) [label="Plan"]
  n16.handle(left) -> Rust.n14.handle(right) [label="File data"]
  n17.handle(left) -> Rust.n14.handle(bottom) [label="Tool data"]
}

The Python Layer

According to the public architecture overview, the Python workspace is where the high-level agent behavior lives: command registration, tool definitions, model abstraction, query orchestration, and task management. This is the layer that gives the system its flexibility, because it is where prompts, policies, session rules, and workflow semantics can change quickly without rebuilding the low-level runtime.

You can think of the Python layer as five cooperating subsystems:

Command registry: Maps slash commands and terminal inputs to structured behaviors.
Model abstraction: Lets the agent target different providers behind a common interface.
Query engine: Handles request construction, streaming, retries, and turn-level control.
Task manager: Tracks lifecycle, iteration, and when a task should stop or continue.
Memory/session manager: Preserves context across turns and compacts transcripts when needed.

This is also the layer where open-source transparency becomes powerful. When developers can inspect orchestration logic directly, they can swap providers, alter prompts, change permission defaults, rewrite tool routing, or build custom commands without waiting for a vendor roadmap.

The Rust Layer

Claw Code's public materials describe a Rust core for performance-critical runtime paths, including streaming, protocol handling, and tool execution concerns. The same materials also mention a multi-crate Rust workspace and an active migration path toward a more fully native runtime.

Why does that matter? Because once an agent can actually do things on your machine, runtime quality becomes non-negotiable.

The Rust layer is where systems concerns tend to land:

Permission enforcement: Gate dangerous actions before they happen.
Execution runtime: Run shell or tool actions with tighter operational guarantees.
Streaming: Push token and event output back to the terminal without lag or state corruption.
Protocol adapters: Support external integrations such as MCP transports and API client behavior.
Safety and reliability: Reduce crash-prone glue code in the hottest execution path.

If Python is the "brainstem" of orchestration, Rust is the skeleton and nervous system underneath it.

The Tool System

One of the clearest architectural signals in the public project description is the emphasis on a plugin-based tool system with permission-gated capabilities. The Claw Code site describes 19 built-in tools and treats each capability—such as file I/O, shell execution, git operations, web access, or agent spawning—as a self-contained tool with explicit controls.

That matters because good agent tooling is not "give the model bash and pray." A mature harness makes tools first-class objects with:

A schema.
A name and purpose.
Permission semantics.
Execution boundaries.
Observable output.
Error handling.
Clear return data for the next reasoning step.

This is one of the reasons the harness layer is such a big deal. The more structured the tool system becomes, the less the agent feels like autocomplete and the more it feels like an operating environment.

Permission Gating

The permission system is not just a UX feature; it is a security model. Claw Code's public materials highlight permission-gated tools and a policy engine with multiple permission modes, deny lists, and interactive prompting.

That tells you a lot about what serious agent builders are optimizing for. The safest architecture is not one where the model "knows better"; it is one where the model is structurally prevented from bypassing policy. In other words, the LLM can propose actions, but the harness remains the final authority.

That design has several benefits:

It keeps policy outside the model.
It makes behavior more auditable.
It reduces trust in prompt obedience.
It creates clean checkpoints for human approval.
It lowers the blast radius of hallucinated or overconfident tool calls.

This is one of the deepest lessons from the whole Claw-Code moment: do not rely on the model to police itself.

Query Engine and Turn Loop

The Claw Code project page describes the query engine as the central intelligence of the system, responsible for LLM calls, response streaming, caching, orchestration, turn limits, and budget controls. That makes it the traffic controller of the harness, because it sits between prompt assembly, model invocation, and tool-driven iteration.

A lot of product behavior that users perceive as "the model being smart" often lives here instead:

How much context gets sent.
When prior transcript is compacted.
How retries happen.
When the system stops looping.
How budgets are enforced.
How streamed output is surfaced to the user.

This is why two tools using the same underlying model can feel radically different. The query engine shapes tempo, discipline, and cost.

Memory and Sessions

The public Claw Code materials also emphasize session persistence, transcript compaction, and multi-layer memory. That is important because memory in agent systems is not one thing; it is usually several layers with different lifetimes and purposes.

A useful way to frame it is:

Turn memory: What the model sees right now.
Session memory: What persists through the current workflow.
Compact memory: A summarized version of prior interaction used to save tokens.
Discovered context: Files, docs, or code pulled in because the current task needs them.

Once you see memory this way, the harness becomes easier to reason about. Good agent products are not just "long context"; they are selective context systems.

Multi-Agent Orchestration

Claw Code's project materials describe support for spawning sub-agents, or "swarms," to parallelize complex work in isolated contexts with shared memory access patterns. That points to a more advanced harness model in which one parent loop can delegate research, code edits, validation, or analysis to subordinate workers before merging the results.

This is a major architectural step up from single-threaded tool calling. Once you introduce subagents, you need:

Isolation.
Result aggregation.
Shared but bounded context.
Failure containment.
Supervisory logic for when to spawn, wait, retry, or abort.

That is why multi-agent systems live or die by harness quality. The orchestration logic gets exponentially more important as concurrency rises.

MCP and External Systems

The public Claw Code site highlights full MCP support with multiple transport types including stdio, SSE, HTTP, WebSocket, SDK, and proxy-based access modes. That matters because MCP turns the harness into a connector layer, not just a local CLI wrapper.

Once MCP is native, the agent is no longer limited to local shell and file operations. It can plug into external tool servers, data systems, internal platforms, and service-specific adapters through a standard protocol. That is why people increasingly describe MCP as a portability layer for agent ecosystems: it gives the harness a stable way to reach the outside world.

Slash Commands and UX Surface

Claw Code's materials also highlight a slash-command system with commands for things like session control, model selection, permissions, cost tracking, compaction, and export. This is important because agent UX is not just chat; it is command grammar.

Slash commands do three things well:

They expose power features without bloating the main interaction loop.
They make session state inspectable and controllable.
They reduce ambiguity by turning meta-actions into explicit operator commands.

In other words, slash commands are part of the harness too. They are the human-control surface over the orchestration engine.

Why Python and Rust Is Such a Strong Split

The Python-plus-Rust pattern works because it maps cleanly to the two hardest problems in agent tooling: experimentation and enforcement. Python is ideal where you want rapid iteration over prompts, routing logic, task policies, and provider abstractions. Rust is ideal where you want a tighter runtime around execution, streaming, permissions, and protocol correctness.

That split also helps teams evolve at different speeds. The orchestration layer can keep changing as the product learns, while the runtime layer can harden around performance and safety guarantees. For agent systems, that is often a better trade than forcing every concern into one language.

What Claw-Code Revealed

Claw-Code's real contribution is not just "open source Claude-like tooling." It is the public demonstration that the harness itself is now a competitive domain. The architecture people used to dismiss as glue code is turning into the actual product layer.

That shift changes the market in at least four ways:

Models become swappable. Public project materials emphasize provider-agnostic design rather than single-vendor lock-in.
Permissions become product features. Tool gating and policy engines move from implementation detail to buying criterion.
Memory design becomes differentiating UX. Session persistence, compaction, and context discovery directly affect quality and cost.
Protocol support becomes strategic. Native MCP support turns an agent from a terminal assistant into an ecosystem node.

That is why this moment feels larger than one leak or one rewrite. It clarified where the leverage really is.

Actionable Takeaways

If you are building agent tooling:

Separate orchestration from execution.
Make the tool system schema-first.
Put policy below the model, not inside it.
Treat the query engine as a product surface.
Design memory as layered infrastructure, not as "just more context."
Add MCP early if external integrations matter.
Make slash commands part of the operator experience.

If you are evaluating agent tooling:

Ask whether the tool layer is permission-gated.
Ask whether sessions can persist and compact cleanly.
Ask whether provider choice is abstracted or locked.
Ask whether external systems connect through MCP or ad hoc adapters.
Ask whether the runtime path is robust enough for real execution rather than demo-grade automation.

The enduring lesson is simple: the harness is becoming the operating system of AI work. Claw-Code made that obvious.

Harness Engineering and How Claw Code Is Reshaping AI Agent Architecture