Open Architecture for Enterprise AI Agents

Building the Orchestration Runway
for AI Agent Systems

An open agent harness architecture that explores how curated tools, scoped domain context, enforced workflows, and engineer review loops can make AI agents more predictable and scalable for the enterprise.

The Problem

Modern Agent Hurdles

AI agents are powerful reasoners. But without shared infrastructure, three problems emerge at enterprise scale.

Inconsistent Quality

Without shared standards, every agent produces different quality. What works for one team breaks another's conventions — and there's no way to enforce consistency across the org.

Constraint Engine Missing

100+ tools, no curation

Tool Sprawl

Hundreds of MCP servers, plugins, and integrations — but no way to scope what's relevant. Agents hallucinate tool names, pick the wrong tool, and each team connects a different set.

Semantic Scoping Required

0% decision traceability

Zero Auditability

No trace of what the agent did or why. When something breaks in production, you can't replay the decision chain — and you can't prove compliance to auditors.

Replay Logs Essential

The "Context" Problem and Its Limits to Build for Scale

Typical Agent-first Development

Tools

Memory

APIs

Code

Deploy

Agent decides everything

With Harness Architecture

Gateway code

Tool Search code

Agent Workflow AI

Validation code

Engineer Review human

Code controls. Agent executes.

Coding Agents have changed the way we build software. But we're experiencing the limitations in these models: attention degrades and results in poor outcomes, static distributions of AI tools drift, and context windows are quickly reaching physical ceilings. The model alone cannot fix everything, but we can engineer around these limitations.

OpenHarness explores meeting these limitations in the middle — scoped domain context, curated tool catalogs, enforced workflow checkpoints, and engineer review loops managed by the teams that drive enterprise systems today. Every constraint is tunable. Every outcome is traceable. Every deployment is reproducible.

This isn't static software because we prefer static software. It's static software because that's how we build for global scale, high-risk, high-impact systems where the cost of a hallucinated orchestration decision is a production incident.

The Pipeline

How a Request Flows Through the Harness

1 Gateway

Raw request enters the system


"fix the login bug on mobile"

2 Tool Search

RAG surfaces relevant tools from the catalog

mobile_auth_sdk session_replay_search platform_ui_lint e2e_device_matrix

~~infra_capacity_planner~~ ~~data_residency_enforcer~~

3 Domain Rules

Scoped rules injected as constraints for this task

Coding Standards Architecture Boundaries Security Practices Naming Conventions Agent Guardrails

4 Workflow Template Engine

Agent executes a versioned workflow template inside an isolated dev box

Agent-gathered context:

Jira ticket Slack thread git history docs & wikis

1 clone repo

2 gather context

3 write code

4 run tests

5 create PR

5 Engineer Review

Engineer inspects full execution trace

Session Trace

original request

gathered context

agent reasoning

Code Review

src/auth/sessionManager.ts

+ const offset = getTimezoneOffset();

- token.refresh(Date.now());

+ token.refresh(Date.now() + offset);

Approve Reject

Deterministic (no AI) Stochastic (AI active)

Narrowed Decision Surface

The Agent Receives Constraints

Before the agent activates, the harness loads scoped tools, domain rules, and a versioned workflow template. Every constraint applied to the agent is a place it can't drift — keeping outcomes predictable, not just cheaper.

Without Pre-Hydration Agent decides everything

Agent browses 200+ tools... attention diluted

Agent guesses which rules apply... may ignore standards

Agent decides its own workflow... unpredictable path

Codes based on assumptions, backtracks often ~150,000+ tokens

With Pre-Hydration Agent implements only

Relevant tools surfaced via Tool Search focused toolset

Domain rules injected as constraints standards enforced

Versioned workflow template loaded predictable path

Implements focused, constrained task ~25,000 tokens

Tools are surfaced, rules are injected, workflow templates are loaded — the agent operates within engineered constraints, not open-ended autonomy.

Scoped Rules

Rules Files, Scoped to the Task

You already write rules files for your coding agents. The harness takes the same concept and scopes it — injecting only the standards, boundaries, and guardrails relevant to this specific task before the agent writes a single line.

project-rules.md

Coding Standards Style guides, formatting rules, approved patterns

Architecture Boundaries Module ownership, dependency direction, layer constraints

Security Practices Auth patterns, input validation, secrets handling

Naming Conventions Variables, files, API endpoints, database schemas

Agent Guardrails Scope limits, forbidden actions, escalation triggers

Agent Context

Task Fix login session expiry on iOS 17

Scoped Rules Loaded

Coding Standards Architecture Boundaries Security Practices Naming Conventions Agent Guardrails

Scoped Tools 5 tools surfaced via Tool Search...

Workflow Template mobile-bugfix-v2.3

Same concept as the rules files you already maintain — CLAUDE.md, .cursorrules, AGENTS.md — but scoped per task by the harness and delivered as structured constraints, not suggestions the agent might skip.

Scoped Tooling

MCP Servers, Plugins, Skills

You already connect tools to your agents — MCP servers, plugins, skills. The harness takes the same catalog and scopes it, using vector similarity to surface only the tools relevant to this specific task. The agent never sees the rest.

412

Full Catalog All registered MCP tools

~30

Vector Search Cosine similarity against task context

~12

Domain Filter Mobile task excludes infra/backend tools

Scoped Set Agent only sees these

Task Fix login session expiry on iOS 17

Surfaced (5)

mobile_auth_sdksession_replay_searchplatform_ui_linte2e_device_matrixcontract_test_runner

Filtered (7)

infra_capacity_plannerdata_residency_enforcersoc2_audit_trail_genprod_incident_runbookbilling_reconcilerml_model_registryk8s_cluster_scaler

Workflow Template Engine

Versioned Templates, Isolated Execution

Engineers define workflow templates with checkpoints and gates. The engine spins up an isolated dev box, loads the template, and the agent executes within it — sandboxed, reproducible, and traceable.

harness — bugfix workflow · MOBILE-4821

Task: fix/MOBILE-4821-login-session

Context: agent-gathered (jira, slack, git, docs) · 2.1k tokens

Tools: file_edit, run_test, git_diff, eslint_fix

Workflow: bugfix-v2.yaml

Script Run

$ git checkout -b fix/MOBILE-4821-login-session

Switched to new branch 'fix/MOBILE-4821-login-session'

Coding Agent Run

$ claude "Fix login session expiry on iOS 17 per MOBILE-4821"

Claude is working...

Reading MOBILE-4821: "Login session expires immediately on iOS 17"

Analyzing src/auth/sessionManager.ts...

Root cause: token refresh uses Date.now() without timezone offset

→ Edited src/auth/sessionManager.ts ✓

→ Edited src/auth/__tests__/session.test.ts ✓

Script Run

$ eslint src/auth/ && npm test -- --filter=auth

✗ eslint: 2 errors in sessionManager.ts

Line 47: Unexpected 'any' type

Line 52: Missing return type

⚠ Tests skipped (lint must pass first)

Coding Agent Run (retry 1/3)

$ claude --fix-lint --errors ./lint-output.json

Claude is working...

Fixing type annotation on line 47...

Adding return type to refreshToken()...

→ Edited src/auth/sessionManager.ts ✓

Script Run

$ eslint src/auth/ && npm test -- --filter=auth

✓ eslint: 0 errors

✓ Tests: 23 passed, 0 failed (4.2s)

Script Run

$ git add -A && git commit -m "fix: session token refresh timezone offset"

✓ 2 files changed, 8 insertions, 3 deletions

Script Run

$ gh pr create --title "fix: session token refresh timezone offset"

✓ PR #247 created

✓ Session trace attached (8 steps, 3.4k agent tokens)

⏳ Awaiting engineer review

Script Run — Harness-controlled. No AI. Same input, same output.

Coding Agent Run — Claude reasoning. Retry budget: 3 attempts, then escalate.

AI Can Never Merge

The Engineer Sees Everything

Not just a PR diff. The review dashboard shows the complete session trace — from the original request to the final code change, with every decision visible.

Engineer Review — Session Trace

Agent Inputs Close review of inputs to the agent

Request

SA sarah.chen 2:34 PM

can we add input validation to SignupForm.tsx? 12% of new signups are failing downstream because malformed emails hit the billing API. context in #billing-incidents and #frontend-platform, jira is ENG-4821

# eng-requests

Context

3 Slack threads 1 Jira ticket 14 git commits 2 scoped rules

Pre-Hydration Pipeline

Tools

acme_design_system_lint shared_schema_registry security_policy_gate nextjs_preview_deploy contract_test_runner

5 MCP tools scoped from hundreds via Tool Search · 0.82 avg cosine similarity

Workflow

feat-frontend v2.1

lint → fetch-context → generate → test → review → commit

deterministic stochastic

Agent Reasoning

Parse signup form requirements

Evaluate validation approach

Zod schema vs manual checks

Zod — matches existing LoginForm patterns

Manual — more control, but inconsistent

Define field constraints

email: RFC 5322 + shared regex

password: min 8, max 72, mixed case

confirmPassword: must match

Wire schema into form submit handler

Click to explore full reasoning trace ↓

Code Diff

+28 additions -1 deletion 4 files

src/components/SignupForm.tsx

3 import { useForm } from "react-hook-form";

4 + import { zodResolver } from "@hookform/resolvers/zod";

5 + import { signupSchema, type SignupInput } from "../schemas/auth";

12 export function SignupForm() {

13 - const { register, handleSubmit } = useForm();

13 + const { register, handleSubmit, formState: { errors } } = useForm<SignupInput>({

14 + resolver: zodResolver(signupSchema),

15 + });

22 return (

23 <form onSubmit={handleSubmit(onSubmit)}>

24 + <input {...register("email")} placeholder="Email" />

25 + {errors.email && <span className="error">{errors.email.message}</span>}

26 +

27 + <input {...register("password")} type="password" placeholder="Password" />

28 + {errors.password && <span className="error">{errors.password.message}</span>}

29 +

30 + <input {...register("confirmPassword")} type="password" placeholder="Confirm" />

31 + {errors.confirmPassword && (

32 + <span className="error">{errors.confirmPassword.message}</span>

33 + )}

src/schemas/auth.ts

1 + import { z } from "zod";

2 + import { EMAIL_REGEX } from "../utils/validation";

3 +

4 + export const signupSchema = z.object({

5 + email: z.string().regex(EMAIL_REGEX, "Invalid email"),

6 + password: z.string().min(8).max(72),

7 + confirmPassword: z.string(),

8 + }).refine(d => d.password === d.confirmPassword, {

9 + message: "Passwords must match",

10 + path: ["confirmPassword"],

11 + });

12 +

13 + export type SignupInput = z.infer<typeof signupSchema>;

Code Review — Pull Request #847 Changes requested

feat: add input validation to signup form opened by harness-agent · 4 files changed · +76 -2

src/components/SignupForm.tsx +18 -1

src/schemas/auth.ts +13

src/components/SignupForm.test.tsx +42

src/utils/validation.ts +3 -1

src/components/SignupForm.tsx line 14 Looks good

marcus.r

Zod resolver matches our LoginForm patterns — nice consistency.

sarah.chen

agreed, and it picks up the shared error boundary we added last sprint

src/schemas/auth.ts line 6 Suggestion

marcus.r

Should we add a max length here? bcrypt has a 72-byte limit and we got burned on this in the billing form last quarter.

sarah.chen

good catch — filing ENG-4830 to add it across all auth schemas

marcus.r

works for me, non-blocking

src/components/SignupForm.test.tsx line 8 Looks good

sarah.chen

love the unicode email edge case, we missed that in LoginForm

2 approved 1 suggestion

The engineer sees everything — not just the PR diff. The session review exposes the four essential dials that shaped the output. The code review validates the result.

Building the Orchestration Runway for AI Agent Systems

Modern Agent Hurdles

Inconsistent Quality

Tool Sprawl

Zero Auditability

How a Request Flows Through the Harness

The Agent Receives Constraints

Rules Files, Scoped to the Task

MCP Servers, Plugins, Skills

Versioned Templates, Isolated Execution

The Engineer Sees Everything

Building the Orchestration Runway
for AI Agent Systems