Open Architecture for Enterprise AI Agents

Building the Orchestration Runway for AI Agent Systems

An open agent harness architecture that explores how curated tools, scoped domain context, enforced workflows, and engineer review loops can make AI agents more predictable and scalable for the enterprise.

The Problem

Modern Agent Hurdles

AI agents are powerful reasoners. But without shared infrastructure, three problems emerge at enterprise scale.

Inconsistent Quality

Without shared standards, every agent produces different quality. What works for one team breaks another's conventions — and there's no way to enforce consistency across the org.

Constraint Engine Missing
100+ tools, no curation

Tool Sprawl

Hundreds of MCP servers, plugins, and integrations — but no way to scope what's relevant. Agents hallucinate tool names, pick the wrong tool, and each team connects a different set.

Semantic Scoping Required
0% decision traceability

Zero Auditability

No trace of what the agent did or why. When something breaks in production, you can't replay the decision chain — and you can't prove compliance to auditors.

Replay Logs Essential
The "Context" Problem and Its Limits to Build for Scale
Typical Agent-first Development
Tools
Memory
APIs
Code
Search
Deploy
Agent decides everything
vs
With Harness Architecture
Gateway code
Tool Search code
Agent Workflow AI
Validation code
Engineer Review human
Code controls. Agent executes.

Coding Agents have changed the way we build software. But we're experiencing the limitations in these models: attention degrades and results in poor outcomes, static distributions of AI tools drift, and context windows are quickly reaching physical ceilings. The model alone cannot fix everything, but we can engineer around these limitations.

OpenHarness explores meeting these limitations in the middle — scoped domain context, curated tool catalogs, enforced workflow checkpoints, and engineer review loops managed by the teams that drive enterprise systems today. Every constraint is tunable. Every outcome is traceable. Every deployment is reproducible.

This isn't static software because we prefer static software. It's static software because that's how we build for global scale, high-risk, high-impact systems where the cost of a hallucinated orchestration decision is a production incident.

The Pipeline

How a Request Flows Through the Harness

1 Gateway

Raw request enters the system

"fix the login bug on mobile"
2 Tool Search

RAG surfaces relevant tools from the catalog

mobile_auth_sdk session_replay_search platform_ui_lint e2e_device_matrix
infra_capacity_planner data_residency_enforcer
3 Domain Rules

Scoped rules injected as constraints for this task

Coding Standards Architecture Boundaries Security Practices Naming Conventions Agent Guardrails
4 Workflow Template Engine

Agent executes a versioned workflow template inside an isolated dev box

Agent-gathered context:
Jira ticket Slack thread git history docs & wikis
1 clone repo
2 gather context
3 write code
4 run tests
5 create PR
5 Engineer Review

Engineer inspects full execution trace

Session Trace
original request
gathered context
agent reasoning
Code Review
src/auth/sessionManager.ts
+ const offset = getTimezoneOffset();
- token.refresh(Date.now());
+ token.refresh(Date.now() + offset);
Approve Reject
Deterministic (no AI) Stochastic (AI active)
Narrowed Decision Surface

The Agent Receives Constraints

Before the agent activates, the harness loads scoped tools, domain rules, and a versioned workflow template. Every constraint applied to the agent is a place it can't drift — keeping outcomes predictable, not just cheaper.

Without Pre-Hydration Agent decides everything
Agent browses 200+ tools... attention diluted
Agent guesses which rules apply... may ignore standards
Agent decides its own workflow... unpredictable path
Codes based on assumptions, backtracks often ~150,000+ tokens
vs
With Pre-Hydration Agent implements only
Relevant tools surfaced via Tool Search focused toolset
Domain rules injected as constraints standards enforced
Versioned workflow template loaded predictable path
Implements focused, constrained task ~25,000 tokens

Tools are surfaced, rules are injected, workflow templates are loaded — the agent operates within engineered constraints, not open-ended autonomy.

Scoped Rules

Rules Files, Scoped to the Task

You already write rules files for your coding agents. The harness takes the same concept and scopes it — injecting only the standards, boundaries, and guardrails relevant to this specific task before the agent writes a single line.

project-rules.md
Coding Standards Style guides, formatting rules, approved patterns
Architecture Boundaries Module ownership, dependency direction, layer constraints
Security Practices Auth patterns, input validation, secrets handling
Naming Conventions Variables, files, API endpoints, database schemas
Agent Guardrails Scope limits, forbidden actions, escalation triggers
Agent Context
Task Fix login session expiry on iOS 17
Scoped Rules Loaded
Coding Standards Architecture Boundaries Security Practices Naming Conventions Agent Guardrails
Scoped Tools 5 tools surfaced via Tool Search...
Workflow Template mobile-bugfix-v2.3

Same concept as the rules files you already maintain — CLAUDE.md, .cursorrules, AGENTS.md — but scoped per task by the harness and delivered as structured constraints, not suggestions the agent might skip.

Scoped Tooling

MCP Servers, Plugins, Skills

You already connect tools to your agents — MCP servers, plugins, skills. The harness takes the same catalog and scopes it, using vector similarity to surface only the tools relevant to this specific task. The agent never sees the rest.

412
Full Catalog All registered MCP tools
~30
Vector Search Cosine similarity against task context
~12
Domain Filter Mobile task excludes infra/backend tools
5
Scoped Set Agent only sees these
Task Fix login session expiry on iOS 17
Surfaced (5)
mobile_auth_sdksession_replay_searchplatform_ui_linte2e_device_matrixcontract_test_runner
Filtered (7)
infra_capacity_plannerdata_residency_enforcersoc2_audit_trail_genprod_incident_runbookbilling_reconcilerml_model_registryk8s_cluster_scaler
Workflow Template Engine

Versioned Templates, Isolated Execution

Engineers define workflow templates with checkpoints and gates. The engine spins up an isolated dev box, loads the template, and the agent executes within it — sandboxed, reproducible, and traceable.

harness — bugfix workflow · MOBILE-4821
Task: fix/MOBILE-4821-login-session
Context: agent-gathered (jira, slack, git, docs) · 2.1k tokens
Tools: file_edit, run_test, git_diff, eslint_fix
Workflow: bugfix-v2.yaml
Script Run
$ git checkout -b fix/MOBILE-4821-login-session
Switched to new branch 'fix/MOBILE-4821-login-session'
Coding Agent Run
$ claude "Fix login session expiry on iOS 17 per MOBILE-4821"
Claude is working...
Reading MOBILE-4821: "Login session expires immediately on iOS 17"
Analyzing src/auth/sessionManager.ts...
Root cause: token refresh uses Date.now() without timezone offset
→ Edited src/auth/sessionManager.ts
→ Edited src/auth/__tests__/session.test.ts
Script Run
$ eslint src/auth/ && npm test -- --filter=auth
eslint: 2 errors in sessionManager.ts
Line 47: Unexpected 'any' type
Line 52: Missing return type
Tests skipped (lint must pass first)
Coding Agent Run (retry 1/3)
$ claude --fix-lint --errors ./lint-output.json
Claude is working...
Fixing type annotation on line 47...
Adding return type to refreshToken()...
→ Edited src/auth/sessionManager.ts
Script Run
$ eslint src/auth/ && npm test -- --filter=auth
eslint: 0 errors
Tests: 23 passed, 0 failed (4.2s)
Script Run
$ git add -A && git commit -m "fix: session token refresh timezone offset"
2 files changed, 8 insertions, 3 deletions
Script Run
$ gh pr create --title "fix: session token refresh timezone offset"
PR #247 created
Session trace attached (8 steps, 3.4k agent tokens)
Awaiting engineer review
Script Run — Harness-controlled. No AI. Same input, same output.
Coding Agent Run — Claude reasoning. Retry budget: 3 attempts, then escalate.
AI Can Never Merge

The Engineer Sees Everything

Not just a PR diff. The review dashboard shows the complete session trace — from the original request to the final code change, with every decision visible.

Engineer Review — Session Trace
Agent Inputs Close review of inputs to the agent
Request
SA sarah.chen 2:34 PM

can we add input validation to SignupForm.tsx? 12% of new signups are failing downstream because malformed emails hit the billing API. context in #billing-incidents and #frontend-platform, jira is ENG-4821

# eng-requests
Context
3 Slack threads 1 Jira ticket 14 git commits 2 scoped rules
Pre-Hydration Pipeline
Tools
acme_design_system_lint shared_schema_registry security_policy_gate nextjs_preview_deploy contract_test_runner
5 MCP tools scoped from hundreds via Tool Search · 0.82 avg cosine similarity
Workflow
feat-frontend v2.1
lint fetch-context generate test review commit
deterministic stochastic
Agent Reasoning
Parse signup form requirements
Evaluate validation approach
Zod schema vs manual checks
Zod — matches existing LoginForm patterns
Manual — more control, but inconsistent
Define field constraints
email: RFC 5322 + shared regex
password: min 8, max 72, mixed case
confirmPassword: must match
Wire schema into form submit handler
Click to explore full reasoning trace
Code Diff
+28 additions -1 deletion 4 files
src/components/SignupForm.tsx
3 import { useForm } from "react-hook-form";
4 + import { zodResolver } from "@hookform/resolvers/zod";
5 + import { signupSchema, type SignupInput } from "../schemas/auth";
6  
12 export function SignupForm() {
13 - const { register, handleSubmit } = useForm();
13 + const { register, handleSubmit, formState: { errors } } = useForm<SignupInput>({
14 + resolver: zodResolver(signupSchema),
15 + });
16  
22 return (
23 <form onSubmit={handleSubmit(onSubmit)}>
24 + <input {...register("email")} placeholder="Email" />
25 + {errors.email && <span className="error">{errors.email.message}</span>}
26 +  
27 + <input {...register("password")} type="password" placeholder="Password" />
28 + {errors.password && <span className="error">{errors.password.message}</span>}
29 +  
30 + <input {...register("confirmPassword")} type="password" placeholder="Confirm" />
31 + {errors.confirmPassword && (
32 + <span className="error">{errors.confirmPassword.message}</span>
33 + )}
src/schemas/auth.ts
1 + import { z } from "zod";
2 + import { EMAIL_REGEX } from "../utils/validation";
3 +  
4 + export const signupSchema = z.object({
5 + email: z.string().regex(EMAIL_REGEX, "Invalid email"),
6 + password: z.string().min(8).max(72),
7 + confirmPassword: z.string(),
8 + }).refine(d => d.password === d.confirmPassword, {
9 + message: "Passwords must match",
10 + path: ["confirmPassword"],
11 + });
12 +  
13 + export type SignupInput = z.infer<typeof signupSchema>;
Code Review — Pull Request #847 Changes requested
feat: add input validation to signup form opened by harness-agent · 4 files changed · +76 -2
src/components/SignupForm.tsx +18 -1
src/schemas/auth.ts +13
src/components/SignupForm.test.tsx +42
src/utils/validation.ts +3 -1
src/components/SignupForm.tsx line 14 Looks good
MR
marcus.r

Zod resolver matches our LoginForm patterns — nice consistency.

SC
sarah.chen

agreed, and it picks up the shared error boundary we added last sprint

src/schemas/auth.ts line 6 Suggestion
MR
marcus.r

Should we add a max length here? bcrypt has a 72-byte limit and we got burned on this in the billing form last quarter.

SC
sarah.chen

good catch — filing ENG-4830 to add it across all auth schemas

MR
marcus.r

works for me, non-blocking

src/components/SignupForm.test.tsx line 8 Looks good
SC
sarah.chen

love the unicode email edge case, we missed that in LoginForm

2 approved 1 suggestion

The engineer sees everything — not just the PR diff. The session review exposes the four essential dials that shaped the output. The code review validates the result.