harness — Agent Engineering Harness
name: harness
by bowen31337 · published 2026-03-22
$ claw add gh:bowen31337/bowen31337-harness---
name: harness
description: >
Agent engineering harness for any repo. Creates a short AGENTS.md table-of-contents,
structured docs/ knowledge base (ARCHITECTURE, QUALITY, CONVENTIONS, COORDINATION),
custom agent-readable linters (WHAT/FIX/REF format), CI enforcement, and execution plan
templates. Supports Rust, Go, TypeScript, and Python. Use when setting up any repo for
agent-first development, upgrading an existing AGENTS.md, or enforcing architectural lint
gates. Includes --audit flag for tool lifecycle checks and L1/L2/L3 progressive disclosure.
license: MIT
---
# harness — Agent Engineering Harness
Implements the [OpenAI Codex team's agent-first engineering harness pattern](https://openai.com/index/harness-engineering/)
for any repo: short AGENTS.md TOC, structured docs/, custom linters with agent-readable errors,
CI enforcement, execution plan templates, doc-gardening.
Validated against: [Agent Tool Design Guidelines](https://github.com/bowen31337/agent-harness-skills/blob/main/docs/agent_tool_desig_guidelines.md) (2026-03-09)
When to use
Supported Languages
Usage
SKILL_DIR="$HOME/.openclaw/workspace/skills/harness"
# Scaffold harness for a repo (language auto-detected: Rust/Go/TypeScript/Python)
uv run python "$SKILL_DIR/scripts/scaffold.py" --repo /path/to/repo
# Scaffold with force-overwrite of existing AGENTS.md
uv run python "$SKILL_DIR/scripts/scaffold.py" --repo /path/to/repo --force
# Audit harness freshness (tool lifecycle check — no writes)
uv run python "$SKILL_DIR/scripts/scaffold.py" --repo /path/to/repo --audit
# Run lints locally
bash /path/to/repo/scripts/agent-lint.sh
# Check doc freshness (finds stale references in docs/)
uv run python "$SKILL_DIR/scripts/doc_garden.py" --repo /path/to/repo --dry-run
# Check doc freshness and open a fix PR
uv run python "$SKILL_DIR/scripts/doc_garden.py" --repo /path/to/repo --pr
# Generate execution plan for a complex task
uv run python "$SKILL_DIR/scripts/plan.py" \
--task "Add IBC timeout handling" \
--repo /path/to/repoWhat gets created
| File | Description |
|------|-------------|
| `AGENTS.md` | ~100 line TOC with L1/L2/L3 progressive disclosure markers |
| `docs/ARCHITECTURE.md` | Layer diagram + dependency rules (auto-generated from repo structure) |
| `docs/QUALITY.md` | Coverage targets + security invariants |
| `docs/CONVENTIONS.md` | Naming rules (language-specific) |
| `docs/COORDINATION.md` | Multi-agent task ownership + conflict resolution rules ← new |
| `docs/EXECUTION_PLAN_TEMPLATE.md` | Structured plan format for complex tasks |
| `scripts/agent-lint.sh` | Custom linter with agent-readable errors (WHAT / FIX / REF) |
| `.github/workflows/agent-lint.yml` | CI gate on every PR |
Lint error format
Every lint error produced by `scripts/agent-lint.sh` follows this format:
LINT ERROR [<rule-id>]: <description of the problem>
WHAT: <why this is a problem>
FIX: <exact steps to resolve it>
REF: <which doc to consult>This means agents can read lint output and fix problems without asking a human.
Agent Design Checklist (from tool design guidelines)
Before shipping any tool or skill change, verify:
Progressive Disclosure Layers
The harness enforces a 3-layer context discipline:
| Layer | Where | When to load |
|-------|-------|--------------|
| L1 | `AGENTS.md` | Always — orientation, commands, invariants |
| L2 | `docs/` | Before coding — architecture, quality, conventions |
| L3 | Source files | On demand — grep/read specific files as needed |
**Rule:** Start with L1. Pull L2 before touching code. Pull L3 only when you need it.
Never pre-load all three layers — it crowds out working context.
Tool Lifecycle (--audit)
Run `--audit` quarterly to check harness freshness:
Safety
References
More tools from the same signal band
Order food/drinks (点餐) on an Android device paired as an OpenClaw node. Uses in-app menu and cart; add goods, view cart, submit order (demo, no real payment).
Sign plugins, rotate agent credentials without losing identity, and publicly attest to plugin behavior with verifiable claims and authenticated transfers.
The philosophical layer for AI agents. Maps behavior to Spinoza's 48 affects, calculates persistence scores, and generates geometric self-reports. Give your...