Skip to content

Latest commit

 

History

History
660 lines (493 loc) · 19.2 KB

File metadata and controls

660 lines (493 loc) · 19.2 KB

Harness Usage Guide

Installation

git clone https://github.com/majiayu000/harness.git
cd harness
cargo build --release

The binary is at ./target/release/harness.

Server Startup

Important: Never start the server from within Claude Code or other agent sessions. The CLAUDECODE and CLAUDE_CODE_ENTRYPOINT environment variables propagate to spawned agents and cause SIGTRAP crashes. Always use a standalone terminal.

Single Project

./target/release/harness serve \
  --transport http \
  --port 9800 \
  --project-root /path/to/your/project

Multi-Project via Config File (Recommended)

Create or edit config/default.toml:

[server]
transport = "stdio"
http_addr = "127.0.0.1:9800"
data_dir = "~/.local/share/harness"

[agents]
default_agent = "auto"
# complexity_preferred_agents = ["codex", "claude"]
sandbox_mode = "danger-full-access"

[agents.claude]
cli_path = "claude"
default_model = "sonnet"

[agents.codex]
cli_path = "codex"

[agents.review]
enabled = true
reviewer_agent = "codex"
max_rounds = 3

[gc]
max_drafts_per_run = 5
budget_per_signal_usd = 0.50
total_budget_usd = 5.0
draft_ttl_hours = 72

[observe]
log_retention_days = 90

[otel]
environment = "development"
exporter = "disabled"

[[projects]]
name = "my-app"
root = "/path/to/my-app"
default = true
max_concurrent = 2

[[projects]]
name = "my-lib"
root = "/path/to/my-lib"
max_concurrent = 1
# default_agent = "auto" # optional override; or set a registered agent name

Start with:

./target/release/harness serve \
  --transport http \
  --port 9800 \
  --config config/default.toml

Multi-Project via CLI Flags

./target/release/harness serve \
  --transport http \
  --port 9800 \
  --project my-app=/path/to/my-app \
  --project my-lib=/path/to/my-lib \
  --default-project my-app

CLI --project flags merge with config [[projects]] entries. CLI overrides config on name conflict.

With GitHub Token

Enable auto-review bot comments on PRs:

GITHUB_TOKEN=ghp_xxx ./target/release/harness serve \
  --transport http \
  --port 9800 \
  --config config/default.toml

With Anthropic API Key

Enable the direct Anthropic API agent:

ANTHROPIC_API_KEY=sk-ant-xxx ./target/release/harness serve \
  --transport http \
  --port 9800 \
  --config config/default.toml

Submitting Tasks

By Prompt

For ad-hoc work without a GitHub issue:

curl -X POST http://127.0.0.1:9800/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "project": "/path/to/project",
    "prompt": "Add input validation to the user registration endpoint. Check email format, password strength (min 8 chars), and sanitize the username field.",
    "description": "feat: input validation for registration"
  }'

Response:

{ "status": "running", "task_id": "a1b2c3d4-..." }

By GitHub Issue

The agent reads the issue title and body, then implements it:

curl -X POST http://127.0.0.1:9800/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "project": "/path/to/project",
    "issue": 42,
    "description": "fix: handle edge case in parser"
  }'

By Pull Request

For reviewing or fixing an existing PR:

curl -X POST http://127.0.0.1:9800/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "project": "/path/to/project",
    "pr": 100
  }'

Batch Submit

Submit multiple issues at once:

curl -X POST http://127.0.0.1:9800/tasks/batch \
  -H "Content-Type: application/json" \
  -d '{
    "project": "/path/to/project",
    "issues": [10, 11, 12, 13]
  }'

Response:

[
  { "task_id": "...", "status": "running" },
  { "task_id": "...", "status": "queued" },
  { "task_id": "...", "status": "queued" },
  { "task_id": "...", "status": "queued" }
]

Tasks respect concurrency limits — excess tasks are queued.

Monitoring

Dashboard

Get a snapshot of all projects and tasks:

curl -s http://127.0.0.1:9800/api/dashboard | python3 -m json.tool
{
  "global": {
    "running": 3,
    "queued": 1,
    "done": 42,
    "failed": 2,
    "grade": "A",
    "max_concurrent": 4,
    "latest_pr": "https://github.com/owner/repo/pull/123"
  },
  "projects": [
    {
      "id": "my-app",
      "root": "/path/to/my-app",
      "tasks": { "running": 2, "queued": 1 }
    },
    {
      "id": "my-lib",
      "root": "/path/to/my-lib",
      "tasks": { "running": 1, "queued": 0 }
    }
  ]
}

Task Status

# Single task
curl http://127.0.0.1:9800/tasks/{task_id}

# All tasks
curl http://127.0.0.1:9800/tasks

SSE Streaming

Stream real-time output from a running task:

curl -N http://127.0.0.1:9800/tasks/{task_id}/stream

Health Check

curl http://127.0.0.1:9800/health

Project Management API

List Projects

curl http://127.0.0.1:9800/api/projects

Register a Project at Runtime

No server restart needed:

curl -X POST http://127.0.0.1:9800/api/projects \
  -H "Content-Type: application/json" \
  -d '{
    "id": "new-project",
    "root": "/path/to/new-project",
    "max_concurrent": 2,
    "default_agent": "codex",
    "active": true
  }'

Remove a Project

curl -X DELETE http://127.0.0.1:9800/api/projects/new-project

Configuration Reference

[server]

Field Default Description
transport "stdio" Transport protocol: stdio, http, or web_socket
http_addr "127.0.0.1:9800" HTTP listen address
data_dir "~/.local/share/harness" Data directory for SQLite databases
project_root "." Default project root (single-project mode)
github_webhook_secret HMAC-SHA256 secret for GitHub webhook verification
notification_broadcast_capacity 256 Internal notification channel capacity

[agents]

Field Default Description
default_agent "auto" Default execution agent; "auto" picks the first registered agent
complexity_preferred_agents [] Optional ordered list for complex/critical routing (for example ["codex","claude"])
sandbox_mode "danger-full-access" Sandbox policy: read-only, workspace-write, danger-full-access
approval_policy "auto-edit" Approval policy for agent actions

[agents.claude]

Field Default Description
cli_path "claude" Path to Claude Code CLI binary
default_model "sonnet" Default model for Claude agent
reasoning_budget Optional reasoning budget for per-phase model selection

[agents.codex]

Field Default Description
cli_path "codex" Path to Codex CLI binary

[agents.codex.cloud]

Field Default Description
enabled false Enable cloud execution mode
cache_ttl_hours 12 Setup phase cache TTL
setup_commands [] Commands run during cloud setup phase
setup_secret_env [] Env vars available during setup but removed for agent execution

[agents.review]

Field Default Description
enabled true Enable independent agent review after PR creation
reviewer_agent "codex" Agent used for review (must differ from implementor)
max_rounds 3 Maximum review-fix cycles

[review]

Field Default Description
enabled false Enable periodic whole-repo review
interval_hours 24 Hours between review cycles
agent Agent for review tasks (defaults to agents.default_agent)
strategy "single" Review mode: single (one reviewer) or cross (dual-review + synthesis)
timeout_secs 900 Per-turn timeout for review tasks

When enabled, the scheduler runs a background loop that:

  1. Checks for new commits since the last review (git log --since=<last_review>)
  2. If new commits exist, gathers repo structure, diff stats, and commit log
  3. Constructs a comprehensive review prompt and enqueues it as a task
  4. The agent reviews the entire codebase and may create a PR with fixes
  5. Logs a periodic_review event as checkpoint for the next cycle

If no new commits have landed since the last review, the cycle is skipped.

[gc]

Field Default Description
max_drafts_per_run 5 Max remediation drafts generated per GC cycle
budget_per_signal_usd 0.50 Budget cap per signal
total_budget_usd 5.0 Total budget cap per GC run
adopt_wait_secs 120 Wait time before adopting a draft
adopt_max_rounds 3 Max adoption retry rounds
draft_ttl_hours 72 Draft expiration time

[observe]

Field Default Description
session_renewal_secs 1800 Session renewal interval
log_retention_days 90 Log retention period

[otel]

Field Default Description
environment "development" Environment tag for traces
exporter "disabled" OTLP exporter: disabled, otlp-http, otlp-grpc
endpoint OTLP collector endpoint URL
log_user_prompt false Include user prompts in trace spans

[concurrency]

Field Default Description
max_concurrent_tasks 4 Global maximum concurrent tasks across all projects
max_queue_size 32 Maximum queued tasks before rejecting

[validation]

Field Default Description
pre_commit [] Commands run after agent changes (auto-detected if empty: cargo fmt, cargo check for Rust)
pre_push [] Commands run before pushing
timeout_secs 120 Validation command timeout
max_retries 2 Retry count on validation failure

[[projects]]

Field Required Default Description
name yes Unique project identifier
root yes Absolute path to project root (must be a git repo)
default no false Mark as default project
default_agent no Override agents.default_agent for this project
max_concurrent no Override concurrency.max_concurrent_tasks for this project

Task Execution Pipeline

1. POST /tasks                    → validate request, resolve project
2. TaskQueue.acquire()            → acquire per-project + global semaphore
3. WorkspaceManager.create()      → create isolated git worktree
4. Agent.execute_stream()         → agent runs in worktree (Claude/Codex/API)
5. PostValidator.run()            → cargo fmt, cargo check (language-detected)
   └─ on failure: retry up to max_retries, agent fixes issues
6. Agent creates PR               → git push + gh pr create
7. Codex review                   → independent review, up to max_rounds
   └─ on issues: agent fixes → Codex re-reviews → repeat
8. QualityGrader.score()          → compute quality grade (A/B/C/D/F)
9. WorkspaceManager.cleanup()     → remove worktree
10. Task status → done/failed

Scheduled Background Systems

Harness runs four background schedulers automatically when the server starts:

1. Periodic Review ([review])

Whole-repo code review on a timer. Disabled by default.

[review]
enabled = true
interval_hours = 24
# strategy = "cross"

What happens when enabled:

  • Every interval_hours, checks if new commits exist since the last review
  • If yes: gathers repo structure + diff stats + commit log → constructs review prompt → enqueues as a task
  • Agent reviews the entire codebase, may create a PR with fixes
  • If no new commits: cycle is skipped (no wasted resources)
  • Review events are logged to EventStore for audit trail

2. Health Tick (always on)

Every 24 hours, runs RuleEngine::scan() on the project root:

  • Checks all registered guard scripts against the codebase
  • Persists violations as rule_check events
  • Generates a health report with quality grade and violation summary
  • Logged as scheduler: periodic health report

3. GC Runner (always on)

Frequency adapts to code quality:

Grade Interval Meaning
A (≥90) 7 days Code is healthy, rare scans
B (≥75) 3 days Minor issues, moderate scanning
C (≥60) 1 day Needs attention, daily scans
D (<60) 1 hour Critical issues, aggressive scanning

Scans for violation signals → generates remediation drafts → optionally adopts fixes.

4. Self-Evolution Tick (always on)

Every 24 hours, Harness runs a learning cycle for the configured project:

  • Calls learn_rules to extract reusable guard/rule patterns from adopted drafts
  • Calls learn_skills to extract reusable execution skills from adopted drafts
  • Scores skill outcomes from recent skill_used events and task status, then updates governance_status (active / watch / quarantine / retired) with canary gating
  • Persists summary events as:
    • self_evolution_tick (rules learned / skills learned / skills scored)
    • skill_governance_tick (status distribution and transitions)

This is independent of manual GC commands: you can still run gc_run, gc_adopt, and learn_* on demand.

GC Learn Pipeline (Self-Improving Rules)

Harness can learn from its own execution history: detect recurring problems, generate fixes, and extract reusable rules/skills. This is a 4-step pipeline.

Prerequisites

  • Server running with accumulated task data (events.db)
  • RPC handshake required before each session:
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize"}'
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":2,"method":"initialized"}'

Step 1: Signal Detection (gc_run)

Scans the event store for recurring problem patterns:

curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":3,"method":"gc_run","params":{"project_id":null}}'

Detected signal types:

Signal Meaning Remediation
RepeatedWarn Same hook fires N+ warnings Guard script
ChronicBlock M+ hard blocks (CI failures) Rule
HotFiles Same files edited K+ times Skill
SlowSessions Operations exceed T ms Skill
WarnEscalation Warn rate exceeds baseline Rule
LinterViolations M+ violations of same rule Guard script

This call spawns an agent per signal to generate remediation drafts. May take several minutes depending on the number of signals and agent availability.

Step 2: Review Drafts (gc_drafts)

List generated drafts and their status:

curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":4,"method":"gc_drafts"}'

Draft statuses: pendingadopted | rejected | expired

You can also inspect drafts directly:

ls ~/Library/Application\ Support/harness/drafts/
# Each .json file contains: signal, rationale, artifacts (rules/guards/skills)

Step 3: Adopt or Reject (gc_adopt / gc_reject)

Adopt a draft to mark it as approved for learning:

# Adopt (also spawns a task to apply the fix)
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":5,"method":"gc_adopt","params":{"draft_id":"<DRAFT_ID>"}}'

# Reject
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":6,"method":"gc_reject","params":{"draft_id":"<DRAFT_ID>"}}'

Step 4: Extract Rules or Skills (learn_rules / learn_skills)

After drafts are adopted, extract reusable rules or skills from the remediation content:

# Extract guard rules from adopted drafts
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":7,"method":"learn_rules","params":{"project_root":"/path/to/project"}}'

# Extract reusable skills from adopted drafts
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":8,"method":"learn_skills","params":{"project_root":"/path/to/project"}}'

These calls invoke an agent to analyze adopted draft artifacts and produce:

  • Rules: Structured ## RULE_ID: Title blocks with severity, added to RuleEngine
  • Skills: Structured === skill: name === blocks, added to SkillStore

Full Pipeline Diagram

Events (task execution telemetry)
  ↓
Signal Detector (gc_run)
  ├→ RepeatedWarn
  ├→ ChronicBlock
  ├→ WarnEscalation
  └→ ...
  ↓
Draft Generation (agent analyzes signals)
  ↓
Drafts (pending)
  ↓ (user reviews)
gc_adopt / gc_reject
  ↓
Adopted Drafts
  ↓
learn_rules / learn_skills (agent extracts patterns)
  ↓
RuleEngine / SkillStore (permanently prevents recurrence)

Tips

  • Budget: Default budget_per_signal_usd = 0.50 may be too low for complex analysis. Increase to 1.0 in config/default.toml if drafts are truncated with "Exceeded USD budget".
  • Timing: Run gc_run after accumulating 50+ tasks for meaningful signals. Running too early produces noise.
  • learn_rules is synchronous: It blocks until the agent finishes. If other tasks are running, the agent may queue — consider running learn when the server is idle.
  • Manual review: Always inspect draft content before adopting. Draft quality depends on agent capability and available context.
  • Auto learning: Even without manual learn_* calls, scheduler ticks will run periodic self-evolution and log results in self_evolution_tick.

CLI Commands

# Start server
harness serve --transport http --port 9800 --config config/default.toml

# One-shot execution
harness exec "Fix the failing test in src/lib.rs"

# Rule engine
harness rule load .        # Load rules from project
harness rule check .       # Run rule checks

# GC cycle
harness gc run .           # Detect signals, generate remediation drafts

# Skills
harness skill list         # List discovered skills

# ExecPlan
harness plan init spec.md           # Initialize execution plan
harness plan status exec-plan.md    # Check plan status

# Version
harness --version

Troubleshooting

Server won't start: "sandbox_mode not supported on macOS"

macOS Seatbelt sandbox blocks Claude Code syscalls. Set sandbox_mode = "danger-full-access" in config.

Tasks fail with SIGTRAP

Started server from within Claude Code. Restart from a standalone terminal.

Codex review shows "unexpected argument"

Codex CLI updated. Check codex exec --help for current flags and update crates/harness-agents/src/codex.rs.

All tasks show no_pr status

PR extraction failed. Check server logs for agent output. Common cause: agent didn't create a PR (build failure, empty diff).

Tasks queued but not running

Global concurrency limit reached. Check with:

curl -s http://127.0.0.1:9800/api/dashboard | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'running={d[\"global\"][\"running\"]} max={d[\"global\"][\"max_concurrent\"]}')"

Increase [concurrency] max_concurrent_tasks in config if needed.