git clone https://github.com/majiayu000/harness.git
cd harness
cargo build --releaseThe binary is at ./target/release/harness.
Important: Never start the server from within Claude Code or other agent sessions. The
CLAUDECODEandCLAUDE_CODE_ENTRYPOINTenvironment variables propagate to spawned agents and cause SIGTRAP crashes. Always use a standalone terminal.
./target/release/harness serve \
--transport http \
--port 9800 \
--project-root /path/to/your/projectCreate or edit config/default.toml:
[server]
transport = "stdio"
http_addr = "127.0.0.1:9800"
data_dir = "~/.local/share/harness"
[agents]
default_agent = "auto"
# complexity_preferred_agents = ["codex", "claude"]
sandbox_mode = "danger-full-access"
[agents.claude]
cli_path = "claude"
default_model = "sonnet"
[agents.codex]
cli_path = "codex"
[agents.review]
enabled = true
reviewer_agent = "codex"
max_rounds = 3
[gc]
max_drafts_per_run = 5
budget_per_signal_usd = 0.50
total_budget_usd = 5.0
draft_ttl_hours = 72
[observe]
log_retention_days = 90
[otel]
environment = "development"
exporter = "disabled"
[[projects]]
name = "my-app"
root = "/path/to/my-app"
default = true
max_concurrent = 2
[[projects]]
name = "my-lib"
root = "/path/to/my-lib"
max_concurrent = 1
# default_agent = "auto" # optional override; or set a registered agent nameStart with:
./target/release/harness serve \
--transport http \
--port 9800 \
--config config/default.toml./target/release/harness serve \
--transport http \
--port 9800 \
--project my-app=/path/to/my-app \
--project my-lib=/path/to/my-lib \
--default-project my-appCLI --project flags merge with config [[projects]] entries. CLI overrides config on name conflict.
Enable auto-review bot comments on PRs:
GITHUB_TOKEN=ghp_xxx ./target/release/harness serve \
--transport http \
--port 9800 \
--config config/default.tomlEnable the direct Anthropic API agent:
ANTHROPIC_API_KEY=sk-ant-xxx ./target/release/harness serve \
--transport http \
--port 9800 \
--config config/default.tomlFor ad-hoc work without a GitHub issue:
curl -X POST http://127.0.0.1:9800/tasks \
-H "Content-Type: application/json" \
-d '{
"project": "/path/to/project",
"prompt": "Add input validation to the user registration endpoint. Check email format, password strength (min 8 chars), and sanitize the username field.",
"description": "feat: input validation for registration"
}'Response:
{ "status": "running", "task_id": "a1b2c3d4-..." }The agent reads the issue title and body, then implements it:
curl -X POST http://127.0.0.1:9800/tasks \
-H "Content-Type: application/json" \
-d '{
"project": "/path/to/project",
"issue": 42,
"description": "fix: handle edge case in parser"
}'For reviewing or fixing an existing PR:
curl -X POST http://127.0.0.1:9800/tasks \
-H "Content-Type: application/json" \
-d '{
"project": "/path/to/project",
"pr": 100
}'Submit multiple issues at once:
curl -X POST http://127.0.0.1:9800/tasks/batch \
-H "Content-Type: application/json" \
-d '{
"project": "/path/to/project",
"issues": [10, 11, 12, 13]
}'Response:
[
{ "task_id": "...", "status": "running" },
{ "task_id": "...", "status": "queued" },
{ "task_id": "...", "status": "queued" },
{ "task_id": "...", "status": "queued" }
]Tasks respect concurrency limits — excess tasks are queued.
Get a snapshot of all projects and tasks:
curl -s http://127.0.0.1:9800/api/dashboard | python3 -m json.tool{
"global": {
"running": 3,
"queued": 1,
"done": 42,
"failed": 2,
"grade": "A",
"max_concurrent": 4,
"latest_pr": "https://github.com/owner/repo/pull/123"
},
"projects": [
{
"id": "my-app",
"root": "/path/to/my-app",
"tasks": { "running": 2, "queued": 1 }
},
{
"id": "my-lib",
"root": "/path/to/my-lib",
"tasks": { "running": 1, "queued": 0 }
}
]
}# Single task
curl http://127.0.0.1:9800/tasks/{task_id}
# All tasks
curl http://127.0.0.1:9800/tasksStream real-time output from a running task:
curl -N http://127.0.0.1:9800/tasks/{task_id}/streamcurl http://127.0.0.1:9800/healthcurl http://127.0.0.1:9800/api/projectsNo server restart needed:
curl -X POST http://127.0.0.1:9800/api/projects \
-H "Content-Type: application/json" \
-d '{
"id": "new-project",
"root": "/path/to/new-project",
"max_concurrent": 2,
"default_agent": "codex",
"active": true
}'curl -X DELETE http://127.0.0.1:9800/api/projects/new-project| Field | Default | Description |
|---|---|---|
transport |
"stdio" |
Transport protocol: stdio, http, or web_socket |
http_addr |
"127.0.0.1:9800" |
HTTP listen address |
data_dir |
"~/.local/share/harness" |
Data directory for SQLite databases |
project_root |
"." |
Default project root (single-project mode) |
github_webhook_secret |
— | HMAC-SHA256 secret for GitHub webhook verification |
notification_broadcast_capacity |
256 |
Internal notification channel capacity |
| Field | Default | Description |
|---|---|---|
default_agent |
"auto" |
Default execution agent; "auto" picks the first registered agent |
complexity_preferred_agents |
[] |
Optional ordered list for complex/critical routing (for example ["codex","claude"]) |
sandbox_mode |
"danger-full-access" |
Sandbox policy: read-only, workspace-write, danger-full-access |
approval_policy |
"auto-edit" |
Approval policy for agent actions |
| Field | Default | Description |
|---|---|---|
cli_path |
"claude" |
Path to Claude Code CLI binary |
default_model |
"sonnet" |
Default model for Claude agent |
reasoning_budget |
— | Optional reasoning budget for per-phase model selection |
| Field | Default | Description |
|---|---|---|
cli_path |
"codex" |
Path to Codex CLI binary |
| Field | Default | Description |
|---|---|---|
enabled |
false |
Enable cloud execution mode |
cache_ttl_hours |
12 |
Setup phase cache TTL |
setup_commands |
[] |
Commands run during cloud setup phase |
setup_secret_env |
[] |
Env vars available during setup but removed for agent execution |
| Field | Default | Description |
|---|---|---|
enabled |
true |
Enable independent agent review after PR creation |
reviewer_agent |
"codex" |
Agent used for review (must differ from implementor) |
max_rounds |
3 |
Maximum review-fix cycles |
| Field | Default | Description |
|---|---|---|
enabled |
false |
Enable periodic whole-repo review |
interval_hours |
24 |
Hours between review cycles |
agent |
— | Agent for review tasks (defaults to agents.default_agent) |
strategy |
"single" |
Review mode: single (one reviewer) or cross (dual-review + synthesis) |
timeout_secs |
900 |
Per-turn timeout for review tasks |
When enabled, the scheduler runs a background loop that:
- Checks for new commits since the last review (
git log --since=<last_review>) - If new commits exist, gathers repo structure, diff stats, and commit log
- Constructs a comprehensive review prompt and enqueues it as a task
- The agent reviews the entire codebase and may create a PR with fixes
- Logs a
periodic_reviewevent as checkpoint for the next cycle
If no new commits have landed since the last review, the cycle is skipped.
| Field | Default | Description |
|---|---|---|
max_drafts_per_run |
5 |
Max remediation drafts generated per GC cycle |
budget_per_signal_usd |
0.50 |
Budget cap per signal |
total_budget_usd |
5.0 |
Total budget cap per GC run |
adopt_wait_secs |
120 |
Wait time before adopting a draft |
adopt_max_rounds |
3 |
Max adoption retry rounds |
draft_ttl_hours |
72 |
Draft expiration time |
| Field | Default | Description |
|---|---|---|
session_renewal_secs |
1800 |
Session renewal interval |
log_retention_days |
90 |
Log retention period |
| Field | Default | Description |
|---|---|---|
environment |
"development" |
Environment tag for traces |
exporter |
"disabled" |
OTLP exporter: disabled, otlp-http, otlp-grpc |
endpoint |
— | OTLP collector endpoint URL |
log_user_prompt |
false |
Include user prompts in trace spans |
| Field | Default | Description |
|---|---|---|
max_concurrent_tasks |
4 |
Global maximum concurrent tasks across all projects |
max_queue_size |
32 |
Maximum queued tasks before rejecting |
| Field | Default | Description |
|---|---|---|
pre_commit |
[] |
Commands run after agent changes (auto-detected if empty: cargo fmt, cargo check for Rust) |
pre_push |
[] |
Commands run before pushing |
timeout_secs |
120 |
Validation command timeout |
max_retries |
2 |
Retry count on validation failure |
| Field | Required | Default | Description |
|---|---|---|---|
name |
yes | — | Unique project identifier |
root |
yes | — | Absolute path to project root (must be a git repo) |
default |
no | false |
Mark as default project |
default_agent |
no | — | Override agents.default_agent for this project |
max_concurrent |
no | — | Override concurrency.max_concurrent_tasks for this project |
1. POST /tasks → validate request, resolve project
2. TaskQueue.acquire() → acquire per-project + global semaphore
3. WorkspaceManager.create() → create isolated git worktree
4. Agent.execute_stream() → agent runs in worktree (Claude/Codex/API)
5. PostValidator.run() → cargo fmt, cargo check (language-detected)
└─ on failure: retry up to max_retries, agent fixes issues
6. Agent creates PR → git push + gh pr create
7. Codex review → independent review, up to max_rounds
└─ on issues: agent fixes → Codex re-reviews → repeat
8. QualityGrader.score() → compute quality grade (A/B/C/D/F)
9. WorkspaceManager.cleanup() → remove worktree
10. Task status → done/failed
Harness runs four background schedulers automatically when the server starts:
Whole-repo code review on a timer. Disabled by default.
[review]
enabled = true
interval_hours = 24
# strategy = "cross"What happens when enabled:
- Every
interval_hours, checks if new commits exist since the last review - If yes: gathers repo structure + diff stats + commit log → constructs review prompt → enqueues as a task
- Agent reviews the entire codebase, may create a PR with fixes
- If no new commits: cycle is skipped (no wasted resources)
- Review events are logged to EventStore for audit trail
Every 24 hours, runs RuleEngine::scan() on the project root:
- Checks all registered guard scripts against the codebase
- Persists violations as
rule_checkevents - Generates a health report with quality grade and violation summary
- Logged as
scheduler: periodic health report
Frequency adapts to code quality:
| Grade | Interval | Meaning |
|---|---|---|
| A (≥90) | 7 days | Code is healthy, rare scans |
| B (≥75) | 3 days | Minor issues, moderate scanning |
| C (≥60) | 1 day | Needs attention, daily scans |
| D (<60) | 1 hour | Critical issues, aggressive scanning |
Scans for violation signals → generates remediation drafts → optionally adopts fixes.
Every 24 hours, Harness runs a learning cycle for the configured project:
- Calls
learn_rulesto extract reusable guard/rule patterns from adopted drafts - Calls
learn_skillsto extract reusable execution skills from adopted drafts - Scores skill outcomes from recent
skill_usedevents and task status, then updatesgovernance_status(active/watch/quarantine/retired) with canary gating - Persists summary events as:
self_evolution_tick(rules learned / skills learned / skills scored)skill_governance_tick(status distribution and transitions)
This is independent of manual GC commands: you can still run gc_run, gc_adopt, and learn_* on demand.
Harness can learn from its own execution history: detect recurring problems, generate fixes, and extract reusable rules/skills. This is a 4-step pipeline.
- Server running with accumulated task data (
events.db) - RPC handshake required before each session:
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize"}'
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":2,"method":"initialized"}'Scans the event store for recurring problem patterns:
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":3,"method":"gc_run","params":{"project_id":null}}'Detected signal types:
| Signal | Meaning | Remediation |
|---|---|---|
RepeatedWarn |
Same hook fires N+ warnings | Guard script |
ChronicBlock |
M+ hard blocks (CI failures) | Rule |
HotFiles |
Same files edited K+ times | Skill |
SlowSessions |
Operations exceed T ms | Skill |
WarnEscalation |
Warn rate exceeds baseline | Rule |
LinterViolations |
M+ violations of same rule | Guard script |
This call spawns an agent per signal to generate remediation drafts. May take several minutes depending on the number of signals and agent availability.
List generated drafts and their status:
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":4,"method":"gc_drafts"}'Draft statuses: pending → adopted | rejected | expired
You can also inspect drafts directly:
ls ~/Library/Application\ Support/harness/drafts/
# Each .json file contains: signal, rationale, artifacts (rules/guards/skills)Adopt a draft to mark it as approved for learning:
# Adopt (also spawns a task to apply the fix)
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":5,"method":"gc_adopt","params":{"draft_id":"<DRAFT_ID>"}}'
# Reject
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":6,"method":"gc_reject","params":{"draft_id":"<DRAFT_ID>"}}'After drafts are adopted, extract reusable rules or skills from the remediation content:
# Extract guard rules from adopted drafts
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":7,"method":"learn_rules","params":{"project_root":"/path/to/project"}}'
# Extract reusable skills from adopted drafts
curl -X POST http://127.0.0.1:9800/rpc -H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":8,"method":"learn_skills","params":{"project_root":"/path/to/project"}}'These calls invoke an agent to analyze adopted draft artifacts and produce:
- Rules: Structured
## RULE_ID: Titleblocks with severity, added toRuleEngine - Skills: Structured
=== skill: name ===blocks, added toSkillStore
Events (task execution telemetry)
↓
Signal Detector (gc_run)
├→ RepeatedWarn
├→ ChronicBlock
├→ WarnEscalation
└→ ...
↓
Draft Generation (agent analyzes signals)
↓
Drafts (pending)
↓ (user reviews)
gc_adopt / gc_reject
↓
Adopted Drafts
↓
learn_rules / learn_skills (agent extracts patterns)
↓
RuleEngine / SkillStore (permanently prevents recurrence)
- Budget: Default
budget_per_signal_usd = 0.50may be too low for complex analysis. Increase to1.0inconfig/default.tomlif drafts are truncated with "Exceeded USD budget". - Timing: Run
gc_runafter accumulating 50+ tasks for meaningful signals. Running too early produces noise. - learn_rules is synchronous: It blocks until the agent finishes. If other tasks are running, the agent may queue — consider running learn when the server is idle.
- Manual review: Always inspect draft content before adopting. Draft quality depends on agent capability and available context.
- Auto learning: Even without manual
learn_*calls, scheduler ticks will run periodic self-evolution and log results inself_evolution_tick.
# Start server
harness serve --transport http --port 9800 --config config/default.toml
# One-shot execution
harness exec "Fix the failing test in src/lib.rs"
# Rule engine
harness rule load . # Load rules from project
harness rule check . # Run rule checks
# GC cycle
harness gc run . # Detect signals, generate remediation drafts
# Skills
harness skill list # List discovered skills
# ExecPlan
harness plan init spec.md # Initialize execution plan
harness plan status exec-plan.md # Check plan status
# Version
harness --versionmacOS Seatbelt sandbox blocks Claude Code syscalls. Set sandbox_mode = "danger-full-access" in config.
Started server from within Claude Code. Restart from a standalone terminal.
Codex CLI updated. Check codex exec --help for current flags and update crates/harness-agents/src/codex.rs.
PR extraction failed. Check server logs for agent output. Common cause: agent didn't create a PR (build failure, empty diff).
Global concurrency limit reached. Check with:
curl -s http://127.0.0.1:9800/api/dashboard | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'running={d[\"global\"][\"running\"]} max={d[\"global\"][\"max_concurrent\"]}')"Increase [concurrency] max_concurrent_tasks in config if needed.