Add a minimal runtime host control surface so Harness can track external executors and safely lease pending tasks to them.
Harness is strong as a centralized control plane, but runtime host lifecycle is implicit today. We need explicit host registration and claim semantics before scaling execution beyond directly attached local agents.
- In-memory runtime host registry with:
- host
register - host
deregister - host
heartbeat - host listing with
onlinestatus derived from heartbeat freshness
- In-memory task lease manager for pending tasks:
- runtime host can claim one pending task
- duplicate claim prevention across hosts
- lease TTL support
- HTTP API endpoints:
GET /api/runtime-hostsPOST /api/runtime-hosts/registerPOST /api/runtime-hosts/{id}/heartbeatPOST /api/runtime-hosts/{id}/deregisterPOST /api/runtime-hosts/{id}/tasks/claim
- Tests for lease correctness and heartbeat-driven status.
- Persisting runtime hosts and leases across restart.
- Remote executor callback APIs (
start/progress/complete/fail). - Scheduler integration that auto-routes all tasks to runtime hosts.
- Workspace/multi-tenant model migration.
RuntimeHost
id: Stringdisplay_name: Stringcapabilities: Vec<String>registered_at: DateTime<Utc>last_heartbeat_at: DateTime<Utc>
TaskLease
task_id: TaskIdhost_id: Stringclaimed_at: DateTime<Utc>expires_at: DateTime<Utc>
registeris idempotent byhost_id(upsert metadata and refresh heartbeat).heartbeaton unknown host returns error.deregisterremoves host and all leases owned by host.claimselects only tasks inpendingstatus.- A non-expired lease blocks claims by other hosts.
- Expired leases are reclaimable by any host.
GET /api/runtime-hosts
hosts: [{ id, display_name, capabilities, registered_at, last_heartbeat_at, online }]onlineis computed asnow - last_heartbeat_at <= heartbeat_timeout_secs
POST /api/runtime-hosts/{id}/tasks/claim
- success:
{ claimed: true, task_id, lease_expires_at } - none available:
{ claimed: false }
- In-memory leases can be lost on restart.
- Claim fairness is best-effort because pending selection is linear scan.
- Unit tests for manager:
- duplicate claim blocked while lease active
- claim succeeds after lease expiry
- deregister clears leases
- Handler tests:
- register + heartbeat + list returns online host
- claim endpoint returns expected payload and prevents double-claim