Skip to content

Onboarding Agent --help Crawl Discovery & GitHub CLI Agent#2193

Open
TalZaccai wants to merge 32 commits intomainfrom
talzacc/onboarding_agent
Open

Onboarding Agent --help Crawl Discovery & GitHub CLI Agent#2193
TalZaccai wants to merge 32 commits intomainfrom
talzacc/onboarding_agent

Conversation

@TalZaccai
Copy link
Copy Markdown
Contributor

@TalZaccai TalZaccai commented Apr 14, 2026

CLI Help Crawl Discovery & GitHub CLI Agent

Extends the onboarding agent with CLI application discovery via --help crawling and adds a fully functional github-cli agent as the first integration built with this workflow.

Onboarding Agent — CLI Discovery

  • Add CrawlCliHelp action that recursively crawls --help / -h output of any CLI tool
  • Parse subcommands, flags, and descriptions into DiscoveredAction[] / ApiSurface
  • Cycle detection via visited set and default maxDepth of 4 to prevent infinite recursion
  • Auto-generate CLI handler from discovery data — buildCliHandler() reads api-surface.json and produces working buildArgs()/runCli() with switch-cases for all discovered actions
  • CLI handler template extracted to cliHandler.template file on disk for easier maintenance
  • Boolean flag codegen — correctly generates flag-only push for boolean params vs value flags
  • Better manifest defaults — defaultEnabled: true, configurable emojiChar parameter
  • Fix all 6 LLM model factories to use createChatModel() with optional endpoint param for configurable model selection
  • Fix test runner (runTests.ts) to load .env for API keys, add missing ClientIO interaction methods
  • TypeChat migration — replace raw model.complete() + regex JSON extraction with createJsonTranslator<CliDiscoveryResult> for strong typing, auto-validation, and retry with error correction
  • Entity extraction — LLM now identifies domain entities (repos, issues, users) alongside actions, stored in ApiSurface.entities for downstream phases
  • Safety guardrail-h fallback only attempted for known-safe CLIs (allowlist); unknown CLIs only get --help
  • Debug truncation warning when help output exceeds 12K chars

GitHub CLI Agent (github-cli)

  • Scaffold agent through all 7 onboarding phases using gh as the CLI target (51 discovered actions)
  • Real gh CLI execution via execFileAsync with 30s timeout
  • Read actions: repo search/view, issue list/view, PR list/view/diff/checks, release list, gist list, workflow list/runs, org list, extension list, auth status, SSH key list, GPG key list, dependabot alerts, contributors, latest-N queries
  • Write actions: issue create/close/reopen/comment, PR create-draft/close/merge/checkout, repo fork/star
  • distillRepoField() for focused answers to specific questions about repos
  • Rich output formatting: markdown hyperlinks, HTML tables, colored status badges
  • Simplified grammar: 674 → 292 lines (57% reduction), 30 grammar rules, 98% test pass rate (253/258)
  • Fix test dispatcher to use path-based agent resolution for locally-developed agents
  • README with prerequisites, actions table, examples, and demo instructions

Demo & Documentation

  • Shell and CLI demo scripts
  • github-cli/README.md with full action reference
  • Updated onboarding/README.md with CLI help crawl documentation

Test Results

  • Grammar matching: 253/258 (98%) — 5 failures are genuinely ambiguous phrases (extensionInstallauthLogin, statusPrintissueList)
  • Build: Clean TypeScript compilation, no errors
  • All 7 onboarding phases: Marked approved in state
  • CI: build-ts ✅, repo policy ✅, license/cla ✅

TalZaccai and others added 23 commits April 13, 2026 23:24
- Add crawlCliHelp action to discover actions from CLI --help output
- Implement recursive help crawling with configurable depth
- Parse subcommands and flags from help text
- Fix LLM model selection for schema/grammar generation (use createChatModel instead of createChatModelDefault to avoid json_object format constraint)
- Fix Windows path bug in scaffolder (use fileURLToPath instead of URL.pathname)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Scaffold github-cli agent with 51 actions from gh CLI help crawl
- Fix schema (remove block comments for asc) and grammar (proper .agr format)
- Register github-cli agent in config.json and defaultAgentProvider deps
- Set defaultEnabled: true in manifest
- Fix test dispatcher to enable target integration agent in commands list

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace getExternalAppAgentProviders with getTestAgentProvider using
  path-based resolution via createNpmAppAgentProvider, avoiding circular
  dependency with default-agent-provider
- Fix AGENTS_DIR path resolution using path.resolve pattern (matches
  scaffolderHandler.ts) instead of new URL which strips filename
- Remove @ prefix from test commands so phrases go through grammar
  matching instead of command handler routing
- Enable cache for grammar matching (was disabled, which skipped
  grammar match and fell through to slow LLM translation)
- Result: 248/258 tests pass (96%) for github-cli agent

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ments

- Implement full gh CLI execution handler with buildArgs() for all 55 actions
- Add prList, prView, issueList, issueView, repoView action types to schema
- Add grammar rules for PR/issue list/view, repo view, API request, contributors
- Smart JSON formatting for repo view (--json) and API array responses
- Auto-expand owner/repo to /repos/owner/repo/contributors for API requests
- Fix test dispatcher: enable schemas+actions, use temp persistDir, validate routing
- Add standalone test runner (runTests.ts) for bypassing MCP timeout

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Demo walks through status, stars, contributors, PRs, issues, search,
and auth - showcasing the onboarding agent's auto-generated github-cli
agent with real gh CLI execution.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When asking 'how many stars/forks does X have' or 'what language is X
written in', the agent now returns a focused one-line answer instead
of dumping all repo stats. General 'show repository X' still returns
the full overview.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR list items link to their pull request page, issue list items link
to their issue page, and search results link to the repository. Each
entry shows a result count header and relevant metadata (state, branch,
labels, stars).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- 'top N contributors' returns exactly N results with a bold header
- Issue view shows rich markdown: linked title, author, labels, body
- gh status output has bold section headers (Assigned Issues, etc.)
- ApiRequest supports limit param via query string for pagination

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…kout

- Add PrCheckoutAction, RepoForkAction, StarRepoAction types
- Add draft parameter to PrCreateAction
- Add repo parameter to IssueCreateAction, IssueCloseAction, IssueReopenAction
- Add repo parameter to ReleaseListAction
- Implement repoFork, starRepo/unstarRepo, prCheckout handlers
- Add rich prView formatting with JSON output
- Add getMutationSuccessMessage for friendly write action feedback
- Default empty body on issue/PR create for non-interactive use
- Expand grammar with natural write action phrasings
- Update demo scripts with PR view, top contributors, star/unstar

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix gh status: parse │-table into clean markdown with bold headers, links, emoji
- Fix contributors: add hyperlinks to GitHub profile pages
- Add 'latest N PRs/issues' grammar rules with limit parameter
- Add DependabotAlertsAction with severity/state filters and color-coded output
- Handle empty dependabot results with friendly message
- Update demo scripts with new features (microsoft/TypeAgent only)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reduce to 1-3 phrasings per action, matching the style of other agents
(player, calendar, email). The dispatcher's LLM translation fallback
handles natural language variations that aren't covered by grammar rules.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Scaffolder generates working buildArgs()/runCli() handler for CLI agents
  using path and parameters from api-surface.json discovery data
- Changed manifest defaults: defaultEnabled=true, configurable emojiChar
- Fixed getPackagingModel() to use createChatModel() (avoids json_object
  constraint on non-JSON prompts)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Switch getDiscoveryModel() and getPhraseGenModel() from
  createChatModelDefault() to createChatModel() to avoid json_object
  response format constraint
- Load ts/.env in runTests.ts so LLM translation fallback works
- Remove duplicate __filename declaration and debug logging
- Post-simplification retest: 254/258 (98%), up from 249/258 (97%)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add phrasings for 'status of issues and PRs', 'latest updates on
repositories', and 'summary of issues' to reduce LLM misrouting
statusPrint -> issueList. 3 remaining failures are inherently
ambiguous phrases that the LLM reasonably maps to issueList.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- github-cli: prerequisites, supported actions table, example phrases,
  output formatting notes, demo instructions
- onboarding: document CLI help crawl discovery, auto-generated handler,
  add example phrases for CLI crawling

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…methods, extra brace

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- buildCliHandler: generate flag-only push for boolean params
- crawlCliRecursive: add visited set and default maxDepth of 4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restores: For Best Results, Agent Patterns, TODO, Trademarks sections.
Updates phase 1 description and TODO to reflect CLI --help as implemented.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace corrupted ΓÇö bytes (0xCE93 C387 C3B6) with proper UTF-8
em dash U+2014 (0xE2 0x80 0x94) across 19 occurrences.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@TalZaccai TalZaccai changed the title Onboarding Agent Help Crawl Discovery & GitHub CLI Agent Onboarding Agent --help Crawl Discovery & GitHub CLI Agent Apr 14, 2026
@TalZaccai TalZaccai requested a review from Copilot April 14, 2026 19:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Extends the onboarding agent to discover CLI APIs by recursively crawling --help/-h output, and adds a new github-cli agent integration that executes real gh commands based on the generated schema/grammar.

Changes:

  • Added CLI help crawling discovery flow (crawlCliHelp) and updated onboarding messaging/docs accordingly.
  • Enhanced scaffolding to generate a CLI-backed action handler when CLI-sourced discovery data is present; added manifest defaults and emojiChar support.
  • Added a new github-cli-agent package, demo scripts, and updated default agent provider registration.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
ts/packages/shell/demo/github_cli.txt Adds a shell demo script for the new GitHub CLI agent.
ts/packages/defaultAgentProvider/package.json Registers github-cli-agent as a workspace dependency.
ts/packages/defaultAgentProvider/data/config.json Adds github-cligithub-cli-agent mapping so it can be resolved by name.
ts/packages/cli/demo/github_cli.txt Adds a CLI demo script for the GitHub CLI agent.
ts/packages/agents/onboarding/src/testing/testingHandler.ts Updates test dispatcher creation to resolve local agents by path and adjusts routing logic for tests.
ts/packages/agents/onboarding/src/testing/runTests.ts Adds a standalone test runner that loads .env and runs phrase→action routing.
ts/packages/agents/onboarding/src/scaffolder/scaffolderSchema.ts Adds emojiChar to scaffolder parameters.
ts/packages/agents/onboarding/src/scaffolder/scaffolderHandler.ts Loads discovery artifacts to generate a CLI handler; updates manifest defaults; adds CLI handler codegen.
ts/packages/agents/onboarding/src/onboardingActionHandler.ts Wires crawlCliHelp into the onboarding action router and hints.
ts/packages/agents/onboarding/src/lib/llm.ts Switches model factories to createChatModel() with tags.
ts/packages/agents/onboarding/src/discovery/discoverySchema.ts Adds crawlCliHelp action type to the discovery schema.
ts/packages/agents/onboarding/src/discovery/discoverySchema.agr Adds grammar rules to invoke crawlCliHelp.
ts/packages/agents/onboarding/src/discovery/discoveryHandler.ts Implements recursive CLI help crawling + LLM extraction into ApiSurface.
ts/packages/agents/onboarding/README.md Documents CLI help discovery and updates phase table and TODOs.
ts/packages/agents/github-cli/tsconfig.json Adds TypeScript project config for the new agent package.
ts/packages/agents/github-cli/src/tsconfig.json Adds TS build config for the agent source → dist output.
ts/packages/agents/github-cli/src/github-cliSchema.ts Introduces the GitHub CLI actions schema (generated).
ts/packages/agents/github-cli/src/github-cliSchema.agr Introduces the GitHub CLI action grammar (generated).
ts/packages/agents/github-cli/src/github-cliManifest.json Adds the agent manifest (emoji, schema/grammar dist locations).
ts/packages/agents/github-cli/src/github-cliActionHandler.ts Implements real gh execution and rich formatting for results.
ts/packages/agents/github-cli/package.json Adds build scripts and dependencies for github-cli-agent.
ts/packages/agents/github-cli/README.md Adds prerequisites, supported actions, examples, and demo instructions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix path.resolve to use path.dirname() for correct directory resolution
- Use Date.now() suffix for unique test tmpDir per run
- Fix emojiChar comment to match actual default (🔎)
- Fix maxDepth comment to reflect default of 4
- Fix codegen: use === true for booleans, !== undefined/null for values

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Run prettier --write on onboarding-agent and github-cli files
to fix code style issues detected by build-ts lint step.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@TalZaccai TalZaccai temporarily deployed to development-fork April 14, 2026 20:09 — with GitHub Actions Inactive
@TalZaccai TalZaccai temporarily deployed to development-fork April 14, 2026 20:09 — with GitHub Actions Inactive
@TalZaccai TalZaccai requested a review from robgruen April 14, 2026 20:16
@robgruen
Copy link
Copy Markdown
Collaborator

@TalZaccai , I took a look...let me know if you have any questions about any of my comments.

- Add copyright header to runTests.ts
- Add homepage and repository fields to github-cli package.json
- Sort github-cli and defaultAgentProvider package.json keys
- Fix alphabetical ordering of github-cli-agent dependency

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move CLI handler template generator (buildCliHandler + flagToCamel) from
scaffolderHandler.ts into its own file for easier maintenance per PR review.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract the auto-generated handler code from an inline string literal
into cliHandler.template — a standalone file that's easier to read and
maintain. buildCliHandler now reads the template from disk and
interpolates {{NAME}}, {{PASCAL_NAME}}, {{CLI_COMMAND}}, {{SWITCH_CASES}}.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- llm.ts: All model factories now accept optional endpoint param to
  override the default model (e.g. 'openai:gpt-5' for best results)
- discoveryHandler.ts: Add debug('typeagent:onboarding:discovery')
  warning when CLI help output is truncated before LLM extraction
- Add debug + @types/debug dependencies

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TalZaccai and others added 3 commits April 14, 2026 16:10
Replace raw model.complete() + regex JSON parsing with TypeChat's
createJsonTranslator<CliDiscoveryResult>. This gives:
- Strong typing via TypeScript schema validation
- Automatic JSON validation against the CliDiscoveryResult type
- Built-in error correction with retry on validation failure
- No more fragile regex extraction of JSON from LLM output

New file discoveryLlmSchema.ts defines the schema that TypeChat reads
as text and sends to the LLM alongside the prompt.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Only attempt -h fallback for CLIs in a known-safe allowlist (gh, git,
az, kubectl, docker, etc.). For unknown CLIs, only --help is tried
since -h can mean something else in some tools. Logs a debug warning
when the short flag is skipped.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extend CliDiscoveryResult schema with optional entities field so the
TypeChat translator also identifies domain entities (repos, issues,
users, etc.) from CLI help output. Entities are stored in ApiSurface
alongside actions, deduplicated by name when merging.

Downstream phases can optionally consume entities for richer grammar
generation (noun recognition) and test phrase generation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@TalZaccai TalZaccai temporarily deployed to development-fork April 14, 2026 23:29 — with GitHub Actions Inactive
@TalZaccai TalZaccai requested a review from robgruen April 15, 2026 00:29
function createCliDiscoveryTranslator(): TypeChatJsonTranslator<CliDiscoveryResult> {
const model = getDiscoveryModel();
// At runtime __dirname is dist/discovery/; resolve back to src/discovery/
const schemaPath = path.resolve(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have loadSchema() in schema.ts that does this, maybe use that here?

// CLIs where both --help and -h are known to be safe help flags.
// For unlisted commands only --help is attempted (since -h can mean
// something else, e.g. -h is "human-readable" in some Unix tools).
const SAFE_SHORT_HELP_CLIS = new Set([
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if the user wants to do this for something that's not in this list we popup a "are you sure you want to do this?" I think that means we have to wire up the yes/no questions from clientIO through to reasoning. Maybe that's a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants