Skip to content

feat: dynamic Sugar docs fetcher + /refresh-docs endpoint for live RAG updates #91

@skypank-coder

Description

@skypank-coder

Problem

The RAG vectorstore loads documents only at server startup
from static PDF files. Two issues exist:

  1. Sugar activity creation docs were missing entirely —
    so questions like "How do I create a Sugar activity?"
    returned poor answers with no real context

  2. There is no way to update RAG knowledge without
    fully restarting the server — making doc updates
    painful in production

Proposed Solution

This issue tracks the addition of:

1. scripts/fetch_sugar_docs.py

  • Fetches latest Sugar activity docs live from GitHub API
  • Sources:
    • sugarlabs/sugar-docs → desktop-activity.md
    • sugarlabs/sugar-docs → web-activity.md
    • sugarlabs/sugar-docs → contributing.md
  • Converts markdown → clean plain text
  • Adds source URL + timestamp header to each file
  • Graceful error handling (404s, timeouts, network failures)
  • Optional GITHUB_TOKEN for higher API rate limits

2. POST /refresh-docs endpoint

  • Admin-only (requires can_change_model: true API key)
  • Re-fetches all Sugar docs from GitHub
  • Rebuilds FAISS vectorstore at runtime
  • Returns JSON status with docs refreshed + timestamp
  • Server keeps running — zero downtime

Testing Done

✅ All 3 docs fetch successfully (~35KB total)
✅ /refresh-docs endpoint returns success response
✅ FAISS vectorstore rebuilds correctly
✅ /ask?question=How do I create a Sugar activity?
now returns accurate Sugar activity guidance
✅ All existing endpoints unchanged

Related

Builds on top of #21 — extends static doc addition
with a live, maintainable fetching system

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions