Skip to content

Eliminate Idle EC2 Costs along with Simplyfying the Architecture #90

@MostlyKIGuess

Description

@MostlyKIGuess

Some Background

  • Last year, the entire Sugar-AI platform (frontend and backend) ran on a single 24/7 AWS EC2 instance. This led to burning through cloud credits because we pay hourly rates even when no one was using the AI.
  • To fix this simply and with using open-source models, we will split the application into two parts: a 24/7 free frontend, and an auto-sleeping backend.

Plan

  1. Frontend on GitHub Pages (Free & Always On): The website UI will be extracted and hosted statically on GitHub Pages. Even if the backend AI server is turned off to save money, users will still see the website.
  2. Backend on EC2 (Auto-Sleeping): The heavy FastAPI/PyTorch backend stays on EC2, but we configure it to automatically turn itself off after 30 minutes of inactivity.
  3. The "Wake Up" Button: To turn the server back on, the GitHub Pages UI will have a "Start AI Server" button. Clicking this hits a single, extremely simple AWS Lambda URL that starts the EC2 instance if it's turned off, that is if the user faces any issues if using the AI chat from the UI itself.

Tasks

Phase 1: Decouple Frontend & Database calls to GitHub Pages

  • Move HTML files from app/templates into standard static files in a docs/ for GitHub Pages.
  • Database Connection Shift: Since GitHub won't be able to run Python nor connect to SQLite, we must replace the Jinja2 backend-rendered data (like the User API Key dashboard) with REST API calls.
  • Write straightforward JavaScript (fetch()) inside the HTML files to query your EC2 server (e.g., GET /api/user) for data like API keys/quotas, and render it dynamically into the DOM.
  • Verify CORS headers inside app/main.py allow requests from our GitHub Pages URL.

Phase 2: Implement "Auto-Sleep" for the Backend

  • Inside AWS Console, navigate to CloudWatch and create an Alarm for our EC2 instance. Set the metric to: CPUUtilization < 5% for 30 consecutive minutes. Set the Alarm Action to: Stop this instance. (Now the server will automatically shut down if idle, saving 90%+ of our cloud costs).

Phase 3: Implement the "Wake Server" Button

Finally:

  • Add a "Wake Up Server" button in our GitHub Pages HTML that sends a POST request to this Lambda Function URL.
    (When clicked, users will wait ~2-3 minute for the EC2 to boot up).

Final Acceptance

  • GitHub Pages URL always loads the UI instantly.
  • If the backend is off, the user is presented with the "Wake Up Server" button.
  • Clicking the button starts the EC2 instance, and the UI polls until the API connects.
  • EC2 automatically turns off after 30 minutes of no interaction.

Discussions

  • Because this will be public, we might still run into issues if someone decides to keep a bot to click our wake up button. To solve this we should add a shared secret X-Wake-Token like, which bots discovering would not have but the GH pages JS when Humans interact should be able to send it. For stronger protection, reserved concurrency throttling on the Lambda can be added as a second layer because someone smart can still lookup the token as it will be public or we can have a private repository to host the site.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions