Skip to content

Latest commit

 

History

History
264 lines (216 loc) · 6.9 KB

File metadata and controls

264 lines (216 loc) · 6.9 KB

System Architecture

Overview

This system provides an LLM-powered analytics platform that queries educational data, generates insights, and learns from user interactions through a feedback loop.

Architecture Diagram

graph TB
    subgraph "Data Layer"
        META[Metadata<br/>ar_data_dictionary.csv]
        CSV1[Synthetic Data<br/>ar_courses.csv]
        CSV2[ar_financial_aid.csv]
        CSV3[ar_cohorts.csv]
    end

    subgraph "Database Layer"
        SQL[(SQL Server<br/>InsightAlchemists DB)]
        T1[ar_courses<br/>90k+ rows]
        T2[ar_financial_aid<br/>300+ rows]
        T3[ar_cohorts]
        T4[llm_debug_logs]
        T5[weekly_trends]

        SQL --> T1
        SQL --> T2
        SQL --> T3
        SQL --> T4
        SQL --> T5
    end

    subgraph "LLM Application Layer"
        LLM[LLM Engine]
        PROMPT[Prompt Library<br/>- Query templates<br/>- Analysis patterns<br/>- Viz specifications]
        QUERY[Query Generator]
        VIZ[Visualization Engine<br/>- Charts<br/>- Tables<br/>- Dashboards]

        LLM <--> PROMPT
        LLM --> QUERY
        LLM --> VIZ
    end

    subgraph "Analytics & Learning Layer"
        LOGS[Conversation Logger]
        TRENDS[Trend Analyzer<br/>- Topic extraction<br/>- Weekly aggregation]
        FEEDBACK[Feedback Processor]
    end

    subgraph "User Interface"
        USER[User]
        DEPT[User Department<br/>Context]
    end

    %% Data flow
    META -.->|Schema info| LLM
    CSV1 -.->|Initial load| SQL
    CSV2 -.->|Initial load| SQL
    CSV3 -.->|Initial load| SQL

    USER -->|Ask question| LLM
    DEPT -.->|Context| LLM

    LLM -->|Generate SQL| QUERY
    QUERY -->|Execute query| SQL
    SQL -->|Return data| LLM

    LLM -->|Generate viz| VIZ
    VIZ -->|Display| USER

    %% Logging & feedback loop
    LLM -->|Log interaction| LOGS
    LOGS -->|Store| T4
    USER -->|Provide feedback| FEEDBACK
    FEEDBACK -->|Store| T4

    T4 -.->|Extract topics| TRENDS
    TRENDS -->|Aggregate| T5

    %% Learning loop
    T4 -.->|Historical context| PROMPT
    T5 -.->|Popular topics| PROMPT
    PROMPT -.->|Improved prompts| LLM

    style LLM fill:#4A90E2
    style SQL fill:#50C878
    style USER fill:#FFB347
    style T4 fill:#FF6B6B
    style T5 fill:#FF6B6B
Loading

Data Flow

1. Query Flow

sequenceDiagram
    participant U as User
    participant L as LLM
    participant P as Prompt Library
    participant Q as Query Generator
    participant S as SQL Server
    participant V as Viz Engine

    U->>L: Ask: "What's the average GPA by race?"
    L->>P: Retrieve relevant prompt template
    P-->>L: Query pattern + context
    L->>Q: Generate SQL query
    Q->>S: Execute: SELECT Race, AVG(GPA)...
    S-->>Q: Return results
    Q-->>L: Data payload
    L->>V: Generate visualization spec
    V-->>U: Display chart + insights
Loading

2. Logging & Feedback Flow

sequenceDiagram
    participant U as User
    participant L as LLM
    participant LOG as Logger
    participant DB as llm_debug_logs
    participant T as Trend Analyzer
    participant WT as weekly_trends

    L->>LOG: Log interaction
    U->>LOG: Submit feedback
    LOG->>DB: INSERT (timestamp, user, logs, dept, feedback)

    Note over T: Runs periodically
    T->>DB: Query logs from past week
    DB-->>T: Return conversations
    T->>T: Extract topics & count
    T->>WT: INSERT/UPDATE (topic, count, week_dates)

    Note over L: Next session
    L->>WT: Query popular topics
    WT-->>L: Trending questions
    L->>L: Adjust prompts & suggestions
Loading

3. Learning Loop

graph LR
    A[User Question] --> B[LLM Processing]
    B --> C[Generate Query]
    C --> D[Execute & Return]
    D --> E[Log Interaction]
    E --> F[Store in llm_debug_logs]

    F --> G{User Feedback?}
    G -->|Yes| H[Update log with feedback]
    G -->|No| I[Track as answered]

    H --> J[Weekly Analysis]
    I --> J

    J --> K[Update weekly_trends]
    K --> L[Identify patterns]
    L --> M[Update Prompt Library]
    M --> B

    style B fill:#4A90E2
    style F fill:#FF6B6B
    style K fill:#FF6B6B
    style M fill:#50C878
Loading

Component Details

Database Tables

ar_courses

  • Course-level enrollment data
  • 90,411 rows
  • Used for: enrollment analysis, course performance, student tracking

ar_financial_aid

  • Financial aid information per student
  • 300+ rows
  • Used for: aid distribution analysis, cost analysis, need assessment

ar_cohorts

  • Cohort-level student outcomes
  • Used for: retention analysis, graduation tracking, student success metrics

llm_debug_logs

- id: INT (auto-increment)
- timestamp: DATETIME2 (conversation time)
- user: VARCHAR(255) (user identifier)
- summarized_logs: NVARCHAR(MAX) (conversation summary)
- user_department: VARCHAR(100) (user's department/role)
- user_feedback: NVARCHAR(1000) (user's rating/comments)

weekly_trends

- id: INT (auto-increment)
- question_topic: VARCHAR(255) (topic/category)
- count: INT (number of queries)
- week_start_date: DATE
- week_end_date: DATE
- created_at: DATETIME2
- updated_at: DATETIME2

LLM Application Components

Prompt Library

  • Pre-built query templates for common questions
  • Context-aware prompts based on user department
  • Dynamic prompts that evolve based on weekly_trends
  • Visualization specifications for different data types

Query Generator

  • Translates natural language to SQL
  • Uses metadata schema for accurate column mapping
  • Validates queries before execution
  • Handles complex joins and aggregations

Visualization Engine

  • Generates charts (bar, line, pie, scatter)
  • Creates data tables with formatting
  • Builds dashboards for multi-metric views
  • Supports export to various formats

Feedback Loop Benefits

  1. Improved Query Accuracy: Learn from successful/failed queries
  2. Topic Prioritization: Surface most-asked questions to users
  3. Prompt Optimization: Refine templates based on user feedback
  4. Department-Specific Insights: Tailor responses by user role
  5. Trend Detection: Identify emerging questions and data needs

Example Workflows

Workflow 1: First-Time User

User asks → LLM checks weekly_trends for similar topics
→ Suggests refined question → Generates query
→ Returns viz → Logs interaction

Workflow 2: Repeat Question

User asks common question → LLM recognizes from trends
→ Uses optimized prompt template
→ Faster, better response → Logs positive interaction

Workflow 3: Department-Specific

Finance dept user → LLM loads finance-specific context
→ Prioritizes cost/aid queries
→ Returns relevant metrics → Logs by department

Technology Stack

  • Database: SQL Server 2019
  • Container: Docker + Docker Compose
  • Python: pyodbc for database connectivity
  • Data Format: CSV for synthetic data
  • LLM: (To be specified - GPT-4, Claude, etc.)
  • Visualization: (To be specified - matplotlib, plotly, etc.)