System Architecture

Overview

This system provides an LLM-powered analytics platform that queries educational data, generates insights, and learns from user interactions through a feedback loop.

Architecture Diagram

graph TB
    subgraph "Data Layer"
        META[Metadata<br/>ar_data_dictionary.csv]
        CSV1[Synthetic Data<br/>ar_courses.csv]
        CSV2[ar_financial_aid.csv]
        CSV3[ar_cohorts.csv]
    end

    subgraph "Database Layer"
        SQL[(SQL Server<br/>InsightAlchemists DB)]
        T1[ar_courses<br/>90k+ rows]
        T2[ar_financial_aid<br/>300+ rows]
        T3[ar_cohorts]
        T4[llm_debug_logs]
        T5[weekly_trends]

        SQL --> T1
        SQL --> T2
        SQL --> T3
        SQL --> T4
        SQL --> T5
    end

    subgraph "LLM Application Layer"
        LLM[LLM Engine]
        PROMPT[Prompt Library<br/>- Query templates<br/>- Analysis patterns<br/>- Viz specifications]
        QUERY[Query Generator]
        VIZ[Visualization Engine<br/>- Charts<br/>- Tables<br/>- Dashboards]

        LLM <--> PROMPT
        LLM --> QUERY
        LLM --> VIZ
    end

    subgraph "Analytics & Learning Layer"
        LOGS[Conversation Logger]
        TRENDS[Trend Analyzer<br/>- Topic extraction<br/>- Weekly aggregation]
        FEEDBACK[Feedback Processor]
    end

    subgraph "User Interface"
        USER[User]
        DEPT[User Department<br/>Context]
    end

    %% Data flow
    META -.->|Schema info| LLM
    CSV1 -.->|Initial load| SQL
    CSV2 -.->|Initial load| SQL
    CSV3 -.->|Initial load| SQL

    USER -->|Ask question| LLM
    DEPT -.->|Context| LLM

    LLM -->|Generate SQL| QUERY
    QUERY -->|Execute query| SQL
    SQL -->|Return data| LLM

    LLM -->|Generate viz| VIZ
    VIZ -->|Display| USER

    %% Logging & feedback loop
    LLM -->|Log interaction| LOGS
    LOGS -->|Store| T4
    USER -->|Provide feedback| FEEDBACK
    FEEDBACK -->|Store| T4

    T4 -.->|Extract topics| TRENDS
    TRENDS -->|Aggregate| T5

    %% Learning loop
    T4 -.->|Historical context| PROMPT
    T5 -.->|Popular topics| PROMPT
    PROMPT -.->|Improved prompts| LLM

    style LLM fill:#4A90E2
    style SQL fill:#50C878
    style USER fill:#FFB347
    style T4 fill:#FF6B6B
    style T5 fill:#FF6B6B

Data Flow

1. Query Flow

sequenceDiagram
    participant U as User
    participant L as LLM
    participant P as Prompt Library
    participant Q as Query Generator
    participant S as SQL Server
    participant V as Viz Engine

    U->>L: Ask: "What's the average GPA by race?"
    L->>P: Retrieve relevant prompt template
    P-->>L: Query pattern + context
    L->>Q: Generate SQL query
    Q->>S: Execute: SELECT Race, AVG(GPA)...
    S-->>Q: Return results
    Q-->>L: Data payload
    L->>V: Generate visualization spec
    V-->>U: Display chart + insights

2. Logging & Feedback Flow

sequenceDiagram
    participant U as User
    participant L as LLM
    participant LOG as Logger
    participant DB as llm_debug_logs
    participant T as Trend Analyzer
    participant WT as weekly_trends

    L->>LOG: Log interaction
    U->>LOG: Submit feedback
    LOG->>DB: INSERT (timestamp, user, logs, dept, feedback)

    Note over T: Runs periodically
    T->>DB: Query logs from past week
    DB-->>T: Return conversations
    T->>T: Extract topics & count
    T->>WT: INSERT/UPDATE (topic, count, week_dates)

    Note over L: Next session
    L->>WT: Query popular topics
    WT-->>L: Trending questions
    L->>L: Adjust prompts & suggestions

3. Learning Loop

graph LR
    A[User Question] --> B[LLM Processing]
    B --> C[Generate Query]
    C --> D[Execute & Return]
    D --> E[Log Interaction]
    E --> F[Store in llm_debug_logs]

    F --> G{User Feedback?}
    G -->|Yes| H[Update log with feedback]
    G -->|No| I[Track as answered]

    H --> J[Weekly Analysis]
    I --> J

    J --> K[Update weekly_trends]
    K --> L[Identify patterns]
    L --> M[Update Prompt Library]
    M --> B

    style B fill:#4A90E2
    style F fill:#FF6B6B
    style K fill:#FF6B6B
    style M fill:#50C878

Component Details

Database Tables

`ar_courses`

Course-level enrollment data
90,411 rows
Used for: enrollment analysis, course performance, student tracking

`ar_financial_aid`

Financial aid information per student
300+ rows
Used for: aid distribution analysis, cost analysis, need assessment

`ar_cohorts`

Cohort-level student outcomes
Used for: retention analysis, graduation tracking, student success metrics

`llm_debug_logs`

- id: INT (auto-increment)
- timestamp: DATETIME2 (conversation time)
- user: VARCHAR(255) (user identifier)
- summarized_logs: NVARCHAR(MAX) (conversation summary)
- user_department: VARCHAR(100) (user's department/role)
- user_feedback: NVARCHAR(1000) (user's rating/comments)

`weekly_trends`

- id: INT (auto-increment)
- question_topic: VARCHAR(255) (topic/category)
- count: INT (number of queries)
- week_start_date: DATE
- week_end_date: DATE
- created_at: DATETIME2
- updated_at: DATETIME2

LLM Application Components

Prompt Library

Pre-built query templates for common questions
Context-aware prompts based on user department
Dynamic prompts that evolve based on weekly_trends
Visualization specifications for different data types

Query Generator

Translates natural language to SQL
Uses metadata schema for accurate column mapping
Validates queries before execution
Handles complex joins and aggregations

Visualization Engine

Generates charts (bar, line, pie, scatter)
Creates data tables with formatting
Builds dashboards for multi-metric views
Supports export to various formats

Feedback Loop Benefits

Improved Query Accuracy: Learn from successful/failed queries
Topic Prioritization: Surface most-asked questions to users
Prompt Optimization: Refine templates based on user feedback
Department-Specific Insights: Tailor responses by user role
Trend Detection: Identify emerging questions and data needs

Example Workflows

Workflow 1: First-Time User

User asks → LLM checks weekly_trends for similar topics
→ Suggests refined question → Generates query
→ Returns viz → Logs interaction

Workflow 2: Repeat Question

User asks common question → LLM recognizes from trends
→ Uses optimized prompt template
→ Faster, better response → Logs positive interaction

Workflow 3: Department-Specific

Finance dept user → LLM loads finance-specific context
→ Prioritizes cost/aid queries
→ Returns relevant metrics → Logs by department

Technology Stack

Database: SQL Server 2019
Container: Docker + Docker Compose
Python: pyodbc for database connectivity
Data Format: CSV for synthetic data
LLM: (To be specified - GPT-4, Claude, etc.)
Visualization: (To be specified - matplotlib, plotly, etc.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System Architecture

Overview

Architecture Diagram

Data Flow

1. Query Flow

2. Logging & Feedback Flow

3. Learning Loop

Component Details

Database Tables

`ar_courses`

`ar_financial_aid`

`ar_cohorts`

`llm_debug_logs`

`weekly_trends`

LLM Application Components

Prompt Library

Query Generator

Visualization Engine

Feedback Loop Benefits

Example Workflows

Workflow 1: First-Time User

Workflow 2: Repeat Question

Workflow 3: Department-Specific

Technology Stack

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

System Architecture

Overview

Architecture Diagram

Data Flow

1. Query Flow

2. Logging & Feedback Flow

3. Learning Loop

Component Details

Database Tables

ar_courses

ar_financial_aid

ar_cohorts

llm_debug_logs

weekly_trends

LLM Application Components

Prompt Library

Query Generator

Visualization Engine

Feedback Loop Benefits

Example Workflows

Workflow 1: First-Time User

Workflow 2: Repeat Question

Workflow 3: Department-Specific

Technology Stack

`ar_courses`

`ar_financial_aid`

`ar_cohorts`

`llm_debug_logs`

`weekly_trends`