This system provides an LLM-powered analytics platform that queries educational data, generates insights, and learns from user interactions through a feedback loop.
graph TB
subgraph "Data Layer"
META[Metadata<br/>ar_data_dictionary.csv]
CSV1[Synthetic Data<br/>ar_courses.csv]
CSV2[ar_financial_aid.csv]
CSV3[ar_cohorts.csv]
end
subgraph "Database Layer"
SQL[(SQL Server<br/>InsightAlchemists DB)]
T1[ar_courses<br/>90k+ rows]
T2[ar_financial_aid<br/>300+ rows]
T3[ar_cohorts]
T4[llm_debug_logs]
T5[weekly_trends]
SQL --> T1
SQL --> T2
SQL --> T3
SQL --> T4
SQL --> T5
end
subgraph "LLM Application Layer"
LLM[LLM Engine]
PROMPT[Prompt Library<br/>- Query templates<br/>- Analysis patterns<br/>- Viz specifications]
QUERY[Query Generator]
VIZ[Visualization Engine<br/>- Charts<br/>- Tables<br/>- Dashboards]
LLM <--> PROMPT
LLM --> QUERY
LLM --> VIZ
end
subgraph "Analytics & Learning Layer"
LOGS[Conversation Logger]
TRENDS[Trend Analyzer<br/>- Topic extraction<br/>- Weekly aggregation]
FEEDBACK[Feedback Processor]
end
subgraph "User Interface"
USER[User]
DEPT[User Department<br/>Context]
end
%% Data flow
META -.->|Schema info| LLM
CSV1 -.->|Initial load| SQL
CSV2 -.->|Initial load| SQL
CSV3 -.->|Initial load| SQL
USER -->|Ask question| LLM
DEPT -.->|Context| LLM
LLM -->|Generate SQL| QUERY
QUERY -->|Execute query| SQL
SQL -->|Return data| LLM
LLM -->|Generate viz| VIZ
VIZ -->|Display| USER
%% Logging & feedback loop
LLM -->|Log interaction| LOGS
LOGS -->|Store| T4
USER -->|Provide feedback| FEEDBACK
FEEDBACK -->|Store| T4
T4 -.->|Extract topics| TRENDS
TRENDS -->|Aggregate| T5
%% Learning loop
T4 -.->|Historical context| PROMPT
T5 -.->|Popular topics| PROMPT
PROMPT -.->|Improved prompts| LLM
style LLM fill:#4A90E2
style SQL fill:#50C878
style USER fill:#FFB347
style T4 fill:#FF6B6B
style T5 fill:#FF6B6B
sequenceDiagram
participant U as User
participant L as LLM
participant P as Prompt Library
participant Q as Query Generator
participant S as SQL Server
participant V as Viz Engine
U->>L: Ask: "What's the average GPA by race?"
L->>P: Retrieve relevant prompt template
P-->>L: Query pattern + context
L->>Q: Generate SQL query
Q->>S: Execute: SELECT Race, AVG(GPA)...
S-->>Q: Return results
Q-->>L: Data payload
L->>V: Generate visualization spec
V-->>U: Display chart + insights
sequenceDiagram
participant U as User
participant L as LLM
participant LOG as Logger
participant DB as llm_debug_logs
participant T as Trend Analyzer
participant WT as weekly_trends
L->>LOG: Log interaction
U->>LOG: Submit feedback
LOG->>DB: INSERT (timestamp, user, logs, dept, feedback)
Note over T: Runs periodically
T->>DB: Query logs from past week
DB-->>T: Return conversations
T->>T: Extract topics & count
T->>WT: INSERT/UPDATE (topic, count, week_dates)
Note over L: Next session
L->>WT: Query popular topics
WT-->>L: Trending questions
L->>L: Adjust prompts & suggestions
graph LR
A[User Question] --> B[LLM Processing]
B --> C[Generate Query]
C --> D[Execute & Return]
D --> E[Log Interaction]
E --> F[Store in llm_debug_logs]
F --> G{User Feedback?}
G -->|Yes| H[Update log with feedback]
G -->|No| I[Track as answered]
H --> J[Weekly Analysis]
I --> J
J --> K[Update weekly_trends]
K --> L[Identify patterns]
L --> M[Update Prompt Library]
M --> B
style B fill:#4A90E2
style F fill:#FF6B6B
style K fill:#FF6B6B
style M fill:#50C878
- Course-level enrollment data
- 90,411 rows
- Used for: enrollment analysis, course performance, student tracking
- Financial aid information per student
- 300+ rows
- Used for: aid distribution analysis, cost analysis, need assessment
- Cohort-level student outcomes
- Used for: retention analysis, graduation tracking, student success metrics
- id: INT (auto-increment)
- timestamp: DATETIME2 (conversation time)
- user: VARCHAR(255) (user identifier)
- summarized_logs: NVARCHAR(MAX) (conversation summary)
- user_department: VARCHAR(100) (user's department/role)
- user_feedback: NVARCHAR(1000) (user's rating/comments)- id: INT (auto-increment)
- question_topic: VARCHAR(255) (topic/category)
- count: INT (number of queries)
- week_start_date: DATE
- week_end_date: DATE
- created_at: DATETIME2
- updated_at: DATETIME2- Pre-built query templates for common questions
- Context-aware prompts based on user department
- Dynamic prompts that evolve based on weekly_trends
- Visualization specifications for different data types
- Translates natural language to SQL
- Uses metadata schema for accurate column mapping
- Validates queries before execution
- Handles complex joins and aggregations
- Generates charts (bar, line, pie, scatter)
- Creates data tables with formatting
- Builds dashboards for multi-metric views
- Supports export to various formats
- Improved Query Accuracy: Learn from successful/failed queries
- Topic Prioritization: Surface most-asked questions to users
- Prompt Optimization: Refine templates based on user feedback
- Department-Specific Insights: Tailor responses by user role
- Trend Detection: Identify emerging questions and data needs
User asks → LLM checks weekly_trends for similar topics
→ Suggests refined question → Generates query
→ Returns viz → Logs interaction
User asks common question → LLM recognizes from trends
→ Uses optimized prompt template
→ Faster, better response → Logs positive interaction
Finance dept user → LLM loads finance-specific context
→ Prioritizes cost/aid queries
→ Returns relevant metrics → Logs by department
- Database: SQL Server 2019
- Container: Docker + Docker Compose
- Python: pyodbc for database connectivity
- Data Format: CSV for synthetic data
- LLM: (To be specified - GPT-4, Claude, etc.)
- Visualization: (To be specified - matplotlib, plotly, etc.)