vllm-mlx Dashboard

Real-time monitoring dashboard for local LLM inference servers running on Apple Silicon. Tracks both llama.cpp and vllm-mlx backends from a single UI.

Features

Global stats — aggregated online count, throughput, token totals, active/deferred requests, slot utilization, GPU usage
Per-server cards with live metrics:
- llama.cpp — generation & prompt tok/s, active/deferred requests, slot status
- vllm-mlx — uptime, running/waiting requests, completion & prompt tokens, Metal GPU memory (active/peak), KV-cache hit rate & utilization
Sparkline charts per server — tok/s throughput and active requests history
Throughput chart — real-time tok/s computed from token count deltas (not averaged gauges)
GPU chart — utilization % and power draw over time
Auto-refresh via SWR polling (2s servers, 5s GPU)

Monitored Servers

Server	Port	Framework
GPT-OSS-20B	1235	llama.cpp
Qwen3-VL-8B	1236	llama.cpp
Qwen3-30B	1238	llama.cpp
Qwen3-Next-80B-MLX	1239	vllm-mlx
GPT-OSS-20B-MLX	1240	vllm-mlx

Server list is configured in src/lib/server-config.ts.

Tech Stack

Next.js 16 (App Router)
React 19 + TypeScript
Tailwind CSS 4 + shadcn/ui
Recharts for time-series charts
SWR for data fetching

Getting Started

npm install
npm run dev

Dashboard runs at http://localhost:3000.

API Endpoints

Endpoint	Description
`GET /api/servers`	Aggregated status from all configured servers
`GET /api/gpu`	GPU utilization and power metrics

How It Works

The Next.js API routes poll each inference server on every request:

llama.cpp servers: fetches /health, /metrics (Prometheus), and /slots
vllm-mlx servers: fetches /health and /v1/status via Promise.allSettled — gracefully degrades if /v1/status is unavailable

The frontend uses SWR to poll /api/servers every 2 seconds and computes real-time throughput from the delta of cumulative token counters between consecutive polls.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
docs		docs
public		public
src		src
.gitignore		.gitignore
README.md		README.md
components.json		components.json
ecosystem.config.cjs		ecosystem.config.cjs
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vllm-mlx Dashboard

Features

Monitored Servers

Tech Stack

Getting Started

API Endpoints

How It Works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vllm-mlx Dashboard

Features

Monitored Servers

Tech Stack

Getting Started

API Endpoints

How It Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages