From b8f59875d9c6a6aaa2e039002a227cb75f346711 Mon Sep 17 00:00:00 2001 From: LyAhn Date: Wed, 29 Apr 2026 08:33:43 +0100 Subject: [PATCH] docs: add AGENTS.md and update README with CPU/CUDA setup --- AGENTS.md | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 39 ++++++++++++----- 2 files changed, 158 insertions(+), 10 deletions(-) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..ed18570 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,129 @@ +# VibePod — Agent Guide + +This file gives AI coding agents (Jules, Copilot, Claude Code, etc.) the context needed to work effectively on this repo without breaking the two-service setup. + +--- + +## Project overview + +VibePod is a text-to-speech web app. It has two services that must both run for the app to work: + +| Service | Language | Entry point | Port | +|---------|----------|-------------|------| +| **server** | Python 3.10+ (FastAPI + VibeVoice) | `server/start.sh` | 8000 | +| **web** | TypeScript (Next.js 15, React 19) | `pnpm --filter vibepod-web dev` | 3000 | + +The Next.js frontend proxies all model requests through its own API routes to the FastAPI server — it never calls the Python server directly from the browser. + +--- + +## Environment (Jules sandbox) + +- **No GPU** — always use CPU mode (`pnpm dev:cpu` / `start.sh --cpu`) +- Python venv lives at `server/.venv-cpu` — do **not** use `server/.venv` +- The VibeVoice model (~1 GB) is pre-downloaded to `~/.cache/huggingface` during setup +- Voice presets live at `server/voices/streaming_model/` +- `server/uv.lock` is committed and must not be modified — if `uv sync` rewrites it, run `git checkout server/uv.lock` + +--- + +## Running the app + +```bash +# Full stack — CPU (correct for Jules) +pnpm dev:cpu + +# Full stack — CUDA (local dev with GPU) +pnpm dev + +# Individual services +pnpm dev:server:cpu # Python server, CPU only +pnpm dev:server # Python server, CUDA +pnpm dev:web # Next.js only + +# Production build +pnpm build +``` + +--- + +## Device selection + +The `--cpu` flag in `start.sh` sets `VIBEPOD_DEVICE=cpu` and uses a separate venv (`server/.venv-cpu`) so CUDA and CPU installs never conflict. `vibevoice_server.py` reads `VIBEPOD_DEVICE` at startup via `_resolve_device()` — do not remove or rename that function. + +| Env var | Values | Set by | +|---------|--------|--------| +| `VIBEPOD_DEVICE` | `cpu` \| `cuda` | `server/start.sh` | +| `UV_PROJECT_ENVIRONMENT` | `.venv-cpu` \| `.venv` | `server/start.sh` | +| `HF_TOKEN` | HuggingFace token | Jules secret / `.env.local` | +| `VIBEVOICE_SERVER_URL` | `http://localhost:8000` | `.env.local` | + +--- + +## Python environment rules + +- Python deps are managed by [uv](https://docs.astral.sh/uv/) — **never use pip directly** +- Always `cd server` before running uv commands +- Add a package: `uv add ` +- Remove a package: `uv remove ` +- Upgrade deps: `uv lock --upgrade` +- The `[tool.uv.sources]` block in `pyproject.toml` points torch at the CUDA 12.4 index — `--no-sources` bypasses this for CPU installs + +--- + +## Key files + +``` +server/ +├── vibevoice_server.py FastAPI app — /health and /generate (SSE) endpoints +├── download_model.py Standalone model prefetch script +├── start.sh Startup: parses --cpu flag, syncs venv, launches uvicorn +└── pyproject.toml Python deps (torch CUDA index configured here) + +web/ +├── app/api/generate/ Proxies POST → Python server, streams SSE to browser +├── app/api/health/ Proxies GET /health from Python server +└── app/page.tsx Main UI + +package.json Root — defines all pnpm dev:* scripts +dev.sh Concurrent launcher (forwards flags to start.sh) +``` + +--- + +## API reference + +### `GET /health` +Returns server status. Safe to poll. +```json +{ + "status": "online", + "model": "microsoft/VibeVoice-Realtime-0.5B", + "device": "cpu", + "voices": ["carter", "davis", "emma", "frank", "grace", "mike"] +} +``` +`status` values: `downloading` | `loading` | `online` | `error` + +### `POST /generate` +Streams audio as SSE events. +```json +{ "text": "Hello world", "speaker": "carter", "cfg_scale": 1.5, "inference_steps": 10 } +``` +Event types: `audio_chunk` (base64 float32 PCM) | `complete` | `error` | `cancelled` + +--- + +## Do / Don't + +**Do:** +- Use `pnpm dev:cpu` in Jules — never plain `pnpm dev` +- Run `git checkout server/uv.lock` if uv rewrites it during setup +- Keep `_resolve_device()` in `vibevoice_server.py` — it's the CPU/CUDA switching logic +- Test server changes against `GET /health` and `POST /generate` + +**Don't:** +- Run `uv sync` without `UV_PROJECT_ENVIRONMENT=.venv-cpu` in the Jules sandbox +- Install Python packages with pip +- Modify `server/uv.lock` manually +- Remove the `[tool.uv.sources]` torch entry from `pyproject.toml` — it's needed for CUDA installs diff --git a/README.md b/README.md index f60ef17..f8202f5 100644 --- a/README.md +++ b/README.md @@ -25,8 +25,8 @@ The Next.js app proxies audio generation requests to the FastAPI server, keeping ```bash # 1. Clone -git clone https://github.com/LyAhn/VibePod.git -cd VibePod +git clone https://github.com/JezzWTF/vibepod.git +cd vibepod # 2. Install Node dependencies (root + web workspace) pnpm install @@ -35,22 +35,39 @@ pnpm install cp .env.example .env.local # 4. Start everything -pnpm dev +pnpm dev # CUDA (requires NVIDIA GPU + driver >= 525.60) +pnpm dev:cpu # CPU-only (no GPU required) ``` -`pnpm dev` starts both services concurrently: +`pnpm dev` / `pnpm dev:cpu` start both services concurrently: -- **SERVER** — `http://localhost:8000` — on first run `uv sync` creates the Python venv and downloads the ~1 GB VibeVoice model from HuggingFace +- **SERVER** — `http://localhost:8000` — on first run uv creates the Python venv and downloads the ~1 GB VibeVoice model from HuggingFace - **WEB** — `http://localhost:3000` — Next.js dev server with Turbopack The frontend shows a loading indicator while the model downloads. Once the server reports `status: online`, generation is available. +## CUDA vs CPU + +VibePod maintains two completely separate Python virtual environments so CUDA and CPU torch installs never conflict: + +| Mode | Command | venv | torch source | +|------|---------|------|--------------| +| CUDA (default) | `pnpm dev` | `server/.venv` | PyTorch CUDA 12.4 index | +| CPU-only | `pnpm dev:cpu` | `server/.venv-cpu` | PyPI (CPU wheel) | + +On first run, each mode creates its own venv automatically. You can switch between them freely — they are fully independent. The active device is reported by the `/health` endpoint as `"device": "cpu"` or `"device": "cuda"`. + +> **CUDA requirement:** driver >= 525.60 (RTX 30/40 series all qualify). Run `nvidia-smi` to check. + ## Individual commands ```bash -pnpm dev:web # Next.js only -pnpm dev:server # Python server only -pnpm build # Production build of the frontend +pnpm dev # CUDA — server + web +pnpm dev:cpu # CPU — server + web +pnpm dev:server # CUDA — Python server only +pnpm dev:server:cpu # CPU — Python server only +pnpm dev:web # Next.js only (no Python server) +pnpm build # Production build of the frontend ``` ## Environment variables @@ -93,8 +110,8 @@ server/ | Parameter | Range | Default | Effect | |-----------|-------|---------|--------| | `speaker` | `carter`, `davis`, `emma`, `frank`, `grace`, `mike` | `carter` | Voice preset used for the generated audio | -| `cfg_scale` | 0.5 - 4.0 | 1.5 | Higher = more expressive guidance | -| `inference_steps` | 5 - 20 | 10 | More steps = higher quality, slower generation | +| `cfg_scale` | 0.5 – 4.0 | 1.5 | Higher = more expressive guidance | +| `inference_steps` | 5 – 20 | 10 | More steps = higher quality, slower generation | ## How it works @@ -115,3 +132,5 @@ cd server && uv add # Upgrade all dependencies cd server && uv lock --upgrade ``` + +> **Note:** The `[tool.uv.sources]` block in `pyproject.toml` pulls torch from the PyTorch CUDA 12.4 index by default. Running with `--cpu` (or `uv sync --no-sources`) bypasses this and installs the standard PyPI CPU wheel instead.