mirror of
https://github.com/JezzWTF/vibepod.git
synced 2026-06-01 15:22:14 +00:00
docs: add AGENTS.md and update README with CPU/CUDA setup
This commit is contained in:
@@ -0,0 +1,129 @@
|
|||||||
|
# VibePod — Agent Guide
|
||||||
|
|
||||||
|
This file gives AI coding agents (Jules, Copilot, Claude Code, etc.) the context needed to work effectively on this repo without breaking the two-service setup.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Project overview
|
||||||
|
|
||||||
|
VibePod is a text-to-speech web app. It has two services that must both run for the app to work:
|
||||||
|
|
||||||
|
| Service | Language | Entry point | Port |
|
||||||
|
|---------|----------|-------------|------|
|
||||||
|
| **server** | Python 3.10+ (FastAPI + VibeVoice) | `server/start.sh` | 8000 |
|
||||||
|
| **web** | TypeScript (Next.js 15, React 19) | `pnpm --filter vibepod-web dev` | 3000 |
|
||||||
|
|
||||||
|
The Next.js frontend proxies all model requests through its own API routes to the FastAPI server — it never calls the Python server directly from the browser.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Environment (Jules sandbox)
|
||||||
|
|
||||||
|
- **No GPU** — always use CPU mode (`pnpm dev:cpu` / `start.sh --cpu`)
|
||||||
|
- Python venv lives at `server/.venv-cpu` — do **not** use `server/.venv`
|
||||||
|
- The VibeVoice model (~1 GB) is pre-downloaded to `~/.cache/huggingface` during setup
|
||||||
|
- Voice presets live at `server/voices/streaming_model/`
|
||||||
|
- `server/uv.lock` is committed and must not be modified — if `uv sync` rewrites it, run `git checkout server/uv.lock`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Running the app
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Full stack — CPU (correct for Jules)
|
||||||
|
pnpm dev:cpu
|
||||||
|
|
||||||
|
# Full stack — CUDA (local dev with GPU)
|
||||||
|
pnpm dev
|
||||||
|
|
||||||
|
# Individual services
|
||||||
|
pnpm dev:server:cpu # Python server, CPU only
|
||||||
|
pnpm dev:server # Python server, CUDA
|
||||||
|
pnpm dev:web # Next.js only
|
||||||
|
|
||||||
|
# Production build
|
||||||
|
pnpm build
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Device selection
|
||||||
|
|
||||||
|
The `--cpu` flag in `start.sh` sets `VIBEPOD_DEVICE=cpu` and uses a separate venv (`server/.venv-cpu`) so CUDA and CPU installs never conflict. `vibevoice_server.py` reads `VIBEPOD_DEVICE` at startup via `_resolve_device()` — do not remove or rename that function.
|
||||||
|
|
||||||
|
| Env var | Values | Set by |
|
||||||
|
|---------|--------|--------|
|
||||||
|
| `VIBEPOD_DEVICE` | `cpu` \| `cuda` | `server/start.sh` |
|
||||||
|
| `UV_PROJECT_ENVIRONMENT` | `.venv-cpu` \| `.venv` | `server/start.sh` |
|
||||||
|
| `HF_TOKEN` | HuggingFace token | Jules secret / `.env.local` |
|
||||||
|
| `VIBEVOICE_SERVER_URL` | `http://localhost:8000` | `.env.local` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Python environment rules
|
||||||
|
|
||||||
|
- Python deps are managed by [uv](https://docs.astral.sh/uv/) — **never use pip directly**
|
||||||
|
- Always `cd server` before running uv commands
|
||||||
|
- Add a package: `uv add <package>`
|
||||||
|
- Remove a package: `uv remove <package>`
|
||||||
|
- Upgrade deps: `uv lock --upgrade`
|
||||||
|
- The `[tool.uv.sources]` block in `pyproject.toml` points torch at the CUDA 12.4 index — `--no-sources` bypasses this for CPU installs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key files
|
||||||
|
|
||||||
|
```
|
||||||
|
server/
|
||||||
|
├── vibevoice_server.py FastAPI app — /health and /generate (SSE) endpoints
|
||||||
|
├── download_model.py Standalone model prefetch script
|
||||||
|
├── start.sh Startup: parses --cpu flag, syncs venv, launches uvicorn
|
||||||
|
└── pyproject.toml Python deps (torch CUDA index configured here)
|
||||||
|
|
||||||
|
web/
|
||||||
|
├── app/api/generate/ Proxies POST → Python server, streams SSE to browser
|
||||||
|
├── app/api/health/ Proxies GET /health from Python server
|
||||||
|
└── app/page.tsx Main UI
|
||||||
|
|
||||||
|
package.json Root — defines all pnpm dev:* scripts
|
||||||
|
dev.sh Concurrent launcher (forwards flags to start.sh)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## API reference
|
||||||
|
|
||||||
|
### `GET /health`
|
||||||
|
Returns server status. Safe to poll.
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"status": "online",
|
||||||
|
"model": "microsoft/VibeVoice-Realtime-0.5B",
|
||||||
|
"device": "cpu",
|
||||||
|
"voices": ["carter", "davis", "emma", "frank", "grace", "mike"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
`status` values: `downloading` | `loading` | `online` | `error`
|
||||||
|
|
||||||
|
### `POST /generate`
|
||||||
|
Streams audio as SSE events.
|
||||||
|
```json
|
||||||
|
{ "text": "Hello world", "speaker": "carter", "cfg_scale": 1.5, "inference_steps": 10 }
|
||||||
|
```
|
||||||
|
Event types: `audio_chunk` (base64 float32 PCM) | `complete` | `error` | `cancelled`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Do / Don't
|
||||||
|
|
||||||
|
**Do:**
|
||||||
|
- Use `pnpm dev:cpu` in Jules — never plain `pnpm dev`
|
||||||
|
- Run `git checkout server/uv.lock` if uv rewrites it during setup
|
||||||
|
- Keep `_resolve_device()` in `vibevoice_server.py` — it's the CPU/CUDA switching logic
|
||||||
|
- Test server changes against `GET /health` and `POST /generate`
|
||||||
|
|
||||||
|
**Don't:**
|
||||||
|
- Run `uv sync` without `UV_PROJECT_ENVIRONMENT=.venv-cpu` in the Jules sandbox
|
||||||
|
- Install Python packages with pip
|
||||||
|
- Modify `server/uv.lock` manually
|
||||||
|
- Remove the `[tool.uv.sources]` torch entry from `pyproject.toml` — it's needed for CUDA installs
|
||||||
@@ -25,8 +25,8 @@ The Next.js app proxies audio generation requests to the FastAPI server, keeping
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# 1. Clone
|
# 1. Clone
|
||||||
git clone https://github.com/LyAhn/VibePod.git
|
git clone https://github.com/JezzWTF/vibepod.git
|
||||||
cd VibePod
|
cd vibepod
|
||||||
|
|
||||||
# 2. Install Node dependencies (root + web workspace)
|
# 2. Install Node dependencies (root + web workspace)
|
||||||
pnpm install
|
pnpm install
|
||||||
@@ -35,22 +35,39 @@ pnpm install
|
|||||||
cp .env.example .env.local
|
cp .env.example .env.local
|
||||||
|
|
||||||
# 4. Start everything
|
# 4. Start everything
|
||||||
pnpm dev
|
pnpm dev # CUDA (requires NVIDIA GPU + driver >= 525.60)
|
||||||
|
pnpm dev:cpu # CPU-only (no GPU required)
|
||||||
```
|
```
|
||||||
|
|
||||||
`pnpm dev` starts both services concurrently:
|
`pnpm dev` / `pnpm dev:cpu` start both services concurrently:
|
||||||
|
|
||||||
- **SERVER** — `http://localhost:8000` — on first run `uv sync` creates the Python venv and downloads the ~1 GB VibeVoice model from HuggingFace
|
- **SERVER** — `http://localhost:8000` — on first run uv creates the Python venv and downloads the ~1 GB VibeVoice model from HuggingFace
|
||||||
- **WEB** — `http://localhost:3000` — Next.js dev server with Turbopack
|
- **WEB** — `http://localhost:3000` — Next.js dev server with Turbopack
|
||||||
|
|
||||||
The frontend shows a loading indicator while the model downloads. Once the server reports `status: online`, generation is available.
|
The frontend shows a loading indicator while the model downloads. Once the server reports `status: online`, generation is available.
|
||||||
|
|
||||||
|
## CUDA vs CPU
|
||||||
|
|
||||||
|
VibePod maintains two completely separate Python virtual environments so CUDA and CPU torch installs never conflict:
|
||||||
|
|
||||||
|
| Mode | Command | venv | torch source |
|
||||||
|
|------|---------|------|--------------|
|
||||||
|
| CUDA (default) | `pnpm dev` | `server/.venv` | PyTorch CUDA 12.4 index |
|
||||||
|
| CPU-only | `pnpm dev:cpu` | `server/.venv-cpu` | PyPI (CPU wheel) |
|
||||||
|
|
||||||
|
On first run, each mode creates its own venv automatically. You can switch between them freely — they are fully independent. The active device is reported by the `/health` endpoint as `"device": "cpu"` or `"device": "cuda"`.
|
||||||
|
|
||||||
|
> **CUDA requirement:** driver >= 525.60 (RTX 30/40 series all qualify). Run `nvidia-smi` to check.
|
||||||
|
|
||||||
## Individual commands
|
## Individual commands
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pnpm dev:web # Next.js only
|
pnpm dev # CUDA — server + web
|
||||||
pnpm dev:server # Python server only
|
pnpm dev:cpu # CPU — server + web
|
||||||
pnpm build # Production build of the frontend
|
pnpm dev:server # CUDA — Python server only
|
||||||
|
pnpm dev:server:cpu # CPU — Python server only
|
||||||
|
pnpm dev:web # Next.js only (no Python server)
|
||||||
|
pnpm build # Production build of the frontend
|
||||||
```
|
```
|
||||||
|
|
||||||
## Environment variables
|
## Environment variables
|
||||||
@@ -93,8 +110,8 @@ server/
|
|||||||
| Parameter | Range | Default | Effect |
|
| Parameter | Range | Default | Effect |
|
||||||
|-----------|-------|---------|--------|
|
|-----------|-------|---------|--------|
|
||||||
| `speaker` | `carter`, `davis`, `emma`, `frank`, `grace`, `mike` | `carter` | Voice preset used for the generated audio |
|
| `speaker` | `carter`, `davis`, `emma`, `frank`, `grace`, `mike` | `carter` | Voice preset used for the generated audio |
|
||||||
| `cfg_scale` | 0.5 - 4.0 | 1.5 | Higher = more expressive guidance |
|
| `cfg_scale` | 0.5 – 4.0 | 1.5 | Higher = more expressive guidance |
|
||||||
| `inference_steps` | 5 - 20 | 10 | More steps = higher quality, slower generation |
|
| `inference_steps` | 5 – 20 | 10 | More steps = higher quality, slower generation |
|
||||||
|
|
||||||
## How it works
|
## How it works
|
||||||
|
|
||||||
@@ -115,3 +132,5 @@ cd server && uv add <package>
|
|||||||
# Upgrade all dependencies
|
# Upgrade all dependencies
|
||||||
cd server && uv lock --upgrade
|
cd server && uv lock --upgrade
|
||||||
```
|
```
|
||||||
|
|
||||||
|
> **Note:** The `[tool.uv.sources]` block in `pyproject.toml` pulls torch from the PyTorch CUDA 12.4 index by default. Running with `--cpu` (or `uv sync --no-sources`) bypasses this and installs the standard PyPI CPU wheel instead.
|
||||||
|
|||||||
Reference in New Issue
Block a user