docs: add AGENTS.md and update README with CPU/CUDA setup

This commit is contained in:
2026-04-29 08:33:43 +01:00
parent 59d3280cb5
commit b8f59875d9
2 changed files with 158 additions and 10 deletions
+29 -10
View File
@@ -25,8 +25,8 @@ The Next.js app proxies audio generation requests to the FastAPI server, keeping
```bash
# 1. Clone
git clone https://github.com/LyAhn/VibePod.git
cd VibePod
git clone https://github.com/JezzWTF/vibepod.git
cd vibepod
# 2. Install Node dependencies (root + web workspace)
pnpm install
@@ -35,22 +35,39 @@ pnpm install
cp .env.example .env.local
# 4. Start everything
pnpm dev
pnpm dev # CUDA (requires NVIDIA GPU + driver >= 525.60)
pnpm dev:cpu # CPU-only (no GPU required)
```
`pnpm dev` starts both services concurrently:
`pnpm dev` / `pnpm dev:cpu` start both services concurrently:
- **SERVER** — `http://localhost:8000` — on first run `uv sync` creates the Python venv and downloads the ~1 GB VibeVoice model from HuggingFace
- **SERVER** — `http://localhost:8000` — on first run uv creates the Python venv and downloads the ~1 GB VibeVoice model from HuggingFace
- **WEB** — `http://localhost:3000` — Next.js dev server with Turbopack
The frontend shows a loading indicator while the model downloads. Once the server reports `status: online`, generation is available.
## CUDA vs CPU
VibePod maintains two completely separate Python virtual environments so CUDA and CPU torch installs never conflict:
| Mode | Command | venv | torch source |
|------|---------|------|--------------|
| CUDA (default) | `pnpm dev` | `server/.venv` | PyTorch CUDA 12.4 index |
| CPU-only | `pnpm dev:cpu` | `server/.venv-cpu` | PyPI (CPU wheel) |
On first run, each mode creates its own venv automatically. You can switch between them freely — they are fully independent. The active device is reported by the `/health` endpoint as `"device": "cpu"` or `"device": "cuda"`.
> **CUDA requirement:** driver >= 525.60 (RTX 30/40 series all qualify). Run `nvidia-smi` to check.
## Individual commands
```bash
pnpm dev:web # Next.js only
pnpm dev:server # Python server only
pnpm build # Production build of the frontend
pnpm dev # CUDA — server + web
pnpm dev:cpu # CPU — server + web
pnpm dev:server # CUDA — Python server only
pnpm dev:server:cpu # CPU — Python server only
pnpm dev:web # Next.js only (no Python server)
pnpm build # Production build of the frontend
```
## Environment variables
@@ -93,8 +110,8 @@ server/
| Parameter | Range | Default | Effect |
|-----------|-------|---------|--------|
| `speaker` | `carter`, `davis`, `emma`, `frank`, `grace`, `mike` | `carter` | Voice preset used for the generated audio |
| `cfg_scale` | 0.5 - 4.0 | 1.5 | Higher = more expressive guidance |
| `inference_steps` | 5 - 20 | 10 | More steps = higher quality, slower generation |
| `cfg_scale` | 0.5 4.0 | 1.5 | Higher = more expressive guidance |
| `inference_steps` | 5 20 | 10 | More steps = higher quality, slower generation |
## How it works
@@ -115,3 +132,5 @@ cd server && uv add <package>
# Upgrade all dependencies
cd server && uv lock --upgrade
```
> **Note:** The `[tool.uv.sources]` block in `pyproject.toml` pulls torch from the PyTorch CUDA 12.4 index by default. Running with `--cpu` (or `uv sync --no-sources`) bypasses this and installs the standard PyPI CPU wheel instead.