4.6 KiB
VibePod — Agent Guide
This file gives AI coding agents (Jules, Copilot, Claude Code, etc.) the context needed to work effectively on this repo without breaking the two-service setup.
Project overview
VibePod is a text-to-speech web app. It has two services that must both run for the app to work:
| Service | Language | Entry point | Port |
|---|---|---|---|
| server | Python 3.10+ (FastAPI + VibeVoice) | server/start.sh |
8000 |
| web | TypeScript (Next.js 15, React 19) | pnpm --filter vibepod-web dev |
3000 |
The Next.js frontend proxies all model requests through its own API routes to the FastAPI server — it never calls the Python server directly from the browser.
Environment (Jules sandbox)
- No GPU — always use CPU mode (
pnpm dev:cpu/start.sh --cpu) - Python venv lives at
server/.venv-cpu— do not useserver/.venv - The VibeVoice model (~1 GB) is pre-downloaded to
~/.cache/huggingfaceduring setup - Voice presets live at
server/voices/streaming_model/ server/uv.lockis committed and must not be modified — ifuv syncrewrites it, rungit checkout server/uv.lock
Running the app
# Full stack — CPU (correct for Jules)
pnpm dev:cpu
# Full stack — CUDA (local dev with GPU)
pnpm dev
# Individual services
pnpm dev:server:cpu # Python server, CPU only
pnpm dev:server # Python server, CUDA
pnpm dev:web # Next.js only
# Production build
pnpm build
Device selection
The --cpu flag in start.sh sets VIBEPOD_DEVICE=cpu and uses a separate venv (server/.venv-cpu) so CUDA and CPU installs never conflict. vibevoice_server.py reads VIBEPOD_DEVICE at startup via _resolve_device() — do not remove or rename that function.
| Env var | Values | Set by |
|---|---|---|
VIBEPOD_DEVICE |
cpu | cuda |
server/start.sh |
UV_PROJECT_ENVIRONMENT |
.venv-cpu | .venv |
server/start.sh |
HF_TOKEN |
HuggingFace token | Jules secret / .env.local |
VIBEVOICE_SERVER_URL |
http://localhost:8000 |
.env.local |
Python environment rules
- Python deps are managed by uv — never use pip directly
- Always
cd serverbefore running uv commands - Add a package:
uv add <package> - Remove a package:
uv remove <package> - Upgrade deps:
uv lock --upgrade - The
[tool.uv.sources]block inpyproject.tomlpoints torch at the CUDA 12.4 index —--no-sourcesbypasses this for CPU installs
Key files
server/
├── vibevoice_server.py FastAPI app — /health and /generate (SSE) endpoints
├── download_model.py Standalone model prefetch script
├── start.sh Startup: parses --cpu flag, syncs venv, launches uvicorn
└── pyproject.toml Python deps (torch CUDA index configured here)
web/
├── app/api/generate/ Proxies POST → Python server, streams SSE to browser
├── app/api/health/ Proxies GET /health from Python server
└── app/page.tsx Main UI
package.json Root — defines all pnpm dev:* scripts
dev.sh Concurrent launcher (forwards flags to start.sh)
API reference
GET /health
Returns server status. Safe to poll.
{
"status": "online",
"model": "microsoft/VibeVoice-Realtime-0.5B",
"device": "cpu",
"voices": ["carter", "davis", "emma", "frank", "grace", "mike"]
}
status values: downloading | loading | online | error
POST /generate
Streams audio as SSE events.
{ "text": "Hello world", "speaker": "carter", "cfg_scale": 1.5, "inference_steps": 10 }
Event types: audio_chunk (base64 float32 PCM) | complete | error | cancelled
Do / Don't
Do:
- Use
pnpm dev:cpuin Jules — never plainpnpm dev - Run
git checkout server/uv.lockif uv rewrites it during setup - Keep
_resolve_device()invibevoice_server.py— it's the CPU/CUDA switching logic - Test server changes against
GET /healthandPOST /generate
Don't:
- Run
uv syncwithoutUV_PROJECT_ENVIRONMENT=.venv-cpuin the Jules sandbox - Install Python packages with pip
- Modify
server/uv.lockmanually - Remove the
[tool.uv.sources]torch entry frompyproject.toml— it's needed for CUDA installs