mirror of
https://github.com/JezzWTF/vibepod.git
synced 2026-06-01 15:22:14 +00:00
5b8b3a011d49eebfcd9ba67b598f2f89eac39e65
VibePod
A text-to-speech podcast generator powered by VibeVoice 0.5B. Paste a script, tune a couple of sliders, and get a WAV back.
Architecture
VibePod/
├── web/ Next.js 15 frontend (React 19, Tailwind CSS 4, TypeScript)
└── server/ FastAPI TTS backend (Python 3.10+, VibeVoice, UV)
The Next.js app proxies audio generation requests to the FastAPI server, keeping CORS out of the picture and the Python model off the browser.
Prerequisites
| Tool | Install |
|---|---|
| Node.js 20+ | winget install OpenJS.NodeJS.LTS |
| pnpm | npm i -g pnpm |
| Python 3.10+ | winget install Python.Python.3.13 |
| uv | winget install astral-sh.uv |
Getting started
# 1. Clone
git clone https://github.com/LyAhn/VibePod.git
cd VibePod
# 2. Install Node dependencies (root + web workspace)
pnpm install
# 3. Copy env file and fill in values
cp .env.example .env.local
# 4. Start everything
pnpm dev
pnpm dev starts both services concurrently:
- SERVER —
http://localhost:8000— on first runuv synccreates the Python venv and downloads the ~1 GB VibeVoice model from HuggingFace - WEB —
http://localhost:3000— Next.js dev server with Turbopack
The frontend shows a loading indicator while the model downloads. Once the server reports status: online, generation is available.
Individual commands
pnpm dev:web # Next.js only
pnpm dev:server # Python server only
pnpm build # Production build of the frontend
Environment variables
Copy .env.example to .env.local and set:
| Variable | Default | Description |
|---|---|---|
VIBEVOICE_SERVER_URL |
http://localhost:8000 |
URL the Next.js API routes use to reach the Python server |
HF_TOKEN |
— | HuggingFace token (required if the model repo is gated) |
HF_HOME |
— | Override the HuggingFace model cache directory |
Project structure
web/
├── app/
│ ├── api/generate/ Proxies POST requests to the Python server
│ ├── api/health/ Proxies health checks (status: loading | online | error)
│ ├── page.tsx Main UI — script input, controls, audio player
│ └── layout.tsx
├── components/
│ ├── Header.tsx
│ ├── TextInputPanel.tsx
│ ├── GenerationControls.tsx cfg_scale and inference_steps sliders
│ ├── AudioPlayer.tsx
│ └── StatusLog.tsx
└── hooks/
└── useAudioPlayer.ts
server/
├── vibevoice_server.py FastAPI app — /health and /generate endpoints
├── download_model.py One-shot HuggingFace model prefetch
├── start.sh Entry point: uv sync → model check → uvicorn
└── pyproject.toml Python deps managed by uv
Generation parameters
| Parameter | Range | Default | Effect |
|---|---|---|---|
speaker |
carter, davis, emma, frank, grace, mike |
carter |
Voice preset used for the generated audio |
cfg_scale |
0.5 - 4.0 | 1.5 | Higher = more expressive guidance |
inference_steps |
5 - 20 | 10 | More steps = higher quality, slower generation |
How it works
- The user pastes a script and hits Generate
- The Next.js
/api/generateroute forwards the request to FastAPI on port 8000 - FastAPI runs the text through the VibeVoice streaming processor and inference model
- Audio chunks stream back to the browser as SSE events containing base64 float32 PCM
- The browser plays the chunks live, assembles a WAV Blob, and loads it into the audio player
Python dependencies
Managed by uv. The server/uv.lock is committed so installs are fully reproducible.
# Add a package
cd server && uv add <package>
# Upgrade all dependencies
cd server && uv lock --upgrade
Description
Languages
TypeScript
55.4%
Python
36%
Shell
6.8%
CSS
1.7%