mirror of https://github.com/JezzWTF/vibepod.git synced 2026-07-31 13:07:06 +00:00

T

Claude 5c5d739bf1 Fix ROCm torch wheel not replacing CPU torch

uv pip install without --reinstall-package silently skips the ROCm wheel
when CPU torch already satisfies torch>=2.0.0, leaving a CPU installation
in .venv-rocm and causing a broken import at startup.

https://claude.ai/code/session_0168pSswiaoEf6LEx6UQWfBu

2026-05-04 09:33:12 +00:00

server

Fix ROCm torch wheel not replacing CPU torch

2026-05-04 09:33:12 +00:00

web

Improve code documentation and maintainer notes

2026-05-02 16:44:38 +00:00

.editorconfig

chore: add prettier + enforce LF line endings

2026-05-01 18:36:04 +01:00

.env.example

perf(cpu): tune streaming playback

2026-04-30 23:20:46 +01:00

.gitattributes

chore: add prettier + enforce LF line endings

2026-05-01 18:36:04 +01:00

.gitignore

docs: update roadmap and ignore Claude settings

2026-04-30 23:32:20 +01:00

.prettierignore

chore: add prettier + enforce LF line endings

2026-05-01 18:36:04 +01:00

.prettierrc

chore: add prettier + enforce LF line endings

2026-05-01 18:36:04 +01:00

AGENTS.md

style: apply prettier formatting across all source files

2026-05-01 18:36:42 +01:00

DESIGN.md

style: apply prettier formatting across all source files

2026-05-01 18:36:42 +01:00

dev.sh

feat: pass --cpu flag through dev.sh to server/start.sh

2026-04-28 14:17:59 +01:00

package.json

Add AMD ROCm GPU support

2026-05-04 01:54:57 +00:00

pnpm-lock.yaml

chore: add prettier + enforce LF line endings

2026-05-01 18:36:04 +01:00

pnpm-workspace.yaml

style: apply prettier formatting across all source files

2026-05-01 18:36:42 +01:00

README.md

Add AMD ROCm GPU support

2026-05-04 01:54:57 +00:00

roadmap.md

docs: update roadmap and ignore Claude settings

2026-04-30 23:32:20 +01:00

README.md

VibePod

A text-to-speech podcast generator powered by VibeVoice 0.5B. Paste a script, tune a couple of sliders, and get a WAV back.

Architecture

VibePod/
├── web/        Next.js 15 frontend (React 19, Tailwind CSS 4, TypeScript)
└── server/     FastAPI TTS backend  (Python 3.10+, VibeVoice, UV)

The Next.js app proxies audio generation requests to the FastAPI server, keeping CORS out of the picture and the Python model off the browser.

Prerequisites

Tool	Install
Node.js 20+	`winget install OpenJS.NodeJS.LTS`
pnpm	`npm i -g pnpm`
Python 3.10+	`winget install Python.Python.3.13`
uv	`winget install astral-sh.uv`

Getting started

# 1. Clone
git clone https://github.com/JezzWTF/vibepod.git
cd vibepod

# 2. Install Node dependencies (root + web workspace)
pnpm install

# 3. Copy env file and fill in values
cp .env.example .env.local

# 4. Start everything
pnpm dev          # CUDA (requires NVIDIA GPU + driver >= 525.60)
pnpm dev:cpu      # CPU-only (no GPU required)
pnpm dev:rocm     # ROCm (requires AMD GPU + ROCm 6.2+, Linux only)

pnpm dev / pnpm dev:cpu start both services concurrently:

SERVER — http://localhost:8000 — on first run uv creates the Python venv and downloads the ~1 GB VibeVoice model from HuggingFace
WEB — http://localhost:3000 — Next.js dev server with Turbopack

The frontend shows a loading indicator while the model downloads. Once the server reports status: online, generation is available.

CUDA vs CPU vs ROCm

VibePod maintains three completely separate Python virtual environments so torch installs never conflict:

Mode	Command	venv	torch source
CUDA (default)	`pnpm dev`	`server/.venv`	PyTorch CUDA 12.4 index
CPU-only	`pnpm dev:cpu`	`server/.venv-cpu`	PyPI (CPU wheel)
ROCm (AMD GPU)	`pnpm dev:rocm`	`server/.venv-rocm`	PyTorch ROCm 6.2 index

On first run, each mode creates its own venv automatically. You can switch between them freely — they are fully independent. The active device is reported by the /health endpoint as "device": "cpu", "device": "cuda", or "device": "rocm".

CUDA requirement: driver >= 525.60 (RTX 30/40 series all qualify). Run nvidia-smi to check.

ROCm requirement: ROCm 6.2+ installed on Linux. Supported GPUs: AMD RX 6000 series (RDNA2) or newer, RX 7000 series (RDNA3), and Instinct accelerators. ROCm is not supported on Windows. Flash attention is not available on ROCm — SDPA is used instead.

Individual commands

pnpm dev              # CUDA — server + web
pnpm dev:cpu          # CPU  — server + web
pnpm dev:rocm         # ROCm — server + web
pnpm dev:server       # CUDA — Python server only
pnpm dev:server:cpu   # CPU  — Python server only
pnpm dev:server:rocm  # ROCm — Python server only
pnpm dev:web          # Next.js only (no Python server)
pnpm build            # Production build of the frontend

Environment variables

Copy .env.example to .env.local and set:

Variable	Default	Description
`VIBEVOICE_SERVER_URL`	`http://localhost:8000`	URL the Next.js API routes use to reach the Python server
`HF_TOKEN`	—	HuggingFace token (required if the model repo is gated)
`HF_HOME`	—	Override the HuggingFace model cache directory

Project structure

web/
├── app/
│   ├── api/generate/   Proxies POST requests to the Python server
│   ├── api/health/     Proxies health checks (status: loading | online | error)
│   ├── page.tsx        Main UI — script input, controls, audio player
│   └── layout.tsx
├── components/
│   ├── Header.tsx
│   ├── TextInputPanel.tsx
│   ├── GenerationControls.tsx   cfg_scale and inference_steps sliders
│   ├── AudioPlayer.tsx
│   └── StatusLog.tsx
└── hooks/
    └── useAudioPlayer.ts

server/
├── vibevoice_server.py   FastAPI app — /health and /generate endpoints
├── download_model.py     One-shot HuggingFace model prefetch
├── start.sh              Entry point: uv sync → model check → uvicorn
└── pyproject.toml        Python deps managed by uv

Generation parameters

Parameter	Range	Default	Effect
`speaker`	`carter`, `davis`, `emma`, `frank`, `grace`, `mike`	`carter`	Voice preset used for the generated audio
`cfg_scale`	0.5 – 4.0	1.5	Higher = more expressive guidance
`inference_steps`	5 – 20	10	More steps = higher quality, slower generation

How it works

The user pastes a script and hits Generate
The Next.js /api/generate route forwards the request to FastAPI on port 8000
FastAPI runs the text through the VibeVoice streaming processor and inference model
Audio chunks stream back to the browser as SSE events containing base64 float32 PCM
The browser plays the chunks live, assembles a WAV Blob, and loads it into the audio player

Python dependencies

Managed by uv. The server/uv.lock is committed so installs are fully reproducible.

# Add a package
cd server && uv add <package>

# Upgrade all dependencies
cd server && uv lock --upgrade

Note: The [tool.uv.sources] block in pyproject.toml pulls torch from the PyTorch CUDA 12.4 index by default. Running with --cpu or --rocm (or uv sync --no-sources) bypasses this and installs the standard PyPI CPU wheel first; for ROCm, the torch wheel is then replaced with the PyTorch ROCm 6.2 build.

README.md Unescape Escape

VibePod

Architecture

Prerequisites

Getting started

CUDA vs CPU vs ROCm

Individual commands

Environment variables

Project structure

Generation parameters

How it works

Python dependencies

README.md