mirror of
https://github.com/JezzWTF/vibepod.git
synced 2026-06-13 03:58:07 +00:00
chore(phase-0): stabilise foundation for Studio build
- Extract WAV assembly (buildWav, mergeFloat32Arrays, decodeFloat32Chunk, SAMPLE_RATE) into web/lib/audio/wav.ts so it can be reused by the Studio playback engine and library waveform previews - Add server/waveform.py with compute_peaks() / write_peaks() — reads any WAV, mixes to mono, returns min/max peak arrays matching the WaveformPeaks TypeScript type - Add server/ids.py with prefixed URL-safe ID helpers (gen_id, proj_id, asset_id, etc.) using stdlib secrets — no new dependency - Add docs/studio-build-plan.md — full execution spec covering stack decisions, data models, API contract, component hierarchy, phase breakdown and acceptance criteria - Ignore data/ directory (generated audio, waveforms, SQLite DB)
This commit is contained in:
@@ -20,8 +20,12 @@ web/tsconfig.tsbuildinfo
|
|||||||
web/next-env.d.ts
|
web/next-env.d.ts
|
||||||
web/node_modules/
|
web/node_modules/
|
||||||
|
|
||||||
|
# Generated data (audio files, waveforms, SQLite DB)
|
||||||
|
data/
|
||||||
|
|
||||||
# OS
|
# OS
|
||||||
.DS_Store
|
.DS_Store
|
||||||
Thumbs.db
|
Thumbs.db
|
||||||
.vscode/settings.json
|
.vscode/settings.json
|
||||||
.claude/settings.local.json
|
.claude/settings.local.json
|
||||||
|
*.txt
|
||||||
|
|||||||
@@ -0,0 +1,925 @@
|
|||||||
|
# VibePod Studio — Build Plan
|
||||||
|
|
||||||
|
**Version:** 1.0
|
||||||
|
**Authors:** LyAhn + Claude (Anthropic) + Codex
|
||||||
|
**Date:** 2026-05-02
|
||||||
|
**Status:** Active
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
1. [Product Goal](#1-product-goal)
|
||||||
|
2. [Current Architecture](#2-current-architecture)
|
||||||
|
3. [Locked Technical Decisions](#3-locked-technical-decisions)
|
||||||
|
4. [Non-Goals](#4-non-goals)
|
||||||
|
5. [Data Models](#5-data-models)
|
||||||
|
6. [Storage Layout](#6-storage-layout)
|
||||||
|
7. [API Contract](#7-api-contract)
|
||||||
|
8. [Frontend Routes](#8-frontend-routes)
|
||||||
|
9. [Component Hierarchy](#9-component-hierarchy)
|
||||||
|
10. [Timeline Rendering Model](#10-timeline-rendering-model)
|
||||||
|
11. [Playback Model](#11-playback-model)
|
||||||
|
12. [Export Rendering Model](#12-export-rendering-model)
|
||||||
|
13. [Phase Breakdown](#13-phase-breakdown)
|
||||||
|
14. [Acceptance Criteria](#14-acceptance-criteria)
|
||||||
|
15. [Deferred Decisions](#15-deferred-decisions)
|
||||||
|
16. [Third-Party Library Policy](#16-third-party-library-policy)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Product Goal
|
||||||
|
|
||||||
|
VibePod becomes a **script-aware AI podcast creation studio** where users can generate, arrange, edit, regenerate, and export podcast episodes from written scripts.
|
||||||
|
|
||||||
|
The headline feature that differentiates VibePod from generic audio editors:
|
||||||
|
|
||||||
|
> Every generated clip knows where it came from in the script and can be regenerated, replaced, or compared — without touching the original audio file.
|
||||||
|
|
||||||
|
The existing fast-generation page stays as the quick path. Studio grows alongside it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Current Architecture
|
||||||
|
|
||||||
|
### What exists today
|
||||||
|
|
||||||
|
| Layer | Details |
|
||||||
|
|---|---|
|
||||||
|
| Frontend | Next.js 15 · React 19 · TypeScript 5 · Tailwind CSS 4 · App Router |
|
||||||
|
| Backend | FastAPI · uvicorn · Python 3.10+ |
|
||||||
|
| Model | VibeVoice Realtime 0.5B via JezzWTF fork |
|
||||||
|
| Audio format | 24 kHz · float32 PCM · streamed as SSE · assembled to WAV in browser |
|
||||||
|
| State management | React `useReducer` in `page.tsx` — no Zustand, no Redux |
|
||||||
|
| Persistence | None — generated audio is an ephemeral browser Blob |
|
||||||
|
| Packages | pnpm workspace (frontend) · uv (Python) |
|
||||||
|
|
||||||
|
### Current generation flow
|
||||||
|
|
||||||
|
```
|
||||||
|
User submits script
|
||||||
|
→ POST /api/generate (Next.js proxy)
|
||||||
|
→ POST /generate (FastAPI)
|
||||||
|
→ VibeVoice inference thread
|
||||||
|
→ SSE stream of base64 float32 PCM chunks
|
||||||
|
→ Browser decodes chunks, adaptive buffering, live Web Audio playback
|
||||||
|
→ On complete: chunks merged, RIFF/WAV header prepended, Blob URL created
|
||||||
|
→ User can play back or download the WAV
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key files
|
||||||
|
|
||||||
|
```
|
||||||
|
web/
|
||||||
|
app/
|
||||||
|
page.tsx main generation UI (AppState via useReducer)
|
||||||
|
api/generate/route.ts SSE proxy to FastAPI
|
||||||
|
api/health/route.ts health check proxy
|
||||||
|
components/
|
||||||
|
Header.tsx
|
||||||
|
TextInputPanel.tsx
|
||||||
|
AudioPlayer.tsx
|
||||||
|
GenerationControls.tsx
|
||||||
|
StatusLog.tsx
|
||||||
|
hooks/
|
||||||
|
useStreamingGeneration.ts core streaming + WAV assembly
|
||||||
|
useAudioPlayer.ts HTML5 audio element wrapper
|
||||||
|
|
||||||
|
server/
|
||||||
|
vibevoice_server.py entire FastAPI app (972 lines)
|
||||||
|
start.sh launcher (CPU/CUDA detection, uv sync, uvicorn)
|
||||||
|
download_model.py HuggingFace prefetch
|
||||||
|
|
||||||
|
docs/
|
||||||
|
studio-build-plan.md this file
|
||||||
|
|
||||||
|
roadmap.md high-level phase vision
|
||||||
|
DESIGN.md brand + design system (colours, type, spacing)
|
||||||
|
AGENTS.md AI agent / CI guide
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Locked Technical Decisions
|
||||||
|
|
||||||
|
These decisions are final and must not be revisited without explicit agreement. New phases build on them.
|
||||||
|
|
||||||
|
### 3.1 Rendering approach — Hybrid DOM + Canvas 2D
|
||||||
|
|
||||||
|
The Studio uses a **hybrid rendering model**:
|
||||||
|
|
||||||
|
| Layer | Technology |
|
||||||
|
|---|---|
|
||||||
|
| App shell, layout, sidebars, panels, inspector | React + Tailwind CSS (DOM) |
|
||||||
|
| Track headers, controls, transport, modals | React + Tailwind CSS (DOM) |
|
||||||
|
| Timeline clip containers + positioning | React + CSS (`left`/`width` from time → pixels) |
|
||||||
|
| Waveform rendering inside clips | Raw Canvas 2D |
|
||||||
|
| Timeline ruler | Raw Canvas 2D |
|
||||||
|
| Playhead overlay | Raw Canvas 2D |
|
||||||
|
| Browser playback preview | Web Audio API |
|
||||||
|
| Final render + export | Python + FFmpeg (server-side only) |
|
||||||
|
|
||||||
|
**Why not full canvas:** Every non-waveform element — buttons, text, inputs, scroll, keyboard focus, accessibility — works better in DOM. Reimplementing all of that in canvas is wasted effort.
|
||||||
|
|
||||||
|
**Why not pure CSS:** Waveform peaks are thousands of pixel-height values per clip. DOM representation would be extremely slow. Canvas draws them in a tight loop in milliseconds.
|
||||||
|
|
||||||
|
**Why not WaveSurfer.js as the core:** WaveSurfer owns playback and its own event model. VibePod Studio needs its own clip model, its own timeline, and eventually its own multi-track playback. Adapting around WaveSurfer's assumptions creates friction. It may be used for standalone audio preview components (e.g., the generation page player), not the Studio timeline.
|
||||||
|
|
||||||
|
**Why not Konva.js:** Overkill for v1. Konva is designed for fully canvas-based scenes (whiteboards, diagrams). VibePod's timeline is mostly DOM. The additional mental model (Stage/Layer/Group/Transformer) is not justified unless interaction complexity grows significantly beyond v1.
|
||||||
|
|
||||||
|
### 3.2 Frontend stack — No additions without justification
|
||||||
|
|
||||||
|
Build inside the existing stack. New packages require a written reason in this document.
|
||||||
|
|
||||||
|
**Approved additions (to be installed when their phase begins):**
|
||||||
|
|
||||||
|
| Package | Purpose | Phase |
|
||||||
|
|---|---|---|
|
||||||
|
| `zustand` | Studio editor state | Phase 2 |
|
||||||
|
| `@dnd-kit/core` + `@dnd-kit/utilities` | Clip drag-and-drop in timeline | Phase 2 |
|
||||||
|
| `better-sqlite3` | SQLite for job and project persistence | Phase 1 |
|
||||||
|
|
||||||
|
**Conditionally approved (evaluate at phase start):**
|
||||||
|
|
||||||
|
| Package | Purpose | Condition |
|
||||||
|
|---|---|---|
|
||||||
|
| `framer-motion` | Clip move animations | Only if DnD-kit transitions feel rough after prototype |
|
||||||
|
| `@radix-ui/react-*` | Accessible modal/dropdown primitives | Only if building custom is taking too long |
|
||||||
|
|
||||||
|
### 3.3 Backend stack — Python + FFmpeg for all rendering
|
||||||
|
|
||||||
|
Browser-side audio mixing is only for **preview**. Export always goes to the Python backend.
|
||||||
|
|
||||||
|
**Approved backend additions:**
|
||||||
|
|
||||||
|
| Package | Purpose | Phase |
|
||||||
|
|---|---|---|
|
||||||
|
| `soundfile` | Already present — WAV read/write | Phase 1 |
|
||||||
|
| `numpy` | Audio array manipulation | Phase 1 |
|
||||||
|
| `pydub` | Audio trimming, mixing, concatenation | Phase 2 |
|
||||||
|
| `pyloudnorm` | Loudness normalisation (LUFS) | Phase 5 |
|
||||||
|
|
||||||
|
FFmpeg must be available on the server host. The render endpoint assumes `ffmpeg` is on PATH.
|
||||||
|
|
||||||
|
### 3.4 State management — Zustand for Studio, useReducer stays on generation page
|
||||||
|
|
||||||
|
The existing generation page uses `useReducer` and works well. Do not refactor it.
|
||||||
|
|
||||||
|
Studio requires a shared store that multiple components read and write (timeline, inspector, transport, script panel). Zustand is the right tool. It is lightweight, does not require providers, and handles editor-style state (undo stacks, selection, playhead) cleanly.
|
||||||
|
|
||||||
|
### 3.5 Database — SQLite from Phase 1
|
||||||
|
|
||||||
|
Do not start with flat JSON files. SQLite is still a single file, requires no server process, and gives proper queries, transactions, and schema migrations from day one. Use `better-sqlite3` in the Next.js API layer.
|
||||||
|
|
||||||
|
Schema lives in `web/lib/db/schema.sql`. Migrations are numbered SQL files in `web/lib/db/migrations/`.
|
||||||
|
|
||||||
|
### 3.6 Audio sample rate
|
||||||
|
|
||||||
|
All generated audio is 24 kHz float32 mono (VibeVoice output). Studio renders at 44.1 kHz stereo WAV or 48 kHz for podcast MP3 export. The render pipeline handles resampling.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Non-Goals
|
||||||
|
|
||||||
|
These will not be built and must not creep in:
|
||||||
|
|
||||||
|
- **Real-time collaborative editing** — single-user per project only
|
||||||
|
- **Cloud sync or user accounts** — local-first, no auth system
|
||||||
|
- **MIDI or music composition** — audio clips only, no MIDI tracks
|
||||||
|
- **Plugin system** — no third-party audio plugin API
|
||||||
|
- **Browser-side FFmpeg (ffmpeg.wasm)** — all rendering is server-side
|
||||||
|
- **Mobile / responsive Studio layout** — Studio targets desktop viewport only
|
||||||
|
- **Offline PWA** — the server must be running; no service worker caching of model output
|
||||||
|
- **Real-time voice cloning** — out of scope until VibeVoice supports it cleanly
|
||||||
|
- **Exporting to streaming platforms** — export to file only; no Spotify/Apple Podcasts upload
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Data Models
|
||||||
|
|
||||||
|
### 5.1 Generation job
|
||||||
|
|
||||||
|
```ts
|
||||||
|
type GenerationJob = {
|
||||||
|
id: string; // "gen_<nanoid>"
|
||||||
|
createdAt: string; // ISO 8601
|
||||||
|
status: "pending" | "generating" | "complete" | "error" | "cancelled";
|
||||||
|
script: string;
|
||||||
|
speaker: string;
|
||||||
|
cfgScale: number;
|
||||||
|
inferenceSteps: number;
|
||||||
|
durationSecs: number | null; // set on complete
|
||||||
|
sampleRate: number; // always 24000
|
||||||
|
audioPath: string | null; // relative to data/generations/<id>/audio.wav
|
||||||
|
waveformPath: string | null; // relative to data/generations/<id>/waveform.json
|
||||||
|
errorMessage: string | null;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.2 Studio project
|
||||||
|
|
||||||
|
```ts
|
||||||
|
type StudioProject = {
|
||||||
|
id: string; // "proj_<nanoid>"
|
||||||
|
name: string;
|
||||||
|
createdAt: string;
|
||||||
|
updatedAt: string;
|
||||||
|
script: ScriptDocument;
|
||||||
|
assets: AudioAsset[];
|
||||||
|
tracks: Track[];
|
||||||
|
edits: EditOperation[];
|
||||||
|
renderSettings: RenderSettings;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.3 Script document
|
||||||
|
|
||||||
|
```ts
|
||||||
|
type ScriptDocument = {
|
||||||
|
blocks: ScriptBlock[];
|
||||||
|
};
|
||||||
|
|
||||||
|
type ScriptBlock = {
|
||||||
|
id: string; // "block_<nanoid>"
|
||||||
|
speakerId: string;
|
||||||
|
text: string;
|
||||||
|
order: number;
|
||||||
|
generatedAssetId: string | null;
|
||||||
|
timelineClipIds: string[];
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.4 Audio asset
|
||||||
|
|
||||||
|
```ts
|
||||||
|
type AudioAsset = {
|
||||||
|
id: string; // "asset_<nanoid>"
|
||||||
|
projectId: string | null; // null = generation library asset
|
||||||
|
kind: "generated_voice" | "upload" | "music" | "sfx" | "render";
|
||||||
|
filePath: string;
|
||||||
|
durationSecs: number;
|
||||||
|
sampleRate: number;
|
||||||
|
channels: number;
|
||||||
|
waveformPath: string | null;
|
||||||
|
source: {
|
||||||
|
generationJobId?: string;
|
||||||
|
scriptBlockId?: string;
|
||||||
|
providerId?: string;
|
||||||
|
modelId?: string;
|
||||||
|
voiceId?: string;
|
||||||
|
settings?: Record<string, unknown>;
|
||||||
|
} | null;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.5 Track
|
||||||
|
|
||||||
|
```ts
|
||||||
|
type Track = {
|
||||||
|
id: string; // "track_<nanoid>"
|
||||||
|
name: string;
|
||||||
|
type: "voice" | "music" | "sfx" | "ambience" | "master";
|
||||||
|
order: number;
|
||||||
|
muted: boolean;
|
||||||
|
solo: boolean;
|
||||||
|
gainDb: number;
|
||||||
|
clips: TimelineClip[];
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.6 Timeline clip
|
||||||
|
|
||||||
|
```ts
|
||||||
|
type TimelineClip = {
|
||||||
|
id: string; // "clip_<nanoid>"
|
||||||
|
assetId: string;
|
||||||
|
trackId: string;
|
||||||
|
startTime: number; // seconds from timeline origin
|
||||||
|
sourceStart: number; // trim start within source asset (seconds)
|
||||||
|
sourceEnd: number; // trim end within source asset (seconds)
|
||||||
|
gainDb: number;
|
||||||
|
fadeInMs: number;
|
||||||
|
fadeOutMs: number;
|
||||||
|
linkedScriptRange: {
|
||||||
|
blockId: string;
|
||||||
|
startChar: number;
|
||||||
|
endChar: number;
|
||||||
|
} | null;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.7 Edit operation (non-destructive EDL)
|
||||||
|
|
||||||
|
```ts
|
||||||
|
type EditOperation =
|
||||||
|
| { type: "split"; clipId: string; at: number }
|
||||||
|
| { type: "trim"; clipId: string; sourceStart: number; sourceEnd: number }
|
||||||
|
| { type: "move"; clipId: string; startTime: number; trackId: string }
|
||||||
|
| { type: "gain"; clipId: string; gainDb: number }
|
||||||
|
| { type: "fade"; clipId: string; fadeInMs: number; fadeOutMs: number }
|
||||||
|
| { type: "delete"; clipId: string }
|
||||||
|
| { type: "mute"; trackId: string; muted: boolean }
|
||||||
|
| { type: "solo"; trackId: string; solo: boolean };
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.8 Take (regeneration history)
|
||||||
|
|
||||||
|
```ts
|
||||||
|
type Take = {
|
||||||
|
id: string; // "take_<nanoid>"
|
||||||
|
scriptBlockId: string;
|
||||||
|
assetId: string;
|
||||||
|
voiceId: string;
|
||||||
|
modelId: string;
|
||||||
|
settings: Record<string, unknown>;
|
||||||
|
createdAt: string;
|
||||||
|
rating: number | null; // 1-5 stars, optional
|
||||||
|
notes: string | null;
|
||||||
|
isActive: boolean; // true = the one placed on the timeline
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.9 Render settings
|
||||||
|
|
||||||
|
```ts
|
||||||
|
type RenderSettings = {
|
||||||
|
format: "wav" | "mp3";
|
||||||
|
sampleRate: 44100 | 48000;
|
||||||
|
bitrate: number | null; // kbps, null for WAV
|
||||||
|
normaliseLoudness: boolean;
|
||||||
|
lufsTarget: number; // default -16 LUFS for podcast
|
||||||
|
metadata: {
|
||||||
|
title: string;
|
||||||
|
artist: string;
|
||||||
|
album: string;
|
||||||
|
episodeNumber: number | null;
|
||||||
|
description: string;
|
||||||
|
} | null;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.10 Waveform peaks
|
||||||
|
|
||||||
|
```ts
|
||||||
|
type WaveformPeaks = {
|
||||||
|
sampleRate: number;
|
||||||
|
durationSecs: number;
|
||||||
|
channels: number;
|
||||||
|
samplesPerPixel: number;
|
||||||
|
length: number;
|
||||||
|
data: {
|
||||||
|
min: number[]; // range -1.0 to 0.0
|
||||||
|
max: number[]; // range 0.0 to 1.0
|
||||||
|
};
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Storage Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
data/
|
||||||
|
generations/
|
||||||
|
gen_<id>/
|
||||||
|
audio.wav raw float32 WAV at 24 kHz
|
||||||
|
waveform.json WaveformPeaks at 256 samples/pixel
|
||||||
|
metadata.json GenerationJob fields (denormalised)
|
||||||
|
|
||||||
|
projects/
|
||||||
|
proj_<id>/
|
||||||
|
project.json full StudioProject serialised
|
||||||
|
assets/
|
||||||
|
asset_<id>.wav uploaded or imported audio
|
||||||
|
renders/
|
||||||
|
render_<timestamp>.wav exported renders
|
||||||
|
render_<timestamp>.mp3
|
||||||
|
|
||||||
|
db/
|
||||||
|
vibepod.db SQLite database
|
||||||
|
```
|
||||||
|
|
||||||
|
The SQLite database is the source of truth for IDs, status, and relationships. JSON files are the source of truth for audio and waveform data.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. API Contract
|
||||||
|
|
||||||
|
All new routes are under `/api/`. The Next.js app proxies to FastAPI only for generation and health. Persistence routes are handled directly by Next.js API routes talking to SQLite.
|
||||||
|
|
||||||
|
### 7.1 Generation (existing, extended)
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /api/generate start streaming generation (existing)
|
||||||
|
GET /api/health server health check (existing)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7.2 Generation library (Phase 1)
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /api/generations list all jobs, newest first
|
||||||
|
query: ?limit=20&offset=0&status=complete
|
||||||
|
GET /api/generations/:id get single job metadata
|
||||||
|
GET /api/generations/:id/audio stream WAV file
|
||||||
|
GET /api/generations/:id/waveform get WaveformPeaks JSON
|
||||||
|
DELETE /api/generations/:id delete job and files
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7.3 Projects (Phase 2)
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /api/projects create project
|
||||||
|
GET /api/projects list projects
|
||||||
|
GET /api/projects/:id get project with full StudioProject
|
||||||
|
PUT /api/projects/:id save/autosave project
|
||||||
|
DELETE /api/projects/:id delete project and assets
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7.4 Project assets (Phase 2)
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /api/projects/:id/assets upload audio file or import from generation
|
||||||
|
GET /api/projects/:id/assets/:aid get asset metadata
|
||||||
|
DELETE /api/projects/:id/assets/:aid remove asset
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7.5 Takes (Phase 3)
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /api/projects/:id/takes/:blockId list takes for a script block
|
||||||
|
POST /api/projects/:id/takes/:blockId save new take
|
||||||
|
PUT /api/projects/:id/takes/:blockId/:takeId set active take
|
||||||
|
DELETE /api/projects/:id/takes/:blockId/:takeId delete take
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7.6 Render (Phase 5)
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /api/projects/:id/render start render job (sends EDL to Python)
|
||||||
|
GET /api/projects/:id/renders list render history
|
||||||
|
GET /api/projects/:id/renders/:rid poll render status
|
||||||
|
GET /api/projects/:id/renders/:rid/download stream rendered file
|
||||||
|
```
|
||||||
|
|
||||||
|
All error responses follow:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "error": "human-readable message", "code": "SNAKE_CASE_CODE" }
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Frontend Routes
|
||||||
|
|
||||||
|
```
|
||||||
|
/ generation page (current fast path, unchanged)
|
||||||
|
/library generation library — browse and replay saved jobs
|
||||||
|
/projects project dashboard — create or open Studio projects
|
||||||
|
/studio/:projectId Studio workspace
|
||||||
|
/studio/new redirect: creates project + navigates to /studio/:id
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Component Hierarchy
|
||||||
|
|
||||||
|
### Generation page (existing — do not restructure without reason)
|
||||||
|
|
||||||
|
```
|
||||||
|
page.tsx
|
||||||
|
Header
|
||||||
|
TextInputPanel
|
||||||
|
GenerationControls
|
||||||
|
StatusLog
|
||||||
|
AudioPlayer
|
||||||
|
```
|
||||||
|
|
||||||
|
### Library page (Phase 1)
|
||||||
|
|
||||||
|
```
|
||||||
|
/app/library/page.tsx
|
||||||
|
Header
|
||||||
|
LibraryPage
|
||||||
|
GenerationCard[]
|
||||||
|
WaveformPreview (small static canvas render of peaks)
|
||||||
|
GenerationMetadata (speaker, duration, date, settings)
|
||||||
|
GenerationActions (play, download, open in studio, delete)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Projects dashboard (Phase 2)
|
||||||
|
|
||||||
|
```
|
||||||
|
/app/projects/page.tsx
|
||||||
|
Header
|
||||||
|
ProjectsPage
|
||||||
|
NewProjectButton
|
||||||
|
ProjectCard[]
|
||||||
|
ProjectThumbnail
|
||||||
|
ProjectMetadata
|
||||||
|
ProjectActions
|
||||||
|
```
|
||||||
|
|
||||||
|
### Studio workspace (Phase 2+)
|
||||||
|
|
||||||
|
```
|
||||||
|
/app/studio/[projectId]/page.tsx
|
||||||
|
StudioShell
|
||||||
|
StudioTopBar
|
||||||
|
ProjectNameInput
|
||||||
|
UndoButton / RedoButton
|
||||||
|
SaveStatus
|
||||||
|
ExportButton
|
||||||
|
StudioBody
|
||||||
|
ProjectSidebar
|
||||||
|
SidebarTabs (Script | Voices | Media | Effects | Templates)
|
||||||
|
ScriptPanel (Phase 3)
|
||||||
|
VoicesPanel
|
||||||
|
MediaBin (Phase 2)
|
||||||
|
EffectsPanel (Phase 4)
|
||||||
|
StudioMain
|
||||||
|
TimelineRuler (canvas)
|
||||||
|
TimelineArea
|
||||||
|
TimelineTrack[] (one per track)
|
||||||
|
TrackHeader
|
||||||
|
TrackClipArea
|
||||||
|
TimelineClip[]
|
||||||
|
WaveformCanvas (canvas)
|
||||||
|
ClipLabel
|
||||||
|
InspectorPanel
|
||||||
|
ClipInspector (when clip selected)
|
||||||
|
ClipMetadata
|
||||||
|
GainControl
|
||||||
|
FadeControls
|
||||||
|
RegenerateButton (Phase 3)
|
||||||
|
TakeStack (Phase 3)
|
||||||
|
TrackInspector (when track selected)
|
||||||
|
EmptyInspector (nothing selected)
|
||||||
|
TransportBar
|
||||||
|
PlayPauseButton
|
||||||
|
StopButton
|
||||||
|
PlayheadTimeDisplay
|
||||||
|
ZoomControls
|
||||||
|
SnapToggle
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. Timeline Rendering Model
|
||||||
|
|
||||||
|
### Clip positioning
|
||||||
|
|
||||||
|
Timeline clips are DOM elements with CSS `position: absolute`. Position and size derive from the project's pixels-per-second zoom level:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
const left = clip.startTime * pixelsPerSecond;
|
||||||
|
const width = (clip.sourceEnd - clip.sourceStart) * pixelsPerSecond;
|
||||||
|
```
|
||||||
|
|
||||||
|
The `pixelsPerSecond` value lives in Zustand and changes with zoom. All clip positions recompute via derived selectors.
|
||||||
|
|
||||||
|
### Waveform canvas
|
||||||
|
|
||||||
|
Each `TimelineClip` contains a `<canvas>` element that receives peak data as a prop. The renderer draws one vertical line per pixel column:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
function drawWaveform(
|
||||||
|
ctx: CanvasRenderingContext2D,
|
||||||
|
peaks: WaveformPeaks,
|
||||||
|
width: number,
|
||||||
|
height: number,
|
||||||
|
color: string
|
||||||
|
): void {
|
||||||
|
const midY = height / 2;
|
||||||
|
ctx.clearRect(0, 0, width, height);
|
||||||
|
ctx.strokeStyle = color;
|
||||||
|
ctx.lineWidth = 1;
|
||||||
|
|
||||||
|
for (let x = 0; x < width; x++) {
|
||||||
|
const peakIndex = Math.floor((x / width) * peaks.length);
|
||||||
|
const minY = midY + peaks.data.min[peakIndex] * midY;
|
||||||
|
const maxY = midY - peaks.data.max[peakIndex] * midY;
|
||||||
|
ctx.beginPath();
|
||||||
|
ctx.moveTo(x + 0.5, minY);
|
||||||
|
ctx.lineTo(x + 0.5, maxY);
|
||||||
|
ctx.stroke();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The canvas rerenders when `width`, `peaks`, or zoom changes. It does not rerender on playback.
|
||||||
|
|
||||||
|
### Playhead
|
||||||
|
|
||||||
|
The playhead is a separate absolutely-positioned element (thin vertical line) that overlays the entire timeline area. Its `left` position is updated via `requestAnimationFrame` during playback — no React state updates, direct DOM style write.
|
||||||
|
|
||||||
|
### Timeline ruler
|
||||||
|
|
||||||
|
A `<canvas>` element spanning the full timeline width. Draws time markers at intervals derived from current zoom level. Rerenders on zoom change only.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. Playback Model
|
||||||
|
|
||||||
|
Studio preview uses the **Web Audio API**, not HTML5 `<audio>`.
|
||||||
|
|
||||||
|
### Graph
|
||||||
|
|
||||||
|
```
|
||||||
|
AudioBufferSourceNode (per clip)
|
||||||
|
→ GainNode (clip gain + fades)
|
||||||
|
→ GainNode (track gain)
|
||||||
|
→ GainNode (master gain)
|
||||||
|
→ AudioContext.destination
|
||||||
|
```
|
||||||
|
|
||||||
|
### Scheduling
|
||||||
|
|
||||||
|
On play, the engine:
|
||||||
|
|
||||||
|
1. Reads current `playheadTime` from Zustand.
|
||||||
|
2. For each clip where `clip.startTime + (clip.sourceEnd - clip.sourceStart) > playheadTime`:
|
||||||
|
- Calculates `offset = playheadTime - clip.startTime + clip.sourceStart` (clamped to 0)
|
||||||
|
- Calls `source.start(audioCtx.currentTime, offset)`
|
||||||
|
3. All sources start in sync via a shared `startTime = audioCtx.currentTime + 0.01` anchor.
|
||||||
|
|
||||||
|
On stop or seek, all sources are disconnected and the graph is torn down. A new graph is built on the next play call.
|
||||||
|
|
||||||
|
### Audio buffer cache
|
||||||
|
|
||||||
|
Fetched WAV files are decoded to `AudioBuffer` via `AudioContext.decodeAudioData()` and cached by asset ID. Cache is invalidated if the asset is deleted or replaced.
|
||||||
|
|
||||||
|
### Fade implementation
|
||||||
|
|
||||||
|
Fades are implemented as `AudioParam` ramps on the clip GainNode:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
gainNode.gain.setValueAtTime(0, startTime);
|
||||||
|
gainNode.gain.linearRampToValueAtTime(clipGain, startTime + fadeInSecs);
|
||||||
|
gainNode.gain.setValueAtTime(clipGain, endTime - fadeOutSecs);
|
||||||
|
gainNode.gain.linearRampToValueAtTime(0, endTime);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Export Rendering Model
|
||||||
|
|
||||||
|
When the user triggers export, the frontend sends a render request to the Python backend. The browser is never involved in mixing.
|
||||||
|
|
||||||
|
### Request payload
|
||||||
|
|
||||||
|
```ts
|
||||||
|
type RenderRequest = {
|
||||||
|
projectId: string;
|
||||||
|
tracks: Track[];
|
||||||
|
clips: TimelineClip[];
|
||||||
|
assets: Array<{ id: string; filePath: string }>;
|
||||||
|
settings: RenderSettings;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python render pipeline
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Pseudocode — actual implementation lives in server/render.py
|
||||||
|
def render_project(req: RenderRequest) -> str:
|
||||||
|
# 1. Load all source audio files into numpy arrays
|
||||||
|
# 2. Determine total timeline duration
|
||||||
|
# 3. Create output buffer (zeros) at target sample rate
|
||||||
|
# 4. For each clip (sorted by startTime):
|
||||||
|
# a. Load source audio
|
||||||
|
# b. Resample to target sample rate if needed
|
||||||
|
# c. Apply trim (sourceStart → sourceEnd)
|
||||||
|
# d. Apply gain (dB → linear)
|
||||||
|
# e. Apply fade in/out (linear ramp)
|
||||||
|
# f. Place at clip.startTime offset in output buffer
|
||||||
|
# 5. Apply track gain to each track's summed signal
|
||||||
|
# 6. Sum all tracks into master buffer
|
||||||
|
# 7. Apply master gain
|
||||||
|
# 8. If normaliseLoudness: apply pyloudnorm to target LUFS
|
||||||
|
# 9. Export WAV or MP3 via soundfile / ffmpeg
|
||||||
|
# 10. Write to data/projects/<id>/renders/<timestamp>.wav
|
||||||
|
# 11. Return file path
|
||||||
|
```
|
||||||
|
|
||||||
|
Render runs in a background thread. The client polls `GET /api/projects/:id/renders/:rid` for status.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13. Phase Breakdown
|
||||||
|
|
||||||
|
### Phase 0 — Stabilise (current state → pre-Phase 1)
|
||||||
|
|
||||||
|
**Goal:** Clean foundation. No new features.
|
||||||
|
|
||||||
|
Tasks:
|
||||||
|
- [ ] Extract WAV assembly from `useStreamingGeneration.ts` into `web/lib/audio/wav.ts`
|
||||||
|
- [ ] Extract waveform peak generation into `server/waveform.py`
|
||||||
|
- [ ] Confirm generation cancellation works cleanly (stream abort + server cancel_event)
|
||||||
|
- [ ] Add `nanoid` to backend for stable generation IDs
|
||||||
|
- [ ] Add `data/` directory to `.gitignore`
|
||||||
|
|
||||||
|
**Acceptance:** WAV assembly is a pure function with unit tests. Generation IDs are stable.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 1 — Persistent Generation Library
|
||||||
|
|
||||||
|
**Goal:** Every generation is saved. Users can browse, play, and download past generations.
|
||||||
|
|
||||||
|
**Backend tasks:**
|
||||||
|
- [ ] Add SQLite setup (`data/db/vibepod.db`, schema migration 001)
|
||||||
|
- [ ] `generations` table: `id, created_at, status, script, speaker, cfg_scale, inference_steps, duration_secs, sample_rate, audio_path, waveform_path, error_message`
|
||||||
|
- [ ] On generation complete: save WAV to `data/generations/<id>/audio.wav`
|
||||||
|
- [ ] On generation complete: compute and save waveform peaks to `data/generations/<id>/waveform.json`
|
||||||
|
- [ ] Implement `GET /api/generations` (list, paginated)
|
||||||
|
- [ ] Implement `GET /api/generations/:id` (single)
|
||||||
|
- [ ] Implement `GET /api/generations/:id/audio` (stream file)
|
||||||
|
- [ ] Implement `GET /api/generations/:id/waveform` (peaks JSON)
|
||||||
|
- [ ] Implement `DELETE /api/generations/:id` (delete row + files)
|
||||||
|
|
||||||
|
**Frontend tasks:**
|
||||||
|
- [ ] Install `better-sqlite3` + types
|
||||||
|
- [ ] Create `web/lib/db/` — schema, migration runner, query helpers
|
||||||
|
- [ ] Create `/library` route and `LibraryPage` component
|
||||||
|
- [ ] `GenerationCard` component: waveform preview canvas, metadata, play/download/delete actions
|
||||||
|
- [ ] `WaveformPreview` component: draws peaks on canvas (static, no playback)
|
||||||
|
- [ ] Mini audio player for library card playback (reuse `useAudioPlayer` hook)
|
||||||
|
- [ ] Link "Open in Studio" button (navigates to `/studio/new?fromGeneration=<id>`)
|
||||||
|
- [ ] Add "Library" link to `Header`
|
||||||
|
|
||||||
|
**Acceptance:**
|
||||||
|
- Generate audio → close browser → reopen → generation appears in library with waveform
|
||||||
|
- Play button plays correct audio
|
||||||
|
- Delete removes from library and disk
|
||||||
|
- Library renders without error when empty
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 2 — Studio MVP
|
||||||
|
|
||||||
|
**Goal:** Single-track timeline editor. Open a generation, view waveform, trim/split/delete, export WAV.
|
||||||
|
|
||||||
|
**Backend tasks:**
|
||||||
|
- [ ] `projects` table: `id, name, created_at, updated_at, project_json`
|
||||||
|
- [ ] `assets` table: `id, project_id, kind, file_path, duration_secs, sample_rate, channels, waveform_path, source_json`
|
||||||
|
- [ ] Implement `POST /api/projects`
|
||||||
|
- [ ] Implement `GET /api/projects` (list)
|
||||||
|
- [ ] Implement `GET /api/projects/:id`
|
||||||
|
- [ ] Implement `PUT /api/projects/:id` (save)
|
||||||
|
- [ ] Implement `DELETE /api/projects/:id`
|
||||||
|
- [ ] Implement `POST /api/projects/:id/assets` (import from generation or upload)
|
||||||
|
- [ ] Implement basic render endpoint (single voice track, WAV out only)
|
||||||
|
|
||||||
|
**Frontend tasks:**
|
||||||
|
- [ ] Install `zustand`, `@dnd-kit/core`, `@dnd-kit/utilities`
|
||||||
|
- [ ] Create Studio Zustand store (`web/stores/studioStore.ts`)
|
||||||
|
- Project state, selected clip, playhead time, zoom, isPlaying, undo stack
|
||||||
|
- Actions: selectClip, moveClip, splitClip, trimClip, setClipGain, undo, redo
|
||||||
|
- [ ] Create `/projects` route and dashboard
|
||||||
|
- [ ] Create `/studio/[projectId]` route
|
||||||
|
- [ ] `StudioShell` — top-level layout
|
||||||
|
- [ ] `StudioTopBar` — project name, undo/redo, save status, export button
|
||||||
|
- [ ] `ProjectSidebar` — tabs shell + `MediaBin` tab
|
||||||
|
- [ ] `MediaBin` — list assets, drag to timeline
|
||||||
|
- [ ] `TimelineArea` — scrollable container with tracks
|
||||||
|
- [ ] `TimelineRuler` — canvas ruler, rerenders on zoom
|
||||||
|
- [ ] `TimelineTrack` — track header + clip area
|
||||||
|
- [ ] `TimelineClip` — positioned div, selectable, draggable
|
||||||
|
- [ ] `WaveformCanvas` — canvas inside clip, draws peaks
|
||||||
|
- [ ] `InspectorPanel` — shows selected clip properties
|
||||||
|
- [ ] `TransportBar` — play/pause/stop, time display, zoom slider
|
||||||
|
- [ ] Web Audio playback engine (`web/lib/audio/playbackEngine.ts`)
|
||||||
|
- [ ] Autosave: debounced PUT on every store change (500ms delay)
|
||||||
|
- [ ] Export dialog: format picker → POST /api/projects/:id/render → poll → download
|
||||||
|
|
||||||
|
**Acceptance:**
|
||||||
|
- Open generation from library → Studio loads with waveform on single track
|
||||||
|
- Play button plays audio in sync with playhead
|
||||||
|
- Drag clip moves it on timeline
|
||||||
|
- Split at playhead creates two clips
|
||||||
|
- Trim handles reduce clip duration
|
||||||
|
- Delete removes clip
|
||||||
|
- Export produces downloadable WAV
|
||||||
|
- Undo/redo works for all operations
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 3 — Script-Linked Regeneration
|
||||||
|
|
||||||
|
**Goal:** Script blocks are the source of truth. Clicking a clip highlights the script. Regenerating a clip produces a new take.
|
||||||
|
|
||||||
|
**Backend tasks:**
|
||||||
|
- [ ] `takes` table: `id, project_id, script_block_id, asset_id, voice_id, model_id, settings_json, created_at, rating, notes, is_active`
|
||||||
|
- [ ] Implement takes API endpoints (list, create, set active, delete)
|
||||||
|
- [ ] Waveform peak generation on regenerated takes
|
||||||
|
|
||||||
|
**Frontend tasks:**
|
||||||
|
- [ ] `ScriptPanel` sidebar tab — editable script blocks with speaker labels
|
||||||
|
- [ ] Script block → clip bidirectional linking (click clip → highlight block, click block → select clip)
|
||||||
|
- [ ] Clip inspector: show source script text (read-only in Phase 3)
|
||||||
|
- [ ] `RegenerateButton` in inspector — sends block text + current voice settings → new generation
|
||||||
|
- [ ] Regeneration creates new Take, new Asset, new Clip (does not replace existing clip automatically)
|
||||||
|
- [ ] `TakeStack` in inspector — list takes for selected block, click to preview, "Replace in timeline" action
|
||||||
|
- [ ] Per-block voice setting override (speaker, cfg_scale, inference_steps)
|
||||||
|
|
||||||
|
**Acceptance:**
|
||||||
|
- Clicking a clip selects the related script block in ScriptPanel
|
||||||
|
- Clicking a script block selects the clip on the timeline
|
||||||
|
- Regenerate produces a new take visible in TakeStack
|
||||||
|
- "Replace in timeline" swaps the clip's asset to the new take
|
||||||
|
- Previous take is preserved and can be restored
|
||||||
|
- Undo works across take replacements
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 4 — Multi-Speaker Podcast Builder
|
||||||
|
|
||||||
|
**Goal:** Multiple voice tracks, music/SFX tracks, speaker assignment, show templates.
|
||||||
|
|
||||||
|
**Tasks:**
|
||||||
|
- [ ] Multiple tracks: Host, Guest, Music, SFX, Ambience
|
||||||
|
- [ ] Track type icons and colour coding per track type
|
||||||
|
- [ ] Per-track mute/solo buttons (functional in Web Audio engine)
|
||||||
|
- [ ] Track gain slider
|
||||||
|
- [ ] Speaker assignment per track (voice preset tied to track)
|
||||||
|
- [ ] Music/SFX uploads to media bin
|
||||||
|
- [ ] Basic music ducking on voice tracks (auto-gain on music track when voice plays)
|
||||||
|
- [ ] Show template: save a project's track layout + speaker assignments as a reusable template
|
||||||
|
- [ ] Template picker on new project creation
|
||||||
|
|
||||||
|
**Acceptance:**
|
||||||
|
- Two voice tracks play independently and mix correctly
|
||||||
|
- Mute/solo work
|
||||||
|
- Music bed plays under voice tracks
|
||||||
|
- Saving as template creates a new project correctly
|
||||||
|
- Exported WAV contains all tracks mixed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 5 — Production Export
|
||||||
|
|
||||||
|
**Goal:** MP3 export, loudness normalisation, podcast metadata, render queue, mastering presets.
|
||||||
|
|
||||||
|
**Tasks:**
|
||||||
|
- [ ] MP3 export via FFmpeg on render backend
|
||||||
|
- [ ] `pyloudnorm` integration — LUFS targeting per preset
|
||||||
|
- [ ] Export presets: Podcast Balanced, Podcast Loud, Audiobook, Raw WAV, YouTube Audio
|
||||||
|
- [ ] ID3 metadata fields in export dialog (title, artist, episode number, cover art, description)
|
||||||
|
- [ ] Render job queue — multiple renders can be queued
|
||||||
|
- [ ] Render status polling with progress bar
|
||||||
|
- [ ] Render history panel in project
|
||||||
|
- [ ] Autosave recovery: on crash/close, restore last autosaved state on next open
|
||||||
|
|
||||||
|
**Acceptance:**
|
||||||
|
- MP3 export produces valid file with correct ID3 tags
|
||||||
|
- Loudness normalisation hits target LUFS ± 0.5
|
||||||
|
- Render queue processes jobs sequentially
|
||||||
|
- Recovering an autosave restores timeline to last saved state
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. Acceptance Criteria
|
||||||
|
|
||||||
|
### Cross-cutting criteria (all phases)
|
||||||
|
|
||||||
|
- No TypeScript `any` types anywhere in Studio code
|
||||||
|
- Zustand store actions are pure (no side effects except explicit async actions)
|
||||||
|
- Autosave never blocks the UI thread
|
||||||
|
- Undo/redo covers every timeline mutation
|
||||||
|
- No orphaned audio files — deleting a project deletes its files
|
||||||
|
- Waveform canvas does not rerender on playback (only on zoom/resize)
|
||||||
|
- Playhead position updates at 60fps via `requestAnimationFrame`, not React state
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. Deferred Decisions
|
||||||
|
|
||||||
|
These are intentionally not decided yet. Revisit at the phase that needs them.
|
||||||
|
|
||||||
|
| Decision | Deferred until |
|
||||||
|
|---|---|
|
||||||
|
| Voice cloning / custom voice upload | Depends on VibeVoice roadmap |
|
||||||
|
| XTTS or ElevenLabs as second provider | Phase 3+ — only after VoiceModelProvider abstraction is proven |
|
||||||
|
| Clip crossfades (overlapping clips) | Phase 4 — requires mixing model update |
|
||||||
|
| Clip-level EQ / compression | Phase 5 |
|
||||||
|
| Per-segment emotion / style tags | Phase 3 evaluation |
|
||||||
|
| WebSocket vs SSE for render progress | Phase 5 — evaluate based on render durations seen in practice |
|
||||||
|
| IndexedDB caching of AudioBuffers | Phase 2 evaluation — only if cache miss latency is a real problem |
|
||||||
|
| Noise gate / background removal | Post-Phase 5 |
|
||||||
|
| Multi-window Studio (popout inspector etc.) | Not planned |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. Third-Party Library Policy
|
||||||
|
|
||||||
|
VibePod is an open-source project. Any third-party library integrated into the codebase must satisfy:
|
||||||
|
|
||||||
|
1. **License:** MIT, Apache 2.0, BSD 2/3-Clause, or ISC. No GPL unless the entire application is separately GPL-licensed. No CC-NC.
|
||||||
|
2. **Attribution:** Add to a `LICENSES.md` file in the repo root when integrating. Include library name, version, license type, and project URL.
|
||||||
|
3. **Size:** For frontend packages, run `bundlephobia` before adding. Prefer packages under 20 kB gzipped unless there is no alternative.
|
||||||
|
4. **Maintenance:** Prefer packages with active maintenance. Check last commit date and open issue count before adding.
|
||||||
|
5. **Source code:** If copying or adapting a snippet (not a full package), add an inline comment with the source URL and license.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*This document is the execution specification. The high-level vision lives in `roadmap.md`. When in doubt about scope, refer to Section 4 (Non-Goals) first.*
|
||||||
@@ -0,0 +1,35 @@
|
|||||||
|
"""Stable, URL-safe ID generation for VibePod entities."""
|
||||||
|
|
||||||
|
import secrets
|
||||||
|
|
||||||
|
|
||||||
|
def _make_id(prefix: str) -> str:
|
||||||
|
return f"{prefix}_{secrets.token_urlsafe(8)}"
|
||||||
|
|
||||||
|
|
||||||
|
def gen_id() -> str:
|
||||||
|
return _make_id("gen")
|
||||||
|
|
||||||
|
|
||||||
|
def proj_id() -> str:
|
||||||
|
return _make_id("proj")
|
||||||
|
|
||||||
|
|
||||||
|
def asset_id() -> str:
|
||||||
|
return _make_id("asset")
|
||||||
|
|
||||||
|
|
||||||
|
def track_id() -> str:
|
||||||
|
return _make_id("track")
|
||||||
|
|
||||||
|
|
||||||
|
def clip_id() -> str:
|
||||||
|
return _make_id("clip")
|
||||||
|
|
||||||
|
|
||||||
|
def block_id() -> str:
|
||||||
|
return _make_id("block")
|
||||||
|
|
||||||
|
|
||||||
|
def take_id() -> str:
|
||||||
|
return _make_id("take")
|
||||||
@@ -0,0 +1,77 @@
|
|||||||
|
"""Waveform peak generation for VibePod.
|
||||||
|
|
||||||
|
Reads a WAV file and produces min/max peak arrays suitable for canvas rendering.
|
||||||
|
The output format matches the WaveformPeaks TypeScript type in the frontend.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import soundfile as sf
|
||||||
|
|
||||||
|
|
||||||
|
def compute_peaks(
|
||||||
|
audio_path: str | Path,
|
||||||
|
samples_per_pixel: int = 256,
|
||||||
|
) -> dict:
|
||||||
|
"""Compute min/max waveform peaks from a WAV file.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
audio_path: Path to a WAV file (any bit depth, any channel count).
|
||||||
|
samples_per_pixel: How many audio samples are condensed into one peak pair.
|
||||||
|
256 is a good default for a ~1000px wide waveform at
|
||||||
|
standard podcast lengths.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
A dict matching the WaveformPeaks TypeScript type:
|
||||||
|
{
|
||||||
|
"sampleRate": int,
|
||||||
|
"durationSecs": float,
|
||||||
|
"channels": int,
|
||||||
|
"samplesPerPixel": int,
|
||||||
|
"length": int, # number of peak pairs
|
||||||
|
"data": {
|
||||||
|
"min": [float, ...], # values in [-1.0, 0.0]
|
||||||
|
"max": [float, ...], # values in [0.0, 1.0]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
samples, sample_rate = sf.read(str(audio_path), dtype="float32", always_2d=True)
|
||||||
|
|
||||||
|
# Mix to mono by averaging channels
|
||||||
|
mono = samples.mean(axis=1)
|
||||||
|
total_samples = len(mono)
|
||||||
|
duration_secs = total_samples / sample_rate
|
||||||
|
channels = samples.shape[1]
|
||||||
|
|
||||||
|
# Pad so total_samples is divisible by samples_per_pixel
|
||||||
|
remainder = total_samples % samples_per_pixel
|
||||||
|
if remainder:
|
||||||
|
pad = samples_per_pixel - remainder
|
||||||
|
mono = np.concatenate([mono, np.zeros(pad, dtype=np.float32)])
|
||||||
|
|
||||||
|
frames = mono.reshape(-1, samples_per_pixel)
|
||||||
|
peak_min = frames.min(axis=1).tolist()
|
||||||
|
peak_max = frames.max(axis=1).tolist()
|
||||||
|
length = len(peak_min)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"sampleRate": int(sample_rate),
|
||||||
|
"durationSecs": round(duration_secs, 4),
|
||||||
|
"channels": int(channels),
|
||||||
|
"samplesPerPixel": samples_per_pixel,
|
||||||
|
"length": length,
|
||||||
|
"data": {
|
||||||
|
"min": [round(float(v), 5) for v in peak_min],
|
||||||
|
"max": [round(float(v), 5) for v in peak_max],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def write_peaks(audio_path: str | Path, output_path: str | Path, samples_per_pixel: int = 256) -> None:
|
||||||
|
"""Compute peaks and write them to a JSON file."""
|
||||||
|
peaks = compute_peaks(audio_path, samples_per_pixel)
|
||||||
|
Path(output_path).write_text(json.dumps(peaks, separators=(",", ":")), encoding="utf-8")
|
||||||
@@ -1,8 +1,8 @@
|
|||||||
"use client";
|
"use client";
|
||||||
|
|
||||||
import { useCallback, useEffect, useRef, useState } from "react";
|
import { useCallback, useEffect, useRef, useState } from "react";
|
||||||
|
import { buildWav, decodeFloat32Chunk, mergeFloat32Arrays, SAMPLE_RATE } from "@/lib/audio/wav";
|
||||||
|
|
||||||
const SAMPLE_RATE = 24_000;
|
|
||||||
const DEFAULT_PREBUFFER_SECS = 5.0;
|
const DEFAULT_PREBUFFER_SECS = 5.0;
|
||||||
const DEFAULT_REBUFFER_THRESHOLD_SECS = 1.0;
|
const DEFAULT_REBUFFER_THRESHOLD_SECS = 1.0;
|
||||||
const DEFAULT_RESUME_THRESHOLD_SECS = 3.0;
|
const DEFAULT_RESUME_THRESHOLD_SECS = 3.0;
|
||||||
@@ -30,53 +30,6 @@ interface UseStreamingGenerationOptions {
|
|||||||
resumeThresholdSecs?: number;
|
resumeThresholdSecs?: number;
|
||||||
}
|
}
|
||||||
|
|
||||||
function mergeFloat32Arrays(chunks: Float32Array<ArrayBuffer>[]): Float32Array<ArrayBuffer> {
|
|
||||||
const total = chunks.reduce((sum, chunk) => sum + chunk.length, 0);
|
|
||||||
const out = new Float32Array(total);
|
|
||||||
let offset = 0;
|
|
||||||
for (const chunk of chunks) {
|
|
||||||
out.set(chunk, offset);
|
|
||||||
offset += chunk.length;
|
|
||||||
}
|
|
||||||
return out;
|
|
||||||
}
|
|
||||||
|
|
||||||
function buildWav(samples: Float32Array<ArrayBuffer>, sampleRate: number): Blob {
|
|
||||||
const dataSize = samples.length * 4;
|
|
||||||
const buffer = new ArrayBuffer(44 + dataSize);
|
|
||||||
const view = new DataView(buffer);
|
|
||||||
const writeString = (offset: number, value: string) => {
|
|
||||||
for (let i = 0; i < value.length; i += 1) {
|
|
||||||
view.setUint8(offset + i, value.charCodeAt(i));
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
writeString(0, "RIFF");
|
|
||||||
view.setUint32(4, 36 + dataSize, true);
|
|
||||||
writeString(8, "WAVE");
|
|
||||||
writeString(12, "fmt ");
|
|
||||||
view.setUint32(16, 16, true);
|
|
||||||
view.setUint16(20, 3, true);
|
|
||||||
view.setUint16(22, 1, true);
|
|
||||||
view.setUint32(24, sampleRate, true);
|
|
||||||
view.setUint32(28, sampleRate * 4, true);
|
|
||||||
view.setUint16(32, 4, true);
|
|
||||||
view.setUint16(34, 32, true);
|
|
||||||
writeString(36, "data");
|
|
||||||
view.setUint32(40, dataSize, true);
|
|
||||||
new Float32Array(buffer, 44).set(samples);
|
|
||||||
return new Blob([buffer], { type: "audio/wav" });
|
|
||||||
}
|
|
||||||
|
|
||||||
function decodeFloat32Chunk(data: string): Float32Array<ArrayBuffer> {
|
|
||||||
const raw = atob(data);
|
|
||||||
const bytes = new Uint8Array(raw.length);
|
|
||||||
for (let i = 0; i < raw.length; i += 1) {
|
|
||||||
bytes[i] = raw.charCodeAt(i);
|
|
||||||
}
|
|
||||||
return new Float32Array(bytes.buffer as ArrayBuffer);
|
|
||||||
}
|
|
||||||
|
|
||||||
export function useStreamingGeneration({
|
export function useStreamingGeneration({
|
||||||
onLog,
|
onLog,
|
||||||
onStart,
|
onStart,
|
||||||
|
|||||||
@@ -0,0 +1,56 @@
|
|||||||
|
export const SAMPLE_RATE = 24_000;
|
||||||
|
|
||||||
|
export function decodeFloat32Chunk(data: string): Float32Array<ArrayBuffer> {
|
||||||
|
const raw = atob(data);
|
||||||
|
const bytes = new Uint8Array(raw.length);
|
||||||
|
for (let i = 0; i < raw.length; i += 1) {
|
||||||
|
bytes[i] = raw.charCodeAt(i);
|
||||||
|
}
|
||||||
|
return new Float32Array(bytes.buffer as ArrayBuffer);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function mergeFloat32Arrays(
|
||||||
|
chunks: Float32Array<ArrayBuffer>[]
|
||||||
|
): Float32Array<ArrayBuffer> {
|
||||||
|
const total = chunks.reduce((sum, chunk) => sum + chunk.length, 0);
|
||||||
|
const out = new Float32Array(total);
|
||||||
|
let offset = 0;
|
||||||
|
for (const chunk of chunks) {
|
||||||
|
out.set(chunk, offset);
|
||||||
|
offset += chunk.length;
|
||||||
|
}
|
||||||
|
return out;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Builds a float32 RIFF/WAV Blob from raw PCM samples. */
|
||||||
|
export function buildWav(
|
||||||
|
samples: Float32Array<ArrayBuffer>,
|
||||||
|
sampleRate: number
|
||||||
|
): Blob {
|
||||||
|
const dataSize = samples.length * 4;
|
||||||
|
const buffer = new ArrayBuffer(44 + dataSize);
|
||||||
|
const view = new DataView(buffer);
|
||||||
|
|
||||||
|
const writeString = (offset: number, value: string) => {
|
||||||
|
for (let i = 0; i < value.length; i += 1) {
|
||||||
|
view.setUint8(offset + i, value.charCodeAt(i));
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
writeString(0, "RIFF");
|
||||||
|
view.setUint32(4, 36 + dataSize, true);
|
||||||
|
writeString(8, "WAVE");
|
||||||
|
writeString(12, "fmt ");
|
||||||
|
view.setUint32(16, 16, true);
|
||||||
|
view.setUint16(20, 3, true); // PCM float32
|
||||||
|
view.setUint16(22, 1, true); // mono
|
||||||
|
view.setUint32(24, sampleRate, true);
|
||||||
|
view.setUint32(28, sampleRate * 4, true); // byte rate
|
||||||
|
view.setUint16(32, 4, true); // block align
|
||||||
|
view.setUint16(34, 32, true); // bits per sample
|
||||||
|
writeString(36, "data");
|
||||||
|
view.setUint32(40, dataSize, true);
|
||||||
|
new Float32Array(buffer, 44).set(samples);
|
||||||
|
|
||||||
|
return new Blob([buffer], { type: "audio/wav" });
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user