style: apply prettier formatting across all source files

This commit is contained in:
2026-05-01 18:36:42 +01:00
parent d60c5ae498
commit a351910fd2
15 changed files with 376 additions and 318 deletions
+18 -10
View File
@@ -8,10 +8,10 @@ This file gives AI coding agents (Jules, Copilot, Claude Code, etc.) the context
VibePod is a text-to-speech web app. It has two services that must both run for the app to work: VibePod is a text-to-speech web app. It has two services that must both run for the app to work:
| Service | Language | Entry point | Port | | Service | Language | Entry point | Port |
|---------|----------|-------------|------| | ---------- | ---------------------------------- | ------------------------------- | ---- |
| **server** | Python 3.10+ (FastAPI + VibeVoice) | `server/start.sh` | 8000 | | **server** | Python 3.10+ (FastAPI + VibeVoice) | `server/start.sh` | 8000 |
| **web** | TypeScript (Next.js 15, React 19) | `pnpm --filter vibepod-web dev` | 3000 | | **web** | TypeScript (Next.js 15, React 19) | `pnpm --filter vibepod-web dev` | 3000 |
The Next.js frontend proxies all model requests through its own API routes to the FastAPI server — it never calls the Python server directly from the browser. The Next.js frontend proxies all model requests through its own API routes to the FastAPI server — it never calls the Python server directly from the browser.
@@ -51,12 +51,12 @@ pnpm build
The `--cpu` flag in `start.sh` sets `VIBEPOD_DEVICE=cpu` and uses a separate venv (`server/.venv-cpu`) so CUDA and CPU installs never conflict. `vibevoice_server.py` reads `VIBEPOD_DEVICE` at startup via `_resolve_device()` — do not remove or rename that function. The `--cpu` flag in `start.sh` sets `VIBEPOD_DEVICE=cpu` and uses a separate venv (`server/.venv-cpu`) so CUDA and CPU installs never conflict. `vibevoice_server.py` reads `VIBEPOD_DEVICE` at startup via `_resolve_device()` — do not remove or rename that function.
| Env var | Values | Set by | | Env var | Values | Set by |
|---------|--------|--------| | ------------------------ | ----------------------- | --------------------------- |
| `VIBEPOD_DEVICE` | `cpu` \| `cuda` | `server/start.sh` | | `VIBEPOD_DEVICE` | `cpu` \| `cuda` | `server/start.sh` |
| `UV_PROJECT_ENVIRONMENT` | `.venv-cpu` \| `.venv` | `server/start.sh` | | `UV_PROJECT_ENVIRONMENT` | `.venv-cpu` \| `.venv` | `server/start.sh` |
| `HF_TOKEN` | HuggingFace token | Jules secret / `.env.local` | | `HF_TOKEN` | HuggingFace token | Jules secret / `.env.local` |
| `VIBEVOICE_SERVER_URL` | `http://localhost:8000` | `.env.local` | | `VIBEVOICE_SERVER_URL` | `http://localhost:8000` | `.env.local` |
--- ---
@@ -94,7 +94,9 @@ dev.sh Concurrent launcher (forwards flags to start.sh)
## API reference ## API reference
### `GET /health` ### `GET /health`
Returns server status. Safe to poll. Returns server status. Safe to poll.
```json ```json
{ {
"status": "online", "status": "online",
@@ -103,13 +105,17 @@ Returns server status. Safe to poll.
"voices": ["carter", "davis", "emma", "frank", "grace", "mike"] "voices": ["carter", "davis", "emma", "frank", "grace", "mike"]
} }
``` ```
`status` values: `downloading` | `loading` | `online` | `error` `status` values: `downloading` | `loading` | `online` | `error`
### `POST /generate` ### `POST /generate`
Streams audio as SSE events. Streams audio as SSE events.
```json ```json
{ "text": "Hello world", "speaker": "carter", "cfg_scale": 1.5, "inference_steps": 10 } { "text": "Hello world", "speaker": "carter", "cfg_scale": 1.5, "inference_steps": 10 }
``` ```
Event types: `audio_chunk` (base64 float32 PCM) | `complete` | `error` | `cancelled` Event types: `audio_chunk` (base64 float32 PCM) | `complete` | `error` | `cancelled`
--- ---
@@ -117,12 +123,14 @@ Event types: `audio_chunk` (base64 float32 PCM) | `complete` | `error` | `cancel
## Do / Don't ## Do / Don't
**Do:** **Do:**
- Use `pnpm dev:cpu` in Jules — never plain `pnpm dev` - Use `pnpm dev:cpu` in Jules — never plain `pnpm dev`
- Run `git checkout server/uv.lock` if uv rewrites it during setup - Run `git checkout server/uv.lock` if uv rewrites it during setup
- Keep `_resolve_device()` in `vibevoice_server.py` — it's the CPU/CUDA switching logic - Keep `_resolve_device()` in `vibevoice_server.py` — it's the CPU/CUDA switching logic
- Test server changes against `GET /health` and `POST /generate` - Test server changes against `GET /health` and `POST /generate`
**Don't:** **Don't:**
- Run `uv sync` without `UV_PROJECT_ENVIRONMENT=.venv-cpu` in the Jules sandbox - Run `uv sync` without `UV_PROJECT_ENVIRONMENT=.venv-cpu` in the Jules sandbox
- Install Python packages with pip - Install Python packages with pip
- Modify `server/uv.lock` manually - Modify `server/uv.lock` manually
+5
View File
@@ -173,16 +173,21 @@ The shape language is a hybrid of structural precision and tactile softness.
## Components ## Components
### Card Containers ### Card Containers
The fundamental building block of the UI. Every distinct section (Script, Player, Controls, Logs) is housed in a card featuring the `card-bg`, a 1px `border`, and `rounded-xl` corners. The internal layout always features an uppercase teal header for immediate section identification. The fundamental building block of the UI. Every distinct section (Script, Player, Controls, Logs) is housed in a card featuring the `card-bg`, a 1px `border`, and `rounded-xl` corners. The internal layout always features an uppercase teal header for immediate section identification.
### Primary Action Buttons ### Primary Action Buttons
Used for high-leverage actions like "Generate Audio" and "Play/Pause." These buttons utilize the `gradient-primary-dim` background, bold white text, and emit a soft teal glow to draw the eye and signify their importance. Used for high-leverage actions like "Generate Audio" and "Play/Pause." These buttons utilize the `gradient-primary-dim` background, bold white text, and emit a soft teal glow to draw the eye and signify their importance.
### Range Sliders ### Range Sliders
Custom-styled input ranges replace default browser styles. The tracks are muted and slim, while the thumbs are bright teal, fully rounded, and emit a glow that intensifies on hover, providing a premium, tactile scrubbing experience. Custom-styled input ranges replace default browser styles. The tracks are muted and slim, while the thumbs are bright teal, fully rounded, and emit a glow that intensifies on hover, providing a premium, tactile scrubbing experience.
### Status Indicators & Logs ### Status Indicators & Logs
A critical component of the application. Status badges utilize a minimalist pill shape with a pulsing ring animation to indicate active server processing. The log panel explicitly uses monospace typography and color-codes messages (green for success, red for error, white for neutral) to provide a terminal-like readout of the backend systems. A critical component of the application. Status badges utilize a minimalist pill shape with a pulsing ring animation to indicate active server processing. The log panel explicitly uses monospace typography and color-codes messages (green for success, red for error, white for neutral) to provide a terminal-like readout of the backend systems.
### Gradients ### Gradients
Gradients are used purposefully to indicate progress, activity, or brand presence. The primary gradient (`135deg` from teal to violet) is used for branding (the logo icon and text) and primary buttons. Horizontal gradients (`90deg`) are used dynamically in progress bars to represent the flow of data over time (e.g., loading, downloading, and audio generation). Gradients are used purposefully to indicate progress, activity, or brand presence. The primary gradient (`135deg` from teal to violet) is used for branding (the logo icon and text) and primary buttons. Horizontal gradients (`90deg`) are used dynamically in progress bars to represent the flow of data over time (e.g., loading, downloading, and audio generation).
+18 -18
View File
@@ -14,12 +14,12 @@ The Next.js app proxies audio generation requests to the FastAPI server, keeping
## Prerequisites ## Prerequisites
| Tool | Install | | Tool | Install |
|------|---------| | ---------------------------------- | ----------------------------------- |
| [Node.js 20+](https://nodejs.org) | `winget install OpenJS.NodeJS.LTS` | | [Node.js 20+](https://nodejs.org) | `winget install OpenJS.NodeJS.LTS` |
| [pnpm](https://pnpm.io) | `npm i -g pnpm` | | [pnpm](https://pnpm.io) | `npm i -g pnpm` |
| [Python 3.10+](https://python.org) | `winget install Python.Python.3.13` | | [Python 3.10+](https://python.org) | `winget install Python.Python.3.13` |
| [uv](https://docs.astral.sh/uv/) | `winget install astral-sh.uv` | | [uv](https://docs.astral.sh/uv/) | `winget install astral-sh.uv` |
## Getting started ## Getting started
@@ -50,10 +50,10 @@ The frontend shows a loading indicator while the model downloads. Once the serve
VibePod maintains two completely separate Python virtual environments so CUDA and CPU torch installs never conflict: VibePod maintains two completely separate Python virtual environments so CUDA and CPU torch installs never conflict:
| Mode | Command | venv | torch source | | Mode | Command | venv | torch source |
|------|---------|------|--------------| | -------------- | -------------- | ------------------ | ----------------------- |
| CUDA (default) | `pnpm dev` | `server/.venv` | PyTorch CUDA 12.4 index | | CUDA (default) | `pnpm dev` | `server/.venv` | PyTorch CUDA 12.4 index |
| CPU-only | `pnpm dev:cpu` | `server/.venv-cpu` | PyPI (CPU wheel) | | CPU-only | `pnpm dev:cpu` | `server/.venv-cpu` | PyPI (CPU wheel) |
On first run, each mode creates its own venv automatically. You can switch between them freely — they are fully independent. The active device is reported by the `/health` endpoint as `"device": "cpu"` or `"device": "cuda"`. On first run, each mode creates its own venv automatically. You can switch between them freely — they are fully independent. The active device is reported by the `/health` endpoint as `"device": "cpu"` or `"device": "cuda"`.
@@ -74,11 +74,11 @@ pnpm build # Production build of the frontend
Copy `.env.example` to `.env.local` and set: Copy `.env.example` to `.env.local` and set:
| Variable | Default | Description | | Variable | Default | Description |
|----------|---------|-------------| | ---------------------- | ----------------------- | --------------------------------------------------------- |
| `VIBEVOICE_SERVER_URL` | `http://localhost:8000` | URL the Next.js API routes use to reach the Python server | | `VIBEVOICE_SERVER_URL` | `http://localhost:8000` | URL the Next.js API routes use to reach the Python server |
| `HF_TOKEN` | — | HuggingFace token (required if the model repo is gated) | | `HF_TOKEN` | — | HuggingFace token (required if the model repo is gated) |
| `HF_HOME` | — | Override the HuggingFace model cache directory | | `HF_HOME` | — | Override the HuggingFace model cache directory |
## Project structure ## Project structure
@@ -107,11 +107,11 @@ server/
## Generation parameters ## Generation parameters
| Parameter | Range | Default | Effect | | Parameter | Range | Default | Effect |
|-----------|-------|---------|--------| | ----------------- | --------------------------------------------------- | -------- | ---------------------------------------------- |
| `speaker` | `carter`, `davis`, `emma`, `frank`, `grace`, `mike` | `carter` | Voice preset used for the generated audio | | `speaker` | `carter`, `davis`, `emma`, `frank`, `grace`, `mike` | `carter` | Voice preset used for the generated audio |
| `cfg_scale` | 0.5 4.0 | 1.5 | Higher = more expressive guidance | | `cfg_scale` | 0.5 4.0 | 1.5 | Higher = more expressive guidance |
| `inference_steps` | 5 20 | 10 | More steps = higher quality, slower generation | | `inference_steps` | 5 20 | 10 | More steps = higher quality, slower generation |
## How it works ## How it works
+1 -1
View File
@@ -1,2 +1,2 @@
packages: packages:
- 'web' - "web"
+2 -2
View File
@@ -7,7 +7,7 @@ export async function POST(request: NextRequest) {
const pythonServerUrl = process.env.VIBEVOICE_SERVER_URL ?? "http://localhost:8000"; const pythonServerUrl = process.env.VIBEVOICE_SERVER_URL ?? "http://localhost:8000";
try { try {
const body = await request.json() as { const body = (await request.json()) as {
text: string; text: string;
speaker?: string; speaker?: string;
cfg_scale?: number; cfg_scale?: number;
@@ -41,7 +41,7 @@ export async function POST(request: NextRequest) {
headers: { headers: {
"Content-Type": "text/event-stream", "Content-Type": "text/event-stream",
"Cache-Control": "no-cache, no-transform", "Cache-Control": "no-cache, no-transform",
"Connection": "keep-alive", Connection: "keep-alive",
"X-Content-Type-Options": "nosniff", "X-Content-Type-Options": "nosniff",
"X-Accel-Buffering": "no", "X-Accel-Buffering": "no",
}, },
+1 -2
View File
@@ -4,8 +4,7 @@ const OFFLINE_RESPONSE = { status: "offline" };
const COMMON_OPTIONS = { headers: { "Cache-Control": "no-store" } }; const COMMON_OPTIONS = { headers: { "Cache-Control": "no-store" } };
export async function GET() { export async function GET() {
const pythonServerUrl = const pythonServerUrl = process.env.VIBEVOICE_SERVER_URL ?? "http://localhost:8000";
process.env.VIBEVOICE_SERVER_URL ?? "http://localhost:8000";
try { try {
const res = await fetch(`${pythonServerUrl}/health`, { const res = await fetch(`${pythonServerUrl}/health`, {
+4 -2
View File
@@ -12,8 +12,10 @@
--muted: #64748b; --muted: #64748b;
--success: #22c55e; --success: #22c55e;
--error: #ef4444; --error: #ef4444;
--font-sans: ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif; --font-sans:
--font-mono: ui-monospace, SFMono-Regular, "SF Mono", Menlo, Consolas, "Liberation Mono", monospace; ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
--font-mono:
ui-monospace, SFMono-Regular, "SF Mono", Menlo, Consolas, "Liberation Mono", monospace;
} }
@theme inline { @theme inline {
+58 -26
View File
@@ -69,19 +69,39 @@ type AppAction =
function reducer(state: AppState, action: AppAction): AppState { function reducer(state: AppState, action: AppAction): AppState {
switch (action.type) { switch (action.type) {
case "SET_SCRIPT": return { ...state, script: action.payload }; case "SET_SCRIPT":
case "SET_SPEAKER": return { ...state, speaker: action.payload }; return { ...state, script: action.payload };
case "SET_CFG_SCALE": return { ...state, cfgScale: action.payload }; case "SET_SPEAKER":
case "SET_INFERENCE_STEPS": return { ...state, inferenceSteps: action.payload }; return { ...state, speaker: action.payload };
case "SET_PREBUFFER_SECS": return { ...state, prebufferSecs: action.payload }; case "SET_CFG_SCALE":
case "SET_REBUFFER_THRESHOLD": return { ...state, rebufferThresholdSecs: action.payload }; return { ...state, cfgScale: action.payload };
case "SET_RESUME_THRESHOLD": return { ...state, resumeThresholdSecs: action.payload }; case "SET_INFERENCE_STEPS":
return { ...state, inferenceSteps: action.payload };
case "SET_PREBUFFER_SECS":
return { ...state, prebufferSecs: action.payload };
case "SET_REBUFFER_THRESHOLD":
return { ...state, rebufferThresholdSecs: action.payload };
case "SET_RESUME_THRESHOLD":
return { ...state, resumeThresholdSecs: action.payload };
case "START_GENERATION": case "START_GENERATION":
return { ...state, isGenerating: true, audioUrl: null, logs: [], genElapsed: 0, genPct: null }; return {
...state,
isGenerating: true,
audioUrl: null,
logs: [],
genElapsed: 0,
genPct: null,
};
case "GEN_PROGRESS": case "GEN_PROGRESS":
return { ...state, genElapsed: action.elapsed, genPct: action.pct }; return { ...state, genElapsed: action.elapsed, genPct: action.pct };
case "GENERATION_SUCCESS": case "GENERATION_SUCCESS":
return { ...state, isGenerating: false, genElapsed: 0, genPct: null, audioUrl: action.payload }; return {
...state,
isGenerating: false,
genElapsed: 0,
genPct: null,
audioUrl: action.payload,
};
case "GENERATION_CANCELLED": case "GENERATION_CANCELLED":
case "GENERATION_ERROR": case "GENERATION_ERROR":
return { ...state, isGenerating: false, genElapsed: 0, genPct: null }; return { ...state, isGenerating: false, genElapsed: 0, genPct: null };
@@ -89,21 +109,27 @@ function reducer(state: AppState, action: AppAction): AppState {
return { ...state, logs: [...state.logs, action.payload] }; return { ...state, logs: [...state.logs, action.payload] };
case "SET_SERVER_STATUS": { case "SET_SERVER_STATUS": {
const isNewConfig = !state.serverConfig && action.payload.config; const isNewConfig = !state.serverConfig && action.payload.config;
const deviceChanged = !!(state.serverConfig && action.payload.config && state.serverConfig.device !== action.payload.config.device); const deviceChanged = !!(
state.serverConfig &&
action.payload.config &&
state.serverConfig.device !== action.payload.config.device
);
const nextSteps = (isNewConfig || deviceChanged) const nextSteps =
isNewConfig || deviceChanged
? action.payload.config!.default_inference_steps ? action.payload.config!.default_inference_steps
: state.inferenceSteps; : state.inferenceSteps;
const nextPrebuffer = (isNewConfig || deviceChanged) const nextPrebuffer =
? action.payload.config!.prebuffer_secs isNewConfig || deviceChanged ? action.payload.config!.prebuffer_secs : state.prebufferSecs;
: state.prebufferSecs;
const nextRebuffer = (isNewConfig || deviceChanged) const nextRebuffer =
isNewConfig || deviceChanged
? action.payload.config!.rebuffer_threshold_secs ? action.payload.config!.rebuffer_threshold_secs
: state.rebufferThresholdSecs; : state.rebufferThresholdSecs;
const nextResume = (isNewConfig || deviceChanged) const nextResume =
isNewConfig || deviceChanged
? action.payload.config!.resume_threshold_secs ? action.payload.config!.resume_threshold_secs
: state.resumeThresholdSecs; : state.resumeThresholdSecs;
@@ -121,7 +147,8 @@ function reducer(state: AppState, action: AppAction): AppState {
resumeThresholdSecs: nextResume, resumeThresholdSecs: nextResume,
}; };
} }
default: return state; default:
return state;
} }
} }
@@ -213,7 +240,10 @@ export default function HomePage() {
} }
poll(); poll();
return () => { cancelled = true; clearTimeout(timeoutId); }; return () => {
cancelled = true;
clearTimeout(timeoutId);
};
}, []); }, []);
const handleGenerate = useCallback(async () => { const handleGenerate = useCallback(async () => {
@@ -241,7 +271,6 @@ export default function HomePage() {
<Header /> <Header />
<main className="flex-1 container mx-auto px-4 py-6 max-w-6xl"> <main className="flex-1 container mx-auto px-4 py-6 max-w-6xl">
<div className="grid grid-cols-1 lg:grid-cols-3 gap-6"> <div className="grid grid-cols-1 lg:grid-cols-3 gap-6">
{/* Left: script + audio player */} {/* Left: script + audio player */}
<div className="lg:col-span-2 flex flex-col gap-6"> <div className="lg:col-span-2 flex flex-col gap-6">
<TextInputPanel <TextInputPanel
@@ -261,12 +290,16 @@ export default function HomePage() {
onCfgScaleChange={(v) => dispatch({ type: "SET_CFG_SCALE", payload: v })} onCfgScaleChange={(v) => dispatch({ type: "SET_CFG_SCALE", payload: v })}
inferenceSteps={state.inferenceSteps} inferenceSteps={state.inferenceSteps}
onInferenceStepsChange={(v) => dispatch({ type: "SET_INFERENCE_STEPS", payload: v })} onInferenceStepsChange={(v) => dispatch({ type: "SET_INFERENCE_STEPS", payload: v })}
prebufferSecs={state.prebufferSecs} prebufferSecs={state.prebufferSecs}
onPrebufferSecsChange={(v) => dispatch({ type: "SET_PREBUFFER_SECS", payload: v })} onPrebufferSecsChange={(v) => dispatch({ type: "SET_PREBUFFER_SECS", payload: v })}
rebufferThresholdSecs={state.rebufferThresholdSecs} rebufferThresholdSecs={state.rebufferThresholdSecs}
onRebufferThresholdChange={(v) => dispatch({ type: "SET_REBUFFER_THRESHOLD", payload: v })} onRebufferThresholdChange={(v) =>
resumeThresholdSecs={state.resumeThresholdSecs} dispatch({ type: "SET_REBUFFER_THRESHOLD", payload: v })
onResumeThresholdChange={(v) => dispatch({ type: "SET_RESUME_THRESHOLD", payload: v })} }
resumeThresholdSecs={state.resumeThresholdSecs}
onResumeThresholdChange={(v) =>
dispatch({ type: "SET_RESUME_THRESHOLD", payload: v })
}
onGenerate={handleGenerate} onGenerate={handleGenerate}
onStop={stop} onStop={stop}
onPauseStream={pauseStream} onPauseStream={pauseStream}
@@ -281,7 +314,6 @@ export default function HomePage() {
/> />
<StatusLog messages={state.logs} /> <StatusLog messages={state.logs} />
</div> </div>
</div> </div>
</main> </main>
</div> </div>
+8 -28
View File
@@ -14,15 +14,8 @@ function formatTime(seconds: number): string {
} }
export default function AudioPlayer({ audioUrl }: AudioPlayerProps) { export default function AudioPlayer({ audioUrl }: AudioPlayerProps) {
const { const { isPlaying, currentTime, duration, volume, toggle, seek, setVolume } =
isPlaying, useAudioPlayer(audioUrl);
currentTime,
duration,
volume,
toggle,
seek,
setVolume,
} = useAudioPlayer(audioUrl);
if (!audioUrl) return null; if (!audioUrl) return null;
@@ -56,12 +49,10 @@ export default function AudioPlayer({ audioUrl }: AudioPlayerProps) {
background: "rgba(45, 212, 191, 0.05)", background: "rgba(45, 212, 191, 0.05)",
}} }}
onMouseEnter={(e) => { onMouseEnter={(e) => {
(e.currentTarget as HTMLButtonElement).style.background = (e.currentTarget as HTMLButtonElement).style.background = "rgba(45, 212, 191, 0.15)";
"rgba(45, 212, 191, 0.15)";
}} }}
onMouseLeave={(e) => { onMouseLeave={(e) => {
(e.currentTarget as HTMLButtonElement).style.background = (e.currentTarget as HTMLButtonElement).style.background = "rgba(45, 212, 191, 0.05)";
"rgba(45, 212, 191, 0.05)";
}} }}
> >
<svg <svg
@@ -115,27 +106,18 @@ export default function AudioPlayer({ audioUrl }: AudioPlayerProps) {
onClick={toggle} onClick={toggle}
className="w-10 h-10 rounded-full flex items-center justify-center transition-transform active:scale-95 cursor-pointer" className="w-10 h-10 rounded-full flex items-center justify-center transition-transform active:scale-95 cursor-pointer"
style={{ style={{
background: background: "linear-gradient(135deg, var(--accent-teal-dim), var(--accent-violet-dim))",
"linear-gradient(135deg, var(--accent-teal-dim), var(--accent-violet-dim))",
boxShadow: "0 4px 12px rgba(45, 212, 191, 0.3)", boxShadow: "0 4px 12px rgba(45, 212, 191, 0.3)",
}} }}
aria-label={isPlaying ? "Pause" : "Play"} aria-label={isPlaying ? "Pause" : "Play"}
> >
{isPlaying ? ( {isPlaying ? (
<svg <svg className="w-4 h-4 text-white" viewBox="0 0 24 24" fill="currentColor">
className="w-4 h-4 text-white"
viewBox="0 0 24 24"
fill="currentColor"
>
<rect x="6" y="4" width="4" height="16" /> <rect x="6" y="4" width="4" height="16" />
<rect x="14" y="4" width="4" height="16" /> <rect x="14" y="4" width="4" height="16" />
</svg> </svg>
) : ( ) : (
<svg <svg className="w-4 h-4 text-white" viewBox="0 0 24 24" fill="currentColor">
className="w-4 h-4 text-white"
viewBox="0 0 24 24"
fill="currentColor"
>
<polygon points="5 3 19 12 5 21 5 3" /> <polygon points="5 3 19 12 5 21 5 3" />
</svg> </svg>
)} )}
@@ -143,9 +125,7 @@ export default function AudioPlayer({ audioUrl }: AudioPlayerProps) {
{/* Duration info */} {/* Duration info */}
<div className="flex-1 flex items-center gap-1 text-sm"> <div className="flex-1 flex items-center gap-1 text-sm">
<span style={{ color: "var(--foreground)" }}> <span style={{ color: "var(--foreground)" }}>{formatTime(currentTime)}</span>
{formatTime(currentTime)}
</span>
<span style={{ color: "var(--muted)" }}>/</span> <span style={{ color: "var(--muted)" }}>/</span>
<span style={{ color: "var(--muted)" }}>{formatTime(duration)}</span> <span style={{ color: "var(--muted)" }}>{formatTime(duration)}</span>
</div> </div>
+67 -18
View File
@@ -36,18 +36,27 @@ const STATUS_CONFIG: Record<
Exclude<ServerStatus, "online">, Exclude<ServerStatus, "online">,
{ color: string; label: (p: DownloadProgress | null) => string } { color: string; label: (p: DownloadProgress | null) => string }
> = { > = {
offline: { color: "var(--error)", label: () => "Server offline — waiting for connection..." }, offline: { color: "var(--error)", label: () => "Server offline — waiting for connection..." },
downloading: { color: "#60a5fa", label: (p) => p && p.total > 0 ? `Downloading model... (${p.done} / ${p.total} files)` : "Downloading model (~1 GB)..." }, downloading: {
loading: { color: "#fbbf24", label: () => "Loading model into memory..." }, color: "#60a5fa",
error: { color: "var(--error)", label: () => "Server error — check the terminal for details." }, label: (p) =>
p && p.total > 0
? `Downloading model... (${p.done} / ${p.total} files)`
: "Downloading model (~1 GB)...",
},
loading: { color: "#fbbf24", label: () => "Loading model into memory..." },
error: { color: "var(--error)", label: () => "Server error — check the terminal for details." },
}; };
function SpinnerIcon() { function SpinnerIcon() {
return ( return (
<svg className="animate-spin w-4 h-4" viewBox="0 0 24 24" fill="none"> <svg className="animate-spin w-4 h-4" viewBox="0 0 24 24" fill="none">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" /> <circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4" />
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z" /> <path
className="opacity-75"
fill="currentColor"
d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z"
/>
</svg> </svg>
); );
} }
@@ -146,7 +155,10 @@ export default function GenerationControls({
onChange={(e) => onCfgScaleChange(parseFloat(e.target.value))} onChange={(e) => onCfgScaleChange(parseFloat(e.target.value))}
className="w-full" className="w-full"
/> />
<div className="flex items-center justify-between text-xs" style={{ color: "var(--muted)" }}> <div
className="flex items-center justify-between text-xs"
style={{ color: "var(--muted)" }}
>
<span>Flat (0.5)</span> <span>Flat (0.5)</span>
<span>CFG Scale</span> <span>CFG Scale</span>
<span>Expressive (4.0)</span> <span>Expressive (4.0)</span>
@@ -176,7 +188,10 @@ export default function GenerationControls({
className="w-full" className="w-full"
style={{ "--thumb-color": "var(--accent-violet)" } as React.CSSProperties} style={{ "--thumb-color": "var(--accent-violet)" } as React.CSSProperties}
/> />
<div className="flex items-center justify-between text-xs" style={{ color: "var(--muted)" }}> <div
className="flex items-center justify-between text-xs"
style={{ color: "var(--muted)" }}
>
<span>Faster (5)</span> <span>Faster (5)</span>
<span>Diffusion Steps</span> <span>Diffusion Steps</span>
<span>Better (20)</span> <span>Better (20)</span>
@@ -207,7 +222,11 @@ export default function GenerationControls({
</div> </div>
{showAdvanced && ( {showAdvanced && (
<div id="advanced-buffering-panel" className="flex flex-col gap-4 pl-2 border-l" style={{ borderColor: "var(--border)" }}> <div
id="advanced-buffering-panel"
className="flex flex-col gap-4 pl-2 border-l"
style={{ borderColor: "var(--border)" }}
>
{/* Pre-buffer */} {/* Pre-buffer */}
<div className="flex flex-col gap-2"> <div className="flex flex-col gap-2">
<div className="flex items-center justify-between"> <div className="flex items-center justify-between">
@@ -232,7 +251,11 @@ export default function GenerationControls({
{/* Re-buffer threshold */} {/* Re-buffer threshold */}
<div className="flex flex-col gap-2"> <div className="flex flex-col gap-2">
<div className="flex items-center justify-between"> <div className="flex items-center justify-between">
<label htmlFor="rebuffer-threshold" className="text-xs font-medium" style={{ color: "var(--foreground)" }}> <label
htmlFor="rebuffer-threshold"
className="text-xs font-medium"
style={{ color: "var(--foreground)" }}
>
Re-buffer Threshold Re-buffer Threshold
</label> </label>
<span className="text-xs font-mono" style={{ color: "var(--accent-teal)" }}> <span className="text-xs font-mono" style={{ color: "var(--accent-teal)" }}>
@@ -260,7 +283,11 @@ export default function GenerationControls({
{/* Resume threshold */} {/* Resume threshold */}
<div className="flex flex-col gap-2"> <div className="flex flex-col gap-2">
<div className="flex items-center justify-between"> <div className="flex items-center justify-between">
<label htmlFor="resume-threshold" className="text-xs font-medium" style={{ color: "var(--foreground)" }}> <label
htmlFor="resume-threshold"
className="text-xs font-medium"
style={{ color: "var(--foreground)" }}
>
Resume Threshold Resume Threshold
</label> </label>
<span className="text-xs font-mono" style={{ color: "var(--accent-teal)" }}> <span className="text-xs font-mono" style={{ color: "var(--accent-teal)" }}>
@@ -302,7 +329,10 @@ export default function GenerationControls({
</div> </div>
{serverStatus === "downloading" && ( {serverStatus === "downloading" && (
<div className="w-full rounded-full h-1.5 overflow-hidden" style={{ background: "var(--border)" }}> <div
className="w-full rounded-full h-1.5 overflow-hidden"
style={{ background: "var(--border)" }}
>
<div <div
className="h-1.5 rounded-full transition-all duration-500" className="h-1.5 rounded-full transition-all duration-500"
style={{ style={{
@@ -315,10 +345,16 @@ export default function GenerationControls({
)} )}
{serverStatus === "loading" && ( {serverStatus === "loading" && (
<div className="w-full rounded-full h-1.5 overflow-hidden" style={{ background: "var(--border)" }}> <div
className="w-full rounded-full h-1.5 overflow-hidden"
style={{ background: "var(--border)" }}
>
<div <div
className="h-1.5 rounded-full animate-pulse" className="h-1.5 rounded-full animate-pulse"
style={{ width: "60%", background: "linear-gradient(90deg, #fbbf24, var(--accent-teal))" }} style={{
width: "60%",
background: "linear-gradient(90deg, #fbbf24, var(--accent-teal))",
}}
/> />
</div> </div>
)} )}
@@ -328,11 +364,17 @@ export default function GenerationControls({
{/* Generation progress bar */} {/* Generation progress bar */}
{isGenerating && ( {isGenerating && (
<div className="flex flex-col gap-1.5"> <div className="flex flex-col gap-1.5">
<div className="flex items-center justify-between text-xs" style={{ color: "var(--muted)" }}> <div
className="flex items-center justify-between text-xs"
style={{ color: "var(--muted)" }}
>
<span>{genElapsed}s elapsed</span> <span>{genElapsed}s elapsed</span>
<span>{genPct !== null ? `${genPct}%` : "starting..."}</span> <span>{genPct !== null ? `${genPct}%` : "starting..."}</span>
</div> </div>
<div className="w-full rounded-full h-1.5 overflow-hidden" style={{ background: "var(--border)" }}> <div
className="w-full rounded-full h-1.5 overflow-hidden"
style={{ background: "var(--border)" }}
>
<div <div
className="h-1.5 rounded-full transition-all duration-500" className="h-1.5 rounded-full transition-all duration-500"
style={{ style={{
@@ -355,7 +397,8 @@ export default function GenerationControls({
buttonDisabled buttonDisabled
? { background: "var(--border)", color: "var(--muted)" } ? { background: "var(--border)", color: "var(--muted)" }
: { : {
background: "linear-gradient(135deg, var(--accent-teal-dim), var(--accent-violet-dim))", background:
"linear-gradient(135deg, var(--accent-teal-dim), var(--accent-violet-dim))",
color: "#fff", color: "#fff",
boxShadow: "0 4px 15px rgba(45, 212, 191, 0.2)", boxShadow: "0 4px 15px rgba(45, 212, 191, 0.2)",
} }
@@ -373,7 +416,13 @@ export default function GenerationControls({
</> </>
) : ( ) : (
<> <>
<svg className="w-4 h-4" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2"> <svg
className="w-4 h-4"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
strokeWidth="2"
>
<polygon points="5 3 19 12 5 21 5 3" /> <polygon points="5 3 19 12 5 21 5 3" />
</svg> </svg>
Generate Audio Generate Audio
+22 -25
View File
@@ -6,8 +6,8 @@ type ServerStatus = "checking" | "downloading" | "loading" | "online" | "error"
type Device = "cpu" | "cuda" | null; type Device = "cpu" | "cuda" | null;
// Polling intervals: poll quickly until the server is online, then slow down. // Polling intervals: poll quickly until the server is online, then slow down.
const FAST_INTERVAL_MS = 3000; // while checking / loading const FAST_INTERVAL_MS = 3000; // while checking / loading
const SLOW_INTERVAL_MS = 30000; // once online const SLOW_INTERVAL_MS = 30000; // once online
export default function Header() { export default function Header() {
const [status, setStatus] = useState<ServerStatus>("checking"); const [status, setStatus] = useState<ServerStatus>("checking");
@@ -31,7 +31,10 @@ export default function Header() {
intervalRef.current = setInterval(checkHealth, SLOW_INTERVAL_MS); intervalRef.current = setInterval(checkHealth, SLOW_INTERVAL_MS);
} }
// Switch to fast polling if we detect the server went offline/loading // Switch to fast polling if we detect the server went offline/loading
if ((newStatus === "offline" || newStatus === "downloading" || newStatus === "loading") && intervalRef.current) { if (
(newStatus === "offline" || newStatus === "downloading" || newStatus === "loading") &&
intervalRef.current
) {
clearInterval(intervalRef.current); clearInterval(intervalRef.current);
intervalRef.current = setInterval(checkHealth, FAST_INTERVAL_MS); intervalRef.current = setInterval(checkHealth, FAST_INTERVAL_MS);
} }
@@ -95,23 +98,20 @@ export default function Header() {
const cfg = statusConfig[status]; const cfg = statusConfig[status];
// Device badge — only shown once the server is online and device is known // Device badge — only shown once the server is online and device is known
const deviceBadge = status === "online" && device ? ( const deviceBadge =
<span status === "online" && device ? (
className="px-2 py-0.5 rounded-full text-xs font-semibold tracking-wide uppercase" <span
style={{ className="px-2 py-0.5 rounded-full text-xs font-semibold tracking-wide uppercase"
background: device === "cuda" style={{
? "var(--accent-violet-dim)" background: device === "cuda" ? "var(--accent-violet-dim)" : "var(--accent-teal-dim)",
: "var(--accent-teal-dim)", color: device === "cuda" ? "var(--accent-violet)" : "var(--accent-teal)",
color: device === "cuda" border: `1px solid ${device === "cuda" ? "var(--accent-violet-dim)" : "var(--accent-teal-dim)"}`,
? "var(--accent-violet)" }}
: "var(--accent-teal)", title={device === "cuda" ? "Running on NVIDIA GPU" : "Running on CPU"}
border: `1px solid ${device === "cuda" ? "var(--accent-violet-dim)" : "var(--accent-teal-dim)"}`, >
}} {device.toUpperCase()}
title={device === "cuda" ? "Running on NVIDIA GPU" : "Running on CPU"} </span>
> ) : null;
{device.toUpperCase()}
</span>
) : null;
return ( return (
<header <header
@@ -136,8 +136,7 @@ export default function Header() {
<h1 <h1
className="text-xl font-bold tracking-tight" className="text-xl font-bold tracking-tight"
style={{ style={{
background: background: "linear-gradient(135deg, var(--accent-teal), var(--accent-violet))",
"linear-gradient(135deg, var(--accent-teal), var(--accent-violet))",
WebkitBackgroundClip: "text", WebkitBackgroundClip: "text",
WebkitTextFillColor: "transparent", WebkitTextFillColor: "transparent",
}} }}
@@ -167,9 +166,7 @@ export default function Header() {
className={`animate-ping absolute inline-flex h-full w-full rounded-full opacity-75 ${cfg.color}`} className={`animate-ping absolute inline-flex h-full w-full rounded-full opacity-75 ${cfg.color}`}
/> />
)} )}
<span <span className={`relative inline-flex rounded-full h-2 w-2 ${cfg.color}`} />
className={`relative inline-flex rounded-full h-2 w-2 ${cfg.color}`}
/>
</span> </span>
<span style={{ color: "var(--foreground)" }}>{cfg.label}</span> <span style={{ color: "var(--foreground)" }}>{cfg.label}</span>
</div> </div>
+1 -2
View File
@@ -47,8 +47,7 @@ export default function StatusLog({ messages }: StatusLogProps) {
) : ( ) : (
messages.map((msg, i) => { messages.map((msg, i) => {
const isError = const isError =
msg.toLowerCase().includes("error") || msg.toLowerCase().includes("error") || msg.toLowerCase().includes("failed");
msg.toLowerCase().includes("failed");
const isSuccess = const isSuccess =
msg.toLowerCase().includes("done") || msg.toLowerCase().includes("done") ||
msg.toLowerCase().includes("complete") || msg.toLowerCase().includes("complete") ||
+6 -16
View File
@@ -15,10 +15,7 @@ interface TextInputPanelProps {
onChange: (text: string) => void; onChange: (text: string) => void;
} }
export default function TextInputPanel({ export default function TextInputPanel({ value, onChange }: TextInputPanelProps) {
value,
onChange,
}: TextInputPanelProps) {
const charCount = value.length; const charCount = value.length;
const wordCount = value.trim() === "" ? 0 : value.trim().split(/\s+/).length; const wordCount = value.trim() === "" ? 0 : value.trim().split(/\s+/).length;
@@ -43,15 +40,12 @@ export default function TextInputPanel({
color: "var(--muted)", color: "var(--muted)",
}} }}
onMouseEnter={(e) => { onMouseEnter={(e) => {
(e.target as HTMLButtonElement).style.color = (e.target as HTMLButtonElement).style.color = "var(--accent-violet)";
"var(--accent-violet)"; (e.target as HTMLButtonElement).style.borderColor = "var(--accent-violet)";
(e.target as HTMLButtonElement).style.borderColor =
"var(--accent-violet)";
}} }}
onMouseLeave={(e) => { onMouseLeave={(e) => {
(e.target as HTMLButtonElement).style.color = "var(--muted)"; (e.target as HTMLButtonElement).style.color = "var(--muted)";
(e.target as HTMLButtonElement).style.borderColor = (e.target as HTMLButtonElement).style.borderColor = "var(--border)";
"var(--border)";
}} }}
> >
Load sample script Load sample script
@@ -69,8 +63,7 @@ export default function TextInputPanel({
}} }}
onMouseLeave={(e) => { onMouseLeave={(e) => {
(e.target as HTMLButtonElement).style.color = "var(--muted)"; (e.target as HTMLButtonElement).style.color = "var(--muted)";
(e.target as HTMLButtonElement).style.borderColor = (e.target as HTMLButtonElement).style.borderColor = "var(--border)";
"var(--border)";
}} }}
> >
Clear Clear
@@ -98,10 +91,7 @@ export default function TextInputPanel({
}} }}
/> />
<div <div className="flex items-center justify-between text-xs" style={{ color: "var(--muted)" }}>
className="flex items-center justify-between text-xs"
style={{ color: "var(--muted)" }}
>
<span> <span>
{wordCount} word{wordCount !== 1 ? "s" : ""} {wordCount} word{wordCount !== 1 ? "s" : ""}
</span> </span>
+6 -10
View File
@@ -55,16 +55,12 @@ export function useAudioPlayer(audioUrl: string | null) {
() => setState((prev) => ({ ...prev, isPlaying: false, currentTime: 0 })), () => setState((prev) => ({ ...prev, isPlaying: false, currentTime: 0 })),
{ signal } { signal }
); );
audio.addEventListener( audio.addEventListener("play", () => setState((prev) => ({ ...prev, isPlaying: true })), {
"play", signal,
() => setState((prev) => ({ ...prev, isPlaying: true })), });
{ signal } audio.addEventListener("pause", () => setState((prev) => ({ ...prev, isPlaying: false })), {
); signal,
audio.addEventListener( });
"pause",
() => setState((prev) => ({ ...prev, isPlaying: false })),
{ signal }
);
return () => { return () => {
audio.pause(); audio.pause();
+159 -158
View File
@@ -92,7 +92,7 @@ export function useStreamingGeneration({
let resumeThresholdSecs = rawResumeThresholdSecs; let resumeThresholdSecs = rawResumeThresholdSecs;
if (resumeThresholdSecs <= rebufferThresholdSecs) { if (resumeThresholdSecs <= rebufferThresholdSecs) {
console.warn( console.warn(
`[useStreamingGeneration] resumeThresholdSecs (${resumeThresholdSecs}) must be greater than rebufferThresholdSecs (${rebufferThresholdSecs}). Clamping resumeThresholdSecs to ${rebufferThresholdSecs + 0.5}.`, `[useStreamingGeneration] resumeThresholdSecs (${resumeThresholdSecs}) must be greater than rebufferThresholdSecs (${rebufferThresholdSecs}). Clamping resumeThresholdSecs to ${rebufferThresholdSecs + 0.5}.`
); );
resumeThresholdSecs = rebufferThresholdSecs + 0.5; resumeThresholdSecs = rebufferThresholdSecs + 0.5;
} }
@@ -162,177 +162,178 @@ export function useStreamingGeneration({
hasStartedPlaybackRef.current = true; hasStartedPlaybackRef.current = true;
}, [enqueue]); }, [enqueue]);
const handleAudioChunk = useCallback((chunk: Float32Array<ArrayBuffer>) => { const handleAudioChunk = useCallback(
const ctx = audioCtxRef.current; (chunk: Float32Array<ArrayBuffer>) => {
if (!ctx) return; const ctx = audioCtxRef.current;
if (!ctx) return;
chunksRef.current.push(chunk); chunksRef.current.push(chunk);
totalAudioSamplesRef.current += chunk.length; totalAudioSamplesRef.current += chunk.length;
if (!firstChunkSeenRef.current) { if (!firstChunkSeenRef.current) {
firstChunkSeenRef.current = true; firstChunkSeenRef.current = true;
onLog("First audio chunk received"); onLog("First audio chunk received");
}
if (!hasStartedPlaybackRef.current) {
const bufferedSecs = chunksRef.current.reduce((sum, c) => sum + c.length, 0) / SAMPLE_RATE;
if (bufferedSecs >= prebufferSecs) {
onLog(`Playback started after ${bufferedSecs.toFixed(1)}s buffered`);
flushBufferedAudio();
}
return;
}
enqueue(ctx, chunk);
if (isUserPausedRef.current) return;
const ahead = nextStartTimeRef.current - ctx.currentTime;
if (
ctx.state === "running" &&
!isAutoBufferingRef.current &&
ahead < rebufferThresholdSecs
) {
isAutoBufferingRef.current = true;
underrunCountRef.current += 1;
adaptiveResumeSecsRef.current = Math.min(
MAX_ADAPTIVE_RESUME_SECS,
Math.max(resumeThresholdSecs, prebufferSecs + underrunCountRef.current * 2),
);
ctx.suspend().catch(() => {});
onLog(
`Buffer underrun ${underrunCountRef.current}; refilling to ${adaptiveResumeSecsRef.current.toFixed(1)}s`,
);
} else if (
isAutoBufferingRef.current &&
ahead >= adaptiveResumeSecsRef.current
) {
isAutoBufferingRef.current = false;
ctx.resume().catch(() => {});
onLog(`Buffer recovered with ${ahead.toFixed(1)}s queued`);
}
}, [enqueue, flushBufferedAudio, onLog, prebufferSecs, rebufferThresholdSecs, resumeThresholdSecs]);
const generate = useCallback(async (options: GenerateOptions) => {
if (!options.text.trim()) return;
resetPlayback();
revokeCurrentUrl();
audioCtxRef.current = new AudioContext({ sampleRate: SAMPLE_RATE });
const controller = new AbortController();
abortRef.current = controller;
onStart();
onLog(`Voice: ${options.speaker}`);
onLog(`CFG ${options.cfgScale.toFixed(1)}, steps ${options.inferenceSteps}`);
const startedAt = Date.now();
const timerId = window.setInterval(() => {
onProgress((Date.now() - startedAt) / 1000, null);
}, 500);
try {
const res = await fetch("/api/generate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
text: options.text,
speaker: options.speaker,
cfg_scale: options.cfgScale,
inference_steps: options.inferenceSteps,
}),
signal: controller.signal,
});
if (!res.ok || !res.body) {
const err = await res.json().catch(() => ({})) as { error?: string };
throw new Error(err.error ?? `HTTP ${res.status}`);
} }
const reader = res.body.getReader(); if (!hasStartedPlaybackRef.current) {
const decoder = new TextDecoder(); const bufferedSecs = chunksRef.current.reduce((sum, c) => sum + c.length, 0) / SAMPLE_RATE;
let buffer = ""; if (bufferedSecs >= prebufferSecs) {
onLog(`Playback started after ${bufferedSecs.toFixed(1)}s buffered`);
flushBufferedAudio();
}
return;
}
while (true) { enqueue(ctx, chunk);
const { done, value } = await reader.read(); if (isUserPausedRef.current) return;
if (done) break;
buffer += decoder.decode(value, { stream: true }); const ahead = nextStartTimeRef.current - ctx.currentTime;
const lines = buffer.split("\n"); if (ctx.state === "running" && !isAutoBufferingRef.current && ahead < rebufferThresholdSecs) {
buffer = lines.pop() ?? ""; isAutoBufferingRef.current = true;
underrunCountRef.current += 1;
adaptiveResumeSecsRef.current = Math.min(
MAX_ADAPTIVE_RESUME_SECS,
Math.max(resumeThresholdSecs, prebufferSecs + underrunCountRef.current * 2)
);
ctx.suspend().catch(() => {});
onLog(
`Buffer underrun ${underrunCountRef.current}; refilling to ${adaptiveResumeSecsRef.current.toFixed(1)}s`
);
} else if (isAutoBufferingRef.current && ahead >= adaptiveResumeSecsRef.current) {
isAutoBufferingRef.current = false;
ctx.resume().catch(() => {});
onLog(`Buffer recovered with ${ahead.toFixed(1)}s queued`);
}
},
[enqueue, flushBufferedAudio, onLog, prebufferSecs, rebufferThresholdSecs, resumeThresholdSecs]
);
for (const line of lines) { const generate = useCallback(
if (!line.startsWith("data: ")) continue; async (options: GenerateOptions) => {
const event = JSON.parse(line.slice(6)) as { if (!options.text.trim()) return;
type: "audio_chunk" | "complete" | "error" | "cancelled";
data?: string;
elapsed?: number;
audio_secs?: number;
realtime_factor?: number | null;
chunks?: number;
first_chunk_secs?: number | null;
max_chunk_gap_secs?: number;
message?: string;
};
if (event.type === "audio_chunk" && event.data) { resetPlayback();
handleAudioChunk(decodeFloat32Chunk(event.data)); revokeCurrentUrl();
} else if (event.type === "complete") { audioCtxRef.current = new AudioContext({ sampleRate: SAMPLE_RATE });
if (!hasStartedPlaybackRef.current) {
flushBufferedAudio(); const controller = new AbortController();
} else if (isAutoBufferingRef.current) { abortRef.current = controller;
isAutoBufferingRef.current = false;
audioCtxRef.current?.resume().catch(() => {}); onStart();
} onLog(`Voice: ${options.speaker}`);
const wavBlob = buildWav(mergeFloat32Arrays(chunksRef.current), SAMPLE_RATE); onLog(`CFG ${options.cfgScale.toFixed(1)}, steps ${options.inferenceSteps}`);
const audioUrl = URL.createObjectURL(wavBlob);
audioUrlRef.current = audioUrl; const startedAt = Date.now();
const kb = (wavBlob.size / 1024).toFixed(0); const timerId = window.setInterval(() => {
const audioSecs = event.audio_secs ?? totalAudioSamplesRef.current / SAMPLE_RATE; onProgress((Date.now() - startedAt) / 1000, null);
const realtimeFactor = }, 500);
event.realtime_factor ??
(event.elapsed && event.elapsed > 0 ? audioSecs / event.elapsed : null); try {
const speedText = const res = await fetch("/api/generate", {
realtimeFactor === null ? "" : ` - ${realtimeFactor.toFixed(2)}x realtime`; method: "POST",
onLog(`Done in ${event.elapsed}s - ${audioSecs.toFixed(1)}s audio${speedText} - ${kb} KB`); headers: { "Content-Type": "application/json" },
if (event.chunks && event.first_chunk_secs !== undefined) { body: JSON.stringify({
text: options.text,
speaker: options.speaker,
cfg_scale: options.cfgScale,
inference_steps: options.inferenceSteps,
}),
signal: controller.signal,
});
if (!res.ok || !res.body) {
const err = (await res.json().catch(() => ({}))) as { error?: string };
throw new Error(err.error ?? `HTTP ${res.status}`);
}
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() ?? "";
for (const line of lines) {
if (!line.startsWith("data: ")) continue;
const event = JSON.parse(line.slice(6)) as {
type: "audio_chunk" | "complete" | "error" | "cancelled";
data?: string;
elapsed?: number;
audio_secs?: number;
realtime_factor?: number | null;
chunks?: number;
first_chunk_secs?: number | null;
max_chunk_gap_secs?: number;
message?: string;
};
if (event.type === "audio_chunk" && event.data) {
handleAudioChunk(decodeFloat32Chunk(event.data));
} else if (event.type === "complete") {
if (!hasStartedPlaybackRef.current) {
flushBufferedAudio();
} else if (isAutoBufferingRef.current) {
isAutoBufferingRef.current = false;
audioCtxRef.current?.resume().catch(() => {});
}
const wavBlob = buildWav(mergeFloat32Arrays(chunksRef.current), SAMPLE_RATE);
const audioUrl = URL.createObjectURL(wavBlob);
audioUrlRef.current = audioUrl;
const kb = (wavBlob.size / 1024).toFixed(0);
const audioSecs = event.audio_secs ?? totalAudioSamplesRef.current / SAMPLE_RATE;
const realtimeFactor =
event.realtime_factor ??
(event.elapsed && event.elapsed > 0 ? audioSecs / event.elapsed : null);
const speedText =
realtimeFactor === null ? "" : ` - ${realtimeFactor.toFixed(2)}x realtime`;
onLog( onLog(
`Stream: first chunk ${event.first_chunk_secs}s, ${event.chunks} chunks, max gap ${event.max_chunk_gap_secs}s`, `Done in ${event.elapsed}s - ${audioSecs.toFixed(1)}s audio${speedText} - ${kb} KB`
); );
if (event.chunks && event.first_chunk_secs !== undefined) {
onLog(
`Stream: first chunk ${event.first_chunk_secs}s, ${event.chunks} chunks, max gap ${event.max_chunk_gap_secs}s`
);
}
onSuccess(audioUrl);
} else if (event.type === "cancelled") {
throw new DOMException("Generation cancelled", "AbortError");
} else if (event.type === "error") {
throw new Error(event.message ?? "Generation failed");
} }
onSuccess(audioUrl);
} else if (event.type === "cancelled") {
throw new DOMException("Generation cancelled", "AbortError");
} else if (event.type === "error") {
throw new Error(event.message ?? "Generation failed");
} }
} }
} catch (err) {
if (err instanceof Error && err.name === "AbortError") {
onLog("Cancelled.");
onCancel();
} else {
const message = err instanceof Error ? err.message : "Unknown error";
onLog(`Error: ${message}`);
onError();
}
} finally {
window.clearInterval(timerId);
abortRef.current = null;
} }
} catch (err) { },
if (err instanceof Error && err.name === "AbortError") { [
onLog("Cancelled."); flushBufferedAudio,
onCancel(); handleAudioChunk,
} else { onCancel,
const message = err instanceof Error ? err.message : "Unknown error"; onError,
onLog(`Error: ${message}`); onLog,
onError(); onProgress,
} onStart,
} finally { onSuccess,
window.clearInterval(timerId); resetPlayback,
abortRef.current = null; revokeCurrentUrl,
} ]
}, [ );
flushBufferedAudio,
handleAudioChunk,
onCancel,
onError,
onLog,
onProgress,
onStart,
onSuccess,
resetPlayback,
revokeCurrentUrl,
]);
const pauseStream = useCallback(() => { const pauseStream = useCallback(() => {
isUserPausedRef.current = true; isUserPausedRef.current = true;