Commit Graph

37 Commits

Author SHA1 Message Date
LyAhn 737d315c1a docs: update roadmap and ignore Claude settings 2026-04-30 23:32:20 +01:00
LyAhn 01ab3d1fc4 perf(cpu): tune streaming playback
Keep CPU async decode enabled without CFG parallelism, expand CPU buffering defaults for smooth playback, prevent CPU startup from mutating the lockfile during thread autodetection, and document runtime tuning variables in the example environment file.
2026-04-30 23:20:46 +01:00
LyAhn d80d5ba46b fix: update lock to vibevoice fe832f2 (inference_mode thread fix) 2026-04-30 21:46:02 +01:00
LyAhn 98e2bf9237 perf: migrate to JezzWTF/VibeVoice fork, parallel CFG executors
Switch vibevoice dependency from microsoft/VibeVoice to JezzWTF/VibeVoice
fork (commit e76701f) which contains the async decode + parallel CFG
optimisations directly in generate(). Removes the instance-method
patching approach (vibevoice_generate_patch.py deleted).

server/vibevoice_server.py:
- Add _cfg_executor (ThreadPoolExecutor, 1 worker) alongside _decode_executor
- _install_cpu_pipeline_optimizations now sets both executors directly as
  model._vibepod_decode_executor and model._vibepod_cfg_executor
- Both executors shut down in lifespan on exit
- Remove vibevoice_generate_patch import/install (no longer needed)

server/pyproject.toml:
- vibevoice source changed to git+https://github.com/JezzWTF/VibeVoice.git
- No machine-local paths; works identically on any clone
2026-04-30 21:30:07 +01:00
LyAhn 7591d15a52 perf: CPU async pipeline overlap + INT8 quantization
Overlap acoustic_decode with forward_tts_lm calls using a background
ThreadPoolExecutor, hiding ~72s of decode cost behind tts_lm work.
Achieved 0.67x realtime (up from 0.43x, ~56% improvement).

- vibevoice_generate_patch.py: patched generate() loop reordered to
  submit decode to thread before running connector + tts_lm×2, then
  resolve future. Installed as instance method via types.MethodType so
  uv sync reinstalling the package cannot revert the patch.
- Dynamic INT8 quantization of Linear layers (VIBEPOD_QUANTIZE=1,
  default on CPU). prediction_head excluded — small fixed-size tensors
  regressed ~20% with INT8 due to pack/unpack overhead.
- Auto-detect AVX512_BF16 and load model in bfloat16 if supported
  (VIBEPOD_CPU_BF16=auto, overridable with 0/1).
- CPU thread count auto-configured from logical CPU count; OMP/MKL env
  vars set accordingly. Lock file preserved around uv sync --no-sources
  so CPU mode does not alter the shared uv.lock.
- torch.compile retained as opt-in (VIBEPOD_COMPILE=1) but marked not
  recommended — dynamic KV cache shapes prevent kernel reuse.
2026-04-30 20:46:29 +01:00
LyAhn 75b84b211b perf: improve streaming generation pipeline
Add CUDA inference hot-path optimizations, safer attention fallback handling, and generation profiling hooks. Improve SSE streaming, browser buffering telemetry, and playback recovery while preserving default audio quality settings.
2026-04-30 18:54:14 +01:00
LyAhn a39ec536fd Improve CPU Inference Stability: Adaptive Buffering & Chunk Accumulation (#11)
* Improve CPU Inference Stability: Implement Adaptive Buffering and Chunk Accumulation

This change addresses audio stuttering issues when running on CPU-only hardware by:
- Implementing server-side audio chunk accumulation to reduce SSE overhead.
- Introducing device-aware default configurations for buffering and inference steps.
- Exposing key performance parameters as environment variables.
- Enabling the frontend to adaptively adjust its buffering thresholds based on the server's configuration.

Changes:
- Modified `server/vibevoice_server.py` to support accumulation and provide config via `/health`.
- Updated `web/hooks/useStreamingGeneration.ts` to accept configurable buffering parameters.
- Updated `web/app/page.tsx` to fetch and apply server-side configuration.

Verified on CPU mode in the development environment.

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>

* Improve CPU Inference Stability: Implement Adaptive Buffering and Chunk Accumulation

This change addresses audio stuttering issues when running on CPU-only hardware by:
- Implementing server-side audio chunk accumulation to reduce SSE overhead.
- Introducing device-aware default configurations for buffering and inference steps.
- Exposing key performance parameters as environment variables.
- Enabling the frontend to adaptively adjust its buffering thresholds based on the server's configuration.

Changes:
- Modified `server/vibevoice_server.py` to support accumulation and provide config via `/health`.
- Updated `web/hooks/useStreamingGeneration.ts` to accept configurable buffering parameters.
- Updated `web/app/page.tsx` to fetch and apply server-side configuration.

Verified on CPU mode in the development environment.

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>

* Improve CPU Inference Stability: Adaptive Buffering UI & Logic

This change enhances the initial CPU stability fix by:
- Exposing adaptive buffering settings (Pre-buffer, Re-buffer Threshold, Resume Threshold) in a new "Advanced Buffering" UI section.
- Managing buffering settings in the application state to allow for manual overrides.
- Implementing robust re-initialization of buffering and inference defaults whenever the server's device (CPU/CUDA) changes.
- Including the active device in the server's config object for reliable client-side detection.

Verified with frontend screenshots and full build. Responds to PR feedback regarding actioning the adaptive logic.

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>

* Refine adaptive buffering: env helpers, threshold validation, a11y fixes

- Extract _env_int/_env_float helpers in server to validate env-var config
  with graceful fallback instead of bare int/float casts
- Fix inference_steps falsy-check (0 is valid) to use explicit None guard
- Enforce rebufferThresholdSecs < resumeThresholdSecs in both the hook
  (with console.warn + clamp) and the GenerationControls UI (sliders block
  invalid states by auto-bumping or ignoring the drag)
- Add type="button", aria-expanded, aria-controls, htmlFor, and input id
  attributes to GenerationControls for accessibility
- Add .vscode/settings.json to .gitignore; sort package.json scripts

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
2026-04-30 16:03:35 +01:00
LyAhn 87185e6289 Merge pull request #9 from JezzWTF/jules-7774489883029094316-4fc49929
🔒 secure backend by binding uvicorn to localhost
2026-04-29 13:44:24 +01:00
google-labs-jules[bot] 706b318abb 🔒 secure backend by binding uvicorn to localhost
🎯 What: Changed the uvicorn host binding from 0.0.0.0 to 127.0.0.1 in server/start.sh.
⚠️ Risk: Binding to 0.0.0.0 exposes the unauthenticated backend API to any network interface, potentially allowing unauthorized access.
🛡️ Solution: Binding to 127.0.0.1 ensures the FastAPI backend is only accessible from the local machine, relying on the Next.js frontend to securely proxy external requests.

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
2026-04-29 12:42:40 +00:00
google-labs-jules[bot] edfc6dc501 🔒 secure backend by binding uvicorn to localhost
🎯 What: Changed the uvicorn host binding from 0.0.0.0 to 127.0.0.1 in server/start.sh.
⚠️ Risk: Binding to 0.0.0.0 exposes the unauthenticated backend API to any network interface, potentially allowing unauthorized access.
🛡️ Solution: Binding to 127.0.0.1 ensures the FastAPI backend is only accessible from the local machine, relying on the Next.js frontend to securely proxy external requests.

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
2026-04-29 11:07:47 +00:00
LyAhn 84e387ec42 Merge pull request #7 from JezzWTF/refactor-audio-player-abort-controller-5486095809189155006
🧹 [Refactor] Use AbortController for event listeners in useAudioPlayer
2026-04-29 09:26:33 +01:00
google-labs-jules[bot] 153b63a90c 🧹 [Refactor] Use AbortController for event listeners in useAudioPlayer
- Replaced multiple named event handler functions with inline state setters.
- Used an AbortController to cleanly remove all event listeners with a single `controller.abort()` call in the cleanup hook.
- This improves maintainability and readability by reducing verbosity without changing functionality.
- Formatted inline callbacks across multiple lines for better readability as requested.

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
2026-04-29 08:19:17 +00:00
LyAhn e3d12819a4 Merge pull request #5 from JezzWTF/refactor-load-model-1716154043227557412
🧹 Refactor _load_model_sync to improve code readability
2026-04-29 09:10:36 +01:00
google-labs-jules[bot] af85b444a7 🧹 Refactor model loading in vibevoice_server.py
🎯 What: Extracted inline model loading logic from `_load_model_sync` into distinct helper functions (`_init_processor`, `_init_model`, and `_load_voice_presets`). Added exc_info to model load exception logging.
💡 Why: This significantly reduces the complexity of `_load_model_sync`, making the code easier to read and maintain. Better logging helps diagnose initialization failures.
 Verification: Ran a syntax check (`python -m py_compile`), started the backend server with CPU inference, and verified the model initialized and correctly processed a text-to-speech request to the `/generate` endpoint without regressions.
 Result: Improved code modularity while preserving identical behavior.

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
2026-04-29 08:08:17 +00:00
LyAhn 9c116b3219 Merge pull request #6 from JezzWTF/jules-2757652384498664157-be7c14be
docs: add DESIGN.md
2026-04-29 08:45:25 +01:00
LyAhn 68174b9d67 feat: surface VIBEPOD_DEVICE (CPU/CUDA) in the frontend header 2026-04-29 08:43:07 +01:00
LyAhn b8f59875d9 docs: add AGENTS.md and update README with CPU/CUDA setup 2026-04-29 08:33:43 +01:00
google-labs-jules[bot] 18a97e0bea 🧹 [Refactor] Use AbortController for event listeners in useAudioPlayer
- Replaced multiple named event handler functions with inline state setters.
- Used an AbortController to cleanly remove all event listeners with a single `controller.abort()` call in the cleanup hook.
- This improves maintainability and readability by reducing verbosity without changing functionality.

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
2026-04-28 19:23:44 +00:00
google-labs-jules[bot] a80220d03f docs: add DESIGN.md
Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
2026-04-28 16:48:25 +00:00
google-labs-jules[bot] 09d9727c20 🧹 Refactor model loading in vibevoice_server.py
🎯 What: Extracted inline model loading logic from `_load_model_sync` into distinct helper functions (`_init_processor`, `_init_model`, and `_load_voice_presets`).
💡 Why: This significantly reduces the complexity of `_load_model_sync`, making the code easier to read and maintain.
 Verification: Ran a syntax check (`python -m py_compile`), started the backend server with CPU inference, and verified the model initialized and correctly processed a text-to-speech request to the `/generate` endpoint without regressions.
 Result: Improved code modularity while preserving identical behavior.

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
2026-04-28 16:35:26 +00:00
LyAhn 59d3280cb5 Merge pull request #4 from JezzWTF/jules-refactor-spinner-7897005482205256093
🧹 [Code Health] Extract duplicated SVG spinner into a shared component
2026-04-28 15:56:40 +01:00
google-labs-jules[bot] 2d2ab26994 🧹 [Code Health] Extract duplicated SVG spinner into a shared component\n\n🎯 What: Extracted the duplicated <svg> spinner code in web/components/GenerationControls.tsx into a new lightweight React component SpinnerIcon.\n💡 Why: This improves maintainability and keeps the code DRY by removing the inline duplication of the SVG path and properties.\n Verification: Ran pnpm install and pnpm run build in the web directory, confirming the code compiles successfully.\n Result: The isGenerating and !serverReady branches now cleanly reference the <SpinnerIcon /> component.
Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
2026-04-28 14:53:41 +00:00
LyAhn fa0c5ec916 Merge pull request #3 from JezzWTF/fix-unhandled-exception-leakage-12139097266042119477
🔒 [security fix] Unhandled Exception Details Exposed to Users
2026-04-28 15:38:06 +01:00
google-labs-jules[bot] adebfceeb0 🔒 security: fix unhandled exception details exposure
Replace detailed exception strings with generic error messages in
the health and generate endpoints to prevent information leakage.
Internal logs still contain full exception details for debugging.

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
2026-04-28 14:36:06 +00:00
LyAhn 7b9cc8c7c2 Merge pull request #2 from JezzWTF/cleanup-offline-response-duplication-1343325800701975982
🧹 cleanup-offline-response-duplication
2026-04-28 15:19:31 +01:00
google-labs-jules[bot] bd5c667307 chore: refactor duplicated offline response in health api route
Extract the duplicated offline response payload and common headers into
constants to improve maintainability and readability.

- Define OFFLINE_RESPONSE for { status: "offline" }
- Define COMMON_OPTIONS for { headers: { "Cache-Control": "no-store" } }
- Use these constants across all response paths in the route.

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
2026-04-28 14:17:19 +00:00
LyAhn c8110ccdde feat: honour VIBEPOD_DEVICE env var for CPU/CUDA device selection 2026-04-28 14:22:38 +01:00
LyAhn 64cf431c2a feat: add dev:cpu and dev:server:cpu npm scripts 2026-04-28 14:20:17 +01:00
LyAhn 55937308b3 feat: pass --cpu flag through dev.sh to server/start.sh 2026-04-28 14:17:59 +01:00
LyAhn 8901ae10b0 chore: ignore .venv-cpu (CPU-only virtual environment) 2026-04-28 14:16:10 +01:00
LyAhn 5b8b3a011d feat: add --cpu flag to start.sh — separate venv via UV_PROJECT_ENVIRONMENT 2026-04-28 14:15:11 +01:00
LyAhn e2f52473ea Merge pull request #1 from JezzWTF/copilot/create-vibepod-tts-podcast-generator
Add VibePod — Next.js 15 TTS podcast generator GUI backed by VibeVoice 0.5B
2026-04-28 00:33:32 +01:00
LyAhn 34ec879cdb feat: add studio roadmap and streaming cleanup 2026-04-28 00:09:15 +01:00
copilot-swe-agent[bot] 11ffc7df7c Improve dev startup: model download script, loading state in health check, faster polling
Agent-Logs-Url: https://github.com/JezzWTF/vibepod/sessions/3c05c740-b0a3-497d-88f1-dfa63121424d

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
2026-04-27 16:00:53 +00:00
copilot-swe-agent[bot] 3974a4cf69 Create VibePod TTS podcast generator application
Agent-Logs-Url: https://github.com/JezzWTF/vibepod/sessions/a78fcf03-e979-4777-a428-18cc8eccc095

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
2026-04-27 15:41:46 +00:00
copilot-swe-agent[bot] ee85bece74 Initial plan 2026-04-27 15:28:49 +00:00
LyAhn c75501e14e Initial commit 2026-04-27 16:28:46 +01:00