vibepod

mirror of https://github.com/JezzWTF/vibepod.git synced 2026-06-13 03:58:07 +00:00

Author	SHA1	Message	Date
LyAhn	13085166fb	feat(phase-1): persistent generation library - Save every completed generation to SQLite (generation_store.py) with WAV and waveform peaks written to data/generations/<id>/ - Deferred DB write until success — cancelled/errored generations never touch the DB and never appear in the library - Fixed cancel+regenerate IndexError: _reset_scheduler_caches() now directly zeros scheduler._step_index and running state in addition to clearing VibePod cache dicts; same explicit resets added in the fresh path of prepare_noise_scheduler as belt-and-suspenders - Added /library page with GenerationCard, WaveformPreview, waveform fetch, play/pause, download, delete, pagination, empty + error states - Added generation API routes (list, single, audio stream, waveform, delete) proxying to Python server - Added Library nav link to Header with active state - Persist script/speaker/CFG to localStorage so generate page state survives navigation - Updated build plan: Phase 0+1 ticked off, better-sqlite3 moved to Phase 2, architectural note on Python owning all persistence	2026-05-02 23:05:11 +01:00
LyAhn	8d4b3f3af7	style: apply ruff formatting and lint fixes to server	2026-05-01 19:06:13 +01:00
LyAhn	01ab3d1fc4	perf(cpu): tune streaming playback Keep CPU async decode enabled without CFG parallelism, expand CPU buffering defaults for smooth playback, prevent CPU startup from mutating the lockfile during thread autodetection, and document runtime tuning variables in the example environment file.	2026-04-30 23:20:46 +01:00
LyAhn	98e2bf9237	perf: migrate to JezzWTF/VibeVoice fork, parallel CFG executors Switch vibevoice dependency from microsoft/VibeVoice to JezzWTF/VibeVoice fork (commit e76701f) which contains the async decode + parallel CFG optimisations directly in generate(). Removes the instance-method patching approach (vibevoice_generate_patch.py deleted). server/vibevoice_server.py: - Add _cfg_executor (ThreadPoolExecutor, 1 worker) alongside _decode_executor - _install_cpu_pipeline_optimizations now sets both executors directly as model._vibepod_decode_executor and model._vibepod_cfg_executor - Both executors shut down in lifespan on exit - Remove vibevoice_generate_patch import/install (no longer needed) server/pyproject.toml: - vibevoice source changed to git+https://github.com/JezzWTF/VibeVoice.git - No machine-local paths; works identically on any clone	2026-04-30 21:30:07 +01:00
LyAhn	7591d15a52	perf: CPU async pipeline overlap + INT8 quantization Overlap acoustic_decode with forward_tts_lm calls using a background ThreadPoolExecutor, hiding ~72s of decode cost behind tts_lm work. Achieved 0.67x realtime (up from 0.43x, ~56% improvement). - vibevoice_generate_patch.py: patched generate() loop reordered to submit decode to thread before running connector + tts_lm×2, then resolve future. Installed as instance method via types.MethodType so uv sync reinstalling the package cannot revert the patch. - Dynamic INT8 quantization of Linear layers (VIBEPOD_QUANTIZE=1, default on CPU). prediction_head excluded — small fixed-size tensors regressed ~20% with INT8 due to pack/unpack overhead. - Auto-detect AVX512_BF16 and load model in bfloat16 if supported (VIBEPOD_CPU_BF16=auto, overridable with 0/1). - CPU thread count auto-configured from logical CPU count; OMP/MKL env vars set accordingly. Lock file preserved around uv sync --no-sources so CPU mode does not alter the shared uv.lock. - torch.compile retained as opt-in (VIBEPOD_COMPILE=1) but marked not recommended — dynamic KV cache shapes prevent kernel reuse.	2026-04-30 20:46:29 +01:00
LyAhn	75b84b211b	perf: improve streaming generation pipeline Add CUDA inference hot-path optimizations, safer attention fallback handling, and generation profiling hooks. Improve SSE streaming, browser buffering telemetry, and playback recovery while preserving default audio quality settings.	2026-04-30 18:54:14 +01:00
LyAhn	a39ec536fd	Improve CPU Inference Stability: Adaptive Buffering & Chunk Accumulation (#11 ) * Improve CPU Inference Stability: Implement Adaptive Buffering and Chunk Accumulation This change addresses audio stuttering issues when running on CPU-only hardware by: - Implementing server-side audio chunk accumulation to reduce SSE overhead. - Introducing device-aware default configurations for buffering and inference steps. - Exposing key performance parameters as environment variables. - Enabling the frontend to adaptively adjust its buffering thresholds based on the server's configuration. Changes: - Modified `server/vibevoice_server.py` to support accumulation and provide config via `/health`. - Updated `web/hooks/useStreamingGeneration.ts` to accept configurable buffering parameters. - Updated `web/app/page.tsx` to fetch and apply server-side configuration. Verified on CPU mode in the development environment. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com> * Improve CPU Inference Stability: Implement Adaptive Buffering and Chunk Accumulation This change addresses audio stuttering issues when running on CPU-only hardware by: - Implementing server-side audio chunk accumulation to reduce SSE overhead. - Introducing device-aware default configurations for buffering and inference steps. - Exposing key performance parameters as environment variables. - Enabling the frontend to adaptively adjust its buffering thresholds based on the server's configuration. Changes: - Modified `server/vibevoice_server.py` to support accumulation and provide config via `/health`. - Updated `web/hooks/useStreamingGeneration.ts` to accept configurable buffering parameters. - Updated `web/app/page.tsx` to fetch and apply server-side configuration. Verified on CPU mode in the development environment. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com> * Improve CPU Inference Stability: Adaptive Buffering UI & Logic This change enhances the initial CPU stability fix by: - Exposing adaptive buffering settings (Pre-buffer, Re-buffer Threshold, Resume Threshold) in a new "Advanced Buffering" UI section. - Managing buffering settings in the application state to allow for manual overrides. - Implementing robust re-initialization of buffering and inference defaults whenever the server's device (CPU/CUDA) changes. - Including the active device in the server's config object for reliable client-side detection. Verified with frontend screenshots and full build. Responds to PR feedback regarding actioning the adaptive logic. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com> * Refine adaptive buffering: env helpers, threshold validation, a11y fixes - Extract _env_int/_env_float helpers in server to validate env-var config with graceful fallback instead of bare int/float casts - Fix inference_steps falsy-check (0 is valid) to use explicit None guard - Enforce rebufferThresholdSecs < resumeThresholdSecs in both the hook (with console.warn + clamp) and the GenerationControls UI (sliders block invalid states by auto-bumping or ignoring the drag) - Add type="button", aria-expanded, aria-controls, htmlFor, and input id attributes to GenerationControls for accessibility - Add .vscode/settings.json to .gitignore; sort package.json scripts --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>	2026-04-30 16:03:35 +01:00
google-labs-jules[bot]	af85b444a7	🧹 Refactor model loading in vibevoice_server.py 🎯 What: Extracted inline model loading logic from `_load_model_sync` into distinct helper functions (`_init_processor`, `_init_model`, and `_load_voice_presets`). Added exc_info to model load exception logging. 💡 Why: This significantly reduces the complexity of `_load_model_sync`, making the code easier to read and maintain. Better logging helps diagnose initialization failures. ✅ Verification: Ran a syntax check (`python -m py_compile`), started the backend server with CPU inference, and verified the model initialized and correctly processed a text-to-speech request to the `/generate` endpoint without regressions. ✨ Result: Improved code modularity while preserving identical behavior. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-29 08:08:17 +00:00
google-labs-jules[bot]	09d9727c20	🧹 Refactor model loading in vibevoice_server.py 🎯 What: Extracted inline model loading logic from `_load_model_sync` into distinct helper functions (`_init_processor`, `_init_model`, and `_load_voice_presets`). 💡 Why: This significantly reduces the complexity of `_load_model_sync`, making the code easier to read and maintain. ✅ Verification: Ran a syntax check (`python -m py_compile`), started the backend server with CPU inference, and verified the model initialized and correctly processed a text-to-speech request to the `/generate` endpoint without regressions. ✨ Result: Improved code modularity while preserving identical behavior. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-28 16:35:26 +00:00
LyAhn	fa0c5ec916	Merge pull request #3 from JezzWTF/fix-unhandled-exception-leakage-12139097266042119477 🔒 [security fix] Unhandled Exception Details Exposed to Users	2026-04-28 15:38:06 +01:00
google-labs-jules[bot]	adebfceeb0	🔒 security: fix unhandled exception details exposure Replace detailed exception strings with generic error messages in the health and generate endpoints to prevent information leakage. Internal logs still contain full exception details for debugging. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-28 14:36:06 +00:00
LyAhn	c8110ccdde	feat: honour VIBEPOD_DEVICE env var for CPU/CUDA device selection	2026-04-28 14:22:38 +01:00
LyAhn	34ec879cdb	feat: add studio roadmap and streaming cleanup	2026-04-28 00:09:15 +01:00

13 Commits