vibepod

mirror of https://github.com/JezzWTF/vibepod.git synced 2026-07-31 13:07:06 +00:00

Author	SHA1	Message	Date
Claude	5c5d739bf1	Fix ROCm torch wheel not replacing CPU torch uv pip install without --reinstall-package silently skips the ROCm wheel when CPU torch already satisfies torch>=2.0.0, leaving a CPU installation in .venv-rocm and causing a broken import at startup. https://claude.ai/code/session_0168pSswiaoEf6LEx6UQWfBu	2026-05-04 09:33:12 +00:00
Claude	bb6da662de	Add AMD ROCm GPU support Introduces a third hardware mode alongside CUDA and CPU: ROCm (AMD GPU). AMD GPUs present as CUDA devices under PyTorch ROCm, so the existing GPU path is reused with minimal changes — the main additions are wheel management, device detection, and suppressing flash_attn (unsupported on ROCm). - server/vibevoice_server.py: extend _resolve_device() to recognise 'rocm' (auto-detected via torch.version.hip); add _torch_device() helper that maps 'rocm' → 'cuda' for all PyTorch API calls; apply GPU optimisations for both cuda and rocm in _init_model(); always use sdpa on ROCm; propagate _torch_device() to _load_voice_presets() map_location. - server/start.sh: add --rocm flag; sync .venv-rocm with uv sync --no-sources then replace torch with the ROCm 6.2 wheel via uv pip install; set VIBEPOD_DEVICE=rocm for uvicorn. - server/pyproject.toml: register pytorch-rocm62 index (explicit); add .venv-rocm to ruff excludes. - package.json: add dev:rocm and dev:server:rocm scripts. - README.md: document ROCm mode, prerequisites (RX 6000+, ROCm 6.2+, Linux), and new commands; expand CUDA vs CPU section to CUDA vs CPU vs ROCm. https://claude.ai/code/session_0168pSswiaoEf6LEx6UQWfBu	2026-05-04 01:54:57 +00:00
LyAhn	f4d759c385	Merge pull request #15 from JezzWTF/improve-docs-and-maintainer-notes-9165053560558121838 Improve codebase documentation and maintainer notes	2026-05-02 18:20:08 +01:00
google-labs-jules[bot]	e64048e500	Improve code documentation and maintainer notes - Add a top-level doc comment to useStreamingGeneration.ts and document the streaming lifecycle. - Add docstrings to helper functions in useStreamingGeneration.ts. - Add section comments to web/app/page.tsx around reducer state, server health polling, and generation handling. - Add file-level comments to API proxy routes explaining the security architecture. - Add a file map / maintainer guide comment to server/vibevoice_server.py. - Add docstrings for key internal helpers in server/vibevoice_server.py. - Document environment variables used by the server in server/vibevoice_server.py. - Add comments identifying VibePod-specific patches around VibeVoice internals. - Format server/vibevoice_server.py with black. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-05-02 16:44:38 +00:00
LyAhn	0236807928	Merge pull request #12 from JezzWTF/codex/gen-optim Improve streaming generation flow and CPU startup handling	2026-05-02 16:46:59 +01:00
LyAhn	0e17c60bdc	fix(server): guard flash-attn install behind OS platform check The win_amd64 wheel URL was attempted on any OS with matching Python/ torch/CUDA tags. On Linux CUDA setups with VIBEPOD_ENABLE_FLASH_ATTN=1 this caused `uv pip install` to fail with an incompatible wheel; with set -e the script then exited before launch instead of falling back to SDPA. - Add uname -s case statement inside the version-tag match: only set the wheel URL on MINGW/CYGWIN/MSYS* (Windows/Git Bash); all other platforms print a clear message and leave FLASH_ATTN_WHEEL_URL empty - Move the install step into a separate `if [[ -n "$FLASH_ATTN_WHEEL_URL" ]]` block so non-Windows platforms skip it entirely - Wrap `uv pip install` in an `if` so a wheel failure is non-fatal and falls through to SDPA regardless of set -e - Update header comment to reflect cross-platform behaviour	2026-05-01 19:09:33 +01:00
LyAhn	8d4b3f3af7	style: apply ruff formatting and lint fixes to server	2026-05-01 19:06:13 +01:00
LyAhn	acb615b918	chore: add ruff linter/formatter for Python server - Add ruff>=0.11.0 as dev dependency via [dependency-groups] - Configure [tool.ruff]: line-length=100, py310 target, LF line endings - Lint rules: E, F, UP, B, SIM, I (ignoring E501/B905) - Add lint:server and lint:server:fix scripts to root package.json - Update format/format:check to also run ruff for server/	2026-05-01 19:05:39 +01:00
LyAhn	a351910fd2	style: apply prettier formatting across all source files	2026-05-01 18:36:42 +01:00
LyAhn	d60c5ae498	chore: add prettier + enforce LF line endings - Add .prettierrc (double quotes, 2-space, trailing comma es5, LF, 100 cols) - Add .prettierignore (excludes node_modules, .next, server/, lock files) - Add .editorconfig (LF + per-language indent rules for all editors) - Expand .gitattributes to cover all text file types with eol=lf - Add prettier@^3.5.3 devDep at workspace root with format/format:check scripts - Add format/format:check scripts to web/package.json	2026-05-01 18:36:04 +01:00
LyAhn	737d315c1a	docs: update roadmap and ignore Claude settings	2026-04-30 23:32:20 +01:00
LyAhn	01ab3d1fc4	perf(cpu): tune streaming playback Keep CPU async decode enabled without CFG parallelism, expand CPU buffering defaults for smooth playback, prevent CPU startup from mutating the lockfile during thread autodetection, and document runtime tuning variables in the example environment file.	2026-04-30 23:20:46 +01:00
LyAhn	d80d5ba46b	fix: update lock to vibevoice fe832f2 (inference_mode thread fix)	2026-04-30 21:46:02 +01:00
LyAhn	98e2bf9237	perf: migrate to JezzWTF/VibeVoice fork, parallel CFG executors Switch vibevoice dependency from microsoft/VibeVoice to JezzWTF/VibeVoice fork (commit e76701f) which contains the async decode + parallel CFG optimisations directly in generate(). Removes the instance-method patching approach (vibevoice_generate_patch.py deleted). server/vibevoice_server.py: - Add _cfg_executor (ThreadPoolExecutor, 1 worker) alongside _decode_executor - _install_cpu_pipeline_optimizations now sets both executors directly as model._vibepod_decode_executor and model._vibepod_cfg_executor - Both executors shut down in lifespan on exit - Remove vibevoice_generate_patch import/install (no longer needed) server/pyproject.toml: - vibevoice source changed to git+https://github.com/JezzWTF/VibeVoice.git - No machine-local paths; works identically on any clone	2026-04-30 21:30:07 +01:00
LyAhn	7591d15a52	perf: CPU async pipeline overlap + INT8 quantization Overlap acoustic_decode with forward_tts_lm calls using a background ThreadPoolExecutor, hiding ~72s of decode cost behind tts_lm work. Achieved 0.67x realtime (up from 0.43x, ~56% improvement). - vibevoice_generate_patch.py: patched generate() loop reordered to submit decode to thread before running connector + tts_lm×2, then resolve future. Installed as instance method via types.MethodType so uv sync reinstalling the package cannot revert the patch. - Dynamic INT8 quantization of Linear layers (VIBEPOD_QUANTIZE=1, default on CPU). prediction_head excluded — small fixed-size tensors regressed ~20% with INT8 due to pack/unpack overhead. - Auto-detect AVX512_BF16 and load model in bfloat16 if supported (VIBEPOD_CPU_BF16=auto, overridable with 0/1). - CPU thread count auto-configured from logical CPU count; OMP/MKL env vars set accordingly. Lock file preserved around uv sync --no-sources so CPU mode does not alter the shared uv.lock. - torch.compile retained as opt-in (VIBEPOD_COMPILE=1) but marked not recommended — dynamic KV cache shapes prevent kernel reuse.	2026-04-30 20:46:29 +01:00
LyAhn	75b84b211b	perf: improve streaming generation pipeline Add CUDA inference hot-path optimizations, safer attention fallback handling, and generation profiling hooks. Improve SSE streaming, browser buffering telemetry, and playback recovery while preserving default audio quality settings.	2026-04-30 18:54:14 +01:00
LyAhn	a39ec536fd	Improve CPU Inference Stability: Adaptive Buffering & Chunk Accumulation (#11 ) * Improve CPU Inference Stability: Implement Adaptive Buffering and Chunk Accumulation This change addresses audio stuttering issues when running on CPU-only hardware by: - Implementing server-side audio chunk accumulation to reduce SSE overhead. - Introducing device-aware default configurations for buffering and inference steps. - Exposing key performance parameters as environment variables. - Enabling the frontend to adaptively adjust its buffering thresholds based on the server's configuration. Changes: - Modified `server/vibevoice_server.py` to support accumulation and provide config via `/health`. - Updated `web/hooks/useStreamingGeneration.ts` to accept configurable buffering parameters. - Updated `web/app/page.tsx` to fetch and apply server-side configuration. Verified on CPU mode in the development environment. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com> * Improve CPU Inference Stability: Implement Adaptive Buffering and Chunk Accumulation This change addresses audio stuttering issues when running on CPU-only hardware by: - Implementing server-side audio chunk accumulation to reduce SSE overhead. - Introducing device-aware default configurations for buffering and inference steps. - Exposing key performance parameters as environment variables. - Enabling the frontend to adaptively adjust its buffering thresholds based on the server's configuration. Changes: - Modified `server/vibevoice_server.py` to support accumulation and provide config via `/health`. - Updated `web/hooks/useStreamingGeneration.ts` to accept configurable buffering parameters. - Updated `web/app/page.tsx` to fetch and apply server-side configuration. Verified on CPU mode in the development environment. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com> * Improve CPU Inference Stability: Adaptive Buffering UI & Logic This change enhances the initial CPU stability fix by: - Exposing adaptive buffering settings (Pre-buffer, Re-buffer Threshold, Resume Threshold) in a new "Advanced Buffering" UI section. - Managing buffering settings in the application state to allow for manual overrides. - Implementing robust re-initialization of buffering and inference defaults whenever the server's device (CPU/CUDA) changes. - Including the active device in the server's config object for reliable client-side detection. Verified with frontend screenshots and full build. Responds to PR feedback regarding actioning the adaptive logic. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com> * Refine adaptive buffering: env helpers, threshold validation, a11y fixes - Extract _env_int/_env_float helpers in server to validate env-var config with graceful fallback instead of bare int/float casts - Fix inference_steps falsy-check (0 is valid) to use explicit None guard - Enforce rebufferThresholdSecs < resumeThresholdSecs in both the hook (with console.warn + clamp) and the GenerationControls UI (sliders block invalid states by auto-bumping or ignoring the drag) - Add type="button", aria-expanded, aria-controls, htmlFor, and input id attributes to GenerationControls for accessibility - Add .vscode/settings.json to .gitignore; sort package.json scripts --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>	2026-04-30 16:03:35 +01:00
LyAhn	87185e6289	Merge pull request #9 from JezzWTF/jules-7774489883029094316-4fc49929 🔒 secure backend by binding uvicorn to localhost	2026-04-29 13:44:24 +01:00
google-labs-jules[bot]	706b318abb	🔒 secure backend by binding uvicorn to localhost 🎯 What: Changed the uvicorn host binding from 0.0.0.0 to 127.0.0.1 in server/start.sh. ⚠️ Risk: Binding to 0.0.0.0 exposes the unauthenticated backend API to any network interface, potentially allowing unauthorized access. 🛡️ Solution: Binding to 127.0.0.1 ensures the FastAPI backend is only accessible from the local machine, relying on the Next.js frontend to securely proxy external requests. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-29 12:42:40 +00:00
google-labs-jules[bot]	edfc6dc501	🔒 secure backend by binding uvicorn to localhost 🎯 What: Changed the uvicorn host binding from 0.0.0.0 to 127.0.0.1 in server/start.sh. ⚠️ Risk: Binding to 0.0.0.0 exposes the unauthenticated backend API to any network interface, potentially allowing unauthorized access. 🛡️ Solution: Binding to 127.0.0.1 ensures the FastAPI backend is only accessible from the local machine, relying on the Next.js frontend to securely proxy external requests. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-29 11:07:47 +00:00
LyAhn	84e387ec42	Merge pull request #7 from JezzWTF/refactor-audio-player-abort-controller-5486095809189155006 🧹 [Refactor] Use AbortController for event listeners in useAudioPlayer	2026-04-29 09:26:33 +01:00
google-labs-jules[bot]	153b63a90c	🧹 [Refactor] Use AbortController for event listeners in useAudioPlayer - Replaced multiple named event handler functions with inline state setters. - Used an AbortController to cleanly remove all event listeners with a single `controller.abort()` call in the cleanup hook. - This improves maintainability and readability by reducing verbosity without changing functionality. - Formatted inline callbacks across multiple lines for better readability as requested. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-29 08:19:17 +00:00
LyAhn	e3d12819a4	Merge pull request #5 from JezzWTF/refactor-load-model-1716154043227557412 🧹 Refactor _load_model_sync to improve code readability	2026-04-29 09:10:36 +01:00
google-labs-jules[bot]	af85b444a7	🧹 Refactor model loading in vibevoice_server.py 🎯 What: Extracted inline model loading logic from `_load_model_sync` into distinct helper functions (`_init_processor`, `_init_model`, and `_load_voice_presets`). Added exc_info to model load exception logging. 💡 Why: This significantly reduces the complexity of `_load_model_sync`, making the code easier to read and maintain. Better logging helps diagnose initialization failures. ✅ Verification: Ran a syntax check (`python -m py_compile`), started the backend server with CPU inference, and verified the model initialized and correctly processed a text-to-speech request to the `/generate` endpoint without regressions. ✨ Result: Improved code modularity while preserving identical behavior. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-29 08:08:17 +00:00
LyAhn	9c116b3219	Merge pull request #6 from JezzWTF/jules-2757652384498664157-be7c14be docs: add DESIGN.md	2026-04-29 08:45:25 +01:00
LyAhn	68174b9d67	feat: surface VIBEPOD_DEVICE (CPU/CUDA) in the frontend header	2026-04-29 08:43:07 +01:00
LyAhn	b8f59875d9	docs: add AGENTS.md and update README with CPU/CUDA setup	2026-04-29 08:33:43 +01:00
google-labs-jules[bot]	18a97e0bea	🧹 [Refactor] Use AbortController for event listeners in useAudioPlayer - Replaced multiple named event handler functions with inline state setters. - Used an AbortController to cleanly remove all event listeners with a single `controller.abort()` call in the cleanup hook. - This improves maintainability and readability by reducing verbosity without changing functionality. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-28 19:23:44 +00:00
google-labs-jules[bot]	a80220d03f	docs: add DESIGN.md Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-28 16:48:25 +00:00
google-labs-jules[bot]	09d9727c20	🧹 Refactor model loading in vibevoice_server.py 🎯 What: Extracted inline model loading logic from `_load_model_sync` into distinct helper functions (`_init_processor`, `_init_model`, and `_load_voice_presets`). 💡 Why: This significantly reduces the complexity of `_load_model_sync`, making the code easier to read and maintain. ✅ Verification: Ran a syntax check (`python -m py_compile`), started the backend server with CPU inference, and verified the model initialized and correctly processed a text-to-speech request to the `/generate` endpoint without regressions. ✨ Result: Improved code modularity while preserving identical behavior. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-28 16:35:26 +00:00
LyAhn	59d3280cb5	Merge pull request #4 from JezzWTF/jules-refactor-spinner-7897005482205256093 🧹 [Code Health] Extract duplicated SVG spinner into a shared component	2026-04-28 15:56:40 +01:00
google-labs-jules[bot]	2d2ab26994	🧹 [Code Health] Extract duplicated SVG spinner into a shared component\n\n🎯 What: Extracted the duplicated `<svg>` spinner code in `web/components/GenerationControls.tsx` into a new lightweight React component `SpinnerIcon`.\n💡 Why: This improves maintainability and keeps the code DRY by removing the inline duplication of the SVG path and properties.\n✅ Verification: Ran `pnpm install` and `pnpm run build` in the `web` directory, confirming the code compiles successfully.\n✨ Result: The `isGenerating` and `!serverReady` branches now cleanly reference the `<SpinnerIcon />` component. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-28 14:53:41 +00:00
LyAhn	fa0c5ec916	Merge pull request #3 from JezzWTF/fix-unhandled-exception-leakage-12139097266042119477 🔒 [security fix] Unhandled Exception Details Exposed to Users	2026-04-28 15:38:06 +01:00
google-labs-jules[bot]	adebfceeb0	🔒 security: fix unhandled exception details exposure Replace detailed exception strings with generic error messages in the health and generate endpoints to prevent information leakage. Internal logs still contain full exception details for debugging. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-28 14:36:06 +00:00
LyAhn	7b9cc8c7c2	Merge pull request #2 from JezzWTF/cleanup-offline-response-duplication-1343325800701975982 🧹 cleanup-offline-response-duplication	2026-04-28 15:19:31 +01:00
google-labs-jules[bot]	bd5c667307	chore: refactor duplicated offline response in health api route Extract the duplicated offline response payload and common headers into constants to improve maintainability and readability. - Define OFFLINE_RESPONSE for { status: "offline" } - Define COMMON_OPTIONS for { headers: { "Cache-Control": "no-store" } } - Use these constants across all response paths in the route. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-28 14:17:19 +00:00
LyAhn	c8110ccdde	feat: honour VIBEPOD_DEVICE env var for CPU/CUDA device selection	2026-04-28 14:22:38 +01:00
LyAhn	64cf431c2a	feat: add dev:cpu and dev:server:cpu npm scripts	2026-04-28 14:20:17 +01:00
LyAhn	55937308b3	feat: pass --cpu flag through dev.sh to server/start.sh	2026-04-28 14:17:59 +01:00
LyAhn	8901ae10b0	chore: ignore .venv-cpu (CPU-only virtual environment)	2026-04-28 14:16:10 +01:00
LyAhn	5b8b3a011d	feat: add --cpu flag to start.sh — separate venv via UV_PROJECT_ENVIRONMENT	2026-04-28 14:15:11 +01:00
LyAhn	e2f52473ea	Merge pull request #1 from JezzWTF/copilot/create-vibepod-tts-podcast-generator Add VibePod — Next.js 15 TTS podcast generator GUI backed by VibeVoice 0.5B	2026-04-28 00:33:32 +01:00
LyAhn	34ec879cdb	feat: add studio roadmap and streaming cleanup	2026-04-28 00:09:15 +01:00
copilot-swe-agent[bot]	11ffc7df7c	Improve dev startup: model download script, loading state in health check, faster polling Agent-Logs-Url: https://github.com/JezzWTF/vibepod/sessions/3c05c740-b0a3-497d-88f1-dfa63121424d Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-27 16:00:53 +00:00
copilot-swe-agent[bot]	3974a4cf69	Create VibePod TTS podcast generator application Agent-Logs-Url: https://github.com/JezzWTF/vibepod/sessions/a78fcf03-e979-4777-a428-18cc8eccc095 Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-27 15:41:46 +00:00
copilot-swe-agent[bot]	ee85bece74	Initial plan	2026-04-27 15:28:49 +00:00
LyAhn	c75501e14e	Initial commit	2026-04-27 16:28:46 +01:00

47 Commits