vibepod

mirror of https://github.com/JezzWTF/vibepod.git synced 2026-06-01 15:22:14 +00:00

Author	SHA1	Message	Date
LyAhn	0e17c60bdc	fix(server): guard flash-attn install behind OS platform check The win_amd64 wheel URL was attempted on any OS with matching Python/ torch/CUDA tags. On Linux CUDA setups with VIBEPOD_ENABLE_FLASH_ATTN=1 this caused `uv pip install` to fail with an incompatible wheel; with set -e the script then exited before launch instead of falling back to SDPA. - Add uname -s case statement inside the version-tag match: only set the wheel URL on MINGW/CYGWIN/MSYS* (Windows/Git Bash); all other platforms print a clear message and leave FLASH_ATTN_WHEEL_URL empty - Move the install step into a separate `if [[ -n "$FLASH_ATTN_WHEEL_URL" ]]` block so non-Windows platforms skip it entirely - Wrap `uv pip install` in an `if` so a wheel failure is non-fatal and falls through to SDPA regardless of set -e - Update header comment to reflect cross-platform behaviour	2026-05-01 19:09:33 +01:00
LyAhn	01ab3d1fc4	perf(cpu): tune streaming playback Keep CPU async decode enabled without CFG parallelism, expand CPU buffering defaults for smooth playback, prevent CPU startup from mutating the lockfile during thread autodetection, and document runtime tuning variables in the example environment file.	2026-04-30 23:20:46 +01:00
LyAhn	7591d15a52	perf: CPU async pipeline overlap + INT8 quantization Overlap acoustic_decode with forward_tts_lm calls using a background ThreadPoolExecutor, hiding ~72s of decode cost behind tts_lm work. Achieved 0.67x realtime (up from 0.43x, ~56% improvement). - vibevoice_generate_patch.py: patched generate() loop reordered to submit decode to thread before running connector + tts_lm×2, then resolve future. Installed as instance method via types.MethodType so uv sync reinstalling the package cannot revert the patch. - Dynamic INT8 quantization of Linear layers (VIBEPOD_QUANTIZE=1, default on CPU). prediction_head excluded — small fixed-size tensors regressed ~20% with INT8 due to pack/unpack overhead. - Auto-detect AVX512_BF16 and load model in bfloat16 if supported (VIBEPOD_CPU_BF16=auto, overridable with 0/1). - CPU thread count auto-configured from logical CPU count; OMP/MKL env vars set accordingly. Lock file preserved around uv sync --no-sources so CPU mode does not alter the shared uv.lock. - torch.compile retained as opt-in (VIBEPOD_COMPILE=1) but marked not recommended — dynamic KV cache shapes prevent kernel reuse.	2026-04-30 20:46:29 +01:00
LyAhn	75b84b211b	perf: improve streaming generation pipeline Add CUDA inference hot-path optimizations, safer attention fallback handling, and generation profiling hooks. Improve SSE streaming, browser buffering telemetry, and playback recovery while preserving default audio quality settings.	2026-04-30 18:54:14 +01:00
google-labs-jules[bot]	edfc6dc501	🔒 secure backend by binding uvicorn to localhost 🎯 What: Changed the uvicorn host binding from 0.0.0.0 to 127.0.0.1 in server/start.sh. ⚠️ Risk: Binding to 0.0.0.0 exposes the unauthenticated backend API to any network interface, potentially allowing unauthorized access. 🛡️ Solution: Binding to 127.0.0.1 ensures the FastAPI backend is only accessible from the local machine, relying on the Next.js frontend to securely proxy external requests. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>	2026-04-29 11:07:47 +00:00
LyAhn	5b8b3a011d	feat: add --cpu flag to start.sh — separate venv via UV_PROJECT_ENVIRONMENT	2026-04-28 14:15:11 +01:00
LyAhn	34ec879cdb	feat: add studio roadmap and streaming cleanup	2026-04-28 00:09:15 +01:00

7 Commits