mirror of
https://github.com/JezzWTF/vibepod.git
synced 2026-06-01 15:22:14 +00:00
a39ec536fd
* Improve CPU Inference Stability: Implement Adaptive Buffering and Chunk Accumulation This change addresses audio stuttering issues when running on CPU-only hardware by: - Implementing server-side audio chunk accumulation to reduce SSE overhead. - Introducing device-aware default configurations for buffering and inference steps. - Exposing key performance parameters as environment variables. - Enabling the frontend to adaptively adjust its buffering thresholds based on the server's configuration. Changes: - Modified `server/vibevoice_server.py` to support accumulation and provide config via `/health`. - Updated `web/hooks/useStreamingGeneration.ts` to accept configurable buffering parameters. - Updated `web/app/page.tsx` to fetch and apply server-side configuration. Verified on CPU mode in the development environment. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com> * Improve CPU Inference Stability: Implement Adaptive Buffering and Chunk Accumulation This change addresses audio stuttering issues when running on CPU-only hardware by: - Implementing server-side audio chunk accumulation to reduce SSE overhead. - Introducing device-aware default configurations for buffering and inference steps. - Exposing key performance parameters as environment variables. - Enabling the frontend to adaptively adjust its buffering thresholds based on the server's configuration. Changes: - Modified `server/vibevoice_server.py` to support accumulation and provide config via `/health`. - Updated `web/hooks/useStreamingGeneration.ts` to accept configurable buffering parameters. - Updated `web/app/page.tsx` to fetch and apply server-side configuration. Verified on CPU mode in the development environment. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com> * Improve CPU Inference Stability: Adaptive Buffering UI & Logic This change enhances the initial CPU stability fix by: - Exposing adaptive buffering settings (Pre-buffer, Re-buffer Threshold, Resume Threshold) in a new "Advanced Buffering" UI section. - Managing buffering settings in the application state to allow for manual overrides. - Implementing robust re-initialization of buffering and inference defaults whenever the server's device (CPU/CUDA) changes. - Including the active device in the server's config object for reliable client-side detection. Verified with frontend screenshots and full build. Responds to PR feedback regarding actioning the adaptive logic. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com> * Refine adaptive buffering: env helpers, threshold validation, a11y fixes - Extract _env_int/_env_float helpers in server to validate env-var config with graceful fallback instead of bare int/float casts - Fix inference_steps falsy-check (0 is valid) to use explicit None guard - Enforce rebufferThresholdSecs < resumeThresholdSecs in both the hook (with console.warn + clamp) and the GenerationControls UI (sliders block invalid states by auto-bumping or ignoring the drag) - Add type="button", aria-expanded, aria-controls, htmlFor, and input id attributes to GenerationControls for accessibility - Add .vscode/settings.json to .gitignore; sort package.json scripts --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
60 lines
1.4 KiB
Python
60 lines
1.4 KiB
Python
#!/usr/bin/env python3
|
|
"""
|
|
Download microsoft/VibeVoice-Realtime-0.5B to the local HuggingFace cache.
|
|
|
|
Run once before starting the server:
|
|
python download_model.py
|
|
|
|
Set HF_HOME or HUGGINGFACE_HUB_CACHE to control where the model is stored.
|
|
Set HF_TOKEN (or HUGGINGFACE_TOKEN) if you need an access token.
|
|
"""
|
|
|
|
import os
|
|
import sys
|
|
import time
|
|
|
|
MODEL_ID = "microsoft/VibeVoice-Realtime-0.5B"
|
|
|
|
# Patterns that are not needed for PyTorch inference
|
|
_IGNORE = [
|
|
"*.msgpack",
|
|
"flax_model*",
|
|
"tf_model*",
|
|
"rust_model*",
|
|
"*.ot",
|
|
]
|
|
|
|
|
|
def download() -> str:
|
|
try:
|
|
from huggingface_hub import snapshot_download
|
|
except ImportError:
|
|
print(
|
|
"ERROR: huggingface_hub is not installed.\n"
|
|
"Run: pip install huggingface_hub",
|
|
file=sys.stderr,
|
|
)
|
|
sys.exit(1)
|
|
|
|
token: str | None = os.environ.get("HF_TOKEN") or os.environ.get(
|
|
"HUGGINGFACE_TOKEN"
|
|
)
|
|
|
|
print(f"Checking / downloading model: {MODEL_ID}")
|
|
print("(This may take several minutes on first run — the model is ~1 GB)")
|
|
start = time.time()
|
|
|
|
cache_path = snapshot_download(
|
|
repo_id=MODEL_ID,
|
|
ignore_patterns=_IGNORE,
|
|
token=token or None,
|
|
)
|
|
|
|
elapsed = time.time() - start
|
|
print(f"Model ready in {elapsed:.1f}s -> {cache_path}")
|
|
return cache_path
|
|
|
|
|
|
if __name__ == "__main__":
|
|
download()
|