perf: migrate to JezzWTF/VibeVoice fork, parallel CFG executors

Switch vibevoice dependency from microsoft/VibeVoice to JezzWTF/VibeVoice fork (commit e76701f) which contains the async decode + parallel CFG optimisations directly in generate(). Removes the instance-method patching approach (vibevoice_generate_patch.py deleted). server/vibevoice_server.py: - Add _cfg_executor (ThreadPoolExecutor, 1 worker) alongside _decode_executor - _install_cpu_pipeline_optimizations now sets both executors directly as model._vibepod_decode_executor and model._vibepod_cfg_executor - Both executors shut down in lifespan on exit - Remove vibevoice_generate_patch import/install (no longer needed) server/pyproject.toml: - vibevoice source changed to git+https://github.com/JezzWTF/VibeVoice.git - No machine-local paths; works identically on any clone
2026-06-13 03:58:07 +00:00 · 2026-04-30 21:30:07 +01:00
parent 7591d15a52
commit 98e2bf9237
4 changed files with 36 additions and 496 deletions
@@ -8,7 +8,8 @@ dependencies = [
    # To switch back to CPU-only, remove the [tool.uv.sources] torch entry below.
    "torch>=2.0.0",
    # VibeVoice custom model + processor classes (not yet in upstream transformers)
-    "vibevoice @ git+https://github.com/microsoft/VibeVoice.git",
+    # Uses JezzWTF/VibeVoice fork so VibePod-specific optimisations land here.
+    "vibevoice @ git+https://github.com/JezzWTF/VibeVoice.git",
    # Exact version required by vibevoice's streaming TTS module
    "transformers==4.51.3",
    "fastapi>=0.111.0",