perf: migrate to JezzWTF/VibeVoice fork, parallel CFG executors

Switch vibevoice dependency from microsoft/VibeVoice to JezzWTF/VibeVoice
fork (commit e76701f) which contains the async decode + parallel CFG
optimisations directly in generate(). Removes the instance-method
patching approach (vibevoice_generate_patch.py deleted).

server/vibevoice_server.py:
- Add _cfg_executor (ThreadPoolExecutor, 1 worker) alongside _decode_executor
- _install_cpu_pipeline_optimizations now sets both executors directly as
  model._vibepod_decode_executor and model._vibepod_cfg_executor
- Both executors shut down in lifespan on exit
- Remove vibevoice_generate_patch import/install (no longer needed)

server/pyproject.toml:
- vibevoice source changed to git+https://github.com/JezzWTF/VibeVoice.git
- No machine-local paths; works identically on any clone
This commit is contained in:
2026-04-30 21:30:07 +01:00
parent 7591d15a52
commit 98e2bf9237
4 changed files with 36 additions and 496 deletions
+2 -1
View File
@@ -8,7 +8,8 @@ dependencies = [
# To switch back to CPU-only, remove the [tool.uv.sources] torch entry below.
"torch>=2.0.0",
# VibeVoice custom model + processor classes (not yet in upstream transformers)
"vibevoice @ git+https://github.com/microsoft/VibeVoice.git",
# Uses JezzWTF/VibeVoice fork so VibePod-specific optimisations land here.
"vibevoice @ git+https://github.com/JezzWTF/VibeVoice.git",
# Exact version required by vibevoice's streaming TTS module
"transformers==4.51.3",
"fastapi>=0.111.0",