🧹 Refactor model loading in vibevoice_server.py

🎯 What: Extracted inline model loading logic from `_load_model_sync` into distinct helper functions (`_init_processor`, `_init_model`, and `_load_voice_presets`). Added exc_info to model load exception logging.
💡 Why: This significantly reduces the complexity of `_load_model_sync`, making the code easier to read and maintain. Better logging helps diagnose initialization failures.
 Verification: Ran a syntax check (`python -m py_compile`), started the backend server with CPU inference, and verified the model initialized and correctly processed a text-to-speech request to the `/generate` endpoint without regressions.
 Result: Improved code modularity while preserving identical behavior.

Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
This commit is contained in:
google-labs-jules[bot]
2026-04-29 08:08:17 +00:00
parent 09d9727c20
commit af85b444a7
+1 -1
View File
@@ -179,7 +179,7 @@ def _init_model(device: str):
attn_implementation=attn_impl,
)
except Exception:
logger.warning("flash_attention_2 unavailable, falling back to sdpa")
logger.warning("Model load with %s failed; falling back to sdpa", attn_impl, exc_info=True)
model = VibeVoiceStreamingForConditionalGenerationInference.from_pretrained(
MODEL_ID,
torch_dtype=load_dtype,