🧹 Refactor model loading in vibevoice_server.py

🎯 What: Extracted inline model loading logic from `_load_model_sync` into distinct helper functions (`_init_processor`, `_init_model`, and `_load_voice_presets`). Added exc_info to model load exception logging. 💡 Why: This significantly reduces the complexity of `_load_model_sync`, making the code easier to read and maintain. Better logging helps diagnose initialization failures. ✅ Verification: Ran a syntax check (`python -m py_compile`), started the backend server with CPU inference, and verified the model initialized and correctly processed a text-to-speech request to the `/generate` endpoint without regressions. ✨ Result: Improved code modularity while preserving identical behavior. Co-authored-by: LyAhn <27559362+LyAhn@users.noreply.github.com>
2026-07-31 13:07:06 +00:00 · 2026-04-29 08:08:17 +00:00
parent 09d9727c20
commit af85b444a7
1 changed files with 1 additions and 1 deletions
@@ -179,7 +179,7 @@ def _init_model(device: str):
            attn_implementation=attn_impl,
        )
    except Exception:
-        logger.warning("flash_attention_2 unavailable, falling back to sdpa")
+        logger.warning("Model load with %s failed; falling back to sdpa", attn_impl, exc_info=True)
        model = VibeVoiceStreamingForConditionalGenerationInference.from_pretrained(
            MODEL_ID,
            torch_dtype=load_dtype,