vibepod

LyAhn/vibepod

Fork 0

mirror of https://github.com/JezzWTF/vibepod.git synced 2026-06-13 03:58:07 +00:00

Commit Graph

Author	SHA1	Message	Date
LyAhn	7591d15a52	perf: CPU async pipeline overlap + INT8 quantization Overlap acoustic_decode with forward_tts_lm calls using a background ThreadPoolExecutor, hiding ~72s of decode cost behind tts_lm work. Achieved 0.67x realtime (up from 0.43x, ~56% improvement). - vibevoice_generate_patch.py: patched generate() loop reordered to submit decode to thread before running connector + tts_lm×2, then resolve future. Installed as instance method via types.MethodType so uv sync reinstalling the package cannot revert the patch. - Dynamic INT8 quantization of Linear layers (VIBEPOD_QUANTIZE=1, default on CPU). prediction_head excluded — small fixed-size tensors regressed ~20% with INT8 due to pack/unpack overhead. - Auto-detect AVX512_BF16 and load model in bfloat16 if supported (VIBEPOD_CPU_BF16=auto, overridable with 0/1). - CPU thread count auto-configured from logical CPU count; OMP/MKL env vars set accordingly. Lock file preserved around uv sync --no-sources so CPU mode does not alter the shared uv.lock. - torch.compile retained as opt-in (VIBEPOD_COMPILE=1) but marked not recommended — dynamic KV cache shapes prevent kernel reuse.	2026-04-30 20:46:29 +01:00

Author

SHA1

Message

Date

LyAhn

7591d15a52

perf: CPU async pipeline overlap + INT8 quantization

Overlap acoustic_decode with forward_tts_lm calls using a background
ThreadPoolExecutor, hiding ~72s of decode cost behind tts_lm work.
Achieved 0.67x realtime (up from 0.43x, ~56% improvement).

- vibevoice_generate_patch.py: patched generate() loop reordered to
  submit decode to thread before running connector + tts_lm×2, then
  resolve future. Installed as instance method via types.MethodType so
  uv sync reinstalling the package cannot revert the patch.
- Dynamic INT8 quantization of Linear layers (VIBEPOD_QUANTIZE=1,
  default on CPU). prediction_head excluded — small fixed-size tensors
  regressed ~20% with INT8 due to pack/unpack overhead.
- Auto-detect AVX512_BF16 and load model in bfloat16 if supported
  (VIBEPOD_CPU_BF16=auto, overridable with 0/1).
- CPU thread count auto-configured from logical CPU count; OMP/MKL env
  vars set accordingly. Lock file preserved around uv sync --no-sources
  so CPU mode does not alter the shared uv.lock.
- torch.compile retained as opt-in (VIBEPOD_COMPILE=1) but marked not
  recommended — dynamic KV cache shapes prevent kernel reuse.

2026-04-30 20:46:29 +01:00

1 Commits