Echo Sampler Node Documentation

Overview

The Echo_Sampler node in the ComfyUI_EchoMimic repository is designed to generate lifelike audio-driven portrait animations. Through the use of advanced neural networks, it processes input images and audio to create video animations that mimic human expressions and movements. This technology utilizes models to ensure accurate and realistic animations in both V1 and V2 versions of EchoMimic.

Functionality

The Echo_Sampler node takes in images and audio as input and processes them using pre-trained models to produce an animated video output. With different infer modes, the node allows for flexibility in audio-driven and pose-driven animation generation. This node is available under the EchoMimic category within the ComfyUI interface.

Inputs

The Echo_Sampler node accepts the following inputs:

Image: The input image for animation (should be in IMAGE format, typically RGB).
Audio: The accompanying audio file for the animation (should be in AUDIO format).
Model: This includes the EchoMimic model pipeline used for processing.
Face Detector: Model for face detection, which varies depending on the selected version and infer mode.
Pose Directory: Specifies the directory containing pose-related data or models.
Seed: Integer value for initializing the random seed, which affects the randomness of certain processes.
Configuration Parameters:
- CFG: Configurable parameter to influence the generation process within the range of 0.0 to 10.0.
- Steps: Integer to define the number of computational steps for animation processing, ranging from 1 to 100.
- FPS (Frames Per Second): Sets the frame rate of the output video, ranging from 5 to 120.
- Sample Rate: Sets the audio sample rate, specifically between 8000 and 48000.
Facemask Ratio and Facecrop Ratio: Float values affecting face processing in the animation, ranging from 0.0 to 1.0.
Context Frames, Context Overlap: Parameters helping manage temporal coherence, with contextual frames ranging from 0 to 50 and overlap from 0 to 10.
Length, Width, and Height: Video output dimensions and length settings, including numbers for pixel dimensions.
Save Video: Boolean option to save the output video directly.

Optional Inputs

The node also supports these optional parameters:

Visualizer: Integrates a visualizer model for refining animations.
Video Images: Accepts multiple image arrays when working with video-based inputs.

Outputs

The node produces the following outputs:

Image: The generated video frames in IMAGE format.
Audio: The processed audio associated with those video frames.
Frame Rate: A float value representing the frame rate of the output video, reflecting the FPS input setting.

Usage in ComfyUI Workflows

The Echo_Sampler can be used in ComfyUI workflows to create stunning animations by simply feeding an input image and corresponding audio. By relying on the pre-trained models and adjustable parameters, users can control the animation fidelity and style. Use cases include generating animated avatars, virtual puppeteering, and video synthesis driven by audio inputs.

Special Features and Considerations

Infer Modes: The node supports multiple modes for audio-driven and pose-driven animations, including options that consider acceleration models and custom pose data.
Version Flexibility: Users can switch between V1 and V2 for varying features and optimizations, with each version having specific default behaviors and optional inputs.
Advanced Face Processing: The node integrates sophisticated face detection and pose estimation models like DWpose and Sapiens, particularly in V2, to ensure realistic movements.
Low VRAM Mode: For systems with limited GPU memory, the node can operate in a low VRAM mode, trading off speed for memory efficiency.

Overall, the Echo_Sampler node packs advanced capabilities into a user-friendly setup, allowing creators to leverage the best of AI-driven animation technology directly within ComfyUI. For more information, refer to the ComfyUI_EchoMimic GitHub repository.

ComfyUI_EchoMimic

Run ComfyUI Easily with InstaSD

Available Nodes

Echo_Sampler