ComfyUI_EchoMimic
Run ComfyUI Easily with InstaSD
Skip the complex setup. InstaSD helps creative professionals build workflows and deploy them to the world:
- One-click deployment
- Any model, any node
- Powerful GPUs for rapid iteration
Available Nodes
Echo_Sampler
Echo Sampler Node Documentation
Overview
The Echo_Sampler node in the ComfyUI_EchoMimic repository is designed to generate lifelike audio-driven portrait animations. Through the use of advanced neural networks, it processes input images and audio to create video animations that mimic human expressions and movements. This technology utilizes models to ensure accurate and realistic animations in both V1 and V2 versions of EchoMimic.
Functionality
The Echo_Sampler node takes in images and audio as input and processes them using pre-trained models to produce an animated video output. With different infer modes, the node allows for flexibility in audio-driven and pose-driven animation generation. This node is available under the EchoMimic category within the ComfyUI interface.
Inputs
The Echo_Sampler node accepts the following inputs:
- Image: The input image for animation (should be in IMAGE format, typically RGB).
- Audio: The accompanying audio file for the animation (should be in AUDIO format).
- Model: This includes the EchoMimic model pipeline used for processing.
- Face Detector: Model for face detection, which varies depending on the selected version and infer mode.
- Pose Directory: Specifies the directory containing pose-related data or models.
- Seed: Integer value for initializing the random seed, which affects the randomness of certain processes.
- Configuration Parameters:
- CFG: Configurable parameter to influence the generation process within the range of 0.0 to 10.0.
- Steps: Integer to define the number of computational steps for animation processing, ranging from 1 to 100.
- FPS (Frames Per Second): Sets the frame rate of the output video, ranging from 5 to 120.
- Sample Rate: Sets the audio sample rate, specifically between 8000 and 48000.
- Facemask Ratio and Facecrop Ratio: Float values affecting face processing in the animation, ranging from 0.0 to 1.0.
- Context Frames, Context Overlap: Parameters helping manage temporal coherence, with contextual frames ranging from 0 to 50 and overlap from 0 to 10.
- Length, Width, and Height: Video output dimensions and length settings, including numbers for pixel dimensions.
- Save Video: Boolean option to save the output video directly.
Optional Inputs
The node also supports these optional parameters:
- Visualizer: Integrates a visualizer model for refining animations.
- Video Images: Accepts multiple image arrays when working with video-based inputs.
Outputs
The node produces the following outputs:
- Image: The generated video frames in IMAGE format.
- Audio: The processed audio associated with those video frames.
- Frame Rate: A float value representing the frame rate of the output video, reflecting the FPS input setting.
Usage in ComfyUI Workflows
The Echo_Sampler can be used in ComfyUI workflows to create stunning animations by simply feeding an input image and corresponding audio. By relying on the pre-trained models and adjustable parameters, users can control the animation fidelity and style. Use cases include generating animated avatars, virtual puppeteering, and video synthesis driven by audio inputs.
Special Features and Considerations
- Infer Modes: The node supports multiple modes for audio-driven and pose-driven animations, including options that consider acceleration models and custom pose data.
- Version Flexibility: Users can switch between V1 and V2 for varying features and optimizations, with each version having specific default behaviors and optional inputs.
- Advanced Face Processing: The node integrates sophisticated face detection and pose estimation models like DWpose and Sapiens, particularly in V2, to ensure realistic movements.
- Low VRAM Mode: For systems with limited GPU memory, the node can operate in a low VRAM mode, trading off speed for memory efficiency.
Overall, the Echo_Sampler node packs advanced capabilities into a user-friendly setup, allowing creators to leverage the best of AI-driven animation technology directly within ComfyUI. For more information, refer to the ComfyUI_EchoMimic GitHub repository.