ComfyUI_Sonic
Run ComfyUI Easily with InstaSD
Skip the complex setup. InstaSD helps creative professionals build workflows and deploy them to the world:
- One-click deployment
- Any model, any node
- Powerful GPUs for rapid iteration
Available Nodes
Documentation
ComfyUI Sonic
Overview
The ComfyUI Sonic repository integrates the Sonic method into ComfyUI for enhancing portrait animation using global audio perception. This repository provides custom nodes that facilitate the interaction between sound and facial animations, making it a powerful addition to the ComfyUI environment.
Sonic is based on the paper "Shifting Focus to Global Audio Perception in Portrait Animation," and the implementation within this repository allows you to leverage this technique when working within ComfyUI workflows.
Installation Guide
To incorporate ComfyUI Sonic into your workspace, follow the steps below:
1. Clone the Repository
Navigate to the ComfyUI custom node directory and execute the command to clone the repository:
cd ./ComfyUI/custom_node
git clone https://github.com/smthemex/ComfyUI_Sonic.git
2. Install the Required Packages
Navigate into the repository directory and install the necessary dependencies:
pip install -r requirements.txt
3. Download the Models
3.1 Sonic Model Checkpoints
- Download the required checkpoints from Google Drive.
- Download
openai/whisper-tinyfrom Hugging Face.
Organize them in the following structure:
-- ComfyUI/models/sonic/
|-- audio2bucket.pth
|-- audio2token.pth
|-- unet.pth
|-- yoloface_v5m.pt
|-- whisper-tiny/
|--config.json
|--model.safetensors
|--preprocessor_config.json
|-- RIFE/
|--flownet.pkl
3.2 SVD Checkpoints
- Download either
svd_xt.safetensorsorsvd_xt_1_1.safetensorsfrom Stable Video Diffusion on Hugging Face.
Place it in:
-- ComfyUI/models/checkpoints
├── svd_xt.safetensors or svd_xt_1_1.safetensors
Custom Nodes
This repository provides the following nodes, implemented in the sonic_node.py file:
- SONICTLoader: Handles data loading tasks.
- SONIC_PreData: Prepares data for processing.
- SONICSampler: Executes the sampling process for animations.
Key Features and Capabilities
- Global Audio Perception: Unlike traditional methods that focus on localized features, this repository shifts the focus toward the integration of global audio perception with visual animations, enhancing the realism of portrait animations.
- Flexible Device Compatibility: Specific updates have been made to address device compatibility issues, such as fixing errors related to CUDA and MPS devices.
- Dynamic Frame and Audio Management: Allows changing 'infer audio seconds' using 'duration' parameters instead of frame numbers.
- Enhanced Output: Supports non-square image outputs, which may help in scenarios where traditional methods might face limitations due to memory constraints (OOM).
- Adjustable Output Resolution: The minimum size of output images can be controlled, providing flexibility to manage resource consumption.
Utility in ComfyUI Workflows
The ComfyUI Sonic repository enhances ComfyUI workflows centered around face animation by providing sophisticated mechanisms to incorporate audio-driven animations. It is highly beneficial for those seeking to create realistic portrait animations driven by audio cues, offering advanced custom nodes such as SONICTLoader, SONIC_PreData, and SONICSampler that seamlessly integrate into existing ComfyUI frameworks, providing robust capabilities for creating dynamic, audio-responsive animations.