ComfyUI Sonic

Overview

The ComfyUI Sonic repository integrates the Sonic method into ComfyUI for enhancing portrait animation using global audio perception. This repository provides custom nodes that facilitate the interaction between sound and facial animations, making it a powerful addition to the ComfyUI environment.

Sonic is based on the paper "Shifting Focus to Global Audio Perception in Portrait Animation," and the implementation within this repository allows you to leverage this technique when working within ComfyUI workflows.

Installation Guide

To incorporate ComfyUI Sonic into your workspace, follow the steps below:

1. Clone the Repository

Navigate to the ComfyUI custom node directory and execute the command to clone the repository:

cd ./ComfyUI/custom_node
git clone https://github.com/smthemex/ComfyUI_Sonic.git

2. Install the Required Packages

Navigate into the repository directory and install the necessary dependencies:

pip install -r requirements.txt

3. Download the Models

3.1 Sonic Model Checkpoints

Download the required checkpoints from Google Drive.
Download openai/whisper-tiny from Hugging Face.

Organize them in the following structure:

--  ComfyUI/models/sonic/
    |-- audio2bucket.pth
    |-- audio2token.pth
    |-- unet.pth
    |-- yoloface_v5m.pt
    |-- whisper-tiny/
        |--config.json
        |--model.safetensors
        |--preprocessor_config.json
    |-- RIFE/
        |--flownet.pkl

3.2 SVD Checkpoints

Download either svd_xt.safetensors or svd_xt_1_1.safetensors from Stable Video Diffusion on Hugging Face.

Place it in:

--   ComfyUI/models/checkpoints
    ├── svd_xt.safetensors  or  svd_xt_1_1.safetensors

Custom Nodes

This repository provides the following nodes, implemented in the sonic_node.py file:

SONICTLoader: Handles data loading tasks.
SONIC_PreData: Prepares data for processing.
SONICSampler: Executes the sampling process for animations.

Key Features and Capabilities

Global Audio Perception: Unlike traditional methods that focus on localized features, this repository shifts the focus toward the integration of global audio perception with visual animations, enhancing the realism of portrait animations.
Flexible Device Compatibility: Specific updates have been made to address device compatibility issues, such as fixing errors related to CUDA and MPS devices.
Dynamic Frame and Audio Management: Allows changing 'infer audio seconds' using 'duration' parameters instead of frame numbers.
Enhanced Output: Supports non-square image outputs, which may help in scenarios where traditional methods might face limitations due to memory constraints (OOM).
Adjustable Output Resolution: The minimum size of output images can be controlled, providing flexibility to manage resource consumption.

Utility in ComfyUI Workflows

The ComfyUI Sonic repository enhances ComfyUI workflows centered around face animation by providing sophisticated mechanisms to incorporate audio-driven animations. It is highly beneficial for those seeking to create realistic portrait animations driven by audio cues, offering advanced custom nodes such as SONICTLoader, SONIC_PreData, and SONICSampler that seamlessly integrate into existing ComfyUI frameworks, providing robust capabilities for creating dynamic, audio-responsive animations.

ComfyUI_Sonic

Run ComfyUI Easily with InstaSD

Available Nodes

Documentation