Run ComfyUI Easily with InstaSD

Skip the complex setup. InstaSD helps creative professionals build workflows and deploy them to the world:

  • One-click deployment
  • Any model, any node
  • Powerful GPUs for rapid iteration
Get Started

Documentation

ComfyUI Sonic

Overview

The ComfyUI Sonic repository integrates the Sonic method into ComfyUI for enhancing portrait animation using global audio perception. This repository provides custom nodes that facilitate the interaction between sound and facial animations, making it a powerful addition to the ComfyUI environment.

Sonic is based on the paper "Shifting Focus to Global Audio Perception in Portrait Animation," and the implementation within this repository allows you to leverage this technique when working within ComfyUI workflows.

Installation Guide

To incorporate ComfyUI Sonic into your workspace, follow the steps below:

1. Clone the Repository

Navigate to the ComfyUI custom node directory and execute the command to clone the repository:

cd ./ComfyUI/custom_node
git clone https://github.com/smthemex/ComfyUI_Sonic.git

2. Install the Required Packages

Navigate into the repository directory and install the necessary dependencies:

pip install -r requirements.txt

3. Download the Models

3.1 Sonic Model Checkpoints

Organize them in the following structure:

--  ComfyUI/models/sonic/
    |-- audio2bucket.pth
    |-- audio2token.pth
    |-- unet.pth
    |-- yoloface_v5m.pt
    |-- whisper-tiny/
        |--config.json
        |--model.safetensors
        |--preprocessor_config.json
    |-- RIFE/
        |--flownet.pkl

3.2 SVD Checkpoints

Place it in:

--   ComfyUI/models/checkpoints
    ├── svd_xt.safetensors  or  svd_xt_1_1.safetensors

Custom Nodes

This repository provides the following nodes, implemented in the sonic_node.py file:

Key Features and Capabilities

  • Global Audio Perception: Unlike traditional methods that focus on localized features, this repository shifts the focus toward the integration of global audio perception with visual animations, enhancing the realism of portrait animations.
  • Flexible Device Compatibility: Specific updates have been made to address device compatibility issues, such as fixing errors related to CUDA and MPS devices.
  • Dynamic Frame and Audio Management: Allows changing 'infer audio seconds' using 'duration' parameters instead of frame numbers.
  • Enhanced Output: Supports non-square image outputs, which may help in scenarios where traditional methods might face limitations due to memory constraints (OOM).
  • Adjustable Output Resolution: The minimum size of output images can be controlled, providing flexibility to manage resource consumption.

Utility in ComfyUI Workflows

The ComfyUI Sonic repository enhances ComfyUI workflows centered around face animation by providing sophisticated mechanisms to incorporate audio-driven animations. It is highly beneficial for those seeking to create realistic portrait animations driven by audio cues, offering advanced custom nodes such as SONICTLoader, SONIC_PreData, and SONICSampler that seamlessly integrate into existing ComfyUI frameworks, providing robust capabilities for creating dynamic, audio-responsive animations.