Run ComfyUI Easily with InstaSD

Skip the complex setup. InstaSD helps creative professionals build workflows and deploy them to the world:

  • One-click deployment
  • Any model, any node
  • Powerful GPUs for rapid iteration
Get Started

Documentation

ComfyUI Gemini Repository

Introduction

The ComfyUI Gemini repository is designed to integrate Google Gemini's capabilities into the ComfyUI framework. This integration enables users to generate prompts, describe images, and engage in interactive dialogues using advanced multimodal models. The repository includes custom nodes for a seamless workflow experience, facilitating interactions with the Gemini models directly from ComfyUI.

Installation

Recommended Installation

  1. Use ComfyUI Manager for a streamlined installation process.

Manual Installation

  1. Navigate to the custom_nodes directory in your ComfyUI environment:
    cd custom_nodes
    
  2. Clone the repository:
    git clone https://github.com/ZHO-ZHO-ZHO/ComfyUI-Gemini.git
    
  3. Change into the newly created directory:
    cd custom_nodes/ComfyUI-Gemini
    
  4. Install the required dependencies:
    pip install -r requirements.txt
    
  5. Restart ComfyUI to apply the changes.

Repository Purpose

This repository enables users to leverage Google Gemini's powerful features within the ComfyUI framework. It allows for interactive and multimodal dialogues, image analysis, and generation of system instructions. With the integration of Gemini 1.5 Pro and other models, users can extend the functional capabilities of ComfyUI, enhancing both dialogue and visual processing tasks.

Available Nodes

The repository provides a variety of custom nodes tailored for different functionalities within the Gemini ecosystem. Below is the list of nodes available in this repository:

Key Features

  • Advanced Gemini 1.5 Pro Model: Supports system instructions, multimodal inputs, and large token limits (up to 1,048,576 tokens).
  • Multimodal Capability: Handles text, image, and various file types including audio and video (up to 20GB).
  • Flexible Node Options: Both implicit and explicit API Key management for enhanced security and convenience.
  • Extensive Workflow Integration: Provides nodes for seamless interaction between Gemini API and ComfyUI workflows.
  • File Upload Support: Currently allows single file uploads with plans for future support of multiple file types.

Usage in ComfyUI Workflows

The integration with ComfyUI enables users to augment their workflows with advanced dialogue and image processing capabilities powered by Google Gemini. Users can leverage the custom nodes to create sophisticated multi-turn dialogues, analyze and interpret image data, and dynamically generate system instructions. This integration enhances the overall functionality of ComfyUI, making it a powerful tool for developers and researchers working with AI-driven applications.

By leveraging these nodes, users can create comprehensive workflows that combine text, image, and video data processing in a single, cohesive pipeline.