Run ComfyUI Easily with InstaSD

Skip the complex setup. InstaSD helps creative professionals build workflows and deploy them to the world:

  • One-click deployment
  • Any model, any node
  • Powerful GPUs for rapid iteration
Get Started

Gemini_15P_API_S_Advance_Zho

Gemini_15P_API_S_Advance_Zho Node Documentation

Overview

The Gemini_15P_API_S_Advance_Zho node is part of the ComfyUI custom nodes provided by the ComfyUI-Gemini project. This node utilizes the Google Gemini 1.5 Pro model to generate content based on given prompts. It supports the use of system instructions, which offer an advanced way to guide the model in generating context-aware and specific outputs. This allows users to tailor their content generation process more precisely according to their specific needs.

Functional Description

The Gemini_15P_API_S_Advance_Zho node is designed to generate text based on the input prompt and system instructions, leveraging the powerful Gemini 1.5 Pro model. The node can handle both textual prompts and images, making it suitable for various multi-modal content generation tasks. It is particularly useful for creating prompts for systems like Stable Diffusion, assisting users in producing more refined and specific textual outputs.

Inputs

  1. Prompt

    • Type: String
    • Description: The main input text that sets the context or content direction for the Gemini model. It can be a question or a statement where detailed elaboration is expected from the model.
    • Properties: Multiline
  2. System Instruction

    • Type: String
    • Description: Special instructions to guide the model in generating content. This allows the user to set conditions or constraints on the output, such as style, length, or tone.
    • Properties: Multiline
    • Default: "You are creating a prompt for Stable Diffusion to generate an image. First step: describe this image, then put description into text. Second step: generate a text prompt for %s based on first step. Only respond with the prompt itself, but embellish it as needed but keep it under 80 tokens."
  3. Model Name

    • Type: String (Selection)
    • Options: ["gemini-1.5-pro-latest"]
    • Description: Specifies the model version to be used for generation. The gemini-1.5-pro-latest is the only option available for this node, emphasizing its reliance on the latest advancements in the Gemini model series.
  4. Stream

    • Type: Boolean
    • Description: When enabled, the model processes and returns the output in a streaming fashion, potentially allowing for real-time applications or quicker feedback.
  5. Image (Optional)

    • Type: Image
    • Description: Useful for multi-modal interaction. If provided, the image will be considered by the model alongside the prompt to generate context-rich content.
    • Use Case: Incorporate images to enhance prompt understanding or to base part of the output content on visual elements.

Outputs

  • Text
    • Type: String
    • Description: The generated text content, adhering to the instructions provided in the System Instruction. This output is formatted according to the user’s specifications when inputting their prompt and system instructions.

Usage in ComfyUI Workflows

The Gemini_15P_API_S_Advance_Zho node can be integrated into ComfyUI workflows where advanced and specific text generation is required. Examples include:

  • Image Prompt Generation: Converting visual inputs into detailed, descriptive prompts for use in AI art generation systems such as Stable Diffusion.
  • Contextual Dialogues: Crafting dialogue or narrative content in an interactive UI setting, where system instructions adjust content tone or style for applications like chatbots or virtual environments.
  • Content Customization: Tailoring AI-generated scripts, stories, or descriptions for specific aesthetic or qualitative goals defined by the user.

Special Features or Considerations

  • System Instructions: A powerful tool for defining content generation criteria, making the node highly customizable to specific content needs.
  • Multi-Modal Capability: Ability to handle both text and image inputs, enhancing its utility for applications requiring visual context comprehension.
  • Advanced Model Use: Utilizes the comprehensive capabilities of the latest Gemini 1.5 Pro model, ensuring high-quality text outputs.
  • API Key Requirement: Users must configure their environment with a valid Gemini API key to ensure the node functions properly.
  • Streaming Option: Provides flexibility in how content is retrieved and interacted with, especially useful for real-time applications.

For more information or to access the ComfyUI-Gemini repository, you can visit the ComfyUI-Gemini GitHub page.