Yoloworld ESAM Node Documentation

Overview

The Yoloworld_ESAM_Zho node is a powerful component in the ComfyUI designed for object detection and instance segmentation. It leverages the advanced capabilities of the YOLO (You Only Look Once) model and EfficientSAM to efficiently detect objects in images and apply segmentation masks. This node supports processing both images and videos, making it versatile for a variety of use cases.

Node Functionality

What This Node Does

The Yoloworld_ESAM_Zho node performs two primary functions:

Object Detection: It utilizes the YOLO-World model to identify and categorize objects within an input image based on a predefined set of categories.
Instance Segmentation: Using EfficientSAM, the node applies segmentation to the detected objects, allowing for the creation of masks that outline individual objects.

Inputs

The node requires the following inputs to function:

yolo_world_model: This refers to the YOLO-World model used for object detection. It indicates the pre-loaded model for detection tasks.
esam_model: The EfficientSAM model used for segmentation tasks.
image: The input image or video frame(s) to be processed for object detection and segmentation.
categories: A comma-separated string of categories that the model should recognize and detect. Examples include "person, bicycle, car, etc."
confidence_threshold: A float value that sets the minimum confidence level for detections to be considered valid. It helps in filtering out less certain detections.
iou_threshold: A float value defining the Intersection over Union threshold for non-max suppression, controlling the overlap of bounding boxes.
box_thickness: An integer indicating the thickness of the bounding boxes drawn around detected objects.
text_thickness: An integer that sets the thickness for text annotations.
text_scale: A float value that determines the scaling factor for text annotations.
with_confidence: A boolean flag indicating whether to display the confidence score of the detections alongside their label.
with_class_agnostic_nms: A boolean flag for choosing whether class-agnostic non-maximum suppression is applied to reduce overlapping bounding boxes.
with_segmentation: A boolean flag to enable or disable instance segmentation.
mask_combined: A boolean indicating if all detected masks should be combined into a single output mask.
mask_extracted: A boolean flag that specifies whether a single mask should be extracted based on the given index.
mask_extracted_index: An integer defining which mask index should be extracted if mask_extracted is enabled.

Outputs

The node produces the following outputs:

IMAGE: The processed image with annotated bounding boxes and labels around detected objects.
MASK: The segmentation mask(s) that highlight the segmented areas in the image. This can be a combined mask or individual masks depending on the settings.

Usage in ComfyUI Workflows

The Yoloworld_ESAM_Zho node can be used in ComfyUI workflows to automate tasks involving object detection and segmentation. It is well-suited for use cases such as:

Image Annotation: Automatically label objects within images and create training data for machine learning models.
Surveillance Systems: Detect and segment objects in video feeds for monitoring and security purposes.
Content Creation: Extract and manipulate object masks for digital content like photo editing or augmented reality applications.

Special Features and Considerations

Dual Model Support: By integrating both detection (YOLO-World) and segmentation (EfficientSAM) models, users can achieve high efficiency in processing tasks.
Confidence and IoU Thresholds: Customizable thresholds provide users the flexibility to balance between precision and recall based on their specific needs.
Mask Options: Users can choose between combining masks into a single output or extracting specific masks as needed, offering flexibility in output handling.
Device Compatibility: The node can work with both CUDA (GPU) and CPU, making it adaptable to different hardware environments.

In summary, the Yoloworld_ESAM_Zho node is a robust and flexible tool for object detection and segmentation tasks within the ComfyUI environment, catering to various image and video processing needs.

ComfyUI-YoloWorld-EfficientSAM

Run ComfyUI Easily with InstaSD

Available Nodes

Yoloworld_ESAM_Zho