Segment Anything (Fast)

Feature Description

This operator utilizes the FastSAM (Fast Segment Anything Model) model to perform rapid instance segmentation on input images based on user-provided prompts (such as bounding boxes, points, or text descriptions). It can identify and segment object regions in the image corresponding to the prompts.

Use Cases

Interactive Segmentation: When needing to quickly segment specific objects of interest in an image, prompts like bounding boxes, points, or textual descriptions in parameters can guide the model for segmentation.
Target Extraction: Used to precisely extract contour information of specific targets from complex backgrounds.
Automated Annotation Assistance: Can be part of an automated annotation workflow, quickly generating initial segmentation masks for targets through simple prompts.

Inputs Output

Input Item

Image: The color image to be segmented (must be in RGB format).

List of hint points: A list containing multiple point coordinates [X, Y], used to indicate the target region to be segmented. For example, clicking several points on the target.

Tooltip list: A list containing multiple bounding box coordinates, each box defined by four corner points, used to frame the target region to be segmented.

Output Item

Detection results: A list containing segmentation results, each result representing a segmented instance, containing its bounding box (which can be a rotated or horizontal box), category (defaults to 0 or specified category), confidence score, and segmented polygon contour.

Parameter Descriptions

This operator relies on the Fastsam Python library. If your environment has not yet installed it, please visit Qianyi’s internal pypi source and use pip install fastsam to install.

Prompt Combination: Multiple prompt methods (points, boxes, text) can be used simultaneously; the model will integrate this information for segmentation. If you only want to use one type of prompt, ensure other prompt inputs are empty or not connected.
Input Image: Ensure the input is a color RGB image.
Single Image Processing: The current operator implementation only supports processing one image at a time.
Prompt Coordinates: Coordinates for point prompts and box prompts should be image pixel coordinates.

Weight files

Parameter Description

Specifies the FastSAM model weight file (usually .pt format) to be used for segmentation. A valid model file must be selected.

Parameter Tuning Guide

Select a model that matches your task requirements and hardware capabilities. Generally, larger models (like FastSAM-x) have higher accuracy but are slower, while smaller models (like FastSAM-s) are faster but may have slightly lower accuracy.

Enable GPU

Parameter Description

Select whether to use GPU for model inference computation. If checked, ensure the computer has an available NVIDIA graphics card and the corresponding CUDA environment.

Parameter Tuning Guide

Enabling GPU can significantly improve processing speed. If no compatible GPU is available, this should be unchecked (use CPU).

Image size

Parameter Description

The size to which the input image will be scaled before being fed into the model for segmentation.

Parameter Tuning Guide

Larger image sizes generally lead to higher segmentation accuracy but also increase computation time and GPU/CPU memory consumption. Smaller sizes have the opposite effect. Common values include 640, 1024, etc. A trade-off between accuracy and speed needs to be made based on the specific application scenario.

Parameter Range

Default value: 640

Confidence threshold

Parameter Description

Confidence score threshold for filtering FastSAM’s initial segmentation results. Only segmentation results with confidence higher than this threshold will be retained.

Parameter Tuning Guide

Increasing this value will result in fewer segmentation outputs, retaining only objects the model is very confident about, which can reduce missegmentations and speed up post-processing. Decreasing this value will yield more segmentation results, potentially including some lower-confidence targets, but may increase missegmentations and post-processing time. Usually, start with the default value and adjust.

Parameter Range

[0, 1], Default value: 0.5

Overlap Filter Threshold

Parameter Description

Intersection over Union (IoU) threshold for Non-Maximum Suppression (NMS). When multiple segmentation results (masks or boxes) overlap by more than this threshold, lower-confidence results will be suppressed.

Parameter Tuning Guide

Increasing this value allows more overlapping results to coexist, which might be suitable for scenes with dense and mutually occluding objects. Decreasing this value will more aggressively remove overlapping results, ensuring each target outputs only one best result. The default value is usually suitable for most scenarios.

Parameter Range

[0, 1], Default value: 0.9

Extra class name

Parameter Description

Assigns a category name (ID) to the output segmentation results. FastSAM itself does not distinguish specific categories; this parameter is used to tag these segmentation results for subsequent processing (such as filtering, statistics).

Parameter Tuning Guide

Assign a meaningful category ID to the segmented objects according to your application scenario needs.

Parameter Range

Provides category name options from "0" to "29", defaults to 0.

Prompt

Parameter Description

Input text description to guide the model in segmenting objects related to the text content. For example, input "bag" or "red box".

Parameter Tuning Guide

Try using concise, specific nouns or phrases to describe the target you want to segment. You can use commas to separate multiple prompts. For example, "a blue car, the traffic light". The effectiveness of text prompts depends on the model’s comprehension ability.

Text prompt threshold

Parameter Description

When using text prompts, this threshold is used to filter segmentation results based on text similarity scores. Only results with similarity scores higher than this threshold will be retained.

Parameter Tuning Guide

This is a relatively sensitive parameter that needs adjustment based on actual results. If text prompts do not segment the desired results, try lowering this threshold; if many irrelevant results are segmented, try raising this threshold. Note that this threshold is not the confidence score in the final output results and is usually set relatively low.

Parameter Range

[0, 10], Default value: 0.01