Segment anything model (SAM) allows you to automatically create labels around distinct features in all supported file formats. Read more about auto segmentation advances in our explainer blog post here.

SAM is available for Ontologies that have the Polygon, Bounding box, or Bitmask annotation types.

See the video, or use the step-by-step tutorial below to learn how to use SAM effectively.


1. Start Labeling Using SAM

Creating a new instance label

Click the wand icon within the Polygon, or Bounding box class when viewing a task in the label editor to make the SAM pop-up appear.

Use the Shift + A keyboard shortcut to toggle SAM mode.

Creating labels for existing instances

You can use SAM to create labels for an existing object instance.

  1. Navigate to the frame you want to add the instance.
  2. Click Instantiate object next to the instance, or press the instance’s hotkey on your keyboard (W in the example below).
  1. Press Shift + A on your keyboard to enable SAM.
SAM mode will remain active when switching to a different annotation class, and will only be deactivated by clicking the icon again.

2. Using SAM to Segment a Frame or Image

Click the area on the image or frame you would like to segment - the pop-up will indicate that auto-segmentation is running, as shown below.

You can also click and drag your cursor to select the part of an image you would like to segment.


3. Including / Excluding Areas from Selection

Once the prediction has run, a part of the image or frame will be highlighted in blue.

  • You can add to the selected area by left-clicking the part of the image you’d like to add to the label. This is shown in the following video.
  • You can exclude parts of the selected area by right-clicking the part you’d like to exclude from the label.
  • If you’ve made a mistake and want to re-start, simply click Reset on the SAM pop-up to start again.

4. Confirming the Label

Once the correct section is highlighted, click Apply Label on the SAM pop-up, or press Enter on your keyboard to confirm the selection.

A label appears outlining the area that was highlighted - the label shape depends on whether the selection was made using a bounding box, a bitmask, or a polygon.

SAM works in conjunction with the ‘Permanent drawing’ setting which can increase the speed at which SAM labels are created in video frames.

SAM 2 Tracking

For information on how to use SAM 2 object tracking see our documentation here.

SAM 2 brings state-of-the-art segmentation and tracking capabilities for both video and images into a single model. This unification eliminates the need for combining SAM with other video object segmentation models, streamlining the process of video segmentation into a single, efficient tool. It maintains a simple design and fast inference speed, making it accessible and efficient for users.

The model can track objects consistently across video frames in real-time, which opens up numerous possibilities for applications in video editing and mixed reality experiences. This new model builds upon the success of the original Segment Anything Model, offering improved performance and efficiency.

SAM 2 can also be used to annotate visual data for training computer vision systems. It opens up creative ways to select and interact with objects in real-time or live videos.

SAM 2 Key Features

  • Demonstrated superior accuracy in video segmentation with three times fewer interactions compared to previous models and an 8x speedup for video annotations. For image segmentation, it is not only more accurate but also six times faster than its predecessor, SAM.

  • Object Selection and Adjustment: SAM 2 extends the prompt-based object segmentation abilities of SAM to also work for object tracks across video frames.

  • Robust Segmentation of Unfamiliar Videos: The model is capable of zero-shot generalization. This means it can segment objects, images, and videos from domains not seen during training, making it versatile for real-world applications.

  • Real-Time Interactivity: SAM 2 utilizes a streaming memory architecture that processes video frames one at a time, allowing for real-time, interactive applications.