Annotator Training

Follow these steps to set up annotator training in Encord. The annotator training workflow enables you to assess the accuracy and performance of your annotation workforce.

Overview

Add your training files to Encord.
Create a Benchmark Project Establish ground truth labels by having a trusted expert annotate the data. This project must be completed before annotator training begins.
Set up Annotator Training Projects You must create one training Project per annotator. Use the same Dataset as the Benchmark Project in each annotator training Project. Annotators label the data, and their work is compared against the gold standard created in the benchmark.
Annotators Label Training Tasks Annotators must complete all the training tasks assigned to them.
Evaluate Annotator Performance Use the provided SDK script to compare annotator labels with the benchmark. Analyze the results to assess accuracy and provide targeted feedback.

STEP 1: Add Files to Encord

You must first add your data Encord. These files are used to train your annotators.

Create a Cloud Integration

Select your cloud provider:

AWS Integration GCP Integration Azure Integration OTC Integration Wasabi Integration Oracle Integration Direct Access Integration

Create a Folder to Store your Files

Navigate to Files under the Index heading in the Encord platform.
Click the + New folder button to create a new folder. Select the type of folder you want to create.

Give the folder a meaningful name and description.
Click Create to create the folder. The folder is listed in Files.

Create JSON file for Registration

To register files from cloud storage into Encord, you must create a JSON file specifying the files you want to upload.

While you can use a CSV file, we strongly recommend using JSON files for uploading cloud data to Encord for better compatibility and performance.

Find helpful scripts for creating JSON files for the data registration process here.

All types of data (videos, images, image groups, image sequences, and DICOM) from a private cloud are added to a Dataset in the same way, by using a JSON or CSV file. The file includes links to all images, image groups, videos and DICOM files in your cloud storage.

For a list of supported file formats for each data type, go here

Encord supports file names up to 300 characters in length for any file or video for upload.

Encord enforces the following upload limits for each JSON file used for file registration:

Up to 1 million URLs
A maximum of 500,000 items (e.g. images, image groups, videos, DICOMs)
URLs can be up to 16 KB in size

Optimal upload chunking can vary depending on your data type and the amount of associated metadata. For tailored recommendations, contact Encord support. We recommend starting with smaller uploads and gradually increasing the size based on how quickly jobs are processed. Generally, smaller chunks result in faster data reflection within the platform.

BEST PRACTICE: If you want to use Index or Active with your video data, we STRONGLY RECOMMEND using custom metadata (clientMetadata) to specify key frames, custom metadata, and custom embeddings. For more information go here or here for information on using the SDK.

JSON Format

For detailed information about the JSON file format used for import go here.The information provided about each of the following data types is designed to get you up and running as quickly as possible without going too deeply into the why or how. Look at the template for each data type, then the examples, and adjust the examples to suit your needs.

If skip_duplicate_urls is set to true, all object URLs that exactly match existing images/videos in the dataset are skipped.

JSON for AWS

Videos

Video MetadataWhen the videoMetadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers.

To guarantee accurate labels, it is crucial that the videoMetadata provided is accurate.

{
  "videos": [
    {
      "objectUrl": "cloud-path-to-your-video-1"
    },
    {
      "objectUrl": "cloud-path-to-your-video-2",
        "videoMetadata": {
          "fps": frames-per-second,
          "duration": duration-in-seconds,
          "width": frame-width,
          "height": frame-height,
          "file_size": file-size-in-bytes,
          "mime_type": "MIME-file-type-extension"
        }
      }
  ],
  "skip_duplicate_urls": true
}

Audio files

Audio Files

The following is an example JSON file for uploading two audio files to Encord.

Template: Imports audio files with an Encord title.
Audio Metadata: Imports one audio file with the audiometadata flag. When the audiometadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers. To guarantee accurate labels, it is crucial that the metadata you provide is accurate.

{
  "audio": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-custom-audio-file-title.mp3"
    }
  ],
  "skip_duplicate_urls": true
}

PDFs

The following is an example JSON file for uploading PDFs to Encord.

Template: Imports PDFs with an Encord title.
Data: Imports two PDFs with no title or custom metadata.

{
  "pdfs": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-file.html"
    }
  ],
  "skip_duplicate_urls": true
}

Text Files

The following is an example JSON file for uploading text files to Encord.

Template: Imports text files with an Encord title.
Data: Imports two text files with no title or custom metadata.

{
  "text": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-file.html"
    }
  ],
  "skip_duplicate_urls": true
}

Single images

Single Images

For detailed information about the JSON file format used for import go here.The JSON structure for single images parallels that of videos.Template: Provides the proper JSON format to import images into Encord.Examples:

Data Imports the images only.

{
  "images": [
    {
      "objectUrl": "file/path/to/images/file-name-01.file-extension"
    },
    {
      "objectUrl": "file/path/to/images/file-name-02.file-extension"
    },
    {
      "objectUrl": "file/path/to/images/file-name-03.file-extension",
      "title": "image-title.file-extension"
    }
  ],
  "skip_duplicate_urls": true
}

Image groups

For detailed information about the JSON file format used for import go here.

Image groups are collections of images that are processed as one annotation task.
Images within image groups remain unaltered, meaning that images of different sizes and resolutions can form an image group without the loss of data.
Image groups do NOT require ‘write’ permissions to your cloud storage.
If skip_duplicate_urls is set to true, all URLs exactly matching existing image groups in the dataset are skipped.

The position of each image within the sequence needs to be specified in the key (objectUrl_{position_number}).

Template: Provides the proper JSON format to import image groups into Encord.Examples:

Data: Imports the image groups only.

{
  "image_groups": [
    {
      "title": "<title 1>",
      "createVideo": false,
      "objectUrl_0": "file/path/to/images/file-name-01.file-extension",
      "objectUrl_1": "file/path/to/images/file-name-02.file-extension",
      "objectUrl_2": "file/path/to/images/file-name-03.file-extension",
    },
    {
      "title": "<title 2>",
      "createVideo": false,
      "objectUrl_0": "file/path/to/images/file-name-01.file-extension",
      "objectUrl_1": "file/path/to/images/file-name-02.file-extension",
      "objectUrl_2": "file/path/to/images/file-name-03.file-extension"
    }
  ],
  "skip_duplicate_urls": true
}

Image sequences

For detailed information about the JSON file format used for import go here.

Image sequences are collections of images that are processed as one annotation task and represented as a video.
Images within image sequences may be altered as images of varying sizes and resolutions are made to match that of the first image in the sequence.
Creating Image sequences from cloud storage requires ‘write’ permissions, as new files have to be created in order to be read as a video.
Each object in the image_groups array with the createVideo flag set to true represents a single image sequence.
If skip_duplicate_urls is set to true, all URLs exactly matching existing image sequences in the dataset are skipped.

The only difference between adding image groups and image sequences using a JSON file is that image sequences require the createVideo flag to be set to true. Both use the key image_groups.

The position of each image within the sequence needs to be specified in the key (objectUrl_{position_number}).

Encord supports up to 32,767 entries (21:50 minutes) for a single image sequence. We recommend up to 10,000 to 15,000 entries for a single image sequence for best performance. If you need a longer sequence, we recommend using video instead of an image sequence.

Template: Provides the proper JSON format to import image groups into Encord.Examples:

Data: Imports the images groups only.

{
  "image_groups": [
    {
      "title": "<title 1>",
      "createVideo": true,
      "objectUrl_0": "<object url>"
    },
    {
      "title": "<title 2>",
      "createVideo": true,
      "objectUrl_0": "<object url>",
      "objectUrl_1": "<object url>",
      "objectUrl_2": "<object url>"
    }
  ],
  "skip_duplicate_urls": true
}

DICOM

For detailed information about the JSON file format used for import go here.

Each dicom_series element can contain one or more DICOM series.
Each series requires a title and at least one object URL, as shown in the example below.
If skip_duplicate_urls is set to true, all object URLs exactly matching existing DICOM files in the dataset will be skipped.

Custom metadata is distinct from patient metadata, which is included in the .dcm file and does not have to be specific during the upload to Encord.

The following is an example JSON for uploading three DICOM series belonging to a study. Each title and object URL correspond to individual DICOM series.

The first series contains only a single object URL, as it is composed of a single file.
The second series contains 3 object URLs, as it is composed of three separate files.
The third series contains 2 object URLs, as it is composed of two separate files.

For each DICOM upload, an additional DicomSeries file is created. This file represents the series file-set. Only DicomSeries are displayed in the Encord application.

Template

{
  "dicom_series": [
    {
      "title": "Series-1",
      "objectUrl_0": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/study1-series1-file.dcm"
    },
    {
      "title": "Series-2",
      "objectUrl_0": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/study1-series2-file1.dcm",
      "objectUrl_1": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/study1-series2-file2.dcm",
      "objectUrl_2": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/study1-series2-file3.dcm",
    },
      {
      "title": "Series-3",
      "objectUrl_0": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/study1-series3-file1.dcm",
      "objectUrl_1": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/study1-series3-file2.dcm",
    }
  ],
  "skip_duplicate_urls": true
}

NIfTI

The following is an example JSON file for uploading two NIfTI files to Encord.

Template

{
    "nifti": [
      {
        "title": "<file-1>",
        "objectUrl": "https://my-bucket/.../nifti-file1.nii"
      },
      {
        "title": "<file-2>",
        "objectUrl": "https://my-bucket/.../nifti-file2.nii.gz"
      }
    ],
    "skip_duplicate_urls": true
  }

Multiple file types

You can upload multiple file types using a single JSON file. The example below shows 1 image, 2 videos, 2 image sequences, and 1 image group.

Multiple file types


{
  "images": [
    {
      "objectUrl": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/Image1.png"
    }
  ],
  "videos": [
    {
      "objectUrl": "https://encord-integration.s3.eu-west-2.amazonaws.com/videos/Cooking.mp4"
    },
    {
      "objectUrl": "https://encord-integration.s3.eu-west-2.amazonaws.com/videos/Oranges.mp4"
    }
  ],
  "image_groups": [
    {
      "title": "apple-samsung-light",
      "createVideo": true,
      "objectUrl_0": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/1+(32).jpg",
      "objectUrl_1": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/1+(33).jpg",
      "objectUrl_2": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/1+(34).jpg",
      "objectUrl_3": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/1+(35).jpg"
    },
    {
      "title": "apple-samsung-dark",
      "createVideo": true,
      "objectUrl_0": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/2+(32).jpg",
      "objectUrl_1": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/2+(33).jpg",
      "objectUrl_2": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/2+(34).jpg",
      "objectUrl_3": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/2+(35).jpg"
    }
  ],
  "image_groups": [
    {
      "title": "apple-ios-light",
      "createVideo": false,
      "objectUrl_0": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/3+(32).jpg",
      "objectUrl_1": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/3+(33).jpg"
    }
  ],
  "skip_duplicate_urls": true
}

JSON for GCP

Videos

To guarantee accurate labels, it is crucial that the videoMetadata provided is accurate.

{
  "videos": [
    {
      "objectUrl": "cloud-path-to-your-video-1"
    },
    {
      "objectUrl": "cloud-path-to-your-video-2",
        "videoMetadata": {
          "fps": frames-per-second,
          "duration": duration-in-seconds,
          "width": frame-width,
          "height": frame-height,
          "file_size": file-size-in-bytes,
          "mime_type": "MIME-file-type-extension"
        }
      }
  ],
  "skip_duplicate_urls": true
}

Audio files

Audio Files

The following is an example JSON file for uploading two audio files to Encord.

Example 1 imports audio files with an Encord title.
Example 2 imports one audio file with the audiometadata flag. When the audiometadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers. To guarantee accurate labels, it is crucial that the metadata you provide is accurate.

{
  "audio": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-custom-audio-file-title.mp3"
    }
  ],
  "skip_duplicate_urls": true
}

PDFs

The following is an example JSON file for uploading PDFs to Encord.

Template: Imports PDFs with an Encord title.
Data: Imports two PDFs with no title or custom metadata.

{
  "pdfs": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-file.html"
    }
  ],
  "skip_duplicate_urls": true
}

Text Files

The following is an example JSON file for uploading text files to Encord.

Template: Imports text files with an Encord title.
Data: Imports two text files with no title or custom metadata.

{
  "text": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-file.html"
    }
  ],
  "skip_duplicate_urls": true
}

Single images

Single Images

Data Imports the images only.
Image Metadata: Imports images with image metadata. This improves the import speed for your images.

{
  "images": [
    {
      "objectUrl": "file/path/to/images/file-name-01.file-extension"
    },
    {
      "objectUrl": "file/path/to/images/file-name-02.file-extension"
    },
    {
      "objectUrl": "file/path/to/images/file-name-03.file-extension",
      "title": "image-title.file-extension"
    }
  ],
  "skip_duplicate_urls": true
}

Image groups

For detailed information about the JSON file format used for import go here.

Image groups are collections of images that are processed as one annotation task.
Images within image groups remain unaltered, meaning that images of different sizes and resolutions can form an image group without the loss of data.
Image groups do NOT require ‘write’ permissions to your cloud storage.
If skip_duplicate_urls is set to true, all URLs exactly matching existing image groups in the dataset are skipped.

The position of each image within the sequence needs to be specified in the key (objectUrl_{position_number}).

Template: Provides the proper JSON format to import image groups into Encord.Examples:

Data: Imports the image groups only.

{
  "image_groups": [
    {
      "title": "<title 1>",
      "createVideo": false,
      "objectUrl_0": "file/path/to/images/file-name-01.file-extension",
      "objectUrl_1": "file/path/to/images/file-name-02.file-extension",
      "objectUrl_2": "file/path/to/images/file-name-03.file-extension",
    },
    {
      "title": "<title 2>",
      "createVideo": false,
      "objectUrl_0": "file/path/to/images/file-name-01.file-extension",
      "objectUrl_1": "file/path/to/images/file-name-02.file-extension",
      "objectUrl_2": "file/path/to/images/file-name-03.file-extension"
    }
  ],
  "skip_duplicate_urls": true
}

Image sequences

For detailed information about the JSON file format used for import go here.

Image sequences are collections of images that are processed as one annotation task and represented as a video.
Images within image sequences may be altered as images of varying sizes and resolutions are made to match that of the first image in the sequence.
Creating Image sequences from cloud storage requires ‘write’ permissions, as new files have to be created in order to be read as a video.
Each object in the image_groups array with the createVideo flag set to true represents a single image sequence.
If skip_duplicate_urls is set to true, all URLs exactly matching existing image sequences in the dataset are skipped.

The only difference between adding image groups and image sequences using a JSON file is that image sequences require the createVideo flag to be set to true. Both use the key image_groups.

The position of each image within the sequence needs to be specified in the key (objectUrl_{position_number}).

Template: Provides the proper JSON format to import image groups into Encord.Examples:

Data: Imports the images groups only.

{
  "image_groups": [
    {
      "title": "<title 1>",
      "createVideo": true,
      "objectUrl_0": "<object url>"
    },
    {
      "title": "<title 2>",
      "createVideo": true,
      "objectUrl_0": "<object url>",
      "objectUrl_1": "<object url>",
      "objectUrl_2": "<object url>"
    }
  ],
  "skip_duplicate_urls": true
}

DICOM

For detailed information about the JSON file format used for import go here.

Each dicom_series element can contain one or more DICOM series.
Each series requires a title and at least one object URL, as shown in the example below.
If skip_duplicate_urls is set to true, all object URLs exactly matching existing DICOM files in the dataset will be skipped.

Custom metadata is distinct from patient metadata, which is included in the .dcm file and does not have to be specific during the upload to Encord.

The following is an example JSON for uploading three DICOM series belonging to a study. Each title and object URL correspond to individual DICOM series.

The first series contains only a single object URL, as it is composed of a single file.
The second series contains 3 object URLs, as it is composed of three separate files.
The third series contains 2 object URLs, as it is composed of two separate files.

For each DICOM upload, an additional DicomSeries file is created. This file represents the series file-set. Only DicomSeries are displayed in the Encord application.

JSON for DICOM

{
  "dicom_series": [
    {
      "title": "Series-1",
      "objectUrl_0": "https://storage.cloud.google.com/encord-image-bucket/images/study1-series1-file.dcm"
    },
    {
      "title": "Series-2",
      "objectUrl_0": "https://storage.cloud.google.com/encord-image-bucket/images/study1-series2-file1.dcm",
      "objectUrl_1": "https://storage.cloud.google.com/encord-image-bucket/images/study1-series2-file2.dcm",
      "objectUrl_2": "https://storage.cloud.google.com/encord-image-bucket/images/study1-series2-file3.dcm",
    },
      {
      "title": "Series-3",
      "objectUrl_0": "https://storage.cloud.google.com/encord-image-bucket/images/study1-series3-file1.dcm",
      "objectUrl_1": "https://storage.cloud.google.com/encord-image-bucket/images/study1-series3-file2.dcm",
    }
  ],
  "skip_duplicate_urls": true
}

NIfTI

The following is an example JSON file for uploading two NIfTI files to Encord.

{
    "nifti": [
      {
        "title": "<file-1>",
        "objectUrl": "https://my-bucket/.../nifti-file1.nii"
      },
      {
        "title": "<file-2>",
        "objectUrl": "https://my-bucket/.../nifti-file2.nii.gz"
      }
    ],
    "skip_duplicate_urls": true
  }

Multiple file types

You can upload multiple file types using a single JSON file. The example below shows 1 image, 2 videos, 2 image sequences, and 1 image group.

Multiple file types


{
  "images": [
    {
      "objectUrl": "https://storage.cloud.google.com/encord-image-bucket/images/Image1.png"
    }
  ],
  "videos": [
    {
      "objectUrl": "https://storage.cloud.google.com/encord-image-bucket/videos/Cooking.mp4"
    },
    {
      "objectUrl": "https://storage.cloud.google.com/encord-image-bucket/videos/Oranges.mp4"
    }
  ],
  "image_groups": [
    {
      "title": "apple-samsung-light",
      "createVideo": true,
      "objectUrl_0": "https://storage.cloud.google.com/encord-image-bucket/images/1+(32).jpg",
      "objectUrl_1": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/1+(33).jpg",
      "objectUrl_2": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/1+(34).jpg",
      "objectUrl_3": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/1+(35).jpg"
    },
    {
      "title": "apple-samsung-dark",
      "createVideo": true,
      "objectUrl_0": "https://storage.cloud.google.com/encord-image-bucket/images/2+(32).jpg",
      "objectUrl_1": "https://storage.cloud.google.com/encord-image-bucket/images/2+(33).jpg",
      "objectUrl_2": "https://storage.cloud.google.com/encord-image-bucket/images/2+(34).jpg",
      "objectUrl_3": "https://storage.cloud.google.com/encord-image-bucket/images/2+(35).jpg"
    }
  ],
  "image_groups": [
    {
      "title": "apple-ios-light",
      "createVideo": false,
      "objectUrl_0": "https://storage.cloud.google.com/encord-image-bucket/images/3+(32).jpg",
      "objectUrl_1": "https://storage.cloud.google.com/encord-image-bucket/images/3+(33).jpg"
    }
  ],
  "skip_duplicate_urls": true
}

JSON for Azure

Videos

To guarantee accurate labels, it is crucial that the videoMetadata provided is accurate.

{
  "videos": [
    {
      "objectUrl": "cloud-path-to-your-video-1"
    },
    {
      "objectUrl": "cloud-path-to-your-video-2",
        "videoMetadata": {
          "fps": frames-per-second,
          "duration": duration-in-seconds,
          "width": frame-width,
          "height": frame-height,
          "file_size": file-size-in-bytes,
          "mime_type": "MIME-file-type-extension"
        }
      }
    {
      "objectUrl": "cloud-path-to-your-video-3",
      "title": "title-for-your-video-3"

    }
  ],
  "skip_duplicate_urls": true
}

Audio files

Audio Files

The following is an example JSON file for uploading two audio files to Encord.

Template: Imports audio files with an Encord title.
Audio Metadata: Imports one audio file with the audiometadata flag. When the audiometadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers. To guarantee accurate labels, it is crucial that the metadata you provide is accurate.

{
  "audio": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-custom-audio-file-title.mp3"
    }
  ],
  "skip_duplicate_urls": true
}

PDFs

The following is an example JSON file for uploading PDFs to Encord.

Template: Imports PDFs with an Encord title.
Data: Imports two PDFs with no title or custom metadata.

{
  "pdfs": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-file.html"
    }
  ],
  "skip_duplicate_urls": true
}

Text Files

The following is an example JSON file for uploading text files to Encord.

Template: Imports text files with an Encord title.
Data: Imports two text files with no title or custom metadata.

{
  "text": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-file.html"
    }
  ],
  "skip_duplicate_urls": true
}

Single images

Single Images

Data Imports the images only.
Image Metadata: Imports images with image metadata. This improves the import speed for your images.

{
  "images": [
    {
      "objectUrl": "file/path/to/images/file-name-01.file-extension"
    },
    {
      "objectUrl": "file/path/to/images/file-name-02.file-extension"
    },
    {
      "objectUrl": "file/path/to/images/file-name-03.file-extension",
      "title": "image-title.file-extension"
    }
  ],
  "skip_duplicate_urls": true
}

Image groups

For detailed information about the JSON file format used for import go here.

Image groups are collections of images that are processed as one annotation task.
Images within image groups remain unaltered, meaning that images of different sizes and resolutions can form an image group without the loss of data.
Image groups do NOT require ‘write’ permissions to your cloud storage.
If skip_duplicate_urls is set to true, all URLs exactly matching existing image groups in the dataset are skipped.

The position of each image within the sequence needs to be specified in the key (objectUrl_{position_number}).

Template: Provides the proper JSON format to import image groups into Encord.Examples:

Data: Imports the image groups only.

{
  "image_groups": [
    {
      "title": "<title 1>",
      "createVideo": false,
      "objectUrl_0": "file/path/to/images/file-name-01.file-extension",
      "objectUrl_1": "file/path/to/images/file-name-02.file-extension",
      "objectUrl_2": "file/path/to/images/file-name-03.file-extension"
    },
    {
      "title": "<title 2>",
      "createVideo": false,
      "objectUrl_0": "file/path/to/images/file-name-01.file-extension",
      "objectUrl_1": "file/path/to/images/file-name-02.file-extension",
      "objectUrl_2": "file/path/to/images/file-name-03.file-extension"
    }
  ],
  "skip_duplicate_urls": true
}

Image sequences

For detailed information about the JSON file format used for import go here.

Image sequences are collections of images that are processed as one annotation task and represented as a video.
Images within image sequences may be altered as images of varying sizes and resolutions are made to match that of the first image in the sequence.
Creating Image sequences from cloud storage requires ‘write’ permissions, as new files have to be created in order to be read as a video.
Each object in the image_groups array with the createVideo flag set to true represents a single image sequence.
If skip_duplicate_urls is set to true, all URLs exactly matching existing image sequences in the dataset are skipped.

The only difference between adding image groups and image sequences using a JSON file is that image sequences require the createVideo flag to be set to true. Both use the key image_groups.

The position of each image within the sequence needs to be specified in the key (objectUrl_{position_number}).

Template: Provides the proper JSON format to import image groups into Encord.Examples:

Data: Imports the images groups only.

{
  "image_groups": [
    {
      "title": "<title 1>",
      "createVideo": true,
      "objectUrl_0": "<object url>"
    },
    {
      "title": "<title 2>",
      "createVideo": true,
      "objectUrl_0": "<object url>",
      "objectUrl_1": "<object url>",
      "objectUrl_2": "<object url>"
    }
  ],
  "skip_duplicate_urls": true
}

DICOM

For detailed information about the JSON file format used for import go here.

Each dicom_series element can contain one or more DICOM series.
Each series requires a title and at least one object URL, as shown in the example below.
If skip_duplicate_urls is set to true, all object URLs exactly matching existing DICOM files in the dataset will be skipped.

Custom metadata is distinct from patient metadata, which is included in the .dcm file and does not have to be specific during the upload to Encord.

The following is an example JSON for uploading three DICOM series belonging to a study. Each title and object URL correspond to individual DICOM series.

The first series contains only a single object URL, as it is composed of a single file.
The second series contains 3 object URLs, as it is composed of three separate files.
The third series contains 2 object URLs, as it is composed of two separate files.

For each DICOM upload, an additional DicomSeries file is created. This file represents the series file-set. Only DicomSeries are displayed in the Encord application.

Template

{
  "dicom_series": [
    {
      "title": "Series-1",
      "objectUrl_0": "https://myaccount.blob.core.windows.net/encordcontainer/study1-series1-file.dcm"
    },
    {
      "title": "Series-2",
      "objectUrl_0": "https://myaccount.blob.core.windows.net/encordcontainer/study1-series2-file1.dcm",
      "objectUrl_1": "https://myaccount.blob.core.windows.net/encordcontainer/study1-series2-file2.dcm",
      "objectUrl_2": "https://myaccount.blob.core.windows.net/encordcontainer/study1-series2-file3.dcm",
    },
      {
      "title": "Series-3",
      "objectUrl_0": "https://myaccount.blob.core.windows.net/encordcontainer/study1-series3-file1.dcm",
      "objectUrl_1": "https://myaccount.blob.core.windows.net/encordcontainer/study1-series3-file2.dcm",
    }
  ],
  "skip_duplicate_urls": true
}

NIfTI

The following is an example JSON file for uploading two NIfTI files to Encord.

Template

{
    "nifti": [
      {
        "title": "<file-1>",
        "objectUrl": "https://my-bucket/.../nifti-file1.nii"
      },
      {
        "title": "<file-2>",
        "objectUrl": "https://my-bucket/.../nifti-file2.nii.gz"
      }
    ],
    "skip_duplicate_urls": true
  }

Multiple file types

You can upload multiple file types using a single JSON file. The example below shows 1 image, 2 videos, 2 image sequences, and 1 image group.


{
  "images": [
    {
      "objectUrl": "https://myaccount.blob.core.windows.net/encordcontainer/Image1.png"
    }
  ],
  "videos": [
    {
      "objectUrl": "https://myaccount.blob.core.windows.net/encordcontainer/Cooking.mp4"
    },
    {
      "objectUrl": "https://myaccount.blob.core.windows.net/encordcontainer/Oranges.mp4"
    }
  ],
  "image_groups": [
    {
      "title": "apple-samsung-light",
      "createVideo": true,
      "objectUrl_0": "https://myaccount.blob.core.windows.net/encordcontainer/1-Samsung-S4-Light+Environment/1+(32).jpg",
      "objectUrl_1": "https://myaccount.blob.core.windows.net/encordcontainer/1-Samsung-S4-Light+Environment/1+(33).jpg",
      "objectUrl_2": "https://myaccount.blob.core.windows.net/encordcontainer/1-Samsung-S4-Light+Environment/1+(34).jpg",
      "objectUrl_3": "https://myaccount.blob.core.windows.net/encordcontainer/1-Samsung-S4-Light+Environment/1+(35).jpg"
    },
    {
      "title": "apple-samsung-dark",
      "createVideo": true,
      "objectUrl_0": "https://myaccount.blob.core.windows.net/encordcontainer/2-samsung-S4-Dark+Environment/2+(32).jpg",
      "objectUrl_1": "https://myaccount.blob.core.windows.net/encordcontainer/2-samsung-S4-Dark+Environment/2+(33).jpg",
      "objectUrl_2": "https://myaccount.blob.core.windows.net/encordcontainer/2-samsung-S4-Dark+Environment/2+(34).jpg",
      "objectUrl_3": "https://myaccount.blob.core.windows.net/encordcontainer/2-samsung-S4-Dark+Environment/2+(35).jpg"
    }
  ],
  "image_groups": [
    {
      "title": "apple-ios-light",
      "createVideo": false,
      "objectUrl_0": "https://myaccount.blob.core.windows.net/encordcontainer/3-IOS-4-Light+Environment/3+(32).jpg",
      "objectUrl_1": "https://myaccount.blob.core.windows.net/encordcontainer/3-IOS-4-Light+Environment/3+(33).jpg"
    }
  ],
  "skip_duplicate_urls": true
}

JSON for OTC

Videos

To guarantee accurate labels, it is crucial that the videoMetadata provided is accurate.

{
  "videos": [
    {
      "objectUrl": "cloud-path-to-your-video-1"
    },
    {
      "objectUrl": "cloud-path-to-your-video-2",
        "videoMetadata": {
          "fps": frames-per-second,
          "duration": duration-in-seconds,
          "width": frame-width,
          "height": frame-height,
          "file_size": file-size-in-bytes,
          "mime_type": "MIME-file-type-extension"
        }
      }
  ],
  "skip_duplicate_urls": true
}

Audio files

Audio Files

The following is an example JSON file for uploading two audio files to Encord.

Template: Imports audio files with an Encord title.
Audio Metadata: Imports one audio file with the audiometadata flag. When the audiometadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers. To guarantee accurate labels, it is crucial that the metadata you provide is accurate.

{
  "audio": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-custom-audio-file-title.mp3"
    }
  ],
  "skip_duplicate_urls": true
}

PDFs

The following is an example JSON file for uploading PDFs to Encord.

Template: Imports PDFs with an Encord data_title.
Data: Imports two PDFs with no title or custom metadata.

{
  "pdfs": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-file.html"
    }
  ],
  "skip_duplicate_urls": true
}

Text Files

The following is an example JSON file for uploading text files to Encord.

Template: Imports text files with an Encord title.
Data: Imports two text files with no title or custom metadata.

{
  "text": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-file.html"
    }
  ],
  "skip_duplicate_urls": true
}

Single images

Single Images

Data Imports the images only.
Image Metadata: Imports images with image metadata. This improves the import speed for your images.

{
  "images": [
    {
      "objectUrl": "file/path/to/images/file-name-01.file-extension"
    },
    {
      "objectUrl": "file/path/to/images/file-name-02.file-extension"
    },
    {
      "objectUrl": "file/path/to/images/file-name-03.file-extension",
      "title": "image-title.file-extension"
    }
  ],
  "skip_duplicate_urls": true
}

Image groups

For detailed information about the JSON file format used for import go here.

Image groups are collections of images that are processed as one annotation task.
Images within image groups remain unaltered, meaning that images of different sizes and resolutions can form an image group without the loss of data.
Image groups do NOT require ‘write’ permissions to your cloud storage.
If skip_duplicate_urls is set to true, all URLs exactly matching existing image groups in the dataset are skipped.

The position of each image within the sequence needs to be specified in the key (objectUrl_{position_number}).

Template: Provides the proper JSON format to import image groups into Encord.Examples:

Data: Imports the image groups only.

{
  "image_groups": [
    {
      "title": "<title 1>",
      "createVideo": false,
      "objectUrl_0": "file/path/to/images/file-name-01.file-extension",
      "objectUrl_1": "file/path/to/images/file-name-02.file-extension",
      "objectUrl_2": "file/path/to/images/file-name-03.file-extension",
    },
    {
      "title": "<title 2>",
      "createVideo": false,
      "objectUrl_0": "file/path/to/images/file-name-01.file-extension",
      "objectUrl_1": "file/path/to/images/file-name-02.file-extension",
      "objectUrl_2": "file/path/to/images/file-name-03.file-extension",
      "clientMetadata": {"optional": "metadata"}
    }
  ],
  "skip_duplicate_urls": true
}

Image sequences

For detailed information about the JSON file format used for import go here.

Image sequences are collections of images that are processed as one annotation task and represented as a video.
Images within image sequences may be altered as images of varying sizes and resolutions are made to match that of the first image in the sequence.
Creating Image sequences from cloud storage requires ‘write’ permissions, as new files have to be created in order to be read as a video.
Each object in the image_groups array with the createVideo flag set to true represents a single image sequence.
If skip_duplicate_urls is set to true, all URLs exactly matching existing image sequences in the dataset are skipped.

The only difference between adding image groups and image sequences using a JSON file is that image sequences require the createVideo flag to be set to true. Both use the key image_groups.

The position of each image within the sequence needs to be specified in the key (objectUrl_{position_number}).

Template: Provides the proper JSON format to import image groups into Encord.Examples:

Data: Imports the images groups only.

{
  "image_groups": [
    {
      "title": "<title 1>",
      "createVideo": true,
      "objectUrl_0": "<object url>"
    },
    {
      "title": "<title 2>",
      "createVideo": true,
      "objectUrl_0": "<object url>",
      "objectUrl_1": "<object url>",
      "objectUrl_2": "<object url>"
    }
  ],
  "skip_duplicate_urls": true
}

DICOM

For detailed information about the JSON file format used for import go here.

Each dicom_series element can contain one or more DICOM series.
Each series requires a title and at least one object URL, as shown in the example below.
If skip_duplicate_urls is set to true, all object URLs exactly matching existing DICOM files in the dataset will be skipped.

Custom metadata is distinct from patient metadata, which is included in the .dcm file and does not have to be specific during the upload to Encord.

The following is an example JSON for uploading three DICOM series belonging to a study. Each title and object URL correspond to individual DICOM series.

The first series contains only a single object URL, as it is composed of a single file.
The second series contains 3 object URLs, as it is composed of three separate files.
The third series contains 2 object URLs, as it is composed of two separate files.

For each DICOM upload, an additional DicomSeries file is created. This file represents the series file-set. Only DicomSeries are displayed in the Encord application.

JSON for DICOM

{
  "dicom_series": [
    {
      "title": "Series-1",
      "objectUrl_0": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/study1-series1-file.dcm"
    },
    {
      "title": "Series-2",
      "objectUrl_0": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/study1-series2-file1.dcm",
      "objectUrl_1": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/study1-series2-file2.dcm",
      "objectUrl_2": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/study1-series2-file3.dcm",
    },
      {
      "title": "Series-3",
      "objectUrl_0": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/study1-series3-file1.dcm",
      "objectUrl_1": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/study1-series3-file2.dcm",
    }
  ],
  "skip_duplicate_urls": true
}

NIfTI

The following is an example JSON file for uploading two NIfTI files to Encord.

{
    "nifti": [
      {
        "title": "<file-1>",
        "objectUrl": "https://my-bucket/.../nifti-file1.nii"
      },
      {
        "title": "<file-2>",
        "objectUrl": "https://my-bucket/.../nifti-file2.nii.gz"
      }
    ],
    "skip_duplicate_urls": true
  }

Multiple file types

You can upload multiple file types using a single JSON file. The example below shows 1 image, 2 videos, 2 image sequences, and 1 image group.

Multiple file types


{
  "images": [
    {
      "objectUrl": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/Image1.png"
    }
  ],
  "videos": [
    {
      "objectUrl": "https://encord-bucket.obs.eu-de.otc.t-systems.com/videos/Cooking.mp4"
    },
    {
      "objectUrl": "https://encord-bucket.obs.eu-de.otc.t-systems.com/videos/Oranges.mp4"
    }
  ],
  "image_groups": [
    {
      "title": "apple-samsung-light",
      "createVideo": true,
      "objectUrl_0": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/1+(32).jpg",
      "objectUrl_1": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/1+(33).jpg",
      "objectUrl_2": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/1+(34).jpg",
      "objectUrl_3": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/1+(35).jpg"
    },
    {
      "title": "apple-samsung-dark",
      "createVideo": true,
      "objectUrl_0": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/2+(32).jpg",
      "objectUrl_1": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/2+(33).jpg",
      "objectUrl_2": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/2+(34).jpg",
      "objectUrl_3": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/2+(35).jpg"
    }
  ],
  "image_groups": [
    {
      "title": "apple-ios-light",
      "createVideo": false,
      "objectUrl_0": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/3+(32).jpg",
      "objectUrl_1": "https://encord-bucket.obs.eu-de.otc.t-systems.com/images/3+(33).jpg"
    }
  ],
  "skip_duplicate_urls": true
}

JSON for AWS Multi-Region Access Point

When using a Multi-Region Access Point for your AWS S3 buckets the JSON file has to be slightly different from the examples provided. Instead of an object’s URL, objects are specified using the ARN of the Multi-Region Access Point followed by the object name. The example below shows how video files from a Multi-Region Access Point would be specified.

{
  "videos": [
    {
      "objectUrl": "Multi-Region-Access-Point-ARN + <object name_1>"
    },
    {
      "objectUrl": "Multi-Region-Access-Point-ARN + <object name_2>",
      "title": "my-custom-video-title.mp4"
    }
  ],
  "skip_duplicate_urls": true
}

MRAP Example

{
  "videos": [
    {
      "objectUrl": "https://arn:aws:s3::123123123:accesspoint/frf28frarf9.mrap.s3-accesspoint.amazonaws.com/Videos/2022/video_1.mp4"
    },
    {
      "objectUrl": "https://arn:aws:s3::123123123:accesspoint/frf28frarf9.mrap.s3-accesspoint.amazonaws.com/Videos/2022/video_2.mp4",
      "title": "many-cute-cats.mp4"
    }
  ],
  "skip_duplicate_urls": true
}

Import your Files

Import cloud data

Import local data

STEP 2: Create a Benchmark Project

The benchmark Project contains reference labels used to evaluate your annotators’ labels. These gold standard labels should be created by a trusted expert to ensure accurate assessment.

Create a Training Dataset

Create a Dataset containing tasks designed to establish ground truth labels. These files are used to generate ‘gold-standard’ labels against which annotator performance can be evaluated. Give the Dataset a meaningful name.

Learn how to create Datasets here.

Create an Ontology

Create an Ontology to label your data. The same Ontology must be used in the Benchmark Project AND the Annotator Training Project.

Learn how to create Ontologies here.

Create the Benchmark Project

Ensure that you attach ONLY the Training Dataset to the Project.

Go to Annotate > Projects.
Click the + New annotation project button to create a new Project.

Give the Project a meaningful title and description. For example “Benchmark Labels”.
Click the Attach ontology button and attach the Ontology you created.
Click the Attach dataset button and attach the Benchmark Dataset you created.

Click Create project to finish creating the Project. You have now created the Project to Establish ground-truth labels.

STEP 3: Create Annotator Training Projects

Create a Project where your annotation workforce labels data and is evaluated against benchmark labels.

Create an Annotator Training Workflow Template

Create a Workflow template and give it a meaningful name like “Annotator Training”.

Creating templates makes creating one Project per annotator quicker and easier.

Create the following Workflow template for your Annotator Training Projects. Documentation on how to create new Workflow templates can be found here.

Create Annotator Training Projects

You must create one Annotator Training Project per annotator. This step must be repeated for each annotator.

Ensure that you:

Attach the Training Dataset you created in Step 2.1 for the Benchmark Project.
Attach the SAME Ontology you created in Step 2.2 for the Benchmark Project.
Attach the Annotator Training Workflow Template to the Project.

Go to Annotate > Projects.
Click the + New annotation project button to create a new Project.
Give the Project a meaningful title and description. For example “Annotator Training - Alex” for an annotator named Alex.
Click the Attach ontology button and attach the Ontology you created. Attach the SAME Ontology you created in Step 2.2 for the Benchmark Project.
Click the Attach dataset button and attach the training Dataset you created in Step 2.1.
Click the Load from template button to attach the “Annotator Training” template you created in Step 3.1.
Click Invite collaborators. Add the annotator you want to train in this Project to the annotation stage.
Click Create Project to create the Project. You have now created the Project to train the selected annotator.

STEP 4: Annotator Training

Your annotators must now complete all tasks in the Annotator Training Project they are assigned to. Only tasks in the Complete stage are evaluated.

Information on how to label can be found here.

STEP 5: Evaluate Annotators

This example only evaluates Bounding Boxes.

Save and run the following script to evaluate annotator performance. The script must be run once for each Annotator Training Project. It outputs a CSV file called iou_results.csv containing the results. The evaluation metrics used are Intersection over Union (IoU) and Class score.

IoU (Intersection over Union): Quantifies the overlap between predicted labels and the ground truth. It ranges from 0 to 1: 1.0: Indicates a perfect overlap between the predicted label and the ground truth. 0.0: Indicates no overlap between the predicted label and the ground truth. Values between 0 and 1: Represent the percentage of overlap. For example, an IoU of 0.6 signifies that 60% of the predicted label area overlaps with the ground truth label area.
Class Score (0 or 1): 1: The label was created using the correct class. 0: The label was created using the wrong class.

Ensure that you:

Replace <private_key_path> with the full path to your private SSH key.
Replace <benchmark-project-id> with the id of your Benchmark Project.
Replace <training-project-id> with the id of the Training Project you want to evaluate.

from encord import EncordUserClient
from encord.objects.common import Shape
from encord.objects.coordinates import BoundingBoxCoordinates
import pandas as pd
from encord.user_client import EncordUserClient
import os

# Instantiate Encord client by substituting the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
)

training_project_id = "<training-project-id>"
benchmark_project_id = "<benchmark-project-id>"

training_project = user_client.get_project(training_project_id)
benchmark_project = user_client.get_project(benchmark_project_id)

training_label_rows = training_project.list_label_rows_v2(workflow_graph_node_title_eq='Complete')
benchmark_label_rows = benchmark_project.list_label_rows_v2(workflow_graph_node_title_eq='Complete')

# Match by data_hash
benchmark_dict = {lr.data_hash: lr for lr in benchmark_label_rows}
paired_label_rows = [
    (benchmark_dict[lr.data_hash], lr)
    for lr in training_label_rows
    if lr.data_hash in benchmark_dict
]

# Initialize labels
with training_project.create_bundle() as bundle:
    for _, prod_lr in paired_label_rows:
        prod_lr.initialise_labels(bundle=bundle, overwrite=True)

with benchmark_project.create_bundle() as bundle:
    for bm_lr, _ in paired_label_rows:
        bm_lr.initialise_labels(bundle=bundle, overwrite=True)

# IoU calculation
def calculate_iou(bbox1: BoundingBoxCoordinates, bbox2: BoundingBoxCoordinates) -> float:
    x_left = max(bbox1.top_left_x, bbox2.top_left_x)
    y_top = max(bbox1.top_left_y, bbox2.top_left_y)
    x_right = min(bbox1.top_left_x + bbox1.width, bbox2.top_left_x + bbox2.width)
    y_bottom = min(bbox1.top_left_y + bbox1.height, bbox2.top_left_y + bbox2.height)
    intersection = max(0, x_right - x_left) * max(0, y_bottom - y_top)
    union = bbox1.width * bbox1.height + bbox2.width * bbox2.height - intersection
    return intersection / union if union > 0 else 0.0

# Compare labels and extract information
results = []
for bm_lr, prod_lr in paired_label_rows:
    prod_instances = [oi for oi in prod_lr.get_object_instances() if oi.ontology_item.shape == Shape.BOUNDING_BOX and oi.get_annotation(0)]
    bm_instances = [oi for oi in bm_lr.get_object_instances() if oi.ontology_item.shape == Shape.BOUNDING_BOX and oi.get_annotation(0)]

    training_data_unit_name = prod_lr.data_title
    training_label_id = prod_lr.label_hash

    for prod_obj in prod_instances:
        best_iou = 0.0
        best_match_hash = None
        prod_bbox = prod_obj.get_annotation(0).coordinates
        training_email = prod_obj.get_annotation(0).created_by

        for bm_obj in bm_instances:
            bm_bbox = bm_obj.get_annotation(0).coordinates
            iou = calculate_iou(prod_bbox, bm_bbox)
            if iou > best_iou:
                best_iou = iou
                best_match_hash = bm_obj.feature_hash

        class_score = 1.0 if best_match_hash == prod_obj.feature_hash and best_match_hash is not None else 0.0

        results.append({
            'training_email': training_email,
            'data_unit_name': training_data_unit_name,
            'label_id': training_label_id,
            'iou_score': best_iou,
            'class_score': class_score
        })

# Output the results to a CSV file
if results:
    df_results = pd.DataFrame(results)
    script_dir = os.path.dirname(os.path.abspath(__file__))
    csv_file_path = os.path.join(script_dir, "iou_results.csv")
    df_results.to_csv(csv_file_path, index=False)
    print(f"Results saved to: {csv_file_path}")
else:
    print("No matching label rows found for comparison.")

End-to-End with Encord

Modalities

Features

Benchmark QA Workflow

​Overview

​STEP 1: Add Files to Encord

​STEP 2: Create a Benchmark Project

​STEP 3: Create Annotator Training Projects

​STEP 4: Annotator Training

​STEP 5: Evaluate Annotators

Overview

STEP 1: Add Files to Encord

STEP 2: Create a Benchmark Project

STEP 3: Create Annotator Training Projects

STEP 4: Annotator Training

STEP 5: Evaluate Annotators