> ## Documentation Index
> Fetch the complete documentation index at: https://docs.encord.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Human-in-the-loop Validation

<span className="inline-block rounded-full border border-sky-600/40 px-2 py-0.5 text-xs font-medium text-sky-700 mr-2">
  Human-in-the-loop
</span>

<span className="inline-block rounded-full border border-emerald-600/40 px-2 py-0.5 text-xs font-medium text-emerald-700 mr-2">
  Multimodal
</span>

<span className="inline-block rounded-full border border-emerald-600/40 px-2 py-0.5 text-xs font-medium text-emerald-700 mr-2">
  Basic Cloud Storage
</span>

<span className="inline-block rounded-full border border-emerald-600/40 px-2 py-0.5 text-xs font-medium text-emerald-700">
  Data Groups (text and videos)
</span>

## Why do this?

You want to guarantee correctness, completeness, or fairness in the predictions of your models.

**Quick way to get going with Data Groups in Encord using cloud data.**

<Tip>If you intend to use Encord at scale, with Data Groups, we strongly recommend using the Encord SDK.</Tip>

### Pros and Cons

<table>
  <thead>
    <tr>
      <th>Pros</th>
      <th>Cons</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td>
        * Simple way to get data into Encord
        * Able to sync your cloud data with Encord easily
        * Data groups available for multi-tile multi-modal functionality
      </td>

      <td>
        * Requires a little bit of technical knowledge to set integrations
        * No data management or curation (custom metadata needs to be imported separately)
      </td>
    </tr>
  </tbody>
</table>

<Note>
  Data Groups can include custom metadata, but for the purposes of this end-to-end example none are included.
</Note>

## Import/Register Data

We're going to register our dataset of videos (portion of Nexar open source dataset) and text files (Events captured for the videos).

<Steps>
  <Step title="Create Integration">
    Select your cloud provider.

    <div className="grid gap-4 sm:grid-cols-2 lg:grid-cols-3">
      {[
              { name: 'AWS Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-aws-integration' },
              { name: 'GCP Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-gcp-integration' },
              { name: 'Azure Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-azure-blob-integration' },
              { name: 'OTC Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-otc-integration' },
              { name: 'Wasabi Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-wasabi-integration' },
              { name: 'Oracle Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-oracle-integration' },
              { name: 'Direct Access Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-direct-access-integration' },
            ].map(({ name, href }) => (
              <a
                key={name}
                href={href}
                className="block p-4 rounded-xl border border-gray-200 dark:border-gray-700 bg-white dark:bg-gray-800 text-sm font-medium text-gray-800 dark:text-gray-100 hover:bg-gray-100 dark:hover:bg-gray-700 transition"
              >
                {name}
              </a>
            ))}
    </div>
  </Step>

  <Step title="Download Data">
    Download and extract the contents of [nexar-first-100-osds.zip file](https://storage.googleapis.com/docs-media.encord.com/E2E%20Artifacts/nexar-first-100-osds.zip).
  </Step>

  <Step title="Re-encode Videos">
    <Tip>We strongly recommend re-encoding any videos with issues. Re-encoding your videos ensures the best performance when annotating your data.</Tip>

    You can do this locally or in your cloud storage.

    <Note>
      For more information on re-encoding videos, go [here](/platform-documentation/Curate/files/files-re-encoding).
    </Note>
  </Step>

  <Step title="Import Data to Cloud Storage">
    Import the contents of `nexar-first-100-osds.zip` into your cloud storage.
  </Step>

  <Step title="Create Cloud-synced Folder">
    Syncing the data registers the data in Encord. Your data stays in your cloud storage.

    <iframe width="560" height="315" src="https://www.loom.com/embed/408011d156214d769f650ad33eb346d2?sid=f235ca1e-19b3-4438-962a-f0771eac029b" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />

    1. Navigate to **Data** > **Files & Folders**.

    2. Click **New folder > Cloud-synced folder**.
       The New Cloud-synced folder dialog appears.

    3. Provide the following:

       * **Title:** `E2E - Data Groups - Cloud-synced Folder`.
       * **Description:** OPTIONAL - Provide a meaningful description for the Cloud-synced folder.
       * **Select your integration:** Select the integration to use from the drop down.
       * **Storage path:** Specify the storage/file path to your cloud storage. For example: `gs://encord-gcp-bucket/CloudSync/` or `s3://encord-aws-bucket/CloudSync`.

    4. Click **Test** to verify that Encord can communicate with your cloud storage.

    5. Click **Create**.
       The page for the new Cloud-synced folder appears.

    #### Find Storage Path

    Finding the Storage path for your folder or object varies across Cloud Storage platforms.

    **AWS**

    ![Find AWS storage path](https://storage.googleapis.com/docs-media.encord.com/static/img/Index/aws-storage-path.gif)

    **GCP**

    ![Find GCP storage path](https://storage.googleapis.com/docs-media.encord.com/static/img/Index/gcp-storage-path.gif)
  </Step>

  <Step title="Sync Data Between Encord and Cloud Storage">
    1. Go to **Data** > **Files & Folders**.

    2. Click into your cloud-synced folder.

    3. Click **Initiate sync**.
       The sync between the folder and your cloud storage begins.

    <Important>
      Record the `Dataset ID` for use later.
    </Important>
  </Step>
</Steps>

## Create Ontology

Create the following Ontology for the Project.

Ontology name: `E2E - Ontology - Data Groups`

**Classifications**

* `Prediction correct?`
  * `YES!` (Radio button)
  * `No`  (Radio button)
    * `What's wrong?` (Text)

* `Summary correct?`
  * `YES!` (Radio button)
  * `No`  (Radio button)
    * `What's wrong?` (Text)

![Data Groups Ontology](https://storage.googleapis.com/docs-media.encord.com/E2E%20Artifacts/Screenshots/e2e-data-groups-ontology.png)

<iframe width="560" height="315" src="https://www.loom.com/embed/3e1908be811d42dba1c3d43829f3b481?sid=b5099dd4-e6b2-4404-9d07-8b18bb975ab9" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />

## Create Dataset

Create a Dataset for your Data Groups.

Name: `E2E - Dataset - Data Groups`

<Important>
  Record the `Dataset ID` for use later.
</Important>

<iframe width="560" height="315" src="https://www.loom.com/embed/f9b066cb85094083843e8475f3c251f2?sid=798deb66-83ca-4f6d-ba7b-08d78efa684e" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />

## Create Project

Once all the videos are re-encoded, and you created an Ontology and Dataset you are ready to create an Annotate Project. Once you create a Project you need to create your Data Groups and then your team will be ready to annotate your data.

Name: `E2E - Project - Data Groups`

<Important>
  Record the `Project ID` for use later.
</Important>

<iframe width="560" height="315" src="https://www.loom.com/embed/bf134498ca564b97aafff45005643a99?sid=6ff5b2f7-042d-419e-bbcd-9d4be4ecfd68" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />

## Mapping File for Data Units

Creating Data Groups requires mapping your data units to the layout, used during annotation and review. Currently mapping to the layout uses the File ID/UUID of the data unit Encord assigns the data unit.

To find the File ID/UUID of your data units use `storage_folder.list_items`. The following script provides a way to get the file name and ID of your data units. The output saves to a JSON and CSV file.

```python List File Name and File ID theme={"dark"}

from encord import EncordUserClient
import json
import csv

# --- Configuration ---
SSH_PATH = "/Users/chris-encord/ssh-private-key.txt" # Replace with the file path to your SSH private key
FOLDER_ID = "00000000-0000-0000-0000-000000000000"  # Replace with the Folder ID

# Output file paths
JSON_OUTPUT_PATH = "/file/path/to/save/file_mapping.json" # Update this as required
CSV_OUTPUT_PATH = "/file/path/to/save/file_mapping.csv" # Update this as required

# Authenticate with Encord using the path to your private key
user_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path=SSH_PATH,
    # For US platform users use "https://api.us.encord.com"
    domain="https://api.encord.com",
)

# Find the storage folder by name
folder_name = FOLDER_ID
folders = list(user_client.find_storage_folders(search=folder_name, page_size=1000))

# Ensure the folder was found
if folders:
    storage_folder = folders[0]

    # List all data units
    items = list(storage_folder.list_items())

    # Create a list of dicts for structured output
    file_data = [
        {
            "file_id": str(item.uuid),  # Convert UUID to string
            "file_name": item.name,
            "file_type": item.item_type
        }
        for item in items
    ]

    # --- Save to JSON File ---
    with open(JSON_OUTPUT_PATH, "w") as f:
        json.dump(file_data, f, indent=4)

    # --- Save to CSV File ---
    with open(CSV_OUTPUT_PATH, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["file_id", "file_name", "file_type"])
        writer.writeheader()
        writer.writerows(file_data)

    print(f"Saved output to:\n- {JSON_OUTPUT_PATH}\n- {CSV_OUTPUT_PATH}")

else:
    print("Folder not found.")

```

## Create Data Groups

<Note>
  Use the output file from the `Map Data Units for Data Groups` section to map File IDs to their corresponding layout for Data Groups.
</Note>

Use the script in this section to create Data Groups, add those Data Groups to a Dataset, and add the Dataset to a Project.

The script creates Data Groups with five data units in the following layout:

```pgsql theme={"dark"}
+-------------------------------------------+
|              text file                    |
+------------------+------------------------+
|     video 1      |        video 2         |
+------------------+------------------------+
|     video 3      |        video 4         |
+------------------+------------------------+
```

To create Data Groups the File Ids for data units need to be mapped to the Data Group.

Refer to the following:

```python theme={"dark"}

# --- Group definitions (name + UUIDs) ---
groups = [
    {
        "name": "group-001",
        "uuids": {
            "instructions": UUID("00000000-0000-0000-0000-000000000000"), # Replace with File ID of clustered_event_log_01.txt
            "top-left": UUID("11111111-1111-1111-1111-111111111111"), # Replace with File ID of 00001_normalized.mp4
            "top-right": UUID("22222222-2222-2222-2222-222222222222"), # Replace with File ID of 00002_normalized.mp4
            "bottom-left": UUID("33333333-3333-3333-3333-333333333333"), # Replace with File ID of 00009.mp4
            "bottom-right": UUID("44444444-4444-4444-4444-444444444444"), # Replace with File ID of 00011_normalized.mp4
        },
    },
    {
        "name": "group-002",
        "uuids": {
            "instructions": UUID("55555555-5555-5555-5555-555555555555"), # Replace with File ID of clustered_event_log_02.txt
            "top-left": UUID("66666666-6666-6666-6666-666666666666"), # Replace with File ID of 00012.mp4
            "top-right": UUID("77777777-7777-7777-7777-777777777777"), # Replace with File ID of 00020.mp4
            "bottom-left": UUID("88888888-8888-8888-8888-888888888888"), # Replace with File ID of 00030.mp4
            "bottom-right": UUID("99999999-9999-9999-9999-999999999999"), # Replace with File ID of 00033.mp4
        },
    },
    {
        "name": "group-003",
        "uuids": {
            "instructions": UUID("12312312-3123-1231-2312-312312312312"), # Replace with File ID of clustered_event_log_03.txt
            "top-left": UUID("23232323-2323-2323-2323-232323232323"), # Replace with File ID of 00034.mp4
            "top-right": UUID("31313131-3131-3131-3131-313131313131"), # Replace with File ID of 00035_normalized.mp4
            "bottom-left": UUID("45645645-6456-4564-5645-645645645645"), # Replace with File ID of 00038_normalized.mp4
            "bottom-right": UUID("56565656-6565-5656-6565-656565656565 "), # Replace with File ID of 00045.mp4
        },
    },
    # More groups...
]
```

**Run this script to create Data Groups:**

```python theme={"dark"}

from uuid import UUID

from encord.constants.enums import DataType
from encord.objects.metadata import DataGroupMetadata
from encord.orm.storage import DataGroupCustom, StorageItemType
from encord.user_client import EncordUserClient

# --- Configuration ---
SSH_PATH = "/Users/chris-encord/ssh-private-key.txt"  # Replace with the file path to your access key
FOLDER_ID = "00000000-0000-0000-0000-000000000000"  # Replace with the Folder ID
DATASET_ID = "00000000-0000-0000-0000-000000000000"  # Replace with the Dataset ID
PROJECT_ID = "00000000-0000-0000-0000-000000000000"  # Replace with the Project ID

# --- Connect to Encord ---
user_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path=SSH_PATH,
    # For US platform users use "https://api.us.encord.com"
    domain="https://api.encord.com",
)

folder = user_client.get_storage_folder(FOLDER_ID)

# --- Reusable layout and settings ---
layout = {
    "direction": "column",
    "first": {"type": "data_unit", "key": "instructions"},
    "second": {
        "direction": "column",
        "first": {
            "direction": "row",
            "first": {"type": "data_unit", "key": "top-left"},
            "second": {"type": "data_unit", "key": "top-right"},
            "splitPercentage": 50,
        },
        "second": {
            "direction": "row",
            "first": {"type": "data_unit", "key": "bottom-left"},
            "second": {"type": "data_unit", "key": "bottom-right"},
            "splitPercentage": 50,
        },
        "splitPercentage": 50,
    },
    "splitPercentage": 20,
}
settings = {"tile_settings": {"instructions": {"is_read_only": True}}}

# --- Group definitions (name + UUIDs) ---
groups = [
    {
        "name": "group-001",
        "uuids": {
            "instructions": UUID("00000000-0000-0000-0000-000000000000"), # Replace with File ID of clustered_event_log_01.txt
            "top-left": UUID("11111111-1111-1111-1111-111111111111"), # Replace with File ID of 00001_normalized.mp4
            "top-right": UUID("22222222-2222-2222-2222-222222222222"), # Replace with File ID of 00002_normalized.mp4
            "bottom-left": UUID("33333333-3333-3333-3333-333333333333"), # Replace with File ID of 00009.mp4
            "bottom-right": UUID("44444444-4444-4444-4444-444444444444"), # Replace with File ID of 00011_normalized.mp4
        },
    },
    {
        "name": "group-002",
        "uuids": {
            "instructions": UUID("55555555-5555-5555-5555-555555555555"), # Replace with File ID of clustered_event_log_02.txt
            "top-left": UUID("66666666-6666-6666-6666-666666666666"), # Replace with File ID of 00012.mp4
            "top-right": UUID("77777777-7777-7777-7777-777777777777"), # Replace with File ID of 00020.mp4
            "bottom-left": UUID("88888888-8888-8888-8888-888888888888"), # Replace with File ID of 00030.mp4
            "bottom-right": UUID("99999999-9999-9999-9999-999999999999"), # Replace with File ID of 00033.mp4
        },
    },
    {
        "name": "group-003",
        "uuids": {
            "instructions": UUID("12312312-3123-1231-2312-312312312312"), # Replace with File ID of clustered_event_log_03.txt
            "top-left": UUID("23232323-2323-2323-2323-232323232323"), # Replace with File ID of 00034.mp4
            "top-right": UUID("31313131-3131-3131-3131-313131313131"), # Replace with File ID of 00035_normalized.mp4
            "bottom-left": UUID("45645645-6456-4564-5645-645645645645"), # Replace with File ID of 00038_normalized.mp4
            "bottom-right": UUID("56565656-6565-5656-6565-656565656565 "), # Replace with File ID of 00045.mp4
        },
    },
    # More groups...
]

# Create the data groups

for g in groups:
    group = folder.create_data_group(
        DataGroupCustom(
            name=g["name"],
            layout=layout,
            layout_contents=g["uuids"],
            settings=settings,
        )
    )
    print(f"✅ Created group '{g['name']}' with UUID {group}")

# Add all the data groups in a folder to a Dataset
group_items = folder.list_items(item_types=[StorageItemType.GROUP])
d = user_client.get_dataset(DATASET_ID)
d.link_items([item.uuid for item in group_items])

# Add the Dataset with the Data Groups to a Project

p = user_client.get_project(PROJECT_ID)
rows = p.list_label_rows_v2(include_children=True)

# Label Rows of Data Groups use DataGroupMetadata for the layout to Annotate and Review
for row in rows:
    if row.data_type == DataType.GROUP:
        row.initialise_labels()
        assert isinstance(row.metadata, DataGroupMetadata)
        print(row.metadata.children)
```

## Annotate Data Groups

Annotation of videos depends on your Ontology. Our Ontology `E2E Data Groups` uses classifications.

In this section, you'll see the following Collaborators:

* **Annotators** labeling data
* **Reviewers** reviewing labels created by Annotators
* **Team Manager** managing the Annotators and Reviewers
* **Project Admin** managing the Project and exporting labels

<Steps>
  <Step title="Prepare to Label">
    <AccordionGroup>
      <Accordion title="Team Manager or Project Admin">
        The Team Manager or Project Admin can prioritize certain data to be labeled and reviewed first. Let's prioritize a few Data Groups to be labeled first by setting the priority for those files to `75`.

        **Set Priority to `75`**

        <iframe width="560" height="315" src="https://www.loom.com/embed/4cdf778197f449c09550c3c9343e38f0?sid=1c0b213c-f4b5-4ab0-9a73-6c5f2df2c86d" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
      </Accordion>

      <Accordion title="Annotators">
        Annotators can configure the Annotate Label Editor so they can more effectively and efficiently label data.

        <iframe width="560" height="315" src="https://www.loom.com/embed/399c5980e45f4645b9413de475ddc8a1?sid=5883d1c3-2b02-4e06-8e8f-d54b4fbd79db" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
      </Accordion>
    </AccordionGroup>
  </Step>

  <Step title="Label Data">
    <AccordionGroup>
      <Accordion title="Team Manager or Project Admin">
        The Team Manager or Project Admin can monitor the performance and progress of the annotation team.

        <iframe width="560" height="315" src="https://www.loom.com/embed/a753e9ac5b094c14bccce20130cbd3c6?sid=2a170a8b-c3e6-4a54-b276-b7792ffde822" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
      </Accordion>

      <Accordion title="Annotators">
        Annotators use the text file to determine if the `Prediction` and `Summary` for each video are correct.

        <Note>Use hotkeys to speed up your annotation process.</Note>

        <Tip>You can [synchronize playback across all videos](/platform-documentation/Annotate/annotate-label-editor/multimodal-groups).</Tip>

        <iframe width="560" height="315" src="https://www.loom.com/embed/ff23f8db0ec240779568c630d96cf441?sid=9e5bac73-303c-4810-887a-a2643d92d3d4" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
      </Accordion>
    </AccordionGroup>
  </Step>

  <Step title="Review Labels">
    <AccordionGroup>
      <Accordion title="Team Manager or Project Admin">
        The Team Manager or Project Admin can monitor the performance and progress of the review team.

        <iframe width="560" height="315" src="https://www.loom.com/embed/48fa7f2c10f746928cb0c7979eb2fc42?sid=99d24493-7860-4434-972f-0598a210bd57" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
      </Accordion>

      <Accordion title="Review Labels">
        Reviewers verify that the labels are correct.

        <Note>You can approve labels/classifications on a task one at a time (from the left panel) or all at once (using the **Approval all** button).</Note>

        <Tip>
          When there is an issue with labels/classifications, Reviewers can:

          * Reject the task and add a comment about why a task was rejected. Rejected tasks go back to the person who added the labels/classifications.
          * Edit labels directly using the **Edit labels** button and then approve the task.
        </Tip>

        <iframe width="560" height="315" src="https://www.loom.com/embed/b0f1665322d442289ed7e684eb3a0161?sid=2ec741f9-0a81-492e-969b-50d810090c63" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
      </Accordion>
    </AccordionGroup>
  </Step>

  <Step title="Export Labels">
    Only Project Admins can export labels from Encord.

    <Accordion title="Project Admin">
      Export the labels from the Project Labels page.

      <iframe width="560" height="315" src="https://www.loom.com/embed/c9d1e0eeca4f4fc080e23a9d70a8a36a?sid=3263db85-d8c1-4b5e-afe8-3342997c66fb" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
    </Accordion>
  </Step>
</Steps>
