> ## Documentation Index
> Fetch the complete documentation index at: https://docs.encord.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Entity Mapping - E2E

<span className="inline-block rounded-full border border-sky-600/40 px-2 py-0.5 text-xs font-medium text-sky-700 mr-2">
  Entity Mapping
</span>

<span className="inline-block rounded-full border border-emerald-600/40 px-2 py-0.5 text-xs font-medium text-emerald-700 mr-2">
  Basic Cloud Storage
</span>

<span className="inline-block rounded-full border border-emerald-600/40 px-2 py-0.5 text-xs font-medium text-emerald-700">
  Tabular Data
</span>

## Why do this?

You want to reconcile messy, fragmented, or duplicated rows into consistent, unique entities for your down stream models.

**You want to understand how to create Projects in Encord that use Tabular Data (CSV files). This example assumes your Tabular Data is stored in cloud storage.**

<Tip>If you intend to use Encord at scale, we strongly recommend using the Encord SDK.</Tip>

### Pros and Cons

<table>
  <thead>
    <tr>
      <th>Pros</th>
      <th>Cons</th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td>
        * Simple way to get data into Encord
        * Tabular data available for annotation
      </td>

      <td>
        * Requires a little bit of technical knowledge to set integrations
        * No data management or curation of your tabular data in this example
      </td>
    </tr>
  </tbody>
</table>

<Note>
  Tabular data currently supports [**CONSENSUS** Projects](/platform-documentation/Annotate/annotate-projects/annotate-workflows-consensus) only.
</Note>

## Import/Register Data

We're going to register our tiny dataset of CSV files.

<Steps>
  <Step title="Create Integration">
    Select your cloud provider.

    <div className="grid gap-4 sm:grid-cols-2 lg:grid-cols-3">
      {[
              { name: 'AWS Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-aws-integration' },
              { name: 'GCP Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-gcp-integration' },
              { name: 'Azure Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-azure-blob-integration' },
              { name: 'OTC Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-otc-integration' },
              { name: 'Wasabi Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-wasabi-integration' },
              { name: 'Oracle Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-oracle-integration' },
              { name: 'Direct Access Integration', href: '/platform-documentation/General/annotate-data-integrations/annotate-direct-access-integration' },
            ].map(({ name, href }) => (
              <a
                key={name}
                href={href}
                className="block p-4 rounded-xl border border-gray-200 dark:border-gray-700 bg-white dark:bg-gray-800 text-sm font-medium text-gray-800 dark:text-gray-100 hover:bg-gray-100 dark:hover:bg-gray-700 transition"
              >
                {name}
              </a>
            ))}
    </div>
  </Step>

  <Step title="Download Data">
    Download and extract the contents of [e2e-tabular-data.zip file](https://storage.googleapis.com/docs-media.encord.com/E2E%20Artifacts/Tab%20Data/e2e-tabular-data.zip).
  </Step>

  <Step title="Modify JSON">
    Modify the `tabular-data.json` file in the e2e-tabular-data.zip file.

    1. Open the `tabular-data.json` file and replace `<file-path>` with the file path to the data stored in your cloud storage.

    <Note>The `tabular-data.json` file includes the file path and title for each CSV file. It does NOT include `clientMetadata`.</Note>
  </Step>

  <Step title="Create a Mirrored Dataset">
    Create a mirrored Dataset called `E2E - Tabular Data - Dataset` using the UI. Using mirrored Datasets is a simple way to sync data from folders to Datasets. Mirrored Datasets provide no method of curating or managing your data.

    If you want to add more data to your Dataset, add more data to the JSON file. Then re-import the JSON file and data automatically gets added to your Dataset and Project.

    <iframe width="560" height="315" src="https://www.loom.com/embed/0a94090e88b749b085640078fc1f89b2?sid=12a72497-6e6d-4a29-a276-be83e8db6054" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
  </Step>

  <Step title="Register/Import Data">
    Use the `tabular-data.json`, from the `e2e-tabular-data.zip`, to register/import the data to the mirrored Dataset.

    <iframe width="560" height="315" src="https://www.loom.com/embed/33bd5fad37b1482eb3c49e1c1edf5f5b?sid=76d77210-33cd-4705-b8b2-43fedb0c6c8c" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
  </Step>
</Steps>

## Create Ontology

For this step you need the following:

* `genre-options.csv` and `platform-options.csv` from the `e2e-tabular-data.zip` file.
* One `video_game_annotation_X.csv` from the `e2e-tabular-data.zip` file.
* `tabular_create_ontology.py` script. You create this.

The `tabular_create_ontology.py` script does the following:

* Creates the Ontology based on the structure of any of the `video_game_annotation_X.csv` files.
* Creates feature mapping for the *genre* column using `genre-options.csv`.
* Creates feature mapping for the *platform* column using `platform-options.csv`.

<iframe width="560" height="315" src="https://www.loom.com/embed/f606386b84dc483f8a68f0bfe19a9faa?sid=d225972d-e6ff-4a35-9a89-3b4f039cd578" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />

`E2E - Tabular Data - Ontology` appears in your Ontology list after running the script.

```python tabular_create_ontology script theme={"dark"}

import pandas as pd
from encord.objects import OntologyStructure, Shape, TextAttribute
from encord.objects.attributes import RadioAttribute
from encord.user_client import EncordUserClient

# --- Configuration ---
ENCORD_SSH_KEY = "/Users/chris-encord/ssh-private-key.txt" # Replace with the file path to your SHH private key
TASK_CSV_PATH = "/file/path/to/video_game_annotation_1.csv" # Replace with the file path to any of the video_game_annotation_X.csv files

READ_ONLY_COLUMNS = [0, 1, 2]
ANNOTATION_COLUMNS = [3, 4]

# Replace these paths with actual mapping column name > options file
MAPPING_FIELD_OPTION_PATHS = {
    "genre": "/file/path/to/genre-options.csv",
    "platform": "/file/path/to/platform-options.csv",
}

ONTOLOGY_NAME = "E2E - Tabular Data - Ontology"
OBJECT_NAME = "Game Row"


def parse_csv():
    csv_df = pd.read_csv(TASK_CSV_PATH)
    readonly_columns = csv_df.columns[READ_ONLY_COLUMNS].tolist()
    mapping_columns = csv_df.columns[ANNOTATION_COLUMNS].tolist()

    return mapping_columns, readonly_columns


def create_ontology(text_attribute_names, radio_option_names):
    ontology_structure = OntologyStructure()
    text_object = ontology_structure.add_object(name=OBJECT_NAME, shape=Shape.TEXT)

    for attribute in text_attribute_names:
        text_object.add_attribute(TextAttribute, attribute)

    for column_name in radio_option_names:
        options_path = MAPPING_FIELD_OPTION_PATHS.get(column_name)
        if options_path is None:
            raise ValueError(f"No options file defined for column '{column_name}'")

        options = pd.read_csv(options_path).iloc[:, 0].dropna().astype(str).tolist()

        radio_attribute = text_object.add_attribute(RadioAttribute, column_name, required=True)
        for option in options:
            radio_attribute.add_option(option)

    user_client = EncordUserClient.create_with_ssh_private_key(
        ssh_private_key_path=ENCORD_SSH_KEY,
        domain="https://api.encord.com",
    )
    return user_client.create_ontology(ONTOLOGY_NAME, structure=ontology_structure)


if __name__ == "__main__":
    mapping_columns, readonly_columns = parse_csv()
    ontology = create_ontology(readonly_columns, mapping_columns)
    print(f"Created ontology {ontology.title}, id: {ontology.ontology_hash}")

```

## Create Project

Create a **CONSENSUS** Project, after creating the Mirrored Dataset and registering/importing the CSV files, and creating the Ontology.

<Note>
  * Tabular data currently supports **CONSENSUS** Projects only.
  * An AGENT block must be the first block for tabular data.
  * The AGENT block and AGENT pathway **MUST be the exact name** specified below.
</Note>

* Name: `E2E - Tabular Data - Project`
* Agent name: `Pre-label`
* Agent pathway: `Labelled`

<Important>
  - Record the `Project ID` for use later.
  - Add your annotators, reviewers, team managers, and Project Admin after creating the Project.
  - Adjust the number of annotators per task to your own requirements.
</Important>

<iframe width="560" height="315" src="https://www.loom.com/embed/6bf890b7f39b494597225a766717ae3b?sid=e171b8b1-9ab6-431e-b00a-3e4fb1c6a8d7" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />

## Run the Agent script

The `tabular_run_agent.py` populates tasks in the AGENT block in your workflow.

Create the following Python scripts. Both scripts must be in the same directory.

* `tabular_run_agent.py`
* `tabular_utils.py`

After creating the scripts, run the `tabular_run_agent.py` script.

After running the script, tasks that were in the AGENT stage are now in the CONSENSUS - ANNOTATE stage.

<iframe width="560" height="315" src="https://www.loom.com/embed/a115913a6ee84318a6005c56dcba4a8f?sid=c21d8141-d86f-4eed-b43a-c5d860fe299f" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />

<CodeGroup>
  ```py tabular_run_agent script theme={"dark"}

  from typing import Annotated
  from pathlib import Path
  import os

  from encord_agents.tasks import Runner
  from encord.objects.ontology_labels_impl import LabelRowV2
  from encord.project import Project
  from encord_agents.tasks.dependencies import dep_asset
  from encord_agents.core.dependencies import Depends
  from encord.objects.common import Shape

  from tabular_utils import parse_csv_and_add_objects

  # --- Configuration ---
  ENCORD_SSH_KEY = "/Users/chris-encord/ssh-private-key.txt" # Replace with the file path to your SSH private key
  PROJECT_HASH = "00000000-0000-0000-0000-000000000000" # Replace with unique Project ID of the tabular data Project
  AGENT_STAGE = "Pre-label"
  AGENT_PATHWAY = "Labelled"

  # Inject into environment so Encord Agents can pick it up
  os.environ["ENCORD_SSH_KEY_FILE"] = ENCORD_SSH_KEY

  runner = Runner(project_hash=PROJECT_HASH)

  @runner.stage(stage=AGENT_STAGE)
  def agent_logic(
      lr: LabelRowV2, project: Project, asset: Annotated[Path, Depends(dep_asset)]
  ):
      ontology = project.ontology_structure
      text_object = ontology.objects[0]
      if text_object is None:
          raise Exception("No objects found")
      elif text_object.shape is not Shape.TEXT:
          raise Exception("Text object required")

      parse_csv_and_add_objects(text_object, lr, asset)

      return AGENT_PATHWAY

  if __name__ == "__main__":
      runner.run()
  ```

  ```py tabular_utils script theme={"dark"}

  import pandas as pd
  from pathlib import Path

  from encord.exceptions import OntologyError
  from encord.objects.attributes import Attribute
  from encord.objects.coordinates import TextCoordinates
  from encord.objects.frames import Range
  from encord.objects.ontology_labels_impl import LabelRowV2
  from encord.objects.ontology_object import Object

  def parse_csv_and_add_objects(
      text_object: Object, label_row: LabelRowV2, asset_link: Path
  ):
      csv_df = pd.read_csv(asset_link)
      filtered_columns = csv_df.columns

      read = None
      with asset_link as f:
          read = f.read_bytes()

      for index, row in csv_df.iloc[0:].iterrows():
          byte_range = get_byte_range_for_row(read, index + 1)
          if byte_range is None:
              raise Exception("Row does not exist")
          (start, end) = byte_range

          row_dict = row[filtered_columns].to_dict()

          add_text_object(text_object, label_row, Range(start=start, end=end), row_dict)

      label_row.save()


  def get_byte_range_for_row(csv_content: bytes, row: int) -> tuple[int, int] | None:
      # Find all newline positions
      newline_positions = []
      current_pos = 0

      while True:
          newline_pos = csv_content.find(b"\n", current_pos)
          if newline_pos == -1:
              break
          newline_positions.append(newline_pos)
          current_pos = newline_pos + 1

      # Check if requested row exists
      if row > len(newline_positions):
          return None

      # Calculate start and end positions
      if row == 0:
          start_byte = 0
      else:
          start_byte = newline_positions[row - 1] + 1

      if row < len(newline_positions):
          end_byte = newline_positions[row]
      else:
          # Last row might not have a trailing newline
          end_byte = len(csv_content)

      return (start_byte, end_byte)


  def add_text_object(
      text_object: Object,
      lr: LabelRowV2,
      byte_range: Range,
      csv_column_to_value: dict[str, str],
  ):
      new_instance = text_object.create_instance()
      new_instance.set_for_frames(coordinates=TextCoordinates(range=[byte_range]))
      for key, value in csv_column_to_value.items():
          try:
              attribute = text_object.get_child_by_title(title=key, type_=Attribute)
              if attribute.required:
                  # If required attribute then this is the one they are writing so ignore
                  continue
              new_instance.set_answer(attribute=attribute, answer=str(value))
          except OntologyError:
              # Ignore columns that don't have attributes
              pass
      lr.add_object_instance(new_instance)

  ```
</CodeGroup>

## Annotate Tabular Project

Annotation of CSV files depends on your Ontology. Our Ontology `E2E - Tabular Data - Ontology` uses text regions, but your annotators and reviewers use drop downs.

In this section, you'll see the following Collaborators:

* **Annotators** labeling data
* **Reviewers** reviewing labels created by Annotators
* **Team Manager** managing the Annotators and Reviewers
* **Project Admin** managing the Project and exporting labels

<Steps>
  <Step title="Prepare to Label">
    <AccordionGroup>
      <Accordion title="Team Manager or Project Admin">
        The Team Manager or Project Admin can prioritize certain data to be labeled and reviewed first. Let's prioritize two data units to be labeled first by setting the priority for those files to `75`.

        **Set Priority to `75`**

        <iframe width="560" height="315" src="https://www.loom.com/embed/f6b6971589f94640bc3dd70ea38bdef0?sid=7154b4e5-026a-454e-bd82-5c6a476079d4" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
      </Accordion>

      <Accordion title="Annotators">
        Annotators can sort by priority, search by file name, or filter Dataset or Issue Status.

        <iframe width="560" height="315" src="https://www.loom.com/embed/f7ef69898c0e437c8260e439c12b2745?sid=025b2fa2-0485-4541-b3a4-32f55cd373c7" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
      </Accordion>
    </AccordionGroup>
  </Step>

  <Step title="Label Data">
    <AccordionGroup>
      <Accordion title="Team Manager or Project Admin">
        The Team Manager or Project Admin can monitor the performance and progress of the annotation team.

        <iframe width="560" height="315" src="https://www.loom.com/embed/203cfcfa0c524eda93b0eccbc1557ca5?sid=43430c3f-efc9-4af4-914c-0c32485add9f" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
      </Accordion>

      <Accordion title="Annotators">
        Annotators use drop downs to select the genre and platform for each row.

        <iframe width="560" height="315" src="https://www.loom.com/embed/76c352565b9b40128ded02c940755388?sid=b0e70825-f08c-487b-a893-eae1914d700a" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
      </Accordion>
    </AccordionGroup>
  </Step>

  <Step title="Review Labels">
    <AccordionGroup>
      <Accordion title="Team Manager or Project Admin">
        The Team Manager or Project Admin can monitor the performance and progress of the review team.

        <iframe width="560" height="315" src="https://www.loom.com/embed/e1bb0ad1b3af4705b4b8a2474ff28766?sid=7638f83a-17ab-4aea-b463-7cf905983ebb" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
      </Accordion>

      <Accordion title="Review Labels">
        Reviewers verify that the labels are correct.

        <Note>Use any column in a row to select correct answers.</Note>

        <Tip>
          When there is an issue with labels/classifications, Reviewers can:

          * Reject the task and add a comment about why a task was rejected. Rejected tasks go back to the person who added the labels/classifications.
          * Edit labels directly using the **Edit labels** button and then approve the task.
        </Tip>

        <iframe width="560" height="315" src="https://www.loom.com/embed/3144b127de484c2ba939b3df51d1a95f?sid=302e4d87-29fc-42b3-b924-ffe1ad58871b" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
      </Accordion>
    </AccordionGroup>
  </Step>

  <Step title="Export Labels">
    Only Project Admins can export labels from Encord.

    <Accordion title="Project Admin">
      Export the labels from the Project Labels page.

      <iframe width="560" height="315" src="https://www.loom.com/embed/3ef36f8389cc404d9b34e2ea8757a2bf?sid=33fac4e7-8431-4521-958a-4b6d55c10432" title="Loom Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen autoplay />
    </Accordion>
  </Step>
</Steps>
