We strongly recommend that highly technical users (examples: IT professionals, software developers, or system administrators) perform the steps outlined in this process.
Third-party products, such as OpenAI’s API, come with their own terms and conditions.

The following example uses ChatGPT-4o to automatically update classifications in Encord each time the Agent triggers. The example uses FastAPI as a server to host and run the code. The server returns an empty 200 response, indicating that the request has succeeded and causing the Label Editor to refresh.

Only HTTPS endpoints are supported.

Assumptions

This example makes the following assumptions:

  • Encord’s payload uses the application/json content type in its headers.

  • The payload includes the following data fields:

    • projectHash: A unique identifier for the project.
    • dataHash: A unique identifier for the data item.
    • frame: The frame number within the data item.

The following is an example of the payload structure:

{
  "dataHash": "038ed92d-dbe8-4991-a3aa-6ede35d6e62d",
  "projectHash": "027e0c65-c53f-426d-9f02-04fafe8bd80e",
  "frame": 10
}

Step 1: Create a directory to host your code

Ensure that CORS (Cross-Origin Resource Sharing) is configured correctly for your agent. Refer to the relevant documentation:

Create a directory containing the following files.

  • requirements.txt: Installs all the dependencies necessary for the example to work.
  • openai_api_agent.py: Makes the call to the OpenAI API. Ensure that you replace <classification_name> with the name of the classification you want GPT 4o to update.
  • data_models.py: Ensures that the response from OpenAI has the specific format required by Encord.
  • dependencies.py: Retrieves the information necessary for the query to be made and the label row to be updated.
  • main.py: Runs the program in the correct order.

import base64
import logging
from pathlib import Path

from encord.objects.classification import Classification
from encord.objects.classification_instance import ClassificationInstance
from encord.objects.ontology_labels_impl import LabelRowV2
from encord.objects.ontology_structure import OntologyStructure
from openai import OpenAI
from openai.types.chat import (
    ChatCompletionContentPartImageParam,
    ChatCompletionContentPartTextParam,
    ChatCompletionSystemMessageParam,
    ChatCompletionUserMessageParam,
)
from openai.types.chat.chat_completion_content_part_image_param import ImageURL
from openai.types.chat.completion_create_params import ResponseFormat
from pydantic import BaseModel, Field


def to_image_completion_content_part(
    image_path: Path,
) -> ChatCompletionContentPartImageParam:
    """
    Convert an image path into a base64 encoding to be sent to gpt.
    """
    with image_path.open("rb") as image_file:
        content = base64.b64encode(image_file.read()).decode("utf-8")
    return ChatCompletionContentPartImageParam(
        image_url=ImageURL(url=f"data:image/jpeg;base64,{content}", detail="auto"),
        type="image_url",
    )


def get_ontology_classification(ontology: OntologyStructure) -> Classification:
    """
    Replace <classification_name> with the name of the classification in your Ontology you want to update.
    """
    return ontology.get_child_by_title("<classification_name>", type_=Classification)


"""
Below is an example of how to define a pydantic model for extracting text.
GPT also understands if you use list types and enums. For more examples,
have a look at these notebooks:
    - [GPT-4o example with videos](https://colab.research.google.com/drive/1ctV-Zpoks7PDEXisVvpP1NeocyBkWXzp?usp=sharing)
    - [Gemini 1.5 Pro with advanced pydantic models](https://colab.research.google.com/drive/1jeCCZrumLnCwdVHbn-wK46xUPQQ9KCtf?usp=sharing)
"""


class DescriptionModel(BaseModel):
    description: str = Field(
        min_length=25,
        max_length=1000,
        description="A detailed description of the scene",
    )


def describe_scene(label_row: LabelRowV2, asset: Path) -> ClassificationInstance | None:
    system_instruction = f"""
Your are an image analysis expert. Your task is to extract the most relevant description of the image content provided.

You are expected to only respond in the form of the following JSON schema.
```json
{DescriptionModel.model_json_schema()}
```

Ensure that you do not wrap the object in a list. Only a single object conforming to the JSON schema is allowed.  
"""  
    client = OpenAI()

    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            ChatCompletionSystemMessageParam(role="system", content=system_instruction),
            ChatCompletionUserMessageParam(
                role="user",
                content=[to_image_completion_content_part(asset)]
                + [
                    ChatCompletionContentPartTextParam(
                        type="text",
                        text="Please build a JSON object with respect to this visual data. Follow the JSON schema provided to fill in the schema as accurately as you can.",
                    )
                ],
            ),
        ],
        response_format={"type": "json_object"},
        max_tokens=1000,
    )

    raw_text = completion.choices[0].message.content
    if raw_text is None:
        logging.error("No response")
        raise ValueError("Missing response from GPT-4o")

    try:
        labels = DescriptionModel.model_validate_json(raw_text)
    except Exception:
        logging.error(
            "Unable to parse text",
        )
        logging.error(raw_text)
        return None

    ontology_classification = get_ontology_classification(label_row.ontology_structure)
    instance = ontology_classification.create_instance()
    instance.set_answer(labels.description)
    return instance

Step 2: Test your Agent locally

We strongly recommend you test your Agent locally before deploying it to your server.

  1. Create a new virtual environment and install all the requirements using the following terminal commands:
$ python -m venv venv
$ source venv/bin/activate
$ python -m pip install -r requirements.txt
  1. Run your Agent and server locally using the following terminal command. Ensure that you replace:
  • <private_key_path> with the path to your private Encord API key.
  • <openai_key_path> with the path / key to your OpenAI API key.
ENCORD_SSH_KEY_FILE=\<private_key_path> OPENAI_API_KEY='<openai_key_path>' fastapi dev main.py
Keep this terminal window open as long as you want to run your server locally. The following steps should be performed in a new terminal window.
  1. In a new terminal window, point the server to a specific data asset in Encord. This step replicates the triggering of the Agent in the Encord platform. Ensure that you:
  • Replace <project_hash> with the hash of your Project.
  • Replace <data_hash> with the hash of the data unit you want to run your Agent on.

Both the <project_hash> and the <data_hash> can be found in the url of the Label Editor. The url is structured like this: https://app.encord.com/label_editor/<project_hash>\<data_hash> or this https://app.us.encord.com/label_editor/<project_hash>\<data_hash>

curl localhost:8000 -H "Content-type: application/json" -d '{
    "projectHash": "<project_hash>",
    "dataHash": "<data_hash>",
    "frame": 0
}'

Step 3: Deploy the agent to your server

The Agent deploys to your server using a container image, such as Docker. The exact method of deployment varies depending on your choice of server.

  • Install the requirements from the requirements.txt file. Documentation for creating a Docker image for FastAPI can be found here.

  • Ensure that you set up environment variables to authenticate with Encord and OpenAI. Information on authenticating with OpenAI can be found in their documentation here.

    • ENCORD_SSH_KEY_FILE=/path/to/your/private_key_file
    • OPENAI_API_KEY=<your_openai_api_key>