Basic Geometric Example
A simple example showing how to use objectHashes.agent.py
objectHashes
to your agent. Your agent can then use the dep_objects
method to gain immediate access to these specific object instances, which greatly simplifies integrating your OCR model for targeted processing.
Test the Agent
- Save the above code as
agent.py
. - Run the following command to run the agent in debug mode in your terminal.
- Open your Project in the Encord platform and navigate to a frame with an object that you want to act on. Choose an object from the bottom left sider and click
Copy URL
as shown:

The url should have roughly this format:
"https://app.encord.com/label_editor/{project_hash}/{data_hash}/{frame}/0?other_query_params&objectHash={objectHash}"
.- In another shell operating from the same working directory, source your virtual environment and test the agent.
- To see if the test is successful, refresh your browser to see the action taken by the Agent. If the test has run successfully, the agent can be deployed. Visit the deployment documentation to learn more.
Nested Classification using Claude 3.5 Sonnet
The goals of this example is to:- Create an editor agent that can automatically fill in frame-level classifications in the Label Editor.
- Demonstrate how to use the
OntologyDataModel
for classifications. - Demonstrate how to build an agent using FastAPI that can be self-hosted.
- Created a virtual Python environment.
- Installed all necessary dependencies.
- Have an Anthropic API key.
- Are able to authenticate with Encord.

Ontology JSON and Script
Ontology JSON and Script
Ontology JSON
Create Ontology


- Import dependencies, authenticate with Encord, and set up the Project. Ensure you insert your Project’s unique identifier.
- Create a data model and a system prompt based on the Project Ontology to tell Claude how to structure its response.
- Set up an Anthropic API client to establish communication with the Claude model.
- Define the Editor Agent. This includes:
- Receiving frame data using FastAPI’s Form dependency.
- Retrieving the associated label row and frame content using Encord Agents’ dependencies.
- Constructing a Frame object from the content.
- Sending the frame image to Claude for analysis.
- Parsing Claude’s response into classification instances.
- Adding these classifications to the label row and saving the updated data.
- In your current terminal run the following command to runFastAPI server in development mode with auto-reload enabled.
- Open your Project in the Encord platform and navigate to a frame you want to add a classification to. Copy the URL from your browser.
The url should have the following format:
"https://app.encord.com/label_editor/{project_hash}/{data_hash}/{frame}"
- In another shell operating from the same working directory, source your virtual environment and test the agent.
- To see if the test is successful, refresh your browser to view the classifications generated by Claude. Once the test runs successfully, you are ready to deploy your agent. Visit the deployment documentation to learn more.
Nested Attributes using Claude 3.5 Sonnet
The goals of this example are:- Create an editor agent that can convert generic object annotations (class-less coordinates) into class specific annotations with nested attributes like descriptions, radio buttons, and checklists.
- Demonstrate how to use both the
OntologyDataModel
and thedep_object_crops
dependency.
- Created a virtual Python environment.
- Installed all necessary dependencies.
- Have an Anthropic API key.
- Are able to authenticate with Encord.

Ontology JSON and Script
Ontology JSON and Script
ontology.json


- Import Dependencies and Configure Project: Import necessary dependencies and set up your project. Remember to insert your project’s unique identifier.
- Create a data model and a system prompt based on the Project Ontology to tell Claude how to structure its response.
- Initialize Anthropic API Client: Set up an API client to establish communication with the Claude model.
- Define the Editor Agent:
- Arguments are automatically injected when the agent is called (see dependency injection details [suspicious link removed]).
- The dep_object_crops dependency filters to include only “generic” object crops that still need classification.
- Call Claude with Image Crops: Use the crop.b64_encoding method to send each image crop to Claude in a format it understands.
- Parse Claude’s Response and Update Labels: The data_model parses Claude’s JSON response, creating a new Encord object instance. If successful, the original generic object is replaced with the newly classified instance on the label row.
- Save Labels.
- In your current terminal run the following command to runFastAPI server in development mode with auto-reload enabled.
- Open your Project in the Encord platform and navigate to a frame you want to add a classification to. Copy the URL from your browser.
The url should have roughly this format:
"https://app.encord.com/label_editor/{project_hash}/{data_hash}/{frame}"
.- In another shell operating from the same working directory, source your virtual environment and test the agent:
- To see if the test is successful, refresh your browser to view the classifications generated by Claude. Once the test runs successfully, you are ready to deploy your agent. Visit the deployment documentation to learn more.
Video Recaptioning using GPT-4o-mini
The goals of this example are:- Create an Editor Agent that automatically generates multiple variations of video captions.
- Demonstrate how to use OpenAI’s GPT-4o-mini model to enhance human-created video captions with a FastAPI-based agent.
- Created a virtual Python environment.
- Installed all necessary dependencies.
- Have an OpenAI API key.
- Are able to authenticate with Encord.
- One text classification for human-created summaries of what is happening in the video.
- Three text classifications to be automatically filled by the LLM.

Ontology JSON and Script
Ontology JSON and Script
- A human watches the video and enters a caption in the first text field.
- The agent is then triggered and generates three additional caption variations for review.
- Each video is first annotated by a human (ANNOTATE stage).
- Next, a data agent automatically generates alternative captions (AGENT stage).
- Finally, a human reviews all four captions (REVIEW stage) before the task is marked complete.
If no human caption is present when the agent is triggered, the task is sent back for annotation.
If the review stage results in rejection, the task is also returned for re-annotation.
- Set up imports and create a Pydantic model for our LLM’s structured output
- Create a detailed system prompt for the LLM that explains exactly what kind of rephrasing we want
- We configure the LLM to use structured outputs based on our model
- Create a helper function to prompt the model with both text and image:
- Initialize the FastAPI app with the required CORS middleware:
- Define the agent to handle the recaptioning. This includes:
- Retrieving the existing human-created caption, prioritizing captions from the current frame or falling back to frame zero.
- Sending the first frame of the video along with the human caption to the LLM.
- Processing the response from the LLM, which provides three alternative phrasings of the original caption.
- Updating the label row with the new captions, replacing any existing ones.
- In your current terminal, run the following command to run the FastAPI server:
- Open your Project in the Encord platform, navigate to a video frame, and add your initial caption. Copy the URL from your browser.
- In another shell operating from the same working directory, source your virtual environment and test the agent:
- Refresh your browser to view the three AI-generated caption variations. Once the test runs successfully, you are ready to deploy your agent. Visit the deployment documentation to learn more.