We strongly recommend that highly technical users (examples: IT professionals, software developers, or system administrators) perform the steps outlined in this process.
Third-party products, such as OpenAI’s API, come with their own terms and conditions.
The following example uses ChatGPT-4o to automatically update classifications in Encord each time the Agent triggers. The example uses FastAPI as a server to host and run the code. The server returns an empty 200 response, indicating that the request has succeeded and causing the Label Editor to refresh.
Create a directory containing the following files.
requirements.txt: Installs all the dependencies necessary for the example to work.
openai_api_agent.py: Makes the call to the OpenAI API. Ensure that you replace <classification_name> with the name of the classification you want GPT 4o to update.
data_models.py: Ensures that the response from OpenAI has the specific format required by Encord.
dependencies.py: Retrieves the information necessary for the query to be made and the label row to be updated.
main.py: Runs the program in the correct order.
import base64import loggingfrom pathlib import Pathfrom encord.objects.classification import Classificationfrom encord.objects.classification_instance import ClassificationInstancefrom encord.objects.ontology_labels_impl import LabelRowV2from encord.objects.ontology_structure import OntologyStructurefrom openai import OpenAIfrom openai.types.chat import( ChatCompletionContentPartImageParam, ChatCompletionContentPartTextParam, ChatCompletionSystemMessageParam, ChatCompletionUserMessageParam,)from openai.types.chat.chat_completion_content_part_image_param import ImageURLfrom openai.types.chat.completion_create_params import ResponseFormatfrom pydantic import BaseModel, Fielddefto_image_completion_content_part( image_path: Path,)-> ChatCompletionContentPartImageParam:""" Convert an image path into a base64 encoding to be sent to gpt."""with image_path.open("rb")as image_file: content = base64.b64encode(image_file.read()).decode("utf-8")return ChatCompletionContentPartImageParam( image_url=ImageURL(url=f"data:image/jpeg;base64,{content}", detail="auto"),type="image_url",)defget_ontology_classification(ontology: OntologyStructure)-> Classification:""" Replace <classification_name>with the name of the classification in your Ontology you want to update."""return ontology.get_child_by_title("<classification_name>", type_=Classification)"""Below is an example of how to define a pydantic model for extracting text.GPT also understands if you use list types and enums. For more examples,have a look at these notebooks:-[GPT-4o example with videos](https://colab.research.google.com/drive/1ctV-Zpoks7PDEXisVvpP1NeocyBkWXzp?usp=sharing)-[Gemini 1.5 Pro with advanced pydantic models](https://colab.research.google.com/drive/1jeCCZrumLnCwdVHbn-wK46xUPQQ9KCtf?usp=sharing)"""classDescriptionModel(BaseModel): description:str= Field( min_length=25, max_length=1000, description="A detailed description of the scene",)defdescribe_scene(label_row: LabelRowV2, asset: Path)-> ClassificationInstance |None: system_instruction =f"""Your are an image analysis expert. Your task is to extract the most relevant description of the image content provided.You are expected to only respond in the form of the following JSON schema.```json{DescriptionModel.model_json_schema()}```Ensure that you do not wrap the objectin a list. Only a single object conforming to the JSON schema is allowed.""" client = OpenAI() completion = client.chat.completions.create( model="gpt-4o", messages=[ ChatCompletionSystemMessageParam(role="system", content=system_instruction), ChatCompletionUserMessageParam( role="user", content=[to_image_completion_content_part(asset)]+[ ChatCompletionContentPartTextParam(type="text", text="Please build a JSON object with respect to this visual data. Follow the JSON schema provided to fill in the schema as accurately as you can.",)],),], response_format={"type":"json_object"}, max_tokens=1000,) raw_text = completion.choices[0].message.contentif raw_text isNone: logging.error("No response")raise ValueError("Missing response from GPT-4o")try: labels = DescriptionModel.model_validate_json(raw_text)except Exception: logging.error("Unable to parse text",) logging.error(raw_text)returnNone ontology_classification = get_ontology_classification(label_row.ontology_structure) instance = ontology_classification.create_instance() instance.set_answer(labels.description)return instance
Run your Agent and server locally using the following terminal command. Ensure that you replace:
<private_key_path> with the path to your private Encord API key.
<openai_key_path> with the path / key to your OpenAI API key.
ENCORD_SSH_KEY_FILE=\<private_key_path> OPENAI_API_KEY='<openai_key_path>' fastapi dev main.py
Keep this terminal window open as long as you want to run your server locally. The following steps should be performed in a new terminal window.
In a new terminal window, point the server to a specific data asset in Encord. This step replicates the triggering of the Agent in the Encord platform. Ensure that you:
Replace <project_hash> with the hash of your Project.
Replace <data_hash> with the hash of the data unit you want to run your Agent on.
Both the <project_hash> and the <data_hash> can be found in the url of the Label Editor. The url is structured like this:
https://app.encord.com/label_editor/<project_hash>\<data_hash> or this https://app.us.encord.com/label_editor/<project_hash>\<data_hash>
The Agent deploys to your server using a container image, such as Docker. The exact method of deployment varies depending on your choice of server.
Install the requirements from the requirements.txt file. Documentation for creating a Docker image for FastAPI can be found here.
Ensure that you set up environment variables to authenticate with Encord and OpenAI. Information on authenticating with OpenAI can be found in their documentation here.