This documentation is only relevant for customers with early access to Encord Files. Contact support@encord.com to learn more and gain access. If you do not have access to Files, see our documentation here to learn about uploading cloud data. Data from your private cloud can be uploaded to Files, or directly to Dataset.

At least one data integration is required to upload cloud data to Encord. Encord can integrate with the following cloud service providers:


Upload cloud data to Files

We recommend uploading smaller batches of data: limit uploads to 100 videos and up to 1000 images at a time. Familiarize yourself with our limits and best practices for data import before uploading data to Encord.
  1. Navigate to Files section of Index in the Encord platform.
  2. Click into a Folder.
  3. Click + Upload files. A dialog appears.
  1. Click Import from cloud data.
We recommend turning on the Ignore individual file errors feature. This ensures that individual file errors do not lead to the whole upload process being aborted.
  1. Click Add JSON or CSV files to add a JSON or CSV file specifying cloud data that is to be added.

Upload cloud data to Datasets

We recommend uploading files in batches not exceeding 2GB, to ensure upload does not exceed 3 hours.
  1. Create a Dataset.

  2. Select the Dataset you want to upload data to.

  3. Click +Upload files.

  1. Select an folder to store the files in, or create a new folder.

  2. Select the Import from private cloud tab and select the integration you want to use.

  1. Click Add JSON or CSV files to upload a JSON or CSV file specifying the cloud data that is to be added to the Dataset. Turn on the Ignore individual file errors toggle to ignore errors caused by files not supported by Encord.
We recommend enabling the Ignore individual file errors toggle. This ensures that the entire upload does not fail if only one file cannot be added.
  1. Click Import to add your cloud data to the Dataset.
The data will be fetched from your cloud storage and processed asynchronously. This involves fetching appropriate metadata and other file information to help us render the files appropriately and to check for any framerate inconsistencies. We do not store your files in any way.

Check upload status

You can check the progress of the processing job by clicking the bell icon in the top right.
A spinning progress indicator shows that the processing job is still in progress.

  • If successful, the processing completes with a green tick icon.
  • If unsuccessful, there is a red cross icon, as seen below.

If this is the case, please check that your provider permissions have been set correctly, that the object data format is supported, and that the JSON or CSV file is correctly formatted.

Check which files failed to upload by clicking the Export icon to download a CSV log file. Every row in the CSV will correspond to a file which failed to be uploaded.

You will only see failed uploads if the Ignore individual file errors toggle wasn’t enabled when uploading your data.

Specify cloud data

To upload private cloud data, you must supply either a JSON or CSV file, specifying the URLs of all the files you want to add. Click Add JSON or CSV files when uploading cloud data to add a JSON or CSV file.

JSON Format

The JSON file format is a JSON object with top-level keys specifying the type of data and object URLs of the files you want to upload to Encord. You can add one data type at a time, or combine multiple data types in one JSON.

The supported top-level keys are: videos, audio, image_groups, images, and dicom_series. The details for each data format are given in the sections below.

An optional top-level skip_duplicate_urls key can be included and set to true, ensuring that all object URLs that exactly match existing files in the Dataset are skipped.

Encord enforces the following upload limits for each JSON file used for file uploads:

  • Up to 1 million URLs
  • A maximum of 500,000 items (e.g. images, image groups, videos, DICOMs)
  • URLs can be up to 16 KB in size

Optimal upload chunking can vary depending on your data type and the amount of associated metadata. For tailored recommendations, contact Encord support. We recommend starting with smaller uploads and gradually increasing the size based on how quickly jobs are processed. Generally, smaller chunks result in faster data reflection within the platform.


Client metadata & skip duplicate URLs

You can optionally add some custom client metadata per data item in the clientMetadata field (examples show how this is done). Client metadata is separate from video_metadata, and is intended as an arbitrary store of data you want to associate with a file.

We enforce a 10MB limit on the client metadata for each data item. Internally, we store client metadata is stored as a PostgreSQL jsonb type. Read the relevant PostgreSQL documentation about the jsonb type and its behaviors. For example, jsonb type does not preserve key order or duplicate keys.

Add the "skip_duplicate_urls": true flag at the top level to make the uploads idempotent. Skipping URLs in the Dataset can help speed up large upload operations. Since previously processed assets do not have to be uploaded again, you can simply retry the failed operation without editing the upload specification file. The flag’s default value isfalse.

These features are currently only supported for JSON uploads.

Update client metadata

To update client metadata:

  1. Create an upload JSON file with the updated client metadata. Include the "skip_duplicate_urls": true and upsert_metadata: true flags.
  • Client metadata updates require skip_duplicate_urls: true to function. It does not work if skip_duplicate_urls: false.
  • Only client metadata for pre-existing files is updated. Any new files present in the JSON are uploaded.
Update client metadata example
{
  "videos": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-custom-video-title.mp4",
      "clientMetadata": {"optional": "metadata"}
    }
  ],
  "skip_duplicate_urls": true,
  "upsert_metadata": true
}
  1. Start a new file upload to Encord using the new JSON file.

When using a Multi-Region Access Point

When using a Multi-Region Access Point for your AWS S3 buckets, specify objects using the ARN of the Multi-Region Access Point followed by the object name. The following example demonstrates how to specify video files from a Multi-Region Access Point.

{
  "videos": [
    {
      "objectUrl": "Multi-Region-Access-Point-ARN + <object name_1>"
    },
    {
      "objectUrl": "Multi-Region-Access-Point-ARN + <object name_2>",
      "title": "my-custom-video-title.mp4",
      "clientMetadata": {"optional": "metadata"}
    }
  ],
  "skip_duplicate_urls": true
}

CSV Format

In the CSV file format, the column headers specify which type of data is being uploaded. You can add and single file format at a time, or combine multiple data types in a single CSV file.

Details for each data format are given in the sections below.

Encord supports up to 10,000 entries for upload in the CSV file.
  • Object URLs can’t contain whitespace.
  • For backwards compatibility reasons, a single column CSV is supported. A file with the single ObjectUrl column is interpreted as a request for video upload. If your objects are of a different type (for example, images), this error displays: “Expected a video, got a file of type XXX”.

Help Scripts and Examples

Use the following examples and helpful scripts to quickly learn how to create JSON and CSV files formatted for the dataset creation process, by constructing the URLs from the specified path in your private storage.