Upload Cloud Data
At least one data integration is required to upload cloud data to Encord. Encord can integrate with the following cloud service providers:
Any files you upload to Encord must be stored in folders. Click here to learn how to create a folder to store your files.
Import Cloud Data to Files
STEP 1: Create a JSON or CSV File for Import
Before importing your cloud data to Encord you must first create a JSON or CSV file specifying the files you want to import.
JSON Format
We provide helpful scripts and examples that automatically generate JSON and CSV files for all the files in a folder or bucket within your cloud storage. This makes importing large datasets easier and more efficient.
The JSON file format is a JSON object with top-level keys specifying the type of data and object URLs of the files you want to upload to Encord. You can add one data type at a time, or combine multiple data types in one JSON.
The supported top-level keys are: videos
, audio
, image_groups
, images
, and dicom_series
. The details for each data format are given in the sections below.
See our tips for increasing the speed of file uploads here.
"skip_duplicate_urls": true
flag at the top level to make the uploads idempotent. Skipping URLs can help speed up large upload operations. Since previously processed assets do not have to be uploaded again, you can simply retry the failed operation without editing the upload specification file. The flag’s default value isfalse
.Encord enforces the following upload limits for each JSON file used for file uploads:
- Up to 1 million URLs
- A maximum of 500,000 items (e.g. images, image groups, videos, DICOMs)
- URLs can be up to 16 KB in size
Optimal upload chunking can vary depending on your data type and the amount of associated metadata. For tailored recommendations, contact Encord support. We recommend starting with smaller uploads and gradually increasing the size based on how quickly jobs are processed. Generally, smaller chunks result in faster data reflection within the platform.
CSV Format
In the CSV file format, the column headers specify which type of data is being uploaded. You can add and single file format at a time, or combine multiple data types in a single CSV file.
Details for each data format are given in the sections below.
- Object URLs can’t contain whitespace.
- For backwards compatibility reasons, a single column CSV is supported. A file with the single
ObjectUrl
column is interpreted as a request for video upload. If your objects are of a different type (for example, images), this error displays: “Expected a video, got a file of type XXX”.
STEP 2: Import Your Cloud Data
- Navigate to Files section of Index in the Encord platform.
- Click into a Folder.
- Click + Upload files. A dialog appears.
- Click Import from cloud data.
Custom Metadata
Custom metadata can only be added through JSON uploads in the Encord Platform or via the Encord SDK.
Custom metadata, also known as client metadata, is supplementary information you can add to all data imported into Encord. It is provided in the form of a Python dictionary, as shown in examples. Client metadata serves several key functions:
- Filtering and sorting in Index and Active.
- Creating custom Label Editor layouts based on metadata.
You can optionally add some custom metadata per data item in the clientMetadata
field (examples show how this is done) of your JSON file.
We enforce a 10MB limit on the custom metadata for each data item. Internally, we store custom metadata as a PostgreSQL jsonb
type. Read the relevant PostgreSQL documentation about the jsonb
type and its behaviors. For example, jsonb
type does not preserve key order or duplicate keys.
Metadata Schema
Metadata schemas, including custom embeddings, can only be imported through the Encord SDK.
Based on your Data Discoverability Strategy, you need to create a metadata schema. The schema provides a method of organization for your custom metadata. Encord supports:
- Scalers: Methods for filtering.
- Enums: Methods with options for filtering.
- Embeddings: Method for embedding plot visualization, similarity search, and natural language search.
Custom metadata
Custom metadata refers to any additional information you attach to files, allowing for better data curation and management based on your specific needs. It can include any details relevant to your workflow, helping you organize, filter, and retrieve data more efficiently. For example, for a video of a construction site, custom metadata could include fields like "site_location": "Algiers"
, "project_phase": "foundation"
, or "weather_conditions": "sunny"
. This enables more precise tracking and management of your data.
Before importing any files with custom metadata to Encord, we recommend that you import a metadata schema. Encord uses metadata schemas to validate custom metadata uploaded to Encord and to instruct Index and Active how to display your metadata.
video.description
, while team B could use audio.description
. Another example could be TeamName.MetadataKey
. This approach maintains clarity and avoids key collisions across departments.Metadata schema table
Use add_scalar
to add a scalar key to your metadata schema.
Scalar Key | Description | Display Benefits |
---|---|---|
boolean | Binary data type with values “true” or “false”. | Filtering by binary values |
datetime | ISO 8601 formatted date and time. | Filtering by time and date |
number | Numeric data type supporting float values. | Filtering by numeric values |
uuid | Customer specified unique identifier for a data unit. | Filtering by customer specified unique identifier |
varchar | Textual data type. Formally string . string can be used as an alias for varchar , but we STRONGLY RECOMMEND that you use varchar . | Filtering by string. |
text | Text data with unlimited length (example: transcripts for audio). Formally long_string . long_string can be used as an alias for text , but we STRONGLY RECOMMEND that you use text . | Storing and filtering large amounts of text. |
Use add_enum
and add_enum_options
to add an enum and enum options to your metadata schema.
Key | Description | Display Benefits |
---|---|---|
enum | Enumerated type with predefined set of values. | Facilitates categorical filtering and data validation |
Use add_embedding
to add an embedding to your metadata schema.
Key | Description | Display Benefits |
---|---|---|
embedding | 1 to 4096 for Index. 1 to 2000 for Active. | Filtering by embeddings, similarity search, 2D scatter plot visualization (Coming Soon) |
Incorrectly specifying a data type in the schema can cause errors when filtering your data in Index or Active. If you encounter errors while filtering, verify your schema is correct. If your schema has errors, correct the errors, re-import the schema, and then re-sync your Active Project.
Import your metadata schema to Encord
Verify your schema
After importing your schema to Encord we recommend that you verify that the import is successful. Run the following code to verify your metadata schema imported and that the schema is correct.
Update Custom Metadata (JSON)
When updating custom metadata using a JSON file, you MUST specify "skip_duplicate_urls": true
and "upsert_metadata": true
.
Specifying the "skip_duplicate_urls": true
and "upsert_metadata": true
flags in the JSON file means the import does the following:
-
New files (and the custom metadata for those files) import into Encord.
-
Existing files have their existing custom metadata overwritten with the custom metadata specified in the JSON file.
To update custom metadata with a JSON file:
- Create an upload JSON file with the updated custom metadata. Include the
"skip_duplicate_urls": true
and"upsert_metadata": true
flags.
- Custom metadata updates require
"skip_duplicate_urls": true
to function. It does not work if"skip_duplicate_urls": false
. - Only custom metadata for pre-existing files is updated. Any new files present in the JSON are uploaded.
- Start a new file upload to Encord using the new JSON file.
Custom Embeddings
Metadata schemas, including custom embeddings, can only be imported through the Encord SDK.
Encord enables the use of custom embeddings for images, image sequences, image groups, and individual video frames.
To learn how to use custom embeddings in Encord, see our documentation here.
Step 1: Create a New Embedding Type
A key is required in your custom metadata schema for your embeddings. You can use any string as the key for your embeddings. We strongly recommend that you use a string that is meaningful.
If you do not include a key in your metadata schema, your imported embeddings are treated as strings.
Use add_embedding
to add an embedding to your metadata schema.
Key | Description | Display Benefits |
---|---|---|
embedding | 1 to 4096 for Index. 1 to 2000 for Active | Filtering by embeddings, similarity search, 2D scatter plot visualization (Coming Soon) |
Step 2: Upload Embeddings
With the key in the custom metadata schema ready, we can now import our embeddings.
Custom embedding sizes are flexible and can be set anywhere between 1 and 4096.
You can import embeddings after you have imported your data or during your data import.
If config
is not specified, the sampling_rate
is 1 frame per second, and the keyframe_mode
is frame
.
sampling_rate
of 0
only imports the first frame and all keyframes of your video into Index.How To Increase File Upload Speed
To speed up file imports into Encord, you can include metadata for each file in the upload JSON. This metadata is used directly without additional validation and is not stored on our servers. Ensuring accuracy in the metadata you provide is essential to maintain precise labels.
The metadata referenced here is distinct from clientMetadata
and serves a different purpose. Documentation for clientMetadata
can be found here.
-
imageMetadata
for images:mimeType
: MIME type of the image (e.g.,image/jpeg
).fileSize
: Size of the file in bytes.width
: Width of the image in pixels.height
: Height of the image in pixels.
-
audioMetadata
for audio files:duration_seconds
(float): Audio duration in seconds.file_size
(int): Size of the audio file in bytes.mime_type
(str): MIME type (e.g.,audio/mpeg
,audio/wav
).sample_rate
(int): Sample rate in Hz.bit_depth
(int): Size of each sample in bits.codec
(str): Codec used (e.g.,mp3
,pcm
).num_channels
(int): Number of audio channels.
-
videoMetadata
for videos:fps
: Frames per second.duration
: Duration in seconds.width
/height
: Dimensions in pixels.file_size
: File size in bytes.mime_type
: File type (MIME standard).
Check Data Upload Status
You can check the progress of the processing job by clicking the bell icon in the top right corner of the Encord app.
- A spinning progress indicator shows that the processing job is still in progress.
- If successful, the processing completes with a green tick icon.
- If unsuccessful, there is a red cross icon, as seen below.
If the upload is unsuccessful, ensure that:
- Your provider permissions are set correctly
- The object data format is supported
- The upload JSON or CSV file is correctly formatted.
Check which files failed to upload by clicking the Export icon to download a CSV log file. Every row in the CSV corresponds to a file which failed to be uploaded.
Helpful Scripts and Examples
Use the following examples and helpful scripts to quickly learn how to create JSON and CSV files formatted for the dataset creation process, by constructing the URLs from the specified path in your private storage.
Was this page helpful?