The following script prints the storage locations of all files in a Dataset. This includes the cloud storage locations for private cloud data, and Encord storage location for local data in the Dataset. Knowing where your files are storage helps to cross-verify that all data from a cloud bucket has been added to a Dataset.
To learn how to view the storage locations of all files in a Project, see our documentation here.
In the following script, ensure that you:
Replace <private_key_path> with the path to your private key.
Replace <dataset_hash> with the hash of the Dataset you want to know the storage locations for.
Copy
# Import dependenciesfrom encord import EncordUserClient, Project,Datasetfrom encord.objects.project import ProjectDatasetfrom encord.orm.dataset import DatasetAccessSettings# Instantiate clientuser_client = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path="<private_key_path>")# Print URLs of all files in the Datasetdataset_level_file_links = []dataset: Dataset = user_client.get_dataset("<dataset_hash>")for data in dataset.list_data_rows(): dataset_level_file_links.append(data.file_link)print(dataset_level_file_links)