3.1 KiB
title |
---|
Folder Sync |
This example shows how to use the clearml-data folder sync function.
clearml-data folder sync mode is useful for cases when users have a single point of truth (i.e. a folder) that updates
from time to time. When the point of truth is updated, users can call clearml-data sync
and the
changes (file addition, modification, or removal) will be reflected in ClearML.
Creating Initial Version
Prerequisites
First, make sure that you have cloned the clearml repository. This contains all the needed files.
- Open terminal and change directory to the cloned repository's examples folder
cd clearml/examples/reporting
Syncing a Folder
Create a dataset and sync the data_samples
folder from the repo to ClearML
clearml-data sync --project datasets --name sync_folder --folder data_samples
Expected response:
clearml-data - Dataset Management & Versioning CLI
Creating a new dataset:
New dataset created id=0d8f5f3e5ebd4f849bfb218021be1ede
Syncing dataset id 0d8f5f3e5ebd4f849bfb218021be1ede to local folder data_samples
Generating SHA2 hash for 5 files
Hash generation completed
Sync completed: 0 files removed, 5 added / modified
Finalizing dataset
Pending uploads, starting dataset upload to https://files.community.clear.ml
Uploading compressed dataset changes (5 files, total 222.17 KB) to https://files.community.clear.ml
Upload completed (222.17 KB)
2021-05-04 09:57:56,809 - clearml.Task - INFO - Waiting to finish uploads
2021-05-04 09:57:57,581 - clearml.Task - INFO - Finished uploading
Dataset closed and finalized
As can be seen, the clearml-data sync
command creates the dataset, then uploads the files, and closes the dataset.
Modifying Synced Folder
Now we'll modify the folder:
- Add another line to one of the files in the
data_samples
folder. - Add a file to the sample_data folder.
Runecho "data data data" > data_samples/new_data.txt
(this will create the filenew_data.txt
and put it in thedata_samples
folder)
We'll repeat the process of creating a new dataset with the previous one as its parent, and syncing the folder.
clearml-data sync --project datasets --name second_ds --parents a1ddc8b0711b4178828f6c6e6e994b7c --folder data_samples
Expected response:
clearml-data - Dataset Management & Versioning CLI
Creating a new dataset:
New dataset created id=0992dd6bae6144388e0f2ef131d9724a
Syncing dataset id 0992dd6bae6144388e0f2ef131d9724a to local folder data_samples
Generating SHA2 hash for 6 files
Hash generation completed
Sync completed: 0 files removed, 2 added / modified
Finalizing dataset
Pending uploads, starting dataset upload to https://files.community.clear.ml
Uploading compressed dataset changes (2 files, total 742 bytes) to https://files.community.clear.ml
Upload completed (742 bytes)
2021-05-04 10:05:42,353 - clearml.Task - INFO - Waiting to finish uploads
2021-05-04 10:05:43,106 - clearml.Task - INFO - Finished uploading
Dataset closed and finalized
We can see that 2 files were added or modified, just as we expected!