mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
Refactor ClearML Data docs (#108)
This commit is contained in:
@@ -1,302 +0,0 @@
|
||||
---
|
||||
title: Dataset Management Using CIFAR10
|
||||
---
|
||||
|
||||
In this tutorial, we are going use a CIFAR example, manage the CIFAR dataset with `clearml-data`, and then replace our
|
||||
current dataset read method with one that interfaces with `clearml-data`.
|
||||
|
||||
## Creating the Dataset
|
||||
|
||||
### Downloading the Data
|
||||
Before we can register the CIFAR dataset with `clearml-data` we need to obtain a local copy of it.
|
||||
|
||||
Execute this python script to download the data
|
||||
```python
|
||||
from clearml import StorageManager
|
||||
# We're using the StorageManager to download the data for us!
|
||||
# It's a neat little utility that helps us download
|
||||
# files we need and cache them :)
|
||||
|
||||
manager = StorageManager()
|
||||
dataset_path = manager.get_local_copy(remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz")
|
||||
# make sure to copy the printed value
|
||||
print("COPY THIS DATASET PATH: {}".format(dataset_path))
|
||||
```
|
||||
|
||||
Expected reponse:
|
||||
```bash
|
||||
COPY THIS DATASET PATH: ~/.clearml/cache/storage_manager/global/f2751d3a22ccb78db0e07874912b5c43.cifar-10-python_artifacts_archive_None
|
||||
```
|
||||
The script prints the path to the downloaded data. It'll be needed later one
|
||||
|
||||
### Creating the Dataset
|
||||
To create the dataset, in a CLI, execute:
|
||||
```
|
||||
clearml-data create --project cifar --name cifar_dataset
|
||||
```
|
||||
|
||||
Expected response:
|
||||
```
|
||||
clearml-data - Dataset Management & Versioning CLI
|
||||
Creating a new dataset:
|
||||
New dataset created id=*********
|
||||
```
|
||||
Where \*\*\*\*\*\*\*\*\* is the dataset ID.
|
||||
|
||||
## Adding Files
|
||||
Add the files we just downloaded to the dataset:
|
||||
```
|
||||
clearml-data add --files <dataset_path>
|
||||
```
|
||||
|
||||
where `dataset_path` is the path that was printed earlier, which denotes the location of the downloaded dataset.
|
||||
|
||||
:::note
|
||||
There's no need to specify a *dataset_id* as *clearml-data* session stores it.
|
||||
:::
|
||||
|
||||
## Finalizing the Dataset
|
||||
Run the close command to upload the files (it'll be uploaded to file server by default):<br/>
|
||||
```
|
||||
clearml-data close
|
||||
```
|
||||
|
||||

|
||||
|
||||
## Using the Dataset
|
||||
Now that we have a new dataset registered, we can consume it.
|
||||
|
||||
We take [this script](https://github.com/allegroai/clearml/blob/master/examples/frameworks/ignite/cifar_ignite.py) as a base to train on the CIFAR dataset.
|
||||
|
||||
We replace the file load part with ClearML's Dataset object. The Dataset's `get_local_copy()` method will return a path
|
||||
to the cached, downloaded dataset.
|
||||
Then we provide the path to Pytorch's dataset object.
|
||||
|
||||
```python
|
||||
dataset_id = "ee1c35f60f384e65bc800f42f0aca5ec"
|
||||
|
||||
from clearml import Dataset
|
||||
dataset_path = Dataset.get(dataset_id=dataset_id).get_local_copy()
|
||||
|
||||
trainset = datasets.CIFAR10(root=dataset_path,
|
||||
train=True,
|
||||
download=False,
|
||||
transform=transform)
|
||||
```
|
||||
|
||||
<details className="cml-expansion-panel info">
|
||||
<summary className="cml-expansion-panel-summary">Full example code using dataset:</summary>
|
||||
<div className="cml-expansion-panel-content">
|
||||
|
||||
```python
|
||||
#These are the obligatory imports
|
||||
from pathlib import Path
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.functional as F
|
||||
import torch.optim as optim
|
||||
import torchvision.datasets as datasets
|
||||
import torchvision.transforms as transforms
|
||||
from ignite.contrib.handlers import TensorboardLogger
|
||||
from ignite.engine import Events, create_supervised_trainer, create_supervised_evaluator
|
||||
from ignite.handlers import global_step_from_engine
|
||||
from ignite.metrics import Accuracy, Loss, Recall
|
||||
from ignite.utils import setup_logger
|
||||
from torch.utils.tensorboard import SummaryWriter
|
||||
from tqdm import tqdm
|
||||
|
||||
from clearml import Task, StorageManager
|
||||
|
||||
# Connecting ClearML with the current process,
|
||||
# from here on everything is logged automatically
|
||||
task = Task.init(project_name='Image Example', task_name='image classification CIFAR10')
|
||||
params = {'number_of_epochs': 20, 'batch_size': 64, 'dropout': 0.25, 'base_lr': 0.001, 'momentum': 0.9, 'loss_report': 100}
|
||||
params = task.connect(params) # enabling configuration override by clearml/
|
||||
print(params) # printing actual configuration (after override in remote mode)
|
||||
|
||||
# This is our original data retrieval code. it uses storage manager to just download and cache our dataset.
|
||||
'''
|
||||
manager = StorageManager()
|
||||
|
||||
dataset_path = Path(manager.get_local_copy(remote_url="https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"))
|
||||
'''
|
||||
|
||||
# Let's now modify it to utilize for the new dataset API, you'll need to copy the created dataset id
|
||||
# to the next variable
|
||||
|
||||
dataset_id = "ee1c35f60f384e65bc800f42f0aca5ec"
|
||||
|
||||
# The below gets the dataset and stores in the cache. If you want to download the dataset regardless if it's in the
|
||||
# cache, use the Dataset.get(dataset_id).get_mutable_local_copy(path to download)
|
||||
from clearml import Dataset
|
||||
dataset_path = Dataset.get(dataset_id=dataset_id).get_local_copy()
|
||||
|
||||
# Dataset and Dataloader initializations
|
||||
transform = transforms.Compose([transforms.ToTensor()])
|
||||
|
||||
trainset = datasets.CIFAR10(root=dataset_path,
|
||||
train=True,
|
||||
download=False,
|
||||
transform=transform)
|
||||
trainloader = torch.utils.data.DataLoader(trainset,
|
||||
batch_size=params.get('batch_size', 4),
|
||||
shuffle=True,
|
||||
num_workers=10)
|
||||
|
||||
testset = datasets.CIFAR10(root=dataset_path,
|
||||
train=False,
|
||||
download=False,
|
||||
transform=transform)
|
||||
testloader = torch.utils.data.DataLoader(testset,
|
||||
batch_size=params.get('batch_size', 4),
|
||||
shuffle=False,
|
||||
num_workers=10)
|
||||
|
||||
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
|
||||
|
||||
tb_logger = TensorboardLogger(log_dir="cifar-output")
|
||||
|
||||
|
||||
# Helper function to store predictions and scores using matplotlib
|
||||
def predictions_gt_images_handler(engine, logger, *args, **kwargs):
|
||||
x, _ = engine.state.batch
|
||||
y_pred, y = engine.state.output
|
||||
|
||||
num_x = num_y = 4
|
||||
le = num_x * num_y
|
||||
fig = plt.figure(figsize=(20, 20))
|
||||
trans = transforms.ToPILImage()
|
||||
for idx in range(le):
|
||||
preds = torch.argmax(F.softmax(y_pred[idx],dim=0))
|
||||
probs = torch.max(F.softmax(y_pred[idx],dim=0))
|
||||
ax = fig.add_subplot(num_x, num_y, idx + 1, xticks=[], yticks=[])
|
||||
ax.imshow(trans(x[idx]))
|
||||
ax.set_title("{0} {1:.1f}% (label: {2})".format(
|
||||
classes[preds],
|
||||
probs * 100,
|
||||
classes[y[idx]]),
|
||||
color=("green" if preds == y[idx] else "red")
|
||||
)
|
||||
logger.writer.add_figure('predictions vs actuals', figure=fig, global_step=engine.state.epoch)
|
||||
|
||||
|
||||
class Net(nn.Module):
|
||||
def __init__(self):
|
||||
super(Net, self).__init__()
|
||||
self.conv1 = nn.Conv2d(3, 6, 3)
|
||||
self.conv2 = nn.Conv2d(6, 16, 3)
|
||||
self.pool = nn.MaxPool2d(2, 2)
|
||||
self.fc1 = nn.Linear(16 * 6 * 6, 120)
|
||||
self.fc2 = nn.Linear(120, 84)
|
||||
self.dorpout = nn.Dropout(p=params.get('dropout', 0.25))
|
||||
self.fc3 = nn.Linear(84, 10)
|
||||
|
||||
def forward(self, x):
|
||||
x = self.pool(F.relu(self.conv1(x)))
|
||||
x = self.pool(F.relu(self.conv2(x)))
|
||||
x = x.view(-1, 16 * 6 * 6)
|
||||
x = F.relu(self.fc1(x))
|
||||
x = F.relu(self.fc2(x))
|
||||
x = self.fc3(self.dorpout(x))
|
||||
return x
|
||||
|
||||
|
||||
# Training
|
||||
def run(epochs, lr, momentum, log_interval):
|
||||
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
net = Net().to(device)
|
||||
criterion = nn.CrossEntropyLoss()
|
||||
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=momentum)
|
||||
|
||||
trainer = create_supervised_trainer(net, optimizer, criterion, device=device)
|
||||
trainer.logger = setup_logger("trainer")
|
||||
|
||||
val_metrics = {"accuracy": Accuracy(),"loss": Loss(criterion), "recall": Recall()}
|
||||
evaluator = create_supervised_evaluator(net, metrics=val_metrics, device=device)
|
||||
evaluator.logger = setup_logger("evaluator")
|
||||
|
||||
# Attach handler to plot trainer's loss every 100 iterations
|
||||
tb_logger.attach_output_handler(
|
||||
trainer,
|
||||
event_name=Events.ITERATION_COMPLETED(every=params.get('loss_report')),
|
||||
tag="training",
|
||||
output_transform=lambda loss: {"loss": loss},
|
||||
)
|
||||
|
||||
# Attach handler to dump evaluator's metrics every epoch completed
|
||||
for tag, evaluator in [("training", trainer), ("validation", evaluator)]:
|
||||
tb_logger.attach_output_handler(
|
||||
evaluator,
|
||||
event_name=Events.EPOCH_COMPLETED,
|
||||
tag=tag,
|
||||
metric_names="all",
|
||||
global_step_transform=global_step_from_engine(trainer),
|
||||
)
|
||||
|
||||
# Attach function to build debug images and report every epoch end
|
||||
tb_logger.attach(
|
||||
evaluator,
|
||||
log_handler=predictions_gt_images_handler,
|
||||
event_name=Events.EPOCH_COMPLETED(once=1),
|
||||
);
|
||||
|
||||
desc = "ITERATION - loss: {:.2f}"
|
||||
pbar = tqdm(initial=0, leave=False, total=len(trainloader), desc=desc.format(0))
|
||||
|
||||
@trainer.on(Events.ITERATION_COMPLETED(every=log_interval))
|
||||
def log_training_loss(engine):
|
||||
pbar.desc = desc.format(engine.state.output)
|
||||
pbar.update(log_interval)
|
||||
|
||||
@trainer.on(Events.EPOCH_COMPLETED)
|
||||
def log_training_results(engine):
|
||||
pbar.refresh()
|
||||
evaluator.run(trainloader)
|
||||
metrics = evaluator.state.metrics
|
||||
avg_accuracy = metrics["accuracy"]
|
||||
avg_nll = metrics["loss"]
|
||||
tqdm.write(
|
||||
"Training Results - Epoch: {} Avg accuracy: {:.2f} Avg loss: {:.2f}".format(
|
||||
engine.state.epoch, avg_accuracy, avg_nll
|
||||
)
|
||||
)
|
||||
|
||||
@trainer.on(Events.EPOCH_COMPLETED)
|
||||
def log_validation_results(engine):
|
||||
evaluator.run(testloader)
|
||||
metrics = evaluator.state.metrics
|
||||
avg_accuracy = metrics["accuracy"]
|
||||
avg_nll = metrics["loss"]
|
||||
tqdm.write(
|
||||
"Validation Results - Epoch: {} Avg accuracy: {:.2f} Avg loss: {:.2f}".format(
|
||||
engine.state.epoch, avg_accuracy, avg_nll
|
||||
)
|
||||
)
|
||||
|
||||
pbar.n = pbar.last_print_n = 0
|
||||
|
||||
@trainer.on(Events.EPOCH_COMPLETED | Events.COMPLETED)
|
||||
def log_time():
|
||||
tqdm.write(
|
||||
"{} took {} seconds".format(trainer.last_event_name.name, trainer.state.times[trainer.last_event_name.name])
|
||||
)
|
||||
|
||||
trainer.run(trainloader, max_epochs=epochs)
|
||||
pbar.close()
|
||||
|
||||
PATH = './cifar_net.pth'
|
||||
torch.save(net.state_dict(), PATH)
|
||||
|
||||
print('Finished Training')
|
||||
print('Task ID number is: {}'.format(task.id))
|
||||
|
||||
|
||||
run(params.get('number_of_epochs'), params.get('base_lr'), params.get('momentum'), 10)
|
||||
```
|
||||
|
||||
</div></details>
|
||||
|
||||
<br/><br/>
|
||||
That's it! All you need to do now is run the full script.
|
||||
@@ -1,79 +0,0 @@
|
||||
---
|
||||
title: Folder Sync
|
||||
---
|
||||
|
||||
This example shows how to use the *clearml-data* folder sync function.
|
||||
|
||||
*clearml-data* folder sync mode is useful for cases when users have a single point of truth (i.e. a folder) that updates
|
||||
from time to time. When the point of truth is updated, users can call `clearml-data sync` and the
|
||||
changes (file addition, modification, or removal) will be reflected in ClearML.
|
||||
|
||||
## Creating Initial Version
|
||||
|
||||
## Prerequisites
|
||||
First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. This contains all
|
||||
the needed files.
|
||||
1. Open terminal and change directory to the cloned repository's examples folder
|
||||
`cd clearml/examples/reporting`
|
||||
|
||||
## Syncing a Folder
|
||||
Create a dataset and sync the `data_samples` folder from the repo to ClearML
|
||||
```bash
|
||||
clearml-data sync --project datasets --name sync_folder --folder data_samples
|
||||
```
|
||||
|
||||
Expected response:
|
||||
|
||||
```
|
||||
clearml-data - Dataset Management & Versioning CLI
|
||||
Creating a new dataset:
|
||||
New dataset created id=0d8f5f3e5ebd4f849bfb218021be1ede
|
||||
Syncing dataset id 0d8f5f3e5ebd4f849bfb218021be1ede to local folder data_samples
|
||||
Generating SHA2 hash for 5 files
|
||||
Hash generation completed
|
||||
Sync completed: 0 files removed, 5 added / modified
|
||||
Finalizing dataset
|
||||
Pending uploads, starting dataset upload to https://files.community.clear.ml
|
||||
Uploading compressed dataset changes (5 files, total 222.17 KB) to https://files.community.clear.ml
|
||||
Upload completed (222.17 KB)
|
||||
2021-05-04 09:57:56,809 - clearml.Task - INFO - Waiting to finish uploads
|
||||
2021-05-04 09:57:57,581 - clearml.Task - INFO - Finished uploading
|
||||
Dataset closed and finalized
|
||||
```
|
||||
|
||||
As can be seen, the `clearml-data sync` command creates the dataset, then uploads the files, and closes the dataset.
|
||||
|
||||
|
||||
## Modifying Synced Folder
|
||||
|
||||
Now we'll modify the folder:
|
||||
1. Add another line to one of the files in the `data_samples` folder.
|
||||
1. Add a file to the sample_data folder.<br/>
|
||||
Run`echo "data data data" > data_samples/new_data.txt` (this will create the file `new_data.txt` and put it in the `data_samples` folder)
|
||||
|
||||
|
||||
We'll repeat the process of creating a new dataset with the previous one as its parent, and syncing the folder.
|
||||
|
||||
```bash
|
||||
clearml-data sync --project datasets --name second_ds --parents a1ddc8b0711b4178828f6c6e6e994b7c --folder data_samples
|
||||
```
|
||||
|
||||
Expected response:
|
||||
```
|
||||
clearml-data - Dataset Management & Versioning CLI
|
||||
Creating a new dataset:
|
||||
New dataset created id=0992dd6bae6144388e0f2ef131d9724a
|
||||
Syncing dataset id 0992dd6bae6144388e0f2ef131d9724a to local folder data_samples
|
||||
Generating SHA2 hash for 6 files
|
||||
Hash generation completed
|
||||
Sync completed: 0 files removed, 2 added / modified
|
||||
Finalizing dataset
|
||||
Pending uploads, starting dataset upload to https://files.community.clear.ml
|
||||
Uploading compressed dataset changes (2 files, total 742 bytes) to https://files.community.clear.ml
|
||||
Upload completed (742 bytes)
|
||||
2021-05-04 10:05:42,353 - clearml.Task - INFO - Waiting to finish uploads
|
||||
2021-05-04 10:05:43,106 - clearml.Task - INFO - Finished uploading
|
||||
Dataset closed and finalized
|
||||
```
|
||||
|
||||
We can see that 2 files were added or modified, just as we expected!
|
||||
@@ -1,165 +0,0 @@
|
||||
---
|
||||
title: Data Management Example
|
||||
---
|
||||
|
||||
In this example we'll create a simple dataset and demonstrate basic actions on it.
|
||||
|
||||
## Prerequisites
|
||||
First, make sure that you have cloned the [clearml](https://github.com/allegroai/clearml) repository. This contains all
|
||||
the needed files.
|
||||
1. Open terminal and change directory to the cloned repository's examples folder
|
||||
`cd clearml/examples/reporting`
|
||||
|
||||
## Creating Initial Dataset
|
||||
|
||||
1. To create the dataset, run this code:
|
||||
|
||||
```bash
|
||||
clearml-data create --project datasets --name HelloDataset
|
||||
```
|
||||
|
||||
Expected response:
|
||||
|
||||
```bash
|
||||
clearml-data - Dataset Management & Versioning CLI
|
||||
Creating a new dataset:
|
||||
New dataset created id=24d05040f3e14fbfbed8edb1bf08a88c
|
||||
```
|
||||
|
||||
1. Now let's add a folder. File addition is recursive, so it's enough to point at the folder
|
||||
to captures all files and subfolders:
|
||||
|
||||
```bash
|
||||
clearml-data add --files data_samples
|
||||
```
|
||||
|
||||
Expected response:
|
||||
|
||||
```bash
|
||||
clearml-data - Dataset Management & Versioning CLI
|
||||
Adding files/folder to dataset id 24d05040f3e14fbfbed8edb1bf08a88c
|
||||
Generating SHA2 hash for 2 files
|
||||
Hash generation completed
|
||||
5 files added
|
||||
```
|
||||
:::note
|
||||
After creating a dataset, we don't have to specify its ID when running commands, such as *add*, *remove* or *list*
|
||||
:::
|
||||
|
||||
1. Close the dataset - this command uploads the files. By default, the files are uploaded to the file server, but
|
||||
this can be configured with the `--storage` flag to any of ClearML's supported storage mediums (see [storage](../../integrations/storage.md)).
|
||||
The command also finalizes the dataset, making it immutable and ready to be consumed.
|
||||
|
||||
```bash
|
||||
clearml-data close
|
||||
```
|
||||
|
||||
Expected response:
|
||||
|
||||
```bash
|
||||
clearml-data - Dataset Management & Versioning CLI
|
||||
Finalizing dataset id 24d05040f3e14fbfbed8edb1bf08a88c
|
||||
Pending uploads, starting dataset upload to https://files.community-master.hosted.allegro.ai
|
||||
Pending uploads, starting dataset upload to https://files.community.clear.ml
|
||||
Uploading compressed dataset changes (4 files, total 221.56 KB) to https://files.community.clear.ml
|
||||
Upload completed (221.56 KB)
|
||||
2021-05-04 09:32:03,388 - clearml.Task - INFO - Waiting to finish uploads
|
||||
2021-05-04 09:32:04,067 - clearml.Task - INFO - Finished uploading
|
||||
Dataset closed and finalized
|
||||
```
|
||||
|
||||
## Listing Dataset Content
|
||||
|
||||
To see that all the files were added to the created dataset, use `clearml-data list` and enter the ID of the dataset
|
||||
that was just closed.
|
||||
|
||||
```bash
|
||||
clearml-data list --id 24d05040f3e14fbfbed8edb1bf08a88c
|
||||
```
|
||||
|
||||
Expected response:
|
||||
|
||||
```console
|
||||
clearml-data - Dataset Management & Versioning CLI
|
||||
|
||||
List dataset content: 24d05040f3e14fbfbed8edb1bf08a88c
|
||||
Listing dataset content
|
||||
file name | size | hash
|
||||
------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
dancing.jpg | 40,484 | 78e804c0c1d54da8d67e9d072c1eec514b91f4d1f296cdf9bf16d6e54d63116a
|
||||
data.csv | 21,440 | b618696f57b822cd2e9b92564a52b3cc93a2206f41df3f022956bb6cfe4e7ad5
|
||||
picasso.jpg | 114,573 | 6b3c67ea9ec82b09bd7520dd09dad2f1176347d740fd2042c88720e780691a7c
|
||||
sample.json | 132 | 9c42a9a978ac7a71873ebd5c65985e613cfaaff1c98f655af0d2ee0246502fd7
|
||||
sample.mp3 | 72,142 | fbb756ae14005420ff00ccdaff99416bebfcea3adb7e30963a69e68e9fbe361b
|
||||
Total 5 files, 248771 bytes
|
||||
```
|
||||
|
||||
## Creating a Child Dataset
|
||||
|
||||
In Clear Data, it's possible to create datasets that inherit the content of other datasets, there are called child datasets.
|
||||
|
||||
1. Create a new dataset, specifying the previously created one as its parent:
|
||||
|
||||
```bash
|
||||
clearml-data create --project datasets --name HelloDataset-improved --parents 24d05040f3e14fbfbed8edb1bf08a88c
|
||||
```
|
||||
:::note
|
||||
You'll need to input the Dataset ID you received when created the dataset above
|
||||
:::
|
||||
|
||||
1. Now, we want to add a new file.
|
||||
* Create a new file: `echo "data data data" > new_data.txt` (this will create the file `new_data.txt`),
|
||||
* Now add the file to the dataset:
|
||||
|
||||
```bash
|
||||
clearml-data add --files new_data.txt
|
||||
```
|
||||
Which should return this output:
|
||||
|
||||
```console
|
||||
clearml-data - Dataset Management & Versioning CLI
|
||||
Adding files/folder to dataset id 8b68686a4af040d081027ba3cf6bbca6
|
||||
1 file added
|
||||
```
|
||||
|
||||
1. Let's also remove a file. We'll need to specify the file's full path (within the dataset, not locally) to remove it.
|
||||
|
||||
```bash
|
||||
clearml-data remove --files data_samples/dancing.jpg
|
||||
```
|
||||
|
||||
Expected response:
|
||||
```bash
|
||||
clearml-data - Dataset Management & Versioning CLI
|
||||
Removing files/folder from dataset id 8b68686a4af040d081027ba3cf6bbca6
|
||||
1 files removed
|
||||
```
|
||||
|
||||
1. Close and finalize the dataset
|
||||
|
||||
```bash
|
||||
clearml-data close
|
||||
```
|
||||
|
||||
1. Let's take a look again at the files in the dataset:
|
||||
|
||||
```
|
||||
clearml-data list --id 8b68686a4af040d081027ba3cf6bbca6
|
||||
```
|
||||
|
||||
And we see that our changes have been made! `new_data.txt` has been added, and `dancing.jpg` has been removed.
|
||||
|
||||
```
|
||||
file name | size | hash
|
||||
------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
data.csv | 21,440 | b618696f57b822cd2e9b92564a52b3cc93a2206f41df3f022956bb6cfe4e7ad5
|
||||
new_data.txt | 15 | 6df986a2154902260a836febc5a32543f5337eac60560c57db99257a7e012051
|
||||
picasso.jpg | 114,573 | 6b3c67ea9ec82b09bd7520dd09dad2f1176347d740fd2042c88720e780691a7c
|
||||
sample.json | 132 | 9c42a9a978ac7a71873ebd5c65985e613cfaaff1c98f655af0d2ee0246502fd7
|
||||
sample.mp3 | 72,142 | fbb756ae14005420ff00ccdaff99416bebfcea3adb7e30963a69e68e9fbe361b
|
||||
Total 5 files, 208302 bytes
|
||||
```
|
||||
|
||||
By using `clearml-data`, a clear lineage is created for the data. As seen in this example, when a dataset is closed, the
|
||||
only way to add or remove data is to create a new dataset, and using the previous dataset as a parent. This way, the data
|
||||
is not reliant on the code and is reproducible.
|
||||
Reference in New Issue
Block a user