10 KiB
title |
---|
Explicit Reporting Tutorial |
In this tutorial, learn how to extend ClearML automagical capturing of inputs and outputs with explicit reporting.
In this example, you will add the following to the pytorch_mnist.py example script from ClearML's GitHub repo:
- Setting an output destination for model checkpoints (snapshots).
- Explicitly logging a scalar, other (non-scalar) data, and logging text.
- Registering an artifact, which is uploaded to ClearML Server, and ClearML logs changes to it.
- Uploading an artifact, which is uploaded, but changes to it are not logged.
Prerequisites
- The clearml repository is cloned.
- The
clearml
package is installed.
Before Starting
Make a copy of pytorch_mnist.py to add explicit reporting to it.
cp pytorch_mnist.py pytorch_mnist_tutorial.py
Step 1: Setting an Output Destination for Model Checkpoints
Specify a default output location, which is where model checkpoints (snapshots) and artifacts will be stored when the experiment runs. Some possible destinations include:
- Local destination
- Shared folder
- Cloud storage:
- S3 EC2
- Google Cloud Storage
- Azure Storage.
Specify the output location in the output_uri
parameter of Task.init()
.
In this tutorial, specify a local folder destination.
In pytorch_mnist_tutorial.py
, change the code from:
task = Task.init(project_name='examples', task_name='pytorch mnist train')
to:
model_snapshots_path = '/mnt/clearml'
if not os.path.exists(model_snapshots_path):
os.makedirs(model_snapshots_path)
task = Task.init(
project_name='examples',
task_name='extending automagical ClearML example',
output_uri=model_snapshots_path
)
When the script runs, ClearML creates the following directory structure:
+ - <output destination name>
| +-- <project name>
| +-- <task name>.<Task Id>
| +-- models
| +-- artifacts
and puts the model checkpoints (snapshots) and artifacts in that folder.
For example, if the Task ID is 9ed78536b91a44fbb3cc7a006128c1b0
, then the directory structure will be:
+ - model_snapshots
| +-- examples
| +-- extending automagical ClearML example.9ed78536b91a44fbb3cc7a006128c1b0
| +-- models
| +-- artifacts
Step 2: Logger Class Reporting Methods
In addition to ClearML automagical logging, the clearml
Python
package contains methods for explicit reporting of plots, log text, media, and tables. These methods include:
Logger.report_histogram
Logger.report_confusion_matrix
Logger.report_scatter2d
Logger.report_scatter3d
Logger.report_surface
(surface diagrams)Logger.report_image
- Report an image and upload its contents.Logger.report_table
- Report a table as a Pandas DataFrame, CSV file, or URL for a CSV file.Logger.report_media
- Report media including images, audio, and video.Logger.get_default_upload_destination
- Retrieve the destination that is set for uploaded media.
Get a Logger
First, create a logger for the Task using Task.get_logger()
:
logger = task.get_logger()
Plot Scalar Metrics
Add scalar metrics using Logger.report_scalar()
to report loss metrics.
def train(args, model, device, train_loader, optimizer, epoch):
save_loss = []
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
save_loss.append(loss)
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
# Add manual scalar reporting for loss metrics
logger.report_scalar(title='Scalar example {} - epoch'.format(epoch),
series='Loss', value=loss.item(), iteration=batch_idx)
Plot Other (Not Scalar) Data
The script contains a function named test
, which determines loss and correct for the trained model. Add a histogram
and confusion matrix to log them.
def test(args, model, device, test_loader):
save_test_loss = []
save_correct = []
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
# sum up batch loss
test_loss += F.nll_loss(output, target, reduction='sum').item()
# get the index of the max log-probability
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
save_test_loss.append(test_loss)
save_correct.append(correct)
test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
logger.report_histogram(
title='Histogram example',
series='correct',
iteration=1,
values=save_correct,
xaxis='Test',
yaxis='Correct'
)
# Manually report test loss and correct as a confusion matrix
matrix = np.array([save_test_loss, save_correct])
logger.report_confusion_matrix(
title='Confusion matrix example',
series='Test loss / correct',
matrix=matrix,
iteration=1
)
Log Text
Extend ClearML by explicitly logging text, including errors, warnings, and debugging statements. Use Logger.report_text()
and its level
argument to report a debugging message.
logger.report_text(
'The default output destination for model snapshots and artifacts is: {}'.format(
model_snapshots_path
),
level=logging.DEBUG
)
Step 3: Registering Artifacts
Registering an artifact uploads it to ClearML Server, and if it changes, the change is logged in ClearML Server. Currently, ClearML supports Pandas DataFrames as registered artifacts.
Register the Artifact
In the tutorial script, test
function, you can assign the test loss and correct data to a Pandas DataFrame object and register
that Pandas DataFrame using Task.register_artifact()
.
# Create the Pandas DataFrame
test_loss_correct = {
'test lost': save_test_loss,
'correct': save_correct
}
df = pd.DataFrame(test_loss_correct, columns=['test lost','correct'])
# Register the test loss and correct as a Pandas DataFrame artifact
task.register_artifact(
'Test_Loss_Correct',
df,
metadata={
'metadata string': 'apple',
'metadata int': 100,
'metadata dict': {'dict string': 'pear', 'dict int': 200}
}
)
Reference the Registered Artifact
Once an artifact is registered, it can be referenced and utilized in the Python experiment script.
In the tutorial script, add Task.current_task()
and
Task.get_registered_artifacts()
to take a sample.
# Once the artifact is registered, we can get it and work with it. Here, we sample it.
sample = Task.current_task().get_registered_artifacts()['Test_Loss_Correct'].sample(
frac=0.5,
replace=True,
random_state=1
)
Step 4: Uploading Artifacts
Artifact can be uploaded to the ClearML Server, but changes are not logged.
Supported artifacts include:
- Pandas DataFrames
- Files of any type, including image files
- Folders - stored as ZIP files
- Images - stored as PNG files
- Dictionaries - stored as JSONs
- Numpy arrays - stored as NPZ files
In the tutorial script, upload the loss data as an artifact using Task.upload_artifact()
with metadata specified in the metadata
parameter.
# Upload test loss as an artifact. Here, the artifact is numpy array
task.upload_artifact(
'Predictions',
artifact_object=np.array(save_test_loss),
metadata={
'metadata string': 'banana',
'metadata integer': 300,
'metadata dictionary': {'dict string': 'orange', 'dict int': 400}
}
)
Additional Information
After extending the Python experiment script, run it and view the results in the ClearML Web UI.
python pytorch_mnist_tutorial.py
To view the experiment results, do the following:
- In the ClearML Web UI, on the Projects page, click the examples project.
- In the experiments table, click the Extending automagical ClearML example experiment.
- In the ARTIFACTS tab, DATA AUDIT section, click Test_Loss_Correct. The registered Pandas DataFrame appears, including the file path, size, hash, metadata, and a preview.
- In the OTHER section, click Loss. The uploaded numpy array appears, including its related information.
- Click the CONSOLE tab, and see the debugging message showing the Pandas DataFrame sample.
- Click the SCALARS tab, and see the scalar plots for epoch logging loss.
- Click the PLOTS tab, and see the confusion matrix and histogram.
Next Steps
- See the User Interface section to learn about its features.
- See the ClearML Python Package Reference to learn about all the available classes and methods.