add advanced to ToC, add image, add more info to execute_remotely.md

This commit is contained in:
Revital
2021-08-23 11:26:36 +03:00
parent 8920bbe8a7
commit cd0bd1baf7
3 changed files with 54 additions and 6 deletions

View File

@@ -7,13 +7,60 @@ script demonstrates the use of the [`execute_remotely`](../../references/sdk/tas
The script does the following:
* Trains a simple deep neural network on the PyTorch built-in MNIST dataset.
* Uses ClearML automatic logging.
* Uses ClearML's automatic and explicit logging.
* Creates an experiment named `remote_execution pytorch mnist train`, which is associated with the `examples` project.
When the code is executed, the training runs for one epoch to make sure nothing crashes, then the code passes the `execute_remotely` method
which terminates the local execution of the code. Execution will switch to remote execution by the agent listening to
queue specified in the `queue_name` parameter of the method.
## Execution Flow
This feature is especially helpful if you want to run the first epoch locally on your machine to debug and to
make sure code doesn't crash, and then move to a stronger machine for the entire training.
The following describes the code's execution flow:
1. The training runs for one epoch.
1. The code passes the `execute_remotely` method which terminates the local execution of the code.
1. Execution switches to remote execution by the agent listening to queue specified in the `queue_name` parameter of the method.
The `execute_remotely` method is especially helpful when running code on a development machine for a few iterations
to debug and to make sure the code doesn't crash, or setting up an environment. After that, the training can be
moved to be executed by a stronger machine.
## Scalars
In the example script's `train` function, the following code explicitly reports scalars to **ClearML**:
```python
Logger.current_logger().report_scalar(
"train", "loss", iteration=(epoch * len(train_loader) + batch_idx), value=loss.item())
```
In the `test` method, the code explicitly reports `loss` and `accuracy` scalars.
```python
Logger.current_logger().report_scalar(
"test", "loss", iteration=epoch, value=test_loss)
Logger.current_logger().report_scalar(
"test", "accuracy", iteration=epoch, value=(correct / len(test_loader.dataset)))
```
These scalars can be visualized in plots, which appear in the ClearML web UI, in the experiment's
page **>** **RESULTS** **>** **SCALARS**.
![image](../../img/examples_pytorch_mnist_07.png)
## Hyperparameters
ClearML automatically logs command line options defined with `argparse`. They appear in **CONFIGURATIONS** **>** **HYPER PARAMETERS** **>** **Args**.
![image](../../img/examples_pytorch_mnist_01.png)
## Console
Text printed to the console for training progress, as well as all other console output, appear in **RESULTS** **>** **CONSOLE**.
![image](../../img/examples_pytorch_mnist_06.png)
## Artifacts
Model artifacts associated with the experiment appear in the info panel of the **EXPERIMENTS** tab and in
the info panel of the **MODELS** tab.
![image](../../img/examples_remote_execution_artifacts.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

View File

@@ -57,6 +57,7 @@ module.exports = {
],
guidesSidebar: [
'guides/guidemain',
{'Advanced': ['guides/advanced/execute_remotely', 'guides/advanced/multiple_tasks_single_process']},
{'Automation': ['guides/automation/manual_random_param_search_example', 'guides/automation/task_piping']},
{'Data Management': ['guides/data management/data_man_simple', 'guides/data management/data_man_folder_sync', 'guides/data management/data_man_cifar_classification']},
{'ClearML Task': ['guides/clearml-task/clearml_task_tutorial']},