clearml-docs/docs/guides/advanced/execute_remotely.md

78 lines
3.0 KiB
Markdown
Raw Normal View History

2021-08-25 06:42:40 +00:00
---
title: Remote Execution
---
The [execute_remotely_example](https://github.com/allegroai/clearml/blob/master/examples/advanced/execute_remotely_example.py)
2022-01-13 07:42:36 +00:00
script demonstrates the use of the [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely) method.
2021-08-25 06:42:40 +00:00
:::note
Make sure to have at least one [ClearML Agent](../../clearml_agent.md) running and assigned to listen to the `default` queue
```
clearml-agent daemon --queue default
```
:::
## Execution Flow
The script trains a simple deep neural network on the PyTorch built-in MNIST dataset. The following describes the code's
execution flow:
1. The training runs for one epoch.
2023-10-11 09:29:56 +00:00
1. The code uses [`Task.execute_remotely()`](../../references/sdk/task.md#execute_remotely), which terminates the local execution of the code and enqueues the task
2021-08-25 06:42:40 +00:00
to the `default` queue, as specified in the `queue_name` parameter.
1. An agent listening to the queue fetches the task and restarts task execution remotely. When the agent executes the task,
the `execute_remotely` is considered no-op.
2023-10-11 09:29:56 +00:00
An execution flow that uses `execute_remotely` is especially helpful when running code on a development machine for a few iterations
2021-08-25 06:42:40 +00:00
to debug and to make sure the code doesn't crash, or to set up an environment. After that, the training can be
moved to be executed by a stronger machine.
During the execution of the example script, the code does the following:
* Uses ClearML's automatic and explicit logging.
2023-09-04 12:40:42 +00:00
* Creates an experiment named `Remote_execution PyTorch MNIST train` in the `examples` project.
2021-08-25 06:42:40 +00:00
## Scalars
2022-03-13 13:07:06 +00:00
In the example script's `train` function, the following code explicitly reports scalars to ClearML:
2021-08-25 06:42:40 +00:00
```python
Logger.current_logger().report_scalar(
2022-01-13 07:42:36 +00:00
"train", "loss", iteration=(epoch * len(train_loader) + batch_idx), value=loss.item()
)
2021-08-25 06:42:40 +00:00
```
2023-10-11 09:29:56 +00:00
In the script's `test` function, the code explicitly reports `loss` and `accuracy` scalars.
2021-08-25 06:42:40 +00:00
```python
Logger.current_logger().report_scalar(
2022-01-13 07:42:36 +00:00
"test", "loss", iteration=epoch, value=test_loss
)
2021-08-25 06:42:40 +00:00
Logger.current_logger().report_scalar(
2022-01-13 07:42:36 +00:00
"test", "accuracy", iteration=epoch, value=(correct / len(test_loader.dataset))
)
2021-08-25 06:42:40 +00:00
```
2022-05-26 06:54:41 +00:00
These scalars can be visualized in plots, which appear in the ClearML web UI, in the experiment's **SCALARS** tab.
2021-08-25 06:42:40 +00:00
2022-03-13 13:07:06 +00:00
![Experiment Scalars](../../img/examples_pytorch_mnist_07.png)
2021-08-25 06:42:40 +00:00
## Hyperparameters
2023-01-12 10:49:55 +00:00
ClearML automatically logs command line options defined with `argparse`. They appear in **CONFIGURATION** **>** **HYPERPARAMETERS** **>** **Args**.
2021-08-25 06:42:40 +00:00
2022-03-13 13:07:06 +00:00
![Experiment hyperparameters](../../img/examples_pytorch_mnist_01.png)
2021-08-25 06:42:40 +00:00
## Console
2022-05-22 07:27:30 +00:00
Text printed to the console for training progress, as well as all other console output, appear in **CONSOLE**.
2021-08-25 06:42:40 +00:00
2022-03-13 13:07:06 +00:00
![Experiment console log](../../img/examples_pytorch_mnist_06.png)
2021-08-25 06:42:40 +00:00
## Artifacts
2023-10-01 07:31:48 +00:00
Models created by the experiment appear in the experiment's **ARTIFACTS** tab. ClearML automatically logs and tracks models
2022-01-13 07:42:36 +00:00
and any snapshots created using PyTorch.
2021-08-25 06:42:40 +00:00
2022-03-13 13:07:06 +00:00
![Experiment artifacts](../../img/examples_remote_execution_artifacts.png)
2021-08-25 06:42:40 +00:00