clearml/docs/clearml-task.md
2021-08-13 10:57:04 +03:00

168 lines
9.8 KiB
Markdown

# `clearml-task` - Execute ANY python code on a remote machine
Using only your command line and __zero__ additional lines of code, you can easily integrate the ClearML platform
into your experiment. With the `clearml-task` command, you can create a [Task](https://clear.ml/docs/latest/docs/fundamentals/task)
using any script from **any python code or repository and launch it on a remote machine**.
The remote execution is fully monitored. All outputs - including console / tensorboard / matplotlib -
are logged in real-time into the ClearML UI.
## What does it do?
With the `clearml-task` command, you specify the details of your experiment including:
* Project and task name
* Repository / commit / branch
* [Queue](https://clear.ml/docs/latest/docs/fundamentals/agents_and_queues#what-is-a-queue)
name
* Optional: the base docker image to be used as underlying environment
* Optional: alternative python requirements, in case `requirements.txt` is not found inside the repository.
Then `clearml-task` does the rest of the heavy-lifting. It creates a new experiment or Task on your `clearml-server`
according to your specifications, and then, it will enqueue the experiment to the selected execution queue.
While the Task is executed on the remote machine (performed by an available `clearml-agent`), all the console outputs
will be logged in real-time, alongside your TensorBoard and matplotlib. During and after the Task execution, you can
track and visualize the results in the ClearML Web UI.
### Use-cases for `clearml-task` remote execution
- You have an off-the-shelf code, and you want to launch it on a remote machine with a specific resource (i.e., GPU)
- You want to run the [hyper-parameter optimization]() on a codebase that is still not connected to `clearml`
- You want to create a [pipeline]() from an assortment of scripts, and you need to create Tasks for those scripts
- Sometimes, you just want to run some code on a remote machine, either using an on-prem cluster or on the cloud...
### Prerequisites
- A single python script, or an up-to-date repository containing the codebase.
- `clearml` installed. `clearml` also has a [Task](https://clear.ml/docs/latest/docs/fundamentals/task)
feature but it requires two lines of code in order to integrate the platform.
- `clearml-agent` running on at least one machine (to execute the experiment)
## Tutorial
### Launching a job from a repository
You will be launching this [script](https://github.com/allegroai/events/blob/master/webinar-0620/keras_mnist.py)
on a remote machine. You will be using the following command-line options:
1. Give the experiment a name and select a project, for example: `--project keras_examples --name remote_test`. If the project
doesn't exist, a new project will be created with the selected name.
2. Select the repository with your code. For example: `--repo https://github.com/allegroai/events.git` You can specify a
branch and/or commit using `--branch <branch_name> --commit <commit_id>`. If you do not specify the
branch / commit, it will use by default the latest commit from the master branch,
3. Specify which script in the repository needs to be run, for example: `--script /webinar-0620/keras_mnist.py`
By default, the execution working directory will be the root of the repository. If you need to change it,
add `--cwd <folder>`
4. If you need, pass an argument to your scripts, use `--args`, followed by the arguments.
The names of the arguments should match the argparse arguments, but without the '--' prefix. Instead
of --key=value -> use `--args key=value`, for example `--args batch_size=64 epochs=1`
5. Select the queue for your Task's execution, for example: `--queue default`. If a queue isn't chosen, the Task
will not be executed, it will be left in [draft mode](https://clear.ml/docs/latest/docs/fundamentals/task#task-states),
and you can enqueue and execute the Task at a later point.
6. Add required packages. If your repo has a requirements.txt file, you don't need to do anything; `clearml-task`
will automatically find the file and put it in your Task. If your repo does __not__ have a requirements file and
there are packages that are necessary for the execution of your code, use --packages <package_name>. For example:
`--packages "keras" "tensorflow>2.2"`.
``` bash
clearml-task --project keras_examples --name remote_test --repo https://github.com/allegroai/events.git
--script /webinar-0620/keras_mnist.py --args batch_size=64 epochs=1 --queue default
```
### Launching a job from a local script
You will be launching a single local script file (no git repo needed) on a remote machine:
1. Give the experiment a name and select a project (`--project examples --name remote_test`)
2. Select the script file on your machine, `--script /path/to/my/script.py`
3. If you require specific packages to run your code, you can specify them manually with `--packages "package_name" "package_name2`,
for example: `packages "keras" "tensorflow>2.2"`
or you can pass a requirements file `--requirements /path/to/my/requirements.txt`
4. If you need to pass arguments, like in the repo case, add `--args key=value` and make sure that the key names match
the argparse arguments (`--args batch_size=64 epochs=1`)
5. If you have a docker container with an entire environment in which you want your script to run inside,
add e.g. `--docker nvcr.io/nvidia/pytorch:20.11-py3`
6. Select the queue for your Task's execution, for example: `--queue dual_gpu`. If a queue isn't chosen, the Task
will not be executed, it will be left in [draft mode](https://clear.ml/docs/latest/docs/fundamentals/task#task-states),
and you can enqueue and execute it at a later point.
``` bash
clearml-task --project examples --name remote_test --script /path/to/my/script.py
--packages "keras" "tensorflow>2.2" --args epochs=1 batch_size=64
--queue dual_gpu
```
### CLI options
``` bash
clearml-task --help
```
``` console
ClearML launch - launch any codebase on remote machine running clearml-agent
optional arguments:
-h, --help show this help message and exit
--version Display the clearml-task utility version
--project PROJECT Required: set the project name for the task. If
--base-task-id is used, this arguments is optional.
--name NAME Required: select a name for the remote task
--repo REPO remote URL for the repository to use. Example: --repo
https://github.com/allegroai/clearml.git
--branch BRANCH Select specific repository branch/tag (implies the
latest commit from the branch)
--commit COMMIT Select specific commit id to use (default: latest
commit, or when used with local repository matching
the local commit id)
--folder FOLDER Remotely execute the code in the local folder. Notice!
It assumes a git repository already exists. Current
state of the repo (commit id and uncommitted changes)
is logged and will be replicated on the remote machine
--script SCRIPT Specify the entry point script for the remote
execution. When used in tandem with --repo the script
should be a relative path inside the repository, for
example: --script source/train.py .When used with
--folder it supports a direct path to a file inside
the local repository itself, for example: --script
~/project/source/train.py
--cwd CWD Working directory to launch the script from. Default:
repository root folder. Relative to repo root or local
folder
--args [ARGS [ARGS ...]]
Arguments to pass to the remote execution, list of
<argument>=<value> strings.Currently only argparse
arguments are supported. Example: --args lr=0.003
batch_size=64
--queue QUEUE Select the queue to launch the task. If not provided a
Task will be created but it will not be launched.
--requirements REQUIREMENTS
Specify requirements.txt file to install when setting
the session. If not provided, the requirements.txt
from the repository will be used.
--packages [PACKAGES [PACKAGES ...]]
Manually specify a list of required packages. Example:
--packages "tqdm>=2.1" "scikit-learn"
--docker DOCKER Select the docker image to use in the remote session
--docker_args DOCKER_ARGS
Add docker arguments, pass a single string
--docker_bash_setup_script DOCKER_BASH_SETUP_SCRIPT
Add bash script to be executed inside the docker
before setting up the Task's environment
--output-uri OUTPUT_URI
Optional: set the Task `output_uri` (automatically
upload model destination)
--task-type TASK_TYPE
Set the Task type, optional values: training, testing,
inference, data_processing, application, monitor,
controller, optimizer, service, qc, custom
--skip-task-init If set, Task.init() call is not added to the entry
point, and is assumed to be called in within the
script. Default: add Task.init() call entry point
script
--base-task-id BASE_TASK_ID
Use a pre-existing task in the system, instead of a
local repo/script. Essentially clones an existing task
and overrides arguments/requirements.
```