Merge branch 'allegroai:main' into uri_links

This commit is contained in:
pollfly 2021-08-22 12:19:54 +03:00 committed by GitHub
commit 5ac6108623
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
7 changed files with 72 additions and 16 deletions

View File

@ -12,12 +12,13 @@ in the UI and send it for long-term training on a remote machine.
**If you are not that lucky**, this section is for you :)
## What does ClearML Session do?
`clearml-session` is a feature that allows to launch a session of Jupyterlab and VS Code, and to execute code on a remote
`clearml-session` is a feature that allows to launch a session of JupyterLab and VS Code, and to execute code on a remote
machine that better meets resource needs. With this feature, local links are provided, which can be used to access
JupyterLab and VS Code on a remote machine over a secure and encrypted SSH connection.
JupyterLab and VS Code on a remote machine over a secure and encrypted SSH connection. By default, the JupyterLab and
VS Code remote sessions use ports 8878 and 8898 respectively.
<details className="cml-expansion-panel screenshot">
<summary className="cml-expansion-panel-summary">Jupyter-Lab Window</summary>
<summary className="cml-expansion-panel-summary">JupyterLab Window</summary>
<div className="cml-expansion-panel-content">
![image](../img/session_jupyter.png)
@ -138,7 +139,7 @@ The Task must be connected to a git repository, since currently single script de
| Command line options | Description | Default value |
|-----|---|---|
| `--jupyter-lab` | Download a Jupyter-Lab environment | `true` |
| `--jupyter-lab` | Download a JupyterLab environment | `true` |
| `--vscode-server` | Download a VSCode environment | `true` |
| `--public-ip` | Register the public IP of the remote machine (if you are running the session on a public cloud) | Session runs on the machine whose agent is executing the session|
| `--init-script` | Specify a BASH init script file to be executed when the interactive session is being set up | `none` or previously entered BASH script |

View File

@ -6,7 +6,7 @@ ClearML Task is ClearML's Zero Code Integration Module. Using only the command l
you can easily track your work and integrate ClearML with your existing code.
`clearml-task` automatically integrates ClearML into any script or **any** python repository. `clearml-task` has the option
to send the Task to a queue, where a **ClearML Agent** listening to the queue will fetch the Task it and executes it on a
to send the task to a queue, where a **ClearML Agent** listening to the queue will fetch the task and execute it on a
remote or local machine. It's even possible to provide command line arguments and provide Python module dependencies and requirements.txt file!
## How Does ClearML Task Work?
@ -14,8 +14,8 @@ remote or local machine. It's even possible to provide command line arguments an
1. Execute `clearml-task`, pointing it to your script or repository, and optionally an execution queue.
1. `clearml-task` does its magic! It creates a new experiment on the [ClearML Server](../deploying_clearml/clearml_server.md),
and, if a queue was specified, it sends the experiment to the queue to be fetched and executed by a **ClearML Agent**.
1. The command line will provide you with a link to your Task's page in the ClearML web UI,
where you will be able to view the Task's details.
1. The command line will provide you with a link to your task's page in the ClearML web UI,
where you will be able to view the task's details.
## Features and Options
### Docker
@ -24,12 +24,12 @@ The ClearML Agent will pull it from dockerhub or a docker artifactory automatica
### Package Dependencies
If the local script requires packages to be installed installed or the remote repository doesn't have a requirements.txt file,
specify manually the required python packages using<br/>
specify manually the required python packages using <br/>
`--packages "<package_name>"`, for example `--packages "keras" "tensorflow>2.2"`.
### Queue
Tasks are passed to ClearML Agents via [Queues](../fundamentals/agents_and_queues.md). Specify a queue to enqueue the Task to.
If a queue isn't chosen in the `clearml-task` command, the Task will not be executed; it will be left in draft mode,
Tasks are passed to ClearML Agents via [Queues](../fundamentals/agents_and_queues.md). Specify a queue to enqueue the task to.
If a queue isn't chosen in the `clearml-task` command, the task will not be executed; it will be left in draft mode,
and can be enqueued at a later point.
### Branch and Working Directory
@ -37,5 +37,34 @@ A specific branch and commit ID, other than latest commit in master, to be execu
`--branch <branch_name> --commit <commit_id>` flags.
If unspecified, `clearml-task` will use the latest commit from the master branch.
Learn how to use the `clearml-task` feature [here](../guides/clearml-task/clearml_task_tutorial.md).
### Command line options
<div className="tbl-cmd">
|Name | Description| Optional |
|---|----|---|
| `--version` | Display the `clearml-task` utility version | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--project`| Set the project name for the task (Required, unless using `--base-task-id`) | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--name` | Select a name for the remote task | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--repo` | URL of remote repository. Example: `--repo https://github.com/allegroai/clearml.git` | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--branch` | Select specific repository branch / tag. By default, latest commit from the master branch | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--commit` | Select specific commit ID to use. By default, latest commit, or local commit ID when using local repository | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--folder` | Remotely execute the code in a local folder. Notice! It assumes a git repository already exists. Current state of the repo (commit ID and uncommitted changes) is logged and will be replicated on the remote machine | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--script` | Entry point script for the remote execution. When used in tandem with `--repo`, the script should be a relative path inside the repository. For example: `--script source/train.py`. When used with `--folder`, it supports a direct path to a file inside the local repository itself, for example: `--script ~/project/source/train.py` | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--cwd` | Working directory to launch the script from. Relative to repo root or local `--folder` | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--args` | Arguments to pass to the remote task, list of `<argument>=<value>` strings. Currently only argparse arguments are supported. Example: `--args lr=0.003 batch_size=64` | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--queue` | Select task's execution queue. If not provided, a task will be created but it will not be launched | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--requirements` | Specify `requirements.txt` file to install when setting the session. By default, the` requirements.txt` from the repository will be used | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--packages` | Manually specify a list of required packages. Example: `--packages "tqdm>=2.1" "scikit-learn"` | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--docker` | Select the docker image to use in the remote task | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--docker_args` | Add docker arguments, pass a single string | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--docker_bash_setup_script` | Add bash script to be executed inside the docker before setting up the task's environment | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--output-uri` | Set the task `output_uri`, upload destination for task models and artifacts (Optional) | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--task-type` | Set the task type. Optional values: training, testing, inference, data_processing, application, monitor, controller, optimizer, service, qc, custom | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--skip-task-init` | If set, `Task.init()` call is not added to the entry point, and is assumed to be called within the script. Default: Add `Task.init()` call to entry point script | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--base-task-id` | Use a pre-existing task in the system, instead of a local repo / script. Essentially clones an existing task and overrides arguments / requirements | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
</div>
## Tutorial
Learn how to use the `clearml-task` feature [here](../guides/clearml-task/clearml_task_tutorial.md).

View File

@ -300,6 +300,7 @@ the watchdog marks them as `aborted`. The non-responsive experiment watchdog is
Modify the following settings for the watchdog:
* Watchdog status - enabled / disabled
* The time threshold (in seconds) of experiment inactivity (default value is 7200 seconds (2 hours)).
* The time interval (in seconds) between watchdog cycles.
@ -312,6 +313,8 @@ Modify the following settings for the watchdog:
tasks {
non_responsive_tasks_watchdog {
enabled: true
# In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
threshold_sec: 7200

View File

@ -94,6 +94,7 @@ title: FAQ
* [How do I bypass a proxy configuration to access my local ClearML Server?](#proxy-localhost)
* [Trains is failing to update ClearML Server. I get an error 500 (or 400). How do I fix this?](#elastic_watermark)
* [Why is my Trains Web-App (UI) not showing any data?](#web-ui-empty)
* [Why can't I access my ClearML Server when I run my code in a virtual machine?](#vm_server)
**ClearML Agent**
@ -816,7 +817,7 @@ Do the following:
<br/>
**The ClearML Server keeps returning HTTP 500 (or 400) errors. How do I fix this?**
**The ClearML Server keeps returning HTTP 500 (or 400) errors. How do I fix this?** <a id="elastic_watermark"></a>
The ClearML Server will return HTTP error responses (5XX, or 4XX) when some of its [backend components](deploying_clearml/clearml_server.md)
are failing.
@ -839,6 +840,28 @@ A likely indication of this situation can be determined by searching your clearm
If your ClearML Web-App (UI) does not show anything, it may be an error authenticating with the server. Try clearing the application cookies for the site in your browser's developer tools.
**Why can't I access my ClearML Server when I run my code in a virtual machine?** <a id="vm_server"></a>
The network definitions inside a virtual machine (or container) are different from those of the host. The virtual machine's
and the server machine's IP addresses are different, so you have to make sure that the machine that is executing the
experiment can access the server's machine.
Make sure to have an independent configuration file for the virtual machine where you are running your experiments.
Edit the `api` section of your `clearml.conf` file and insert IP addresses of the server machine that are accessible
from the VM. It should look something like this:
```
api {
web_server: http://192.168.1.2:8080
api_server: http://192.168.1.2:8008
credentials {
"access_key" = "KEY"
"secret_key" = "SECRET"
}
}
```
## ClearML Agent
**How can I execute ClearML Agent without installing packages each time?** <a className="tr_top_negative" id="system_site_packages"></a>

View File

@ -30,7 +30,7 @@ pip install clearml-agent
Connect the Agent to the server by [creating credentials](https://app.community.clear.ml/profile), then run this:
```bash
clearml-init
clearml-agent init
```
:::note

View File

@ -9,7 +9,7 @@ The example script does the following:
* Trains a simple deep neural network on the PyTorch built-in [MNIST](https://pytorch.org/docs/stable/torchvision/datasets.html#mnist)
dataset.
* Uses **ClearML** automatic logging.
* Calls the [Logger.report_scalar](../../../references/sdk/logger.md#report_scalar) method to demonstrate explicit reporting and explicit reporting,
* Calls the [Logger.report_scalar](../../../references/sdk/logger.md#report_scalar) method to demonstrate explicit reporting,
which allows adding customized reporting to the code.
* Creates an experiment named `pytorch mnist train`, which is associated with the `examples` project.

View File

@ -59,7 +59,7 @@ module.exports = {
'guides/guidemain',
{'Automation': ['guides/automation/manual_random_param_search_example', 'guides/automation/task_piping']},
{'Data Management': ['guides/data management/data_man_simple', 'guides/data management/data_man_folder_sync', 'guides/data management/data_man_cifar_classification']},
{'Clearml Task': ['guides/clearml-task/clearml_task_tutorial']},
{'ClearML Task': ['guides/clearml-task/clearml_task_tutorial']},
{'Distributed': ['guides/distributed/distributed_pytorch_example', 'guides/distributed/subprocess_example']},
{'Docker': ['guides/docker/extra_docker_shell_script']},
{'Frameworks': [
@ -106,7 +106,7 @@ module.exports = {
],
rnSidebar: {
'Release Notes': ['release_notes/ver_1_0', 'release_notes/ver_0_17', 'release_notes/ver_0_16', 'release_notes/ver_0_15', 'release_notes/ver_0_14',
'Release Notes': ['release_notes/ver_1_1', 'release_notes/ver_1_0', 'release_notes/ver_0_17', 'release_notes/ver_0_16', 'release_notes/ver_0_15', 'release_notes/ver_0_14',
'release_notes/ver_0_13', 'release_notes/ver_0_12', 'release_notes/ver_0_11', 'release_notes/ver_0_10',
'release_notes/ver_0_9',
],