mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
Merge branch 'allegroai:main' into uri_links
This commit is contained in:
commit
5ac6108623
@ -12,12 +12,13 @@ in the UI and send it for long-term training on a remote machine.
|
||||
**If you are not that lucky**, this section is for you :)
|
||||
|
||||
## What does ClearML Session do?
|
||||
`clearml-session` is a feature that allows to launch a session of Jupyterlab and VS Code, and to execute code on a remote
|
||||
`clearml-session` is a feature that allows to launch a session of JupyterLab and VS Code, and to execute code on a remote
|
||||
machine that better meets resource needs. With this feature, local links are provided, which can be used to access
|
||||
JupyterLab and VS Code on a remote machine over a secure and encrypted SSH connection.
|
||||
JupyterLab and VS Code on a remote machine over a secure and encrypted SSH connection. By default, the JupyterLab and
|
||||
VS Code remote sessions use ports 8878 and 8898 respectively.
|
||||
|
||||
<details className="cml-expansion-panel screenshot">
|
||||
<summary className="cml-expansion-panel-summary">Jupyter-Lab Window</summary>
|
||||
<summary className="cml-expansion-panel-summary">JupyterLab Window</summary>
|
||||
<div className="cml-expansion-panel-content">
|
||||
|
||||

|
||||
@ -138,7 +139,7 @@ The Task must be connected to a git repository, since currently single script de
|
||||
|
||||
| Command line options | Description | Default value |
|
||||
|-----|---|---|
|
||||
| `--jupyter-lab` | Download a Jupyter-Lab environment | `true` |
|
||||
| `--jupyter-lab` | Download a JupyterLab environment | `true` |
|
||||
| `--vscode-server` | Download a VSCode environment | `true` |
|
||||
| `--public-ip` | Register the public IP of the remote machine (if you are running the session on a public cloud) | Session runs on the machine whose agent is executing the session|
|
||||
| `--init-script` | Specify a BASH init script file to be executed when the interactive session is being set up | `none` or previously entered BASH script |
|
||||
|
@ -6,7 +6,7 @@ ClearML Task is ClearML's Zero Code Integration Module. Using only the command l
|
||||
you can easily track your work and integrate ClearML with your existing code.
|
||||
|
||||
`clearml-task` automatically integrates ClearML into any script or **any** python repository. `clearml-task` has the option
|
||||
to send the Task to a queue, where a **ClearML Agent** listening to the queue will fetch the Task it and executes it on a
|
||||
to send the task to a queue, where a **ClearML Agent** listening to the queue will fetch the task and execute it on a
|
||||
remote or local machine. It's even possible to provide command line arguments and provide Python module dependencies and requirements.txt file!
|
||||
|
||||
## How Does ClearML Task Work?
|
||||
@ -14,8 +14,8 @@ remote or local machine. It's even possible to provide command line arguments an
|
||||
1. Execute `clearml-task`, pointing it to your script or repository, and optionally an execution queue.
|
||||
1. `clearml-task` does its magic! It creates a new experiment on the [ClearML Server](../deploying_clearml/clearml_server.md),
|
||||
and, if a queue was specified, it sends the experiment to the queue to be fetched and executed by a **ClearML Agent**.
|
||||
1. The command line will provide you with a link to your Task's page in the ClearML web UI,
|
||||
where you will be able to view the Task's details.
|
||||
1. The command line will provide you with a link to your task's page in the ClearML web UI,
|
||||
where you will be able to view the task's details.
|
||||
|
||||
## Features and Options
|
||||
### Docker
|
||||
@ -24,12 +24,12 @@ The ClearML Agent will pull it from dockerhub or a docker artifactory automatica
|
||||
|
||||
### Package Dependencies
|
||||
If the local script requires packages to be installed installed or the remote repository doesn't have a requirements.txt file,
|
||||
specify manually the required python packages using<br/>
|
||||
specify manually the required python packages using <br/>
|
||||
`--packages "<package_name>"`, for example `--packages "keras" "tensorflow>2.2"`.
|
||||
|
||||
### Queue
|
||||
Tasks are passed to ClearML Agents via [Queues](../fundamentals/agents_and_queues.md). Specify a queue to enqueue the Task to.
|
||||
If a queue isn't chosen in the `clearml-task` command, the Task will not be executed; it will be left in draft mode,
|
||||
Tasks are passed to ClearML Agents via [Queues](../fundamentals/agents_and_queues.md). Specify a queue to enqueue the task to.
|
||||
If a queue isn't chosen in the `clearml-task` command, the task will not be executed; it will be left in draft mode,
|
||||
and can be enqueued at a later point.
|
||||
|
||||
### Branch and Working Directory
|
||||
@ -37,5 +37,34 @@ A specific branch and commit ID, other than latest commit in master, to be execu
|
||||
`--branch <branch_name> --commit <commit_id>` flags.
|
||||
If unspecified, `clearml-task` will use the latest commit from the master branch.
|
||||
|
||||
Learn how to use the `clearml-task` feature [here](../guides/clearml-task/clearml_task_tutorial.md).
|
||||
### Command line options
|
||||
|
||||
<div className="tbl-cmd">
|
||||
|
||||
|Name | Description| Optional |
|
||||
|---|----|---|
|
||||
| `--version` | Display the `clearml-task` utility version | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--project`| Set the project name for the task (Required, unless using `--base-task-id`) | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|
||||
| `--name` | Select a name for the remote task | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|
||||
| `--repo` | URL of remote repository. Example: `--repo https://github.com/allegroai/clearml.git` | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--branch` | Select specific repository branch / tag. By default, latest commit from the master branch | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--commit` | Select specific commit ID to use. By default, latest commit, or local commit ID when using local repository | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--folder` | Remotely execute the code in a local folder. Notice! It assumes a git repository already exists. Current state of the repo (commit ID and uncommitted changes) is logged and will be replicated on the remote machine | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--script` | Entry point script for the remote execution. When used in tandem with `--repo`, the script should be a relative path inside the repository. For example: `--script source/train.py`. When used with `--folder`, it supports a direct path to a file inside the local repository itself, for example: `--script ~/project/source/train.py` | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
|
||||
| `--cwd` | Working directory to launch the script from. Relative to repo root or local `--folder` | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--args` | Arguments to pass to the remote task, list of `<argument>=<value>` strings. Currently only argparse arguments are supported. Example: `--args lr=0.003 batch_size=64` | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--queue` | Select task's execution queue. If not provided, a task will be created but it will not be launched | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--requirements` | Specify `requirements.txt` file to install when setting the session. By default, the` requirements.txt` from the repository will be used | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--packages` | Manually specify a list of required packages. Example: `--packages "tqdm>=2.1" "scikit-learn"` | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--docker` | Select the docker image to use in the remote task | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--docker_args` | Add docker arguments, pass a single string | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--docker_bash_setup_script` | Add bash script to be executed inside the docker before setting up the task's environment | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--output-uri` | Set the task `output_uri`, upload destination for task models and artifacts (Optional) | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--task-type` | Set the task type. Optional values: training, testing, inference, data_processing, application, monitor, controller, optimizer, service, qc, custom | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--skip-task-init` | If set, `Task.init()` call is not added to the entry point, and is assumed to be called within the script. Default: Add `Task.init()` call to entry point script | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
| `--base-task-id` | Use a pre-existing task in the system, instead of a local repo / script. Essentially clones an existing task and overrides arguments / requirements | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
|
||||
|
||||
</div>
|
||||
|
||||
## Tutorial
|
||||
Learn how to use the `clearml-task` feature [here](../guides/clearml-task/clearml_task_tutorial.md).
|
||||
|
@ -300,6 +300,7 @@ the watchdog marks them as `aborted`. The non-responsive experiment watchdog is
|
||||
|
||||
Modify the following settings for the watchdog:
|
||||
|
||||
* Watchdog status - enabled / disabled
|
||||
* The time threshold (in seconds) of experiment inactivity (default value is 7200 seconds (2 hours)).
|
||||
* The time interval (in seconds) between watchdog cycles.
|
||||
|
||||
@ -312,6 +313,8 @@ Modify the following settings for the watchdog:
|
||||
|
||||
tasks {
|
||||
non_responsive_tasks_watchdog {
|
||||
enabled: true
|
||||
|
||||
# In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
|
||||
threshold_sec: 7200
|
||||
|
||||
|
25
docs/faq.md
25
docs/faq.md
@ -94,6 +94,7 @@ title: FAQ
|
||||
* [How do I bypass a proxy configuration to access my local ClearML Server?](#proxy-localhost)
|
||||
* [Trains is failing to update ClearML Server. I get an error 500 (or 400). How do I fix this?](#elastic_watermark)
|
||||
* [Why is my Trains Web-App (UI) not showing any data?](#web-ui-empty)
|
||||
* [Why can't I access my ClearML Server when I run my code in a virtual machine?](#vm_server)
|
||||
|
||||
**ClearML Agent**
|
||||
|
||||
@ -816,7 +817,7 @@ Do the following:
|
||||
|
||||
<br/>
|
||||
|
||||
**The ClearML Server keeps returning HTTP 500 (or 400) errors. How do I fix this?**
|
||||
**The ClearML Server keeps returning HTTP 500 (or 400) errors. How do I fix this?** <a id="elastic_watermark"></a>
|
||||
|
||||
The ClearML Server will return HTTP error responses (5XX, or 4XX) when some of its [backend components](deploying_clearml/clearml_server.md)
|
||||
are failing.
|
||||
@ -839,6 +840,28 @@ A likely indication of this situation can be determined by searching your clearm
|
||||
|
||||
If your ClearML Web-App (UI) does not show anything, it may be an error authenticating with the server. Try clearing the application cookies for the site in your browser's developer tools.
|
||||
|
||||
**Why can't I access my ClearML Server when I run my code in a virtual machine?** <a id="vm_server"></a>
|
||||
|
||||
The network definitions inside a virtual machine (or container) are different from those of the host. The virtual machine's
|
||||
and the server machine's IP addresses are different, so you have to make sure that the machine that is executing the
|
||||
experiment can access the server's machine.
|
||||
|
||||
Make sure to have an independent configuration file for the virtual machine where you are running your experiments.
|
||||
Edit the `api` section of your `clearml.conf` file and insert IP addresses of the server machine that are accessible
|
||||
from the VM. It should look something like this:
|
||||
|
||||
```
|
||||
api {
|
||||
web_server: http://192.168.1.2:8080
|
||||
api_server: http://192.168.1.2:8008
|
||||
credentials {
|
||||
"access_key" = "KEY"
|
||||
"secret_key" = "SECRET"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## ClearML Agent
|
||||
|
||||
**How can I execute ClearML Agent without installing packages each time?** <a className="tr_top_negative" id="system_site_packages"></a>
|
||||
|
@ -30,7 +30,7 @@ pip install clearml-agent
|
||||
Connect the Agent to the server by [creating credentials](https://app.community.clear.ml/profile), then run this:
|
||||
|
||||
```bash
|
||||
clearml-init
|
||||
clearml-agent init
|
||||
```
|
||||
|
||||
:::note
|
||||
|
@ -9,7 +9,7 @@ The example script does the following:
|
||||
* Trains a simple deep neural network on the PyTorch built-in [MNIST](https://pytorch.org/docs/stable/torchvision/datasets.html#mnist)
|
||||
dataset.
|
||||
* Uses **ClearML** automatic logging.
|
||||
* Calls the [Logger.report_scalar](../../../references/sdk/logger.md#report_scalar) method to demonstrate explicit reporting and explicit reporting,
|
||||
* Calls the [Logger.report_scalar](../../../references/sdk/logger.md#report_scalar) method to demonstrate explicit reporting,
|
||||
which allows adding customized reporting to the code.
|
||||
* Creates an experiment named `pytorch mnist train`, which is associated with the `examples` project.
|
||||
|
||||
|
@ -59,7 +59,7 @@ module.exports = {
|
||||
'guides/guidemain',
|
||||
{'Automation': ['guides/automation/manual_random_param_search_example', 'guides/automation/task_piping']},
|
||||
{'Data Management': ['guides/data management/data_man_simple', 'guides/data management/data_man_folder_sync', 'guides/data management/data_man_cifar_classification']},
|
||||
{'Clearml Task': ['guides/clearml-task/clearml_task_tutorial']},
|
||||
{'ClearML Task': ['guides/clearml-task/clearml_task_tutorial']},
|
||||
{'Distributed': ['guides/distributed/distributed_pytorch_example', 'guides/distributed/subprocess_example']},
|
||||
{'Docker': ['guides/docker/extra_docker_shell_script']},
|
||||
{'Frameworks': [
|
||||
@ -106,7 +106,7 @@ module.exports = {
|
||||
|
||||
],
|
||||
rnSidebar: {
|
||||
'Release Notes': ['release_notes/ver_1_0', 'release_notes/ver_0_17', 'release_notes/ver_0_16', 'release_notes/ver_0_15', 'release_notes/ver_0_14',
|
||||
'Release Notes': ['release_notes/ver_1_1', 'release_notes/ver_1_0', 'release_notes/ver_0_17', 'release_notes/ver_0_16', 'release_notes/ver_0_15', 'release_notes/ver_0_14',
|
||||
'release_notes/ver_0_13', 'release_notes/ver_0_12', 'release_notes/ver_0_11', 'release_notes/ver_0_10',
|
||||
'release_notes/ver_0_9',
|
||||
],
|
||||
|
Loading…
Reference in New Issue
Block a user