diff --git a/docs/apps/clearml_session.md b/docs/apps/clearml_session.md index d440b91b..fe9e1d42 100644 --- a/docs/apps/clearml_session.md +++ b/docs/apps/clearml_session.md @@ -12,12 +12,13 @@ in the UI and send it for long-term training on a remote machine. **If you are not that lucky**, this section is for you :) ## What does ClearML Session do? -`clearml-session` is a feature that allows to launch a session of Jupyterlab and VS Code, and to execute code on a remote +`clearml-session` is a feature that allows to launch a session of JupyterLab and VS Code, and to execute code on a remote machine that better meets resource needs. With this feature, local links are provided, which can be used to access -JupyterLab and VS Code on a remote machine over a secure and encrypted SSH connection. +JupyterLab and VS Code on a remote machine over a secure and encrypted SSH connection. By default, the JupyterLab and +VS Code remote sessions use ports 8878 and 8898 respectively.
-Jupyter-Lab Window +JupyterLab Window
![image](../img/session_jupyter.png) @@ -138,7 +139,7 @@ The Task must be connected to a git repository, since currently single script de | Command line options | Description | Default value | |-----|---|---| -| `--jupyter-lab` | Download a Jupyter-Lab environment | `true` | +| `--jupyter-lab` | Download a JupyterLab environment | `true` | | `--vscode-server` | Download a VSCode environment | `true` | | `--public-ip` | Register the public IP of the remote machine (if you are running the session on a public cloud) | Session runs on the machine whose agent is executing the session| | `--init-script` | Specify a BASH init script file to be executed when the interactive session is being set up | `none` or previously entered BASH script | diff --git a/docs/apps/clearml_task.md b/docs/apps/clearml_task.md index b8aeb43b..458e84c2 100644 --- a/docs/apps/clearml_task.md +++ b/docs/apps/clearml_task.md @@ -6,7 +6,7 @@ ClearML Task is ClearML's Zero Code Integration Module. Using only the command l you can easily track your work and integrate ClearML with your existing code. `clearml-task` automatically integrates ClearML into any script or **any** python repository. `clearml-task` has the option -to send the Task to a queue, where a **ClearML Agent** listening to the queue will fetch the Task it and executes it on a +to send the task to a queue, where a **ClearML Agent** listening to the queue will fetch the task and execute it on a remote or local machine. It's even possible to provide command line arguments and provide Python module dependencies and requirements.txt file! ## How Does ClearML Task Work? @@ -14,8 +14,8 @@ remote or local machine. It's even possible to provide command line arguments an 1. Execute `clearml-task`, pointing it to your script or repository, and optionally an execution queue. 1. `clearml-task` does its magic! It creates a new experiment on the [ClearML Server](../deploying_clearml/clearml_server.md), and, if a queue was specified, it sends the experiment to the queue to be fetched and executed by a **ClearML Agent**. -1. The command line will provide you with a link to your Task's page in the ClearML web UI, - where you will be able to view the Task's details. +1. The command line will provide you with a link to your task's page in the ClearML web UI, + where you will be able to view the task's details. ## Features and Options ### Docker @@ -24,12 +24,12 @@ The ClearML Agent will pull it from dockerhub or a docker artifactory automatica ### Package Dependencies If the local script requires packages to be installed installed or the remote repository doesn't have a requirements.txt file, -specify manually the required python packages using
+specify manually the required python packages using
`--packages ""`, for example `--packages "keras" "tensorflow>2.2"`. ### Queue -Tasks are passed to ClearML Agents via [Queues](../fundamentals/agents_and_queues.md). Specify a queue to enqueue the Task to. -If a queue isn't chosen in the `clearml-task` command, the Task will not be executed; it will be left in draft mode, +Tasks are passed to ClearML Agents via [Queues](../fundamentals/agents_and_queues.md). Specify a queue to enqueue the task to. +If a queue isn't chosen in the `clearml-task` command, the task will not be executed; it will be left in draft mode, and can be enqueued at a later point. ### Branch and Working Directory @@ -37,5 +37,34 @@ A specific branch and commit ID, other than latest commit in master, to be execu `--branch --commit ` flags. If unspecified, `clearml-task` will use the latest commit from the master branch. -Learn how to use the `clearml-task` feature [here](../guides/clearml-task/clearml_task_tutorial.md). +### Command line options +
+ +|Name | Description| Optional | +|---|----|---| +| `--version` | Display the `clearml-task` utility version | Yes | +| `--project`| Set the project name for the task (Required, unless using `--base-task-id`) | No | +| `--name` | Select a name for the remote task | No | +| `--repo` | URL of remote repository. Example: `--repo https://github.com/allegroai/clearml.git` | Yes | +| `--branch` | Select specific repository branch / tag. By default, latest commit from the master branch | Yes | +| `--commit` | Select specific commit ID to use. By default, latest commit, or local commit ID when using local repository | Yes | +| `--folder` | Remotely execute the code in a local folder. Notice! It assumes a git repository already exists. Current state of the repo (commit ID and uncommitted changes) is logged and will be replicated on the remote machine | Yes | +| `--script` | Entry point script for the remote execution. When used in tandem with `--repo`, the script should be a relative path inside the repository. For example: `--script source/train.py`. When used with `--folder`, it supports a direct path to a file inside the local repository itself, for example: `--script ~/project/source/train.py` | No | +| `--cwd` | Working directory to launch the script from. Relative to repo root or local `--folder` | Yes | +| `--args` | Arguments to pass to the remote task, list of `=` strings. Currently only argparse arguments are supported. Example: `--args lr=0.003 batch_size=64` | Yes | +| `--queue` | Select task's execution queue. If not provided, a task will be created but it will not be launched | Yes | +| `--requirements` | Specify `requirements.txt` file to install when setting the session. By default, the` requirements.txt` from the repository will be used | Yes | +| `--packages` | Manually specify a list of required packages. Example: `--packages "tqdm>=2.1" "scikit-learn"` | Yes | +| `--docker` | Select the docker image to use in the remote task | Yes | +| `--docker_args` | Add docker arguments, pass a single string | Yes | +| `--docker_bash_setup_script` | Add bash script to be executed inside the docker before setting up the task's environment | Yes | +| `--output-uri` | Set the task `output_uri`, upload destination for task models and artifacts (Optional) | Yes | +| `--task-type` | Set the task type. Optional values: training, testing, inference, data_processing, application, monitor, controller, optimizer, service, qc, custom | Yes | +| `--skip-task-init` | If set, `Task.init()` call is not added to the entry point, and is assumed to be called within the script. Default: Add `Task.init()` call to entry point script | Yes | +| `--base-task-id` | Use a pre-existing task in the system, instead of a local repo / script. Essentially clones an existing task and overrides arguments / requirements | Yes | + +
+ +## Tutorial +Learn how to use the `clearml-task` feature [here](../guides/clearml-task/clearml_task_tutorial.md). diff --git a/docs/deploying_clearml/clearml_server_config.md b/docs/deploying_clearml/clearml_server_config.md index 5eb4b600..3d15d9ed 100644 --- a/docs/deploying_clearml/clearml_server_config.md +++ b/docs/deploying_clearml/clearml_server_config.md @@ -300,6 +300,7 @@ the watchdog marks them as `aborted`. The non-responsive experiment watchdog is Modify the following settings for the watchdog: +* Watchdog status - enabled / disabled * The time threshold (in seconds) of experiment inactivity (default value is 7200 seconds (2 hours)). * The time interval (in seconds) between watchdog cycles. @@ -312,6 +313,8 @@ Modify the following settings for the watchdog: tasks { non_responsive_tasks_watchdog { + enabled: true + # In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog threshold_sec: 7200 diff --git a/docs/faq.md b/docs/faq.md index b71fb78f..def29305 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -94,6 +94,7 @@ title: FAQ * [How do I bypass a proxy configuration to access my local ClearML Server?](#proxy-localhost) * [Trains is failing to update ClearML Server. I get an error 500 (or 400). How do I fix this?](#elastic_watermark) * [Why is my Trains Web-App (UI) not showing any data?](#web-ui-empty) +* [Why can't I access my ClearML Server when I run my code in a virtual machine?](#vm_server) **ClearML Agent** @@ -816,7 +817,7 @@ Do the following:
-**The ClearML Server keeps returning HTTP 500 (or 400) errors. How do I fix this?** +**The ClearML Server keeps returning HTTP 500 (or 400) errors. How do I fix this?** The ClearML Server will return HTTP error responses (5XX, or 4XX) when some of its [backend components](deploying_clearml/clearml_server.md) are failing. @@ -839,6 +840,28 @@ A likely indication of this situation can be determined by searching your clearm If your ClearML Web-App (UI) does not show anything, it may be an error authenticating with the server. Try clearing the application cookies for the site in your browser's developer tools. +**Why can't I access my ClearML Server when I run my code in a virtual machine?** + +The network definitions inside a virtual machine (or container) are different from those of the host. The virtual machine's +and the server machine's IP addresses are different, so you have to make sure that the machine that is executing the +experiment can access the server's machine. + +Make sure to have an independent configuration file for the virtual machine where you are running your experiments. +Edit the `api` section of your `clearml.conf` file and insert IP addresses of the server machine that are accessible +from the VM. It should look something like this: + +``` +api { + web_server: http://192.168.1.2:8080 + api_server: http://192.168.1.2:8008 + credentials { + "access_key" = "KEY" + "secret_key" = "SECRET" + } +} +``` + + ## ClearML Agent **How can I execute ClearML Agent without installing packages each time?** diff --git a/docs/getting_started/mlops/mlops_first_steps.md b/docs/getting_started/mlops/mlops_first_steps.md index 4b573b45..17fa158e 100644 --- a/docs/getting_started/mlops/mlops_first_steps.md +++ b/docs/getting_started/mlops/mlops_first_steps.md @@ -30,7 +30,7 @@ pip install clearml-agent Connect the Agent to the server by [creating credentials](https://app.community.clear.ml/profile), then run this: ```bash -clearml-init +clearml-agent init ``` :::note diff --git a/docs/guides/frameworks/pytorch/pytorch_mnist.md b/docs/guides/frameworks/pytorch/pytorch_mnist.md index c8b75191..d2dc9353 100644 --- a/docs/guides/frameworks/pytorch/pytorch_mnist.md +++ b/docs/guides/frameworks/pytorch/pytorch_mnist.md @@ -9,7 +9,7 @@ The example script does the following: * Trains a simple deep neural network on the PyTorch built-in [MNIST](https://pytorch.org/docs/stable/torchvision/datasets.html#mnist) dataset. * Uses **ClearML** automatic logging. -* Calls the [Logger.report_scalar](../../../references/sdk/logger.md#report_scalar) method to demonstrate explicit reporting and explicit reporting, +* Calls the [Logger.report_scalar](../../../references/sdk/logger.md#report_scalar) method to demonstrate explicit reporting, which allows adding customized reporting to the code. * Creates an experiment named `pytorch mnist train`, which is associated with the `examples` project. diff --git a/sidebars.js b/sidebars.js index 63b06510..9ee7ff60 100644 --- a/sidebars.js +++ b/sidebars.js @@ -59,7 +59,7 @@ module.exports = { 'guides/guidemain', {'Automation': ['guides/automation/manual_random_param_search_example', 'guides/automation/task_piping']}, {'Data Management': ['guides/data management/data_man_simple', 'guides/data management/data_man_folder_sync', 'guides/data management/data_man_cifar_classification']}, - {'Clearml Task': ['guides/clearml-task/clearml_task_tutorial']}, + {'ClearML Task': ['guides/clearml-task/clearml_task_tutorial']}, {'Distributed': ['guides/distributed/distributed_pytorch_example', 'guides/distributed/subprocess_example']}, {'Docker': ['guides/docker/extra_docker_shell_script']}, {'Frameworks': [ @@ -107,7 +107,7 @@ module.exports = { ], rnSidebar: { - 'Release Notes': ['release_notes/ver_1_0', 'release_notes/ver_0_17', 'release_notes/ver_0_16', 'release_notes/ver_0_15', 'release_notes/ver_0_14', + 'Release Notes': ['release_notes/ver_1_1', 'release_notes/ver_1_0', 'release_notes/ver_0_17', 'release_notes/ver_0_16', 'release_notes/ver_0_15', 'release_notes/ver_0_14', 'release_notes/ver_0_13', 'release_notes/ver_0_12', 'release_notes/ver_0_11', 'release_notes/ver_0_10', 'release_notes/ver_0_9', ],