merge
@ -17,7 +17,7 @@ title: ClearML Agent
|
||||
|
||||
**ClearML Agent** is a virtual environment and execution manager for DL / ML solutions on GPU machines. It integrates with the **ClearML Python Package** and ClearML Server to provide a full AI cluster solution. <br/>
|
||||
Its main focus is around:
|
||||
- Reproducing tasks, including their complete environments.
|
||||
- Reproducing task runs, including their complete environments.
|
||||
- Scaling workflows on multiple target machines.
|
||||
|
||||
ClearML Agent executes a task or other workflow by reproducing the state of the code from the original machine
|
||||
@ -46,7 +46,7 @@ install Python, so make sure to use a container or environment with the version
|
||||
While the agent is running, it continuously reports system metrics to the ClearML Server (these can be monitored in the
|
||||
[**Orchestration**](webapp/webapp_workers_queues.md) page).
|
||||
|
||||
Continue using ClearML Agent once it is running on a target machine. Reproduce tasks and execute
|
||||
Continue using ClearML Agent once it is running on a target machine. Reproducing task runs and execute
|
||||
automated workflows in one (or both) of the following ways:
|
||||
* Programmatically (using [`Task.enqueue()`](references/sdk/task.md#taskenqueue) or [`Task.execute_remotely()`](references/sdk/task.md#execute_remotely))
|
||||
* Through the ClearML Web UI (without working directly with code), by cloning tasks and enqueuing them to the
|
||||
|
@ -27,6 +27,7 @@ but can be overridden by command-line arguments.
|
||||
|**CLEARML_AGENT_DOCKER_ARGS_HIDE_ENV** | Hide Docker environment variables containing secrets when printing out the Docker command. When printed, the variable values will be replaced by `********`. See [`agent.hide_docker_command_env_vars`](../configs/clearml_conf.md#hide_docker) |
|
||||
|**CLEARML_AGENT_DISABLE_SSH_MOUNT** | Disables the auto `.ssh` mount into the docker |
|
||||
|**CLEARML_AGENT_FORCE_CODE_DIR**| Allows overriding the remote execution code directory to bypass repository cloning and use a repo already available where the remote agent is running. |
|
||||
|**CLEARML_AGENT_FORCE_UV**| If set to `1`, force the agent to use UV as the package manager. Overrides the default manager set in the [clearml.conf](../configs/clearml_conf.md) under `agent.package_manager.type` |
|
||||
|**CLEARML_AGENT_FORCE_EXEC_SCRIPT**| Allows overriding the remote execution script to bypass repository cloning and execute code already available where the remote agent is running. Use `module:file.py` format to specify a module and a script to execute (e.g. `.:main.py` to run `main.py` from the working dir)|
|
||||
|**CLEARML_AGENT_FORCE_TASK_INIT**| If set to `1`, ClearML Agent adds `Task.init()` to scripts that do not have the call, creating a Task to capture code execution information and output, which is then sent to the ClearML Server. If set to `0` and the script does not include `Task.init()`, the agent will capture only the output streams and console output, without tracking code execution details, metrics, or models. |
|
||||
|**CLEARML_AGENT_FORCE_SYSTEM_SITE_PACKAGES** | If set to `1`, overrides default [`agent.package_manager.system_site_packages: true`](../configs/clearml_conf.md#system_site_packages) behavior when running tasks in containers (docker mode and k8s-glue)|
|
||||
|
@ -13,6 +13,7 @@ multiple tasks (see [Virtual Environment Reuse](clearml_agent_env_caching.md#vir
|
||||
ClearML Agent supports working with one of the following package managers:
|
||||
* [`pip`](https://en.wikipedia.org/wiki/Pip_(package_manager)) (default)
|
||||
* [`conda`](https://docs.conda.io/en/latest/)
|
||||
* [`uv`](https://docs.astral.sh/uv/)
|
||||
* [`poetry`](https://python-poetry.org/)
|
||||
|
||||
To change the package manager used by the agent, edit the [`package_manager.type`](../configs/clearml_conf.md#agentpackage_manager)
|
||||
|
@ -80,7 +80,7 @@ For either setup, you can set up in your Enterprise ClearML Agent Helm chart the
|
||||
each queue. When a task is enqueued in ClearML, it translates into a Kubernetes pod running on the designated device
|
||||
with the specified fractional resource as defined in the Agent Helm chart.
|
||||
|
||||
#### MIG-enabled GPUs
|
||||
#### MIG-enabled GPUs
|
||||
The **ClearML Dynamic MIG Operator** (CDMO) chart enables running AI workloads on K8s with optimized hardware utilization
|
||||
and workload performance by facilitating MIG GPU partitioning. Make sure you have a [MIG capable GPU](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-gpus).
|
||||
|
||||
@ -232,7 +232,7 @@ ranging from 2 GB to 12 GB (see [clearml-fractional-gpu repository](https://gith
|
||||
|
||||
This example runs the ClearML Ubuntu 22 with CUDA 12.3 container on GPU 0, which is limited to use up to 8GB of its memory.
|
||||
:::note
|
||||
--pid=host is required to allow the driver to differentiate between the container's processes and other host processes when limiting memory usage
|
||||
`--pid=host` is required to allow the driver to differentiate between the container's processes and other host processes when limiting memory usage
|
||||
:::
|
||||
1. Run the following command inside the container to verify that the fractional gpu memory limit is working correctly:
|
||||
```bash
|
||||
|
@ -40,7 +40,7 @@ it can't do that when running from a virtual environment.
|
||||
If the setup wizard's response indicates that a configuration file already exists, follow the instructions [here](#adding-clearml-agent-to-a-configuration-file).
|
||||
The wizard does not edit or overwrite existing configuration files.
|
||||
|
||||
1. At the command prompt `Paste copied configuration here:`, copy and paste the ClearML credentials and press **Enter**.
|
||||
1. At the command prompt `Paste copied configuration here:`, paste the ClearML credentials and press **Enter**.
|
||||
The setup wizard confirms the credentials.
|
||||
|
||||
```
|
||||
|
@ -2,7 +2,7 @@
|
||||
title: ClearML Python Package
|
||||
---
|
||||
|
||||
This is step-by-step guide for installing the `clearml` Python package and connecting it to the ClearML Server. Once done,
|
||||
This is a step-by-step guide for installing the `clearml` Python package and connecting it to the ClearML Server. Once done,
|
||||
you can integrate `clearml` into your code.
|
||||
|
||||
## Install ClearML
|
||||
@ -68,7 +68,7 @@ pip install clearml
|
||||
The **LOCAL PYTHON** tab shows the data required by the setup wizard (a copy to clipboard action is available on
|
||||
hover).
|
||||
|
||||
1. At the command prompt `Paste copied configuration here:`, copy and paste the ClearML credentials.
|
||||
1. At the command prompt `Paste copied configuration here:`, paste the ClearML credentials.
|
||||
The setup wizard verifies the credentials.
|
||||
```console
|
||||
Detected credentials key="********************" secret="*******"
|
||||
|
@ -71,8 +71,8 @@ optimization.
|
||||
from clearml import Task
|
||||
|
||||
task = Task.init(
|
||||
project_name='Hyper-Parameter Optimization',
|
||||
task_name='Automatic Hyper-Parameter Optimization',
|
||||
project_name='Hyperparameter Optimization',
|
||||
task_name='Automatic Hyperparameter Optimization',
|
||||
task_type=Task.TaskTypes.optimizer,
|
||||
reuse_last_task_id=False
|
||||
)
|
||||
|
@ -65,6 +65,7 @@ After invoking `Task.init` in a script, ClearML starts its automagical logging,
|
||||
* [argparse](../guides/reporting/hyper_parameters.md#argparse-command-line-options)
|
||||
* [Python Fire](../integrations/python_fire.md)
|
||||
* [LightningCLI](../integrations/pytorch_lightning.md)
|
||||
* [jsonargparse](../integrations/jsonargparse.md)
|
||||
* TensorFlow Definitions (`absl-py`)
|
||||
* [Hydra](../integrations/hydra.md) - ClearML logs the OmegaConf which holds all the configuration files, as well as values overridden during runtime.
|
||||
* **Models** - ClearML automatically logs and updates the models and all snapshot paths saved with the following frameworks:
|
||||
@ -74,6 +75,7 @@ After invoking `Task.init` in a script, ClearML starts its automagical logging,
|
||||
* [AutoKeras](../integrations/autokeras.md)
|
||||
* [CatBoost](../integrations/catboost.md)
|
||||
* [Fast.ai](../integrations/fastai.md)
|
||||
* [Hugging Face Transformers](../integrations/transformers.md)
|
||||
* [LightGBM](../integrations/lightgbm.md)
|
||||
* [MegEngine](../integrations/megengine.md)
|
||||
* [MONAI](../integrations/monai.md)
|
||||
|
@ -15,7 +15,7 @@ The following page goes over how to set up and upgrade `clearml-serving`.
|
||||
[free hosted service](https://app.clear.ml)
|
||||
1. Connect `clearml` SDK to the server, see instructions [here](../clearml_sdk/clearml_sdk_setup#install-clearml)
|
||||
|
||||
1. Install clearml-serving CLI:
|
||||
1. Install the `clearml-serving` CLI:
|
||||
|
||||
```bash
|
||||
pip3 install clearml-serving
|
||||
@ -27,21 +27,22 @@ The following page goes over how to set up and upgrade `clearml-serving`.
|
||||
clearml-serving create --name "serving example"
|
||||
```
|
||||
|
||||
The new serving service UID should be printed
|
||||
This command prints the Serving Service UID:
|
||||
|
||||
```console
|
||||
New Serving Service created: id=aa11bb22aa11bb22
|
||||
```
|
||||
|
||||
Write down the Serving Service UID
|
||||
Copy the Serving Service UID (e.g., `aa11bb22aa11bb22`), as you will need it in the next steps.
|
||||
|
||||
1. Clone the `clearml-serving` repository:
|
||||
```bash
|
||||
git clone https://github.com/clearml/clearml-serving.git
|
||||
```
|
||||
|
||||
1. Edit the environment variables file (docker/example.env) with your clearml-server credentials and Serving Service UID.
|
||||
For example, you should have something like
|
||||
1. Edit the environment variables file (`docker/example.env`) with your `clearml-server` API credentials and Serving Service UID.
|
||||
For example:
|
||||
|
||||
```bash
|
||||
cat docker/example.env
|
||||
```
|
||||
@ -55,31 +56,30 @@ The following page goes over how to set up and upgrade `clearml-serving`.
|
||||
CLEARML_SERVING_TASK_ID="<serving_service_id_here>"
|
||||
```
|
||||
|
||||
1. Spin up the `clearml-serving` containers with `docker-compose` (or if running on Kubernetes, use the helm chart)
|
||||
1. Spin up the `clearml-serving` containers with `docker-compose` (or if running on Kubernetes, use the helm chart):
|
||||
|
||||
```bash
|
||||
cd docker && docker-compose --env-file example.env -f docker-compose.yml up
|
||||
```
|
||||
|
||||
If you need Triton support (keras/pytorch/onnx etc.), use the triton docker-compose file
|
||||
If you need Triton support (Keras/PyTorch/ONNX etc.), use the triton `docker-compose` file:
|
||||
```bash
|
||||
cd docker && docker-compose --env-file example.env -f docker-compose-triton.yml up
|
||||
```
|
||||
|
||||
If running on a GPU instance with Triton support (keras/pytorch/onnx etc.), use the triton gpu docker-compose file:
|
||||
If running on a GPU instance with Triton support (Keras/PyTorch/ONNX etc.), use the triton gpu docker-compose file:
|
||||
```bash
|
||||
cd docker && docker-compose --env-file example.env -f docker-compose-triton-gpu.yml up
|
||||
```
|
||||
|
||||
:::note
|
||||
Any model that registers with Triton engine will run the pre/post-processing code on the Inference service container,
|
||||
Any model that registers with Triton engine will run the pre/post-processing code in the Inference service container,
|
||||
and the model inference itself will be executed on the Triton Engine container.
|
||||
:::
|
||||
|
||||
## Advanced Setup - S3/GS/Azure Access (Optional)
|
||||
To add access credentials and allow the inference containers to download models from your S3/GS/Azure object-storage,
|
||||
add the respective environment variables to your env files (example.env). For further details, see
|
||||
[Configuring Storage](../integrations/storage.md#configuring-storage).
|
||||
To enable inference containers to download models from S3, Google Cloud Storage (GS), or Azure,
|
||||
add access credentials in the respective environment variables to your env files (`example.env`):
|
||||
|
||||
```
|
||||
AWS_ACCESS_KEY_ID
|
||||
@ -92,14 +92,21 @@ AZURE_STORAGE_ACCOUNT
|
||||
AZURE_STORAGE_KEY
|
||||
```
|
||||
|
||||
For further details, see [Configuring Storage](../integrations/storage.md#configuring-storage).
|
||||
|
||||
## Upgrading ClearML Serving
|
||||
|
||||
**Upgrading to v1.1**
|
||||
|
||||
1. Take down the serving containers (`docker-compose` or k8s)
|
||||
1. Update the `clearml-serving` CLI `pip3 install -U clearml-serving`
|
||||
1. Shut down the serving containers (`docker-compose` or k8s)
|
||||
1. Update the `clearml-serving` CLI:
|
||||
|
||||
```
|
||||
pip3 install -U clearml-serving
|
||||
```
|
||||
|
||||
1. Re-add a single existing endpoint with `clearml-serving model add ...` (press yes when asked). It will upgrade the
|
||||
`clearml-serving` session definitions
|
||||
`clearml-serving` session definitions.
|
||||
1. Pull the latest serving containers (`docker-compose pull ...` or k8s)
|
||||
1. Re-spin serving containers (`docker-compose` or k8s)
|
||||
|
||||
|
@ -212,7 +212,7 @@ Example:
|
||||
ClearML serving instances send serving statistics (count/latency) automatically to Prometheus and Grafana can be used
|
||||
to visualize and create live dashboards.
|
||||
|
||||
The default docker-compose installation is preconfigured with Prometheus and Grafana. Notice that by default data/ate
|
||||
The default `docker-compose` installation is preconfigured with Prometheus and Grafana. Notice that by default data/ate
|
||||
of both containers is *not* persistent. To add persistence, adding a volume mount is recommended.
|
||||
|
||||
You can also add many custom metrics on the input/predictions of your models. Once a model endpoint is registered,
|
||||
|
@ -22,7 +22,7 @@ The values in the ClearML configuration file can be overridden by environment va
|
||||
and command-line arguments.
|
||||
:::
|
||||
|
||||
# Editing Your Configuration File
|
||||
## Editing Your Configuration File
|
||||
|
||||
To add, change, or delete options, edit your configuration file.
|
||||
|
||||
@ -515,8 +515,12 @@ These settings define which Docker image and arguments should be used unless [ex
|
||||
|
||||
**`agent.package_manager`** (*dict*)
|
||||
|
||||
* Dictionary containing the options for the Python package manager. The currently supported package managers are pip, conda,
|
||||
and, if the repository contains a `poetry.lock` file, poetry.
|
||||
* Dictionary containing the options for the Python package manager.
|
||||
* The currently supported package managers are
|
||||
* pip
|
||||
* conda
|
||||
* uv, if the root repository contains a `uv.lock` or `pyproject.toml` file
|
||||
* poetry, if the repository contains a `poetry.lock` or `pyproject.toml` file
|
||||
|
||||
---
|
||||
|
||||
@ -661,13 +665,38 @@ Torch Nightly builds are ephemeral and are deleted from time to time.
|
||||
* `pip`
|
||||
* `conda`
|
||||
* `poetry`
|
||||
* `uv`
|
||||
|
||||
* If `pip` or `conda` are used, the agent installs the required packages based on the "Python Packages" section of the
|
||||
Task. If the "Python Packages" section is empty, it will revert to using `requirements.txt` from the repository's root
|
||||
directory. If `poetry` is selected, and the root repository contains `poetry.lock` or `pyproject.toml`, the "Python
|
||||
directory.
|
||||
* If `poetry` is selected, and the root repository contains `poetry.lock` or `pyproject.toml`, the "Python
|
||||
Packages" section is ignored, and `poetry` is used. If `poetry` is selected and no lock file is found, it reverts to
|
||||
`pip` package manager behaviour.
|
||||
|
||||
* If `uv` is selected, and the root repository contains `uv.lock` or `pyproject.toml`, the "Python
|
||||
Packages" section is ignored, and `uv` is used. If `uv` is selected and no lock file is found, it reverts to
|
||||
`pip` package manager behaviour.
|
||||
|
||||
---
|
||||
|
||||
**`agent.package_manager.uv_files_from_repo_working_dir`** (*bool*)
|
||||
|
||||
* If set to `true`, the agent will look for the `uv.lock` or `pyproject.toml` file in the provided directory path instead of
|
||||
the repository's root directory.
|
||||
|
||||
---
|
||||
|
||||
**`agent.package_manager.uv_sync_extra_args`** (*list*)
|
||||
|
||||
* List extra command-line arguments to pass when using `uv`.
|
||||
|
||||
---
|
||||
|
||||
**`agent.package_manager.uv_version`** (*string*)
|
||||
|
||||
* The `uv` version requirements. For example, `">0.4"`, `"==0.4"`, `""` (empty string will install the latest version).
|
||||
|
||||
|
||||
<br/>
|
||||
|
||||
#### agent.pip_download_cache
|
||||
@ -1548,7 +1577,7 @@ environment {
|
||||
}
|
||||
```
|
||||
|
||||
### files section
|
||||
### files section
|
||||
|
||||
**`files`** (*dict*)
|
||||
|
||||
|
@ -4,7 +4,7 @@ title: ClearML Server
|
||||
|
||||
## What is ClearML Server?
|
||||
The ClearML Server is the backend service infrastructure for ClearML. It allows multiple users to collaborate and
|
||||
manage their tasks by working seamlessly with the ClearML Python package and [ClearML Agent](../clearml_agent.md).
|
||||
manage their tasks by working seamlessly with the [ClearML Python package](../clearml_sdk/clearml_sdk_setup.md) and [ClearML Agent](../clearml_agent.md).
|
||||
|
||||
ClearML Server is composed of the following:
|
||||
* Web server including the [ClearML Web UI](../webapp/webapp_overview.md), which is the user interface for tracking, comparing, and managing tasks.
|
||||
|
@ -233,7 +233,7 @@ The following example, which is based on AWS load balancing, demonstrates the co
|
||||
|
||||
|
||||
|
||||
### Opening Elasticsearch, MongoDB, and Redis for External Access
|
||||
### Opening Elasticsearch, MongoDB, and Redis for External Access
|
||||
|
||||
For improved security, the ports for ClearML Server Elasticsearch, MongoDB, and Redis servers are not exposed by default;
|
||||
they are only open internally in the docker network. If external access is needed, open these ports (but make sure to
|
||||
|
@ -5,8 +5,8 @@ title: Linux and macOS
|
||||
Deploy the ClearML Server in Linux or macOS using the pre-built Docker image.
|
||||
|
||||
For ClearML docker images, including previous versions, see [https://hub.docker.com/r/allegroai/clearml](https://hub.docker.com/r/allegroai/clearml).
|
||||
However, pulling the ClearML Docker image directly is not required. ClearML provides a docker-compose YAML file that does this.
|
||||
The docker-compose file is included in the instructions on this page.
|
||||
However, pulling the ClearML Docker image directly is not required. ClearML provides a `docker-compose` YAML file that does this.
|
||||
The `docker-compose` file is included in the instructions on this page.
|
||||
|
||||
For information about upgrading ClearML Server in Linux or macOS, see [here](upgrade_server_linux_mac.md).
|
||||
|
||||
@ -134,7 +134,7 @@ Deploying the server requires a minimum of 8 GB of memory, 16 GB is recommended.
|
||||
sudo chown -R $(whoami):staff /opt/clearml
|
||||
```
|
||||
|
||||
2. Download the ClearML Server docker-compose YAML file.
|
||||
2. Download the ClearML Server `docker-compose` YAML file:
|
||||
```
|
||||
sudo curl https://raw.githubusercontent.com/clearml/clearml-server/master/docker/docker-compose.yml -o /opt/clearml/docker-compose.yml
|
||||
```
|
||||
|
@ -54,7 +54,7 @@ Deploying the server requires a minimum of 8 GB of memory, 16 GB is recommended.
|
||||
mkdir c:\opt\clearml\logs
|
||||
```
|
||||
|
||||
1. Save the ClearML Server docker-compose YAML file.
|
||||
1. Save the ClearML Server `docker-compose` YAML file.
|
||||
|
||||
```
|
||||
curl https://raw.githubusercontent.com/clearml/clearml-server/master/docker/docker-compose-win10.yml -o c:\opt\clearml\docker-compose-win10.yml
|
||||
|
@ -29,12 +29,12 @@ The `General` section is the root-level section of the configuration file, and c
|
||||
* `id` - A unique id for the application
|
||||
* `name` - The name to display in the web application
|
||||
* `version` - The version of the application implementation. Recommended to have three numbers and to bump up when updating applications, so that older running instances can still be displayed
|
||||
* `provider` - The person/team/group who is the owner of the application. This will appears in the UI
|
||||
* `provider` - The person/team/group who is the owner of the application. This will appear in the UI
|
||||
* `description` - Short description of the application to be displayed in the ClearML Web UI
|
||||
* `icon` (*Optional*) - Small image to display in the ClearML web UI as an icon for the application. Can be a public web url or an image in the application’s assets directory (described below)
|
||||
* `no_info_html` (*Optional*) - HTML content to display as a placeholder for the dashboard when no instance is available. Can be a public web url or a file in the application’s assets directory (described below)
|
||||
* `default-queue` - The queue to which application instance will be sent when launching a new instance. This queue should have an appropriate agent servicing it. See details in the Custom Apps Agent section below.
|
||||
* `badges` (*Optional*) - List of strings to display as a bacge/label in the UI
|
||||
* `badges` (*Optional*) - List of strings to display as a badge/label in the UI
|
||||
* `resumable` - Boolean indication whether a running application instance can be restarted if required. Default is false.
|
||||
* `category` (*Optional*) - Way to separate apps into different tabs in the ClearML web UI
|
||||
* `featured` (*Optional*) - Value affecting the order of applications. Lower values are displayed first. Defaults to 500
|
||||
@ -61,7 +61,7 @@ The `task` section describes the task to run, containing the following fields:
|
||||
* `branch` - The branch to use
|
||||
* `entry_point` - The python file to run
|
||||
* `working_dir` - The directory to run it from
|
||||
* `hyperparams` (*Optional*) - A list of the task’s hyper-parameters used by the application, with their default values. There is no need to specify all the parameters here, but it enables summarizing of the parameters that will be targeted by the wizard entries described below, and allows to specify default values to optional parameters appearing in the wizard.
|
||||
* `hyperparams` (*Optional*) - A list of the task’s hyperparameters used by the application, with their default values. There is no need to specify all the parameters here, but it enables summarizing of the parameters that will be targeted by the wizard entries described below, and allows to specify default values to optional parameters appearing in the wizard.
|
||||
|
||||
#### Example
|
||||
The `task` section in the simple application example:
|
||||
@ -120,8 +120,8 @@ The `wizard` section defines the entries to display in the application instance
|
||||
* `model`
|
||||
* `queue`
|
||||
* `dataset_version`
|
||||
* `display_field` - The field of the source object to display in the list. Usually “name”
|
||||
* `value_field` - The field of the source object to use for configuring the app instance. Usually “id”
|
||||
* `display_field` - The field of the source object to display in the list. Usually "name"
|
||||
* `value_field` - The field of the source object to use for configuring the app instance. Usually "id"
|
||||
* `filter` - Allows to limit the choices list by setting a filter on one or more of the object’s fields. See Project Selection example below
|
||||
* `target` - Where in the application instance’s task the values will be set. Contains the following:
|
||||
* `field` - Either `configuration` or `hyperparams`
|
||||
@ -264,7 +264,7 @@ The dashboard elements are organized into lines.
|
||||
|
||||
The section contains the following information:
|
||||
* `lines` - The array of line elements, each containing:
|
||||
* `style` - CSS definitions for the line e.g setting the line height
|
||||
* `style` - CSS definitions for the line e.g. setting the line height
|
||||
* `contents` - An array of dashboard elements to display in a given line. Each element may have several fields:
|
||||
* `title` - Text to display at the top of the field
|
||||
* `type` - one of the following:
|
||||
@ -276,7 +276,7 @@ The section contains the following information:
|
||||
* hyperparameter
|
||||
* configuration
|
||||
* html
|
||||
* `text` - For HTML. You can refer to task elements such as hyper-parameters by using `${hyperparams.<section>.<parameter name>.value}`
|
||||
* `text` - For HTML. You can refer to task elements such as hyperparameters by using `${hyperparams.<section>.<parameter name>.value}`
|
||||
* `metric` - For plot, scalar-histogram, debug-images, scalar - Name of the metric
|
||||
* `variant` - For plot, scalar-histogram, debug-images, scalar - List of variants to display
|
||||
* `key` - For histograms, one of the following: `iter`, `timestamp` or, `iso_time`
|
||||
|
@ -13,7 +13,7 @@ without any coding. Applications are installed on top of the ClearML Server.
|
||||
To run application you will need the following:
|
||||
* RAM: Make sure you have at least 400 MB of RAM per application instance.
|
||||
* Applications Service: Make sure that the applications agent service is up and running on your server:
|
||||
* If you are using a docker-compose solution, make sure that the clearml-apps-agent service is running.
|
||||
* If you are using a `docker-compose` solution, make sure that the clearml-apps-agent service is running.
|
||||
* If you are using a Kubernetes cluster, check for the clearml-clearml-enterprise-apps component.
|
||||
* Installation Files: Each application has its installation zip file. Make sure you have the relevant files for the
|
||||
applications you wish to install.
|
||||
|
@ -30,12 +30,12 @@ their instances:
|
||||
* [Embedding Model Deployment](../../webapp/applications/apps_embed_model_deployment.md)
|
||||
* [Llama.cpp Model Deployment](../../webapp/applications/apps_llama_deployment.md)
|
||||
|
||||
The AI Application Gateway is provided through an additional component to the ClearML Server deployment: The ClearML Task Traffic Router.
|
||||
If your ClearML Deployment does not have the Task Traffic Router properly installed, these application instances may not be accessible.
|
||||
The AI Application Gateway requires an additional component to the ClearML Server deployment: the **ClearML App Gateway Router**.
|
||||
If your ClearML Deployment does not have the App Gateway Router properly installed, these application instances may not be accessible.
|
||||
|
||||
#### Installation
|
||||
|
||||
The Task Traffic Router supports two deployment options:
|
||||
The App Gateway Router supports two deployment options:
|
||||
|
||||
* [Docker Compose](appgw_install_compose.md)
|
||||
* [Kubernetes](appgw_install_k8s.md)
|
||||
|
@ -13,11 +13,11 @@ The Application Gateway is available under the ClearML Enterprise plan.
|
||||
* Credentials for the ClearML/allegroai docker repository
|
||||
* A valid ClearML Server installation
|
||||
|
||||
## Host configurations
|
||||
## Host Configurations
|
||||
|
||||
### Docker installation
|
||||
### Docker Installation
|
||||
|
||||
Installing docker and docker-compose might vary depending on the specific operating system you’re using. Here is an example for AmazonLinux:
|
||||
Installing `docker` and `docker-compose` might vary depending on the specific operating system you’re using. Here is an example for AmazonLinux:
|
||||
|
||||
```
|
||||
sudo dnf -y install docker
|
||||
@ -33,87 +33,82 @@ sudo docker login
|
||||
|
||||
Use the ClearML/allegroai dockerhub credentials when prompted by docker login.
|
||||
|
||||
### Docker-compose file
|
||||
### Docker-compose File
|
||||
|
||||
This is an example of the docker-compose file you will need:
|
||||
This is an example of the `docker-compose` file you will need:
|
||||
|
||||
```
|
||||
version: '3.5'
|
||||
services:
|
||||
task_traffic_webserver:
|
||||
image: allegroai/task-traffic-router-webserver:${TASK-TRAFFIC-ROUTER-WEBSERVER-TAG}
|
||||
ports:
|
||||
- "80:8080"
|
||||
restart: unless-stopped
|
||||
container_name: task_traffic_webserver
|
||||
volumes:
|
||||
- ./task_traffic_router/config/nginx:/etc/nginx/conf.d:ro
|
||||
- ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:ro
|
||||
task_traffic_router:
|
||||
image: allegroai/task-traffic-router:${TASK-TRAFFIC-ROUTER-TAG}
|
||||
restart: unless-stopped
|
||||
container_name: task_traffic_router
|
||||
volumes:
|
||||
- /var/run/docker.sock:/var/run/docker.sock
|
||||
- ./task_traffic_router/config/nginx:/etc/nginx/conf.d:rw
|
||||
- ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:rw
|
||||
environment:
|
||||
- LOGGER_LEVEL=INFO
|
||||
- CLEARML_API_HOST=${CLEARML_API_HOST:?err}
|
||||
- CLEARML_API_ACCESS_KEY=${CLEARML_API_ACCESS_KEY:?err}
|
||||
- CLEARML_API_SECRET_KEY=${CLEARML_API_SECRET_KEY:?err}
|
||||
- ROUTER_URL=${ROUTER_URL:?err}
|
||||
- ROUTER_NAME=${ROUTER_NAME:?err}
|
||||
- AUTH_ENABLED=${AUTH_ENABLED:?err}
|
||||
- SSL_VERIFY=${SSL_VERIFY:?err}
|
||||
- AUTH_COOKIE_NAME=${AUTH_COOKIE_NAME:?err}
|
||||
- AUTH_BASE64_JWKS_KEY=${AUTH_BASE64_JWKS_KEY:?err}
|
||||
- LISTEN_QUEUE_NAME=${LISTEN_QUEUE_NAME}
|
||||
- EXTRA_BASH_COMMAND=${EXTRA_BASH_COMMAND}
|
||||
- TCP_ROUTER_ADDRESS=${TCP_ROUTER_ADDRESS}
|
||||
- TCP_PORT_START=${TCP_PORT_START}
|
||||
- TCP_PORT_END=${TCP_PORT_END}
|
||||
|
||||
task_traffic_webserver:
|
||||
image: clearml/ai-gateway-proxy:${PROXY_TAG:?err}
|
||||
network_mode: "host"
|
||||
restart: unless-stopped
|
||||
container_name: task_traffic_webserver
|
||||
volumes:
|
||||
- ./task_traffic_router/config/nginx:/etc/nginx/conf.d:ro
|
||||
- ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:ro
|
||||
task_traffic_router:
|
||||
image: clearml/ai-gateway-router:${ROUTER_TAG:?err}
|
||||
restart: unless-stopped
|
||||
container_name: task_traffic_router
|
||||
volumes:
|
||||
- /var/run/docker.sock:/var/run/docker.sock
|
||||
- ./task_traffic_router/config/nginx:/etc/nginx/conf.d:rw
|
||||
- ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:rw
|
||||
environment:
|
||||
- ROUTER_NAME=${ROUTER_NAME:?err}
|
||||
- ROUTER__WEBSERVER__SERVER_PORT=${ROUTER__WEBSERVER__SERVER_PORT:?err}
|
||||
- ROUTER_URL=${ROUTER_URL:?err}
|
||||
- CLEARML_API_HOST=${CLEARML_API_HOST:?err}
|
||||
- CLEARML_API_ACCESS_KEY=${CLEARML_API_ACCESS_KEY:?err}
|
||||
- CLEARML_API_SECRET_KEY=${CLEARML_API_SECRET_KEY:?err}
|
||||
- AUTH_COOKIE_NAME=${AUTH_COOKIE_NAME:?err}
|
||||
- AUTH_SECURE_ENABLED=${AUTH_SECURE_ENABLED}
|
||||
- TCP_ROUTER_ADDRESS=${TCP_ROUTER_ADDRESS}
|
||||
- TCP_PORT_START=${TCP_PORT_START}
|
||||
- TCP_PORT_END=${TCP_PORT_END}
|
||||
```
|
||||
|
||||
Create a *runtime.env* file containing the following entries:
|
||||
Create a `runtime.env` file containing the following entries:
|
||||
|
||||
```
|
||||
TASK-TRAFFIC-ROUTER-WEBSERVER-TAG=
|
||||
TASK-TRAFFIC-ROUTER-TAG=
|
||||
CLEARML_API_HOST=https://api.
|
||||
PROXY_TAG=
|
||||
ROUTER_TAG=
|
||||
ROUTER_NAME=main-router
|
||||
ROUTER__WEBSERVER__SERVER_PORT=8010
|
||||
ROUTER_URL=
|
||||
CLEARML_API_HOST=
|
||||
CLEARML_API_ACCESS_KEY=
|
||||
CLEARML_API_SECRET_KEY=
|
||||
ROUTER_URL=
|
||||
ROUTER_NAME=main-router
|
||||
AUTH_ENABLED=true
|
||||
SSL_VERIFY=true
|
||||
AUTH_COOKIE_NAME=
|
||||
AUTH_BASE64_JWKS_KEY=
|
||||
LISTEN_QUEUE_NAME=
|
||||
EXTRA_BASH_COMMAND=
|
||||
AUTH_SECURE_ENABLED=true
|
||||
TCP_ROUTER_ADDRESS=
|
||||
TCP_PORT_START=
|
||||
TCP_PORT_END=
|
||||
```
|
||||
|
||||
Edit it according to the following guidelines:
|
||||
|
||||
* `CLEARML_API_HOST`: URL usually starting with `https://api.`
|
||||
* `CLEARML_API_ACCESS_KEY`: ClearML server api key
|
||||
* `CLEARML_API_SECRET_KEY`: ClearML server secret key
|
||||
* `ROUTER_URL`: URL for this router that was previously configured in the load balancer starting with `https://`
|
||||
* `ROUTER_NAME`: unique name for this router
|
||||
* `AUTH_ENABLED`: enable or disable http calls authentication when the router is communicating with the ClearML server
|
||||
* `SSL_VERIFY`: enable or disable SSL certificate validation when the router is communicating with the ClearML server
|
||||
* `AUTH_COOKIE_NAME`: the cookie name used by the ClearML server to store the ClearML authentication cookie. This can usually be found in the `value_prefix` key starting with `allegro_token` in `envoy.yaml` file in the ClearML server installation (`/opt/allegro/config/envoy/envoy.yaml`) (see below)
|
||||
* `AUTH_SECURE_ENABLED`: enable the Set-Cookie `secure` parameter
|
||||
* `AUTH_BASE64_JWKS_KEY`: value form `k` key in the `jwks.json` file in the ClearML server installation
|
||||
* `LISTEN_QUEUE_NAME`: (optional) name of queue to check for tasks (if none, every task is checked)
|
||||
* `EXTRA_BASH_COMMAND`: command to be launched before starting the router
|
||||
* `TCP_ROUTER_ADDRESS`: router external address, can be an IP or the host machine or a load balancer hostname, depends on network configuration
|
||||
* `TCP_PORT_START`: start port for the TCP Session feature
|
||||
* `TCP_PORT_END`: end port port for the TCP Session feature
|
||||
**Configuration Options:**
|
||||
* `PROXY_TAG`: AI Application Gateway proxy tag. The Docker image tag for the proxy component, which needs to be
|
||||
specified during installation. This tag is provided by ClearML to ensure compatibility with the recommended version.
|
||||
* `ROUTER_TAG`: App Gateway Router tag. The Docker image tag for the router component. It defines the specific version
|
||||
to be installed and is provided by ClearML as part of the setup process.
|
||||
* `ROUTER_NAME`: In the case of [multiple routers on the same tenant](#multiple-router-in-the-same-tenant), each router
|
||||
needs to have a unique name.
|
||||
* `ROUTER__WEBSERVER__SERVER_PORT`: Webserver port. The default port is 8080, but it can be adjusted to meet specific network requirements.
|
||||
* `ROUTER_URL`: External address to access the router. This can be the IP address or DNS of the node where the router
|
||||
is running, or the address of a load balancer if the router operates behind a proxy/load balancer. This URL is used
|
||||
to access AI workload applications (e.g. remote IDE, model deployment, etc.), so it must be reachable and resolvable for them.
|
||||
* `CLEARML_API_HOST`: ClearML API server URL starting with `https://api.`
|
||||
* `CLEARML_API_ACCESS_KEY`: ClearML server API key.
|
||||
* `CLEARML_API_SECRET_KEY`: ClearML server secret key.
|
||||
* `AUTH_COOKIE_NAME`: Cookie used by the ClearML server to store the ClearML authentication cookie. This can usually be
|
||||
found in the `envoy.yaml` file in the ClearML server installation (`/opt/allegro/config/envoy/envoy.yaml`), under the
|
||||
`value_prefix` key starting with `allegro_token`
|
||||
* `AUTH_SECURE_ENABLED`: Enable the Set-Cookie `secure` parameter. Set to `false` in case services are exposed with `http`.
|
||||
* `TCP_ROUTER_ADDRESS`: Router external address, can be an IP or the host machine or a load balancer hostname, depends on network configuration
|
||||
* `TCP_PORT_START`: Start port for the TCP Session feature
|
||||
* `TCP_PORT_END`: End port for the TCP Session feature
|
||||
|
||||
Run the following command to start the router:
|
||||
|
||||
@ -121,12 +116,42 @@ Run the following command to start the router:
|
||||
sudo docker compose --env-file runtime.env up -d
|
||||
```
|
||||
|
||||
:::Note How to find my jwkskey
|
||||
### Advanced Configuration
|
||||
|
||||
The *JSON Web Key Set* (*JWKS*) is a set of keys containing the public keys used to verify any JSON Web Token (JWT).
|
||||
#### Using Open HTTP
|
||||
|
||||
In a docker-compose server installation, this can be found in the `CLEARML__secure__auth__token_secret` env var in the apiserver server component.
|
||||
To deploy the App Gateway Router on open HTTP (without a certificate), set the `AUTH_SECURE_ENABLED` entry
|
||||
to `false` in the `runtime.env` file.
|
||||
|
||||
:::
|
||||
#### Multiple Router in the Same Tenant
|
||||
|
||||
If you have workloads running in separate networks that cannot communicate with each other, you need to deploy multiple
|
||||
routers, one for each isolated environment. Each router will only process tasks from designated queues, ensuring that
|
||||
tasks are correctly routed to agents within the same network.
|
||||
|
||||
For example:
|
||||
* If Agent A and Agent B are in separate networks, each must have its own router to receive tasks.
|
||||
* Router A will handle tasks from Agent A’s queues. Router B will handle tasks from Agent B’s queues.
|
||||
|
||||
To achieve this, each router must be configured with:
|
||||
* A unique `ROUTER_NAME`
|
||||
* A distinct set of queues defined in `LISTEN_QUEUE_NAME`.
|
||||
|
||||
##### Example Configuration
|
||||
Each router's `runtime.env` file should include:
|
||||
|
||||
* Router A:
|
||||
|
||||
```
|
||||
ROUTER_NAME=router-a
|
||||
LISTEN_QUEUE_NAME=queue1,queue2
|
||||
```
|
||||
|
||||
* Router B:
|
||||
|
||||
```
|
||||
ROUTER_NAME=router-b
|
||||
LISTEN_QUEUE_NAME=queue3,queue4
|
||||
```
|
||||
|
||||
Make sure `LISTEN_QUEUE_NAME` is set in the [`docker-compose` environment variables](#docker-compose-file) for each router instance.
|
||||
|
@ -3,17 +3,26 @@ title: Kubernetes Deployment
|
||||
---
|
||||
|
||||
:::important Enterprise Feature
|
||||
The Application Gateway is available under the ClearML Enterprise plan.
|
||||
The AI Application Gateway is available under the ClearML Enterprise plan.
|
||||
:::
|
||||
|
||||
This guide details the installation of the ClearML App Gateway Router.
|
||||
The App Gateway Router enables access to your AI workload applications (e.g. remote IDEs like VSCode and Jupyter, model API interface, etc.).
|
||||
It acts as a proxy, identifying ClearML Tasks running within its [K8s namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/)
|
||||
and making them available for network access.
|
||||
|
||||
:::important
|
||||
The App Gateway Router must be installed in the same K8s namespace as a dedicated ClearML Agent.
|
||||
It can only configure access for ClearML Tasks within its own namespace.
|
||||
:::
|
||||
|
||||
This guide details the installation of the ClearML AI Application Gateway, specifically the ClearML Task Router Component.
|
||||
|
||||
## Requirements
|
||||
|
||||
* Kubernetes cluster: `>= 1.21.0-0 < 1.32.0-0`
|
||||
* Helm installed and configured
|
||||
* Helm token to access `allegroai` helm-chart repo
|
||||
* Credentials for `allegroai` docker repo
|
||||
* Helm token to access `clearml` helm-chart repo
|
||||
* Credentials for `clearml` docker repo
|
||||
* A valid ClearML Server installation
|
||||
|
||||
## Optional for HTTPS
|
||||
@ -26,62 +35,55 @@ This guide details the installation of the ClearML AI Application Gateway, speci
|
||||
### Login
|
||||
|
||||
```
|
||||
helm repo add allegroai-enterprise \
|
||||
helm repo add clearml-enterprise \
|
||||
https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages \
|
||||
--username <GITHUB_TOKEN> \
|
||||
--password <GITHUB_TOKEN>
|
||||
```
|
||||
|
||||
### Prepare values
|
||||
Replace `<GITHUB_TOKEN>` with your valid GitHub token that has access to the ClearML Enterprise Helm charts repository.
|
||||
|
||||
Before installing the TTR create an helm-override files named `task-traffic-router.values-override.yaml`:
|
||||
### Prepare Values
|
||||
|
||||
Before installing the App Gateway Router, create a Helm override file:
|
||||
|
||||
```
|
||||
imageCredentials:
|
||||
password: "<DOCKERHUB_TOKEN>"
|
||||
password: ""
|
||||
clearml:
|
||||
apiServerKey: ""
|
||||
apiServerSecret: ""
|
||||
apiServerUrlReference: "https://api."
|
||||
jwksKey: ""
|
||||
authCookieName: ""
|
||||
apiServerKey: ""
|
||||
apiServerSecret: ""
|
||||
apiServerUrlReference: ""
|
||||
authCookieName: ""
|
||||
sslVerify: true
|
||||
ingress:
|
||||
enabled: true
|
||||
hostName: "task-router.dev"
|
||||
enabled: true
|
||||
hostName: ""
|
||||
tcpSession:
|
||||
routerAddress: ""
|
||||
portRange:
|
||||
start:
|
||||
end:
|
||||
routerAddress: ""
|
||||
service:
|
||||
type: LoadBalancer
|
||||
portRange:
|
||||
start:
|
||||
end:
|
||||
```
|
||||
|
||||
Edit it accordingly to this guidelines:
|
||||
**Configuration options:**
|
||||
|
||||
* `clearml.apiServerUrlReference`: url usually starting with `https://api.`
|
||||
* `clearml.apiServerKey`: ClearML server api key
|
||||
* `clearml.apiServerSecret`: ClearML server secret key
|
||||
* `ingress.hostName`: url of router we configured previously for loadbalancer starting with `https://`
|
||||
* `clearml.sslVerify`: enable or disable SSL certificate validation on apiserver calls check
|
||||
* `clearml.authCookieName`: value from `value_prefix` key starting with `allegro_token` in `envoy.yaml` file in ClearML server installation.
|
||||
* `clearml.jwksKey`: value form `k` key in `jwks.json` file in ClearML server installation (see below)
|
||||
* `tcpSession.routerAddress`: router external address can be an IP or the host machine or a loadbalancer hostname, depends on the network configuration
|
||||
* `tcpSession.portRange.start`: start port for the TCP Session feature
|
||||
* `tcpSession.portRange.end`: end port port for the TCP Session feature
|
||||
|
||||
::: How to find my jwkskey
|
||||
|
||||
The *JSON Web Key Set* (*JWKS*) is a set of keys containing the public keys used to verify any JSON Web Token (JWT).
|
||||
|
||||
```
|
||||
kubectl -n clearml get secret clearml-conf \
|
||||
-o jsonpath='{.data.secure_auth_token_secret}' \
|
||||
| base64 -d && echo
|
||||
```
|
||||
|
||||
:::
|
||||
* `imageCredentials.password`: ClearML DockerHub Access Token.
|
||||
* `clearml.apiServerKey`: ClearML server API key.
|
||||
* `clearml.apiServerSecret`: ClearML server secret key.
|
||||
* `clearml.apiServerUrlReference`: ClearML API server URL starting with `https://api.`.
|
||||
* `clearml.authCookieName`: Cookie used by the ClearML server to store the ClearML authentication cookie.
|
||||
* `clearml.sslVerify`: Enable or disable SSL certificate validation on `apiserver` calls check.
|
||||
* `ingress.hostName`: Hostname of router used by the ingress controller to access it.
|
||||
* `tcpSession.routerAddress`: The external router address (can be an IP, hostname, or load balancer address) depending on your network setup. Ensure this address is accessible for TCP connections.
|
||||
* `tcpSession.service.type`: Service type used to expose TCP functionality, default is `NodePort`.
|
||||
* `tcpSession.portRange.start`: Start port for the TCP Session feature.
|
||||
* `tcpSession.portRange.end`: End port for the TCP Session feature.
|
||||
|
||||
|
||||
The whole list of supported configuration is available with the command:
|
||||
The full list of supported configuration is available with the command:
|
||||
|
||||
```
|
||||
helm show readme allegroai-enterprise/clearml-enterprise-task-traffic-router
|
||||
@ -94,9 +96,22 @@ To install the TTR component via Helm use the following command:
|
||||
```
|
||||
helm upgrade --install \
|
||||
<RELEASE_NAME> \
|
||||
-n <NAME_SPACE> \
|
||||
-n <WORKLOAD_NAMESPACE> \
|
||||
allegroai-enterprise/clearml-enterprise-task-traffic-router \
|
||||
--version <CURRENT CHART VERSION> \
|
||||
-f task-traffic-router.values-override.yaml
|
||||
--version <CHART_VERSION> \
|
||||
-f override.yaml
|
||||
```
|
||||
|
||||
Replace the placeholders with the following values:
|
||||
|
||||
* `<RELEASE_NAME>` - Unique name for the App Gateway Router within the K8s namespace. This is a required parameter in
|
||||
Helm, which identifies a specific installation of the chart. The release name also defines the router’s name and
|
||||
appears in the UI within AI workload application URLs (e.g. Remote IDE URLs). This can be customized to support multiple installations within the same
|
||||
namespace by assigning different release names.
|
||||
* `<WORKLOAD_NAMESPACE>` - [Kubernetes Namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/)
|
||||
where workloads will be executed. This namespace must be shared between a dedicated ClearML Agent and an App
|
||||
Gateway Router. The agent is responsible for monitoring its assigned task queues and spawning workloads within this
|
||||
namespace. The router monitors the same namespace for AI workloads (e.g. remote IDE applications). The router has a
|
||||
namespace-limited scope, meaning it can only detect and manage tasks within its
|
||||
assigned namespace.
|
||||
* `<CHART_VERSION>` - Version recommended by the ClearML Support Team.
|
@ -36,7 +36,7 @@ them before exporting.
|
||||
Execute the data tool within the `apiserver` container.
|
||||
|
||||
Open a bash session inside the `apiserver` container of the server:
|
||||
* In docker-compose:
|
||||
* In `docker-compose`:
|
||||
|
||||
```commandline
|
||||
sudo docker exec -it clearml-apiserver /bin/bash
|
||||
|
@ -100,9 +100,10 @@ Install the ClearML chart with the required configuration:
|
||||
1. Prepare the `overrides.yaml` file and input the following content. Make sure to replace `<BASE_DOMAIN>` and `<SSO_*>`
|
||||
with a valid domain that will have records pointing to the ingress controller accordingly.
|
||||
The credentials specified in `<SUPERVISOR_USER_KEY>` and `<SUPERVISOR_USER_SECRET>` can be used to log in as the
|
||||
supervisor user in the web UI.
|
||||
supervisor user in the web UI.
|
||||
|
||||
Note that the `<SUPERVISOR_USER_EMAIL>` value must be explicitly quoted. To do so, put `\\"` around the quoted value.
|
||||
For example `"\\"email@example.com\\””`
|
||||
For example `"\\"email@example.com\\””`.
|
||||
|
||||
```
|
||||
imageCredentials:
|
||||
@ -192,7 +193,7 @@ Install the ClearML chart with the required configuration:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
2. Install ClearML
|
||||
2. Install ClearML:
|
||||
|
||||
```
|
||||
helm install -n clearml \\
|
||||
@ -305,9 +306,9 @@ spec:
|
||||
kubernetes.io/metadata.name: clearml
|
||||
```
|
||||
|
||||
## Applications Installation
|
||||
## Application Installation
|
||||
|
||||
To install ClearML GUI applications, follow these steps:
|
||||
To install ClearML GUI applications:
|
||||
|
||||
1. Get the apps to install and the installation script by downloading and extracting the archive provided by ClearML
|
||||
|
||||
@ -491,7 +492,7 @@ To install the ClearML Agent Chart, follow these steps:
|
||||
-d '{"name":"default"}'
|
||||
```
|
||||
|
||||
### Tenant Namespace isolation with NetworkPolicies
|
||||
### Tenant Namespace Isolation with NetworkPolicies
|
||||
|
||||
To ensure network isolation for each tenant, you need to create a `NetworkPolicy` in the tenant namespace. This way
|
||||
the entire namespace/tenant will not accept any connection from other namespaces.
|
||||
@ -512,31 +513,30 @@ Create a `NetworkPolicy` in the tenant namespace with the following configuratio
|
||||
- podSelector: {}
|
||||
```
|
||||
|
||||
### Install Task Traffic Router Chart
|
||||
### Install the App Gateway Router Chart
|
||||
|
||||
Install the [Task Traffic Router](appgw.md) in your Kubernetes cluster, allowing it to manage and route tasks:
|
||||
Install the App Gateway Router in your Kubernetes cluster, allowing it to manage and route tasks:
|
||||
|
||||
1. Prepare the `overrides.yaml` file with the following content:
|
||||
|
||||
```
|
||||
imageCredentials:
|
||||
password: "<allegroaienterprise_DockerHub_TOKEN>"
|
||||
password: "<clearmlenterprise_DockerHub_TOKEN>"
|
||||
clearml:
|
||||
apiServerUrlReference: "<http://clearml-enterprise-apiserver.clearml:8008>"
|
||||
apiserverKey: "<TENANT_KEY>"
|
||||
apiserverSecret: "<TENANT_SECRET>"
|
||||
jwksKey: "<JWKS_KEY>"
|
||||
ingress:
|
||||
enabled: true
|
||||
hostName: "<unique url in same domain as apiserver/webserver>"
|
||||
```
|
||||
|
||||
2. Install Task Traffic Router in the specified tenant namespace:
|
||||
2. Install App Gateway Router in the specified tenant namespace:
|
||||
|
||||
```
|
||||
helm install -n <TENANT_NAMESPACE> \\
|
||||
clearml-ttr \\
|
||||
allegroai-enterprise/clearml-task-traffic-router \\
|
||||
clearml-enterprise/clearml-task-traffic-router \\
|
||||
--create-namespace \\
|
||||
-f overrides.yaml
|
||||
```
|
||||
|
@ -43,7 +43,7 @@ should be reviewed and modified prior to the server installation
|
||||
## Installing ClearML Server
|
||||
### Preliminary Steps
|
||||
|
||||
1. Install Docker CE
|
||||
1. Install Docker CE:
|
||||
|
||||
```
|
||||
https://docs.docker.com/install/linux/docker-ce/ubuntu/
|
||||
@ -113,10 +113,10 @@ should be reviewed and modified prior to the server installation
|
||||
sudo systemctl enable disable-thp
|
||||
```
|
||||
|
||||
1. Restart the machine
|
||||
1. Restart the machine.
|
||||
|
||||
### Installing the Server
|
||||
1. Remove any previous installation of ClearML Server
|
||||
1. Remove any previous installation of ClearML Server:
|
||||
|
||||
```
|
||||
sudo rm -R /opt/clearml/
|
||||
@ -141,7 +141,7 @@ should be reviewed and modified prior to the server installation
|
||||
sudo mkdir -pv /opt/allegro/config/onprem_poc
|
||||
```
|
||||
|
||||
1. Copy the following ClearML configuration files to `/opt/allegro`
|
||||
1. Copy the following ClearML configuration files to `/opt/allegro`:
|
||||
* `constants.env`
|
||||
* `docker-compose.override.yml`
|
||||
* `docker-compose.yml`
|
||||
@ -165,10 +165,13 @@ should be reviewed and modified prior to the server installation
|
||||
sudo docker login -u=$DOCKERHUB_USER -p=$DOCKERHUB_PASSWORD
|
||||
```
|
||||
|
||||
1. Start the `docker-compose` by changing directories to the directory containing the docker-compose files and running the following command:
|
||||
sudo docker-compose --env-file constants.env up -d
|
||||
|
||||
1. Verify web access by browsing to your URL (IP address) and port 8080.
|
||||
1. Start the `docker-compose` by changing directories to the directory containing the `docker-compose` files and running the following command:
|
||||
|
||||
```
|
||||
sudo docker-compose --env-file constants.env up -d
|
||||
```
|
||||
|
||||
1. Verify web access by browsing to your URL (IP address) and port 8080:
|
||||
|
||||
```
|
||||
http://<server_ip_here>:8080
|
||||
@ -191,7 +194,10 @@ the following subdomains should be forwarded to the corresponding ports on the s
|
||||
* `https://app.<domain>` should be forwarded to port 8080
|
||||
* `https://files.<domain>` should be forwarded to port 8081
|
||||
|
||||
|
||||
:::warning
|
||||
**Critical: Ensure no other ports are open to maintain the highest level of security.**
|
||||
:::
|
||||
|
||||
Additionally, ensure that the following URLs are correctly configured in the server's environment file:
|
||||
|
||||
|
@ -8,7 +8,7 @@ It covers the following:
|
||||
* Set up security groups and IAM role
|
||||
* Create EC2 instance with required disks
|
||||
* Install dependencies and mount disks
|
||||
* Deploy ClearML version using docker-compose
|
||||
* Deploy ClearML version using `docker-compose`
|
||||
* Set up load balancer and DNS
|
||||
* Set up server backup
|
||||
|
||||
@ -38,7 +38,7 @@ It is recommended to use a VPC with IPv6 enabled for future usage expansion.
|
||||
1. Create a security group for the main server (`clearml-main`):
|
||||
|
||||
* Ingress:
|
||||
* TCP port 10000, from the load balancer's security group
|
||||
* TCP port 10000 from the load balancer's security group
|
||||
* TCP port 22 from trusted IP addresses.
|
||||
* Egress: All addresses and ports
|
||||
|
||||
@ -117,10 +117,10 @@ Instance requirements:
|
||||
## Load Balancer
|
||||
|
||||
1. Create a TLS certificate:
|
||||
1. Choose a domain name to be used with the server. The main URL that will be used by the system’s users will be app.\<domain\>
|
||||
1. Choose a domain name to be used with the server. The main URL that will be used by the system’s users will be `app.<domain>`
|
||||
2. Create a certificate, with the following DNS names:
|
||||
1. \<domain name\>
|
||||
2. \*.\<domain name\>
|
||||
1. `<domain name>`
|
||||
2. `*.<domain name>`
|
||||
|
||||
2. Create the `envoy` target group for the server:
|
||||
1. Port: 10000
|
||||
@ -284,7 +284,7 @@ log would usually indicate the reason for the failure.
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Removing app containers
|
||||
### Removing App Containers
|
||||
|
||||
To remove old application containers, add the following to the cron:
|
||||
|
||||
|
@ -82,7 +82,7 @@ Currently, these runtime properties can only be set using an ClearML REST API ca
|
||||
endpoint, as follows:
|
||||
|
||||
* The body of the request must contain the `worker-id`, and the runtime property to add.
|
||||
* An expiry date is optional. Use the format `"expiry":<time>`. For example, `"expiry":86400` will set an expiry of 24 hours.
|
||||
* An expiry date is optional. Use the format `"expiry":<time>`. For example, `"expiry":86400` will set an expiry of 24 hours.
|
||||
* To delete the property, set the expiry date to zero, `"expiry":0`.
|
||||
|
||||
For example, to force a worker on for 24 hours:
|
||||
|
@ -6,9 +6,22 @@ ClearML provides a comprehensive set of monitoring tools to help effectively tra
|
||||
These tools offer both high-level overviews and detailed insights into task execution, resource
|
||||
utilization, and project performance.
|
||||
|
||||
## Offerings
|
||||
|
||||
### Project Dashboard
|
||||
## Project Overview
|
||||
|
||||
A project's **OVERVIEW** tab in the UI presents a general picture of a project:
|
||||
* Metric Snapshot – A graphical representation of selected metric values across project tasks, offering a quick assessment of progress.
|
||||
* Task Status Tracking – When a single metric variant is selected for the snapshot, task status is color-coded (e.g.,
|
||||
Completed, Aborted, Published, Failed) for better visibility.
|
||||
|
||||
Use the Metric Snapshot to track project progress and identify trends in task performance.
|
||||
|
||||
For more information, see [Project Overview](../webapp/webapp_project_overview.md).
|
||||
|
||||

|
||||

|
||||
|
||||
## Project Dashboard
|
||||
|
||||
:::info Pro Plan Offering
|
||||
The Project Dashboard app is available under the ClearML Pro plan.
|
||||
@ -28,16 +41,22 @@ For more information, see [Project Dashboard](../webapp/applications/apps_dashbo
|
||||

|
||||

|
||||
|
||||
### Project Overview
|
||||
## Task Monitoring
|
||||
|
||||
A project's **OVERVIEW** tab in the UI presents a general picture of a project:
|
||||
* Metric Snapshot – A graphical representation of selected metric values across project tasks, offering a quick assessment of progress.
|
||||
* Task Status Tracking – When a single metric variant is selected for the snapshot, task status is color-coded (e.g.,
|
||||
Completed, Aborted, Published, Failed) for better visibility.
|
||||
ClearML provides task monitoring capabilities through the [`clearml.automation.Monitor`](https://github.com/clearml/clearml/blob/master/clearml/automation/monitor.py)
|
||||
class. With this class you can implement monitoring workflows such as:
|
||||
|
||||
Use the Metric Snapshot to track project progress and identify trends in task performance.
|
||||
* Send notifications via Slack or other channels
|
||||
* Trigger automated responses based on specific task conditions
|
||||
|
||||
For more information, see [Project Overview](../webapp/webapp_project_overview.md).
|
||||
For a practical example, see the [Slack Alerts Example](../guides/services/slack_alerts.md), which demonstrates how to:
|
||||
|
||||
* Track task status (completion, failure, etc.)
|
||||
* Send notifications to a specified Slack channel
|
||||
* Retrieve task details such as status, console logs, and links to the ClearML Web UI
|
||||
|
||||
You can also configure filters for task types and projects to reduce unnecessary notifications.
|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
@ -14,7 +14,7 @@ powerful remote machine. This is useful for:
|
||||
* Managing execution through ClearML's queue system.
|
||||
|
||||
This guide focuses on transitioning a locally executed process to a remote machine for scalable execution. To learn how
|
||||
to reproduce a previously executed process on a remote machine, see [Reproducing Tasks](reproduce_tasks.md).
|
||||
to reproduce a previously executed process on a remote machine, see [Reproducing Task Runs](reproduce_tasks.md).
|
||||
|
||||
## Running a Task Remotely
|
||||
|
||||
|
@ -1,5 +1,5 @@
|
||||
---
|
||||
title: Reproducing Tasks
|
||||
title: Reproducing Task Runs
|
||||
---
|
||||
|
||||
:::note
|
||||
|
@ -31,7 +31,7 @@ The pip package also includes `clearml-data`. It can help you keep track of your
|
||||
|
||||
Both the 2 magic lines and the data tool will send all of their information to a ClearML server. This server then keeps an overview of your experiment runs and data sets over time, so you can always go back to a previous experiment, see how it was created and even recreate it exactly. Keep track of your best models by creating leaderboards based on your own metrics, and you can even directly compare multiple experiment runs, helping you to figure out the best way forward for your models.
|
||||
|
||||
To get started with a server right away, you can make use of the free tier. And when your needs grow, we've got you covered too! Just check out our website to find a tier that fits your organisation best. But, because we're open source, you can also host your own completely for free. We have AWS images, Google Cloud images, you can run it on docker-compose locally or even, if you really hate yourself, run it on a self-hosted kubernetes cluster using our helm charts.
|
||||
To get started with a server right away, you can make use of the free tier. And when your needs grow, we've got you covered too! Just check out our website to find a tier that fits your organisation best. But, because we're open source, you can also host your own completely for free. We have AWS images, Google Cloud images, you can run it on `docker-compose` locally or even, if you really hate yourself, run it on a self-hosted kubernetes cluster using our helm charts.
|
||||
|
||||
So, to recap: to get started, all you need is a pip package and a server to store everything. Easy right? But MLOps is much more than experiment and data management. It's also about automation and orchestration, which is exactly where the `clearml-agent` comes into play.
|
||||
|
||||
|
@ -18,22 +18,26 @@ The example does the following:
|
||||
|
||||
The loss and accuracy metric scalar plots appear in **SCALARS**, along with the resource utilization plots, which are titled **:monitor: machine**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Plots
|
||||
|
||||
The example calls Matplotlib methods to create several sample plots, and TensorBoard methods to plot histograms for layer density.
|
||||
They appear in **PLOTS**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Debug Samples
|
||||
|
||||
The example calls Matplotlib methods to log debug sample images. They appear in **DEBUG SAMPLES**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Hyperparameters
|
||||
|
||||
@ -55,17 +59,20 @@ task_params['hidden_dim'] = 512
|
||||
|
||||
Parameter dictionaries appear in **CONFIGURATION** **>** **HYPERPARAMETERS** **>** **General**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
The TensorFlow Definitions appear in the **TF_DEFINE** subsection.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Console
|
||||
|
||||
Text printed to the console for training appears in **CONSOLE**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Artifacts
|
||||
|
||||
@ -74,9 +81,11 @@ created using Keras.
|
||||
|
||||
The task info panel shows model tracking, including the model name and design in **ARTIFACTS** **>** **Output Model**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
Clicking on the model name takes you to the [model's page](../../../webapp/webapp_model_viewing.md), where you can view
|
||||
the model's details and access the model.
|
||||
|
||||

|
||||

|
||||

|
@ -25,31 +25,36 @@ The example script does the following:
|
||||
The loss and accuracy metric scalar plots appear in **SCALARS**, along with the resource utilization plots,
|
||||
which are titled **:monitor: machine**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Histograms
|
||||
|
||||
Histograms for layer density appear in **PLOTS**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Hyperparameters
|
||||
|
||||
ClearML automatically logs command line options generated with `argparse`, and TensorFlow Definitions.
|
||||
ClearML automatically logs command line options generated with `argparse` and TensorFlow Definitions.
|
||||
|
||||
Command line options appear in **CONFIGURATION** **>** **HYPERPARAMETERS** **>** **Args**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
TensorFlow Definitions appear in **TF_DEFINE**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Console
|
||||
|
||||
Text printed to the console for training progress, as well as all other console output, appear in **CONSOLE**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Configuration Objects
|
||||
|
||||
@ -64,4 +69,5 @@ task.connect_configuration(
|
||||
|
||||
It appears in **CONFIGURATION** **>** **CONFIGURATION OBJECTS** **>** **MyConfig**.
|
||||
|
||||

|
||||

|
||||

|
@ -12,16 +12,19 @@ and `matplotlib` to create a scatter diagram. When the script runs, it creates a
|
||||
ClearML automatically logs the scatter plot, which appears in the [task's page](../../../webapp/webapp_exp_track_visual.md)
|
||||
in the ClearML web UI, under **PLOTS**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Artifacts
|
||||
|
||||
Models created by the task appear in the task's **ARTIFACTS** tab.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
Clicking on the model name takes you to the [model's page](../../../webapp/webapp_model_viewing.md), where you can
|
||||
view the model's details and access the model.
|
||||
|
||||
|
||||

|
||||

|
||||

|
@ -16,30 +16,35 @@ The script does the following:
|
||||
The loss and accuracy metric scalar plots appear in the task's page in the **ClearML web UI**, under
|
||||
**SCALARS**. The also includes resource utilization plots, which are titled **:monitor: machine**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Hyperparameters
|
||||
|
||||
ClearML automatically logs command line options defined with `argparse`. They appear in **CONFIGURATION** **>**
|
||||
**HYPERPARAMETERS** **>** **Args**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Console
|
||||
|
||||
Text printed to the console for training progress, as well as all other console output, appear in **CONSOLE**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Artifacts
|
||||
|
||||
Models created by the task appear in the task's **ARTIFACTS** tab. ClearML automatically logs and tracks
|
||||
models and any snapshots created using PyTorch.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
Clicking on the model's name takes you to the [model's page](../../../webapp/webapp_model_viewing.md), where you can
|
||||
view the model's details and access the model.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
|
@ -14,5 +14,6 @@ the `examples` project.
|
||||
ClearML automatically captures the video data that is added to the `SummaryWriter` object, using the `add_video` method.
|
||||
The video appears in the task's **DEBUG SAMPLES** tab.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
|
@ -44,28 +44,33 @@ When the script runs, it logs:
|
||||
ClearML logs the scalars from training each network. They appear in the task's page in the **ClearML web UI**, under
|
||||
**SCALARS**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Summary of Hyperparameter Optimization
|
||||
|
||||
ClearML automatically logs the parameters of each task run in the hyperparameter search. They appear in tabular
|
||||
form in **PLOTS**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Artifacts
|
||||
|
||||
ClearML automatically stores the output model. It appears in **ARTIFACTS** **>** **Output Model**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
Model details, such as snap locations, appear in the **MODELS** tab.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
The model configuration is stored with the model.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Configuration Objects
|
||||
|
||||
@ -73,12 +78,14 @@ The model configuration is stored with the model.
|
||||
|
||||
ClearML automatically logs the TensorFlow Definitions, which appear in **CONFIGURATION** **>** **HYPERPARAMETERS**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
### Configuration
|
||||
|
||||
The Task configuration appears in **CONFIGURATION** **>** **General**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
|
||||
|
@ -20,24 +20,29 @@ In the **ClearML Web UI**, the PR Curve summaries appear in the task's page unde
|
||||
|
||||
* Blue PR curves
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
* Green PR curves
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
* Red PR curves
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Hyperparameters
|
||||
|
||||
ClearML automatically logs TensorFlow Definitions. They appear in **CONFIGURATION** **>** **HYPERPARAMETERS** **>** **TF_DEFINE**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Console
|
||||
|
||||
All other console output appears in **CONSOLE**.
|
||||
All console output appears in **CONSOLE** tab.
|
||||
|
||||

|
||||

|
||||

|
||||
|
@ -14,25 +14,29 @@ project.
|
||||
The `tf.summary.scalar` output appears in the ClearML web UI, in the task's
|
||||
**SCALARS**. Resource utilization plots, which are titled **:monitor: machine**, also appear in the **SCALARS** tab.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Plots
|
||||
|
||||
The `tf.summary.histogram` output appears in **PLOTS**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Debug Samples
|
||||
|
||||
ClearML automatically tracks images and text output to TensorFlow. They appear in **DEBUG SAMPLES**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Hyperparameters
|
||||
|
||||
ClearML automatically logs TensorFlow Definitions. They appear in **CONFIGURATION** **>** **HYPERPARAMETERS** **>**
|
||||
**TF_DEFINE**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
|
||||
|
@ -13,30 +13,35 @@ When the script runs, it creates a task named `Tensorflow v2 mnist with summarie
|
||||
The loss and accuracy metric scalar plots appear in the task's page in the **ClearML web UI** under
|
||||
**SCALARS**. Resource utilization plots, which are titled **:monitor: machine**, also appear in the **SCALARS** tab.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Hyperparameters
|
||||
|
||||
ClearML automatically logs TensorFlow Definitions. They appear in **CONFIGURATION** **>** **HYPERPARAMETERS**
|
||||
**>** **TF_DEFINE**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Console
|
||||
|
||||
All console output appears in **CONSOLE**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Artifacts
|
||||
|
||||
Models created by the task appear in the task's **ARTIFACTS** tab. ClearML automatically logs and tracks
|
||||
models and any snapshots created using TensorFlow.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
Clicking on a model's name takes you to the [model's page](../../../webapp/webapp_model_viewing.md), where you can
|
||||
view the model's details and access the model.
|
||||
|
||||
|
||||

|
||||

|
||||

|
@ -13,7 +13,8 @@ the `examples` project.
|
||||
ClearML automatically captures scalars logged with XGBoost, which can be visualized in plots in the
|
||||
ClearML WebApp, in the task's **SCALARS** tab.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Models
|
||||
|
||||
@ -21,14 +22,17 @@ ClearML automatically captures the model logged using the `xgboost.save` method,
|
||||
|
||||
View saved snapshots in the task's **ARTIFACTS** tab.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
To view the model details, click the model name in the **ARTIFACTS** page, which will open the model's info tab. Alternatively, download the model.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Console
|
||||
|
||||
All console output during the script's execution appears in the task's **CONSOLE** page.
|
||||
|
||||

|
||||

|
||||

|
@ -18,25 +18,30 @@ classification dataset using XGBoost
|
||||
The feature importance plot and tree plot appear in the task's page in the **ClearML web UI**, under
|
||||
**PLOTS**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
|
||||
## Console
|
||||
|
||||
All other console output appear in **CONSOLE**.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Artifacts
|
||||
|
||||
Models created by the task appear in the task's **ARTIFACTS** tab. ClearML automatically logs and tracks
|
||||
models and any snapshots created using XGBoost.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
Clicking on the model's name takes you to the [model's page](../../../webapp/webapp_model_viewing.md), where you can
|
||||
view the model's details and access the model.
|
||||
|
||||

|
||||

|
||||

|
@ -6,11 +6,12 @@ The [Slack alerts example](https://github.com/clearml/clearml/blob/master/exampl
|
||||
demonstrates how to use the `clearml.automation.monitor` class to implement a service that monitors the completion and
|
||||
failure of tasks, and posts alert messages on a Slack channel.
|
||||
|
||||

|
||||

|
||||

|
||||
|
||||
## Creating a Slack Bot
|
||||
## Creating a Slackbot
|
||||
|
||||
Before configuring and running the Slack alert service, create a Slack Bot (**ClearML Bot**).
|
||||
Before configuring and running the Slack alert service, create a Slackbot (**ClearML Bot**).
|
||||
|
||||
:::important
|
||||
The Slack API token and channel you create are required to configure the Slack alert service.
|
||||
|
@ -16,6 +16,10 @@ Use annotation tasks to efficiently organize the annotation of frames in Dataset
|
||||
|
||||
Click on an annotation task card to open the frame viewer, where you can view the task's frames and annotate them.
|
||||
|
||||
Use the search bar <img src="/docs/latest/icons/ico-search.svg" alt="Magnifying glass" className="icon size-md space-sm" />
|
||||
to find specific annotation tasks. You can query by the task’s name, hyper-dataset version, and ID.
|
||||
To search using regex, click the `.*` icon on the search bar.
|
||||
|
||||
## Annotation Task Actions
|
||||
Click <img src="/docs/latest/icons/ico-bars-menu.svg" alt="Menu" className="icon size-md space-sm" /> on the top right
|
||||
of an annotation task card to open its context menu and access annotation task actions.
|
||||
|
@ -145,7 +145,7 @@ filters.
|
||||
* Source rule - Query frame source information. Enter a Lucene query of frame metadata fields in the format
|
||||
`sources.<key>:<value>` (can use AND, OR, and NOT operators).
|
||||
|
||||
A frame filter can contain a number of rules. For each frame filter, the rules are applied with a logical AND operator. For example, the dataset version in the image below has one filter. “Frame Filter 1” has two rules:
|
||||
A frame filter can contain a number of rules. For each frame filter, the rules are applied with a logical AND operator. For example, the dataset version in the image below has one filter. "Frame Filter 1" has two rules:
|
||||
1. ROI rule - the frame must include an ROI with the `cat` label
|
||||
2. Source rule - the frames must be 640 pixels wide.
|
||||
|
||||
|
@ -18,6 +18,10 @@ using the buttons on the top left of the page. Use the table view for a comparat
|
||||
columns of interest. Use the details view to access a selected Dataview's details, while keeping the Dataview list in view.
|
||||
Details view can also be accessed by double-clicking a specific Dataview in the table view to open its details view.
|
||||
|
||||
Use the search bar <img src="/docs/latest/icons/ico-search.svg" alt="Magnifying glass" className="icon size-md space-sm" />
|
||||
to find specific dataviews. You can query by the dataview name, ID, description, hyper-datasets, and versions.
|
||||
To search using regex, click the `.*` icon on the search bar.
|
||||
|
||||
You can archive Dataviews so the Dataview table doesn't get too cluttered. Click **OPEN ARCHIVE** on the top of the
|
||||
table to open the archive and view all archived Dataviews. From the archive, you can restore
|
||||
Dataviews to remove them from the archive. You can also permanently delete Dataviews.
|
||||
|
Before Width: | Height: | Size: 40 KiB After Width: | Height: | Size: 39 KiB |
BIN
docs/img/examples_keras_00_dark.png
Normal file
After Width: | Height: | Size: 40 KiB |
Before Width: | Height: | Size: 86 KiB After Width: | Height: | Size: 76 KiB |
BIN
docs/img/examples_keras_00a_dark.png
Normal file
After Width: | Height: | Size: 78 KiB |
Before Width: | Height: | Size: 129 KiB After Width: | Height: | Size: 120 KiB |
BIN
docs/img/examples_keras_01_dark.png
Normal file
After Width: | Height: | Size: 126 KiB |
Before Width: | Height: | Size: 83 KiB After Width: | Height: | Size: 115 KiB |
BIN
docs/img/examples_keras_02_dark.png
Normal file
After Width: | Height: | Size: 119 KiB |
Before Width: | Height: | Size: 83 KiB After Width: | Height: | Size: 104 KiB |
BIN
docs/img/examples_keras_jupyter_03_dark.png
Normal file
After Width: | Height: | Size: 107 KiB |
Before Width: | Height: | Size: 90 KiB After Width: | Height: | Size: 99 KiB |
BIN
docs/img/examples_keras_jupyter_03a_dark.png
Normal file
After Width: | Height: | Size: 100 KiB |
Before Width: | Height: | Size: 39 KiB After Width: | Height: | Size: 43 KiB |
BIN
docs/img/examples_keras_jupyter_04_dark.png
Normal file
After Width: | Height: | Size: 43 KiB |
Before Width: | Height: | Size: 269 KiB After Width: | Height: | Size: 128 KiB |
BIN
docs/img/examples_keras_jupyter_07_dark.png
Normal file
After Width: | Height: | Size: 130 KiB |
Before Width: | Height: | Size: 120 KiB After Width: | Height: | Size: 118 KiB |
BIN
docs/img/examples_keras_jupyter_08_dark.png
Normal file
After Width: | Height: | Size: 123 KiB |
Before Width: | Height: | Size: 48 KiB After Width: | Height: | Size: 47 KiB |
BIN
docs/img/examples_keras_jupyter_20_dark.png
Normal file
After Width: | Height: | Size: 48 KiB |
Before Width: | Height: | Size: 88 KiB After Width: | Height: | Size: 71 KiB |
BIN
docs/img/examples_keras_jupyter_21_dark.png
Normal file
After Width: | Height: | Size: 73 KiB |
Before Width: | Height: | Size: 94 KiB After Width: | Height: | Size: 72 KiB |
BIN
docs/img/examples_keras_jupyter_23_dark.png
Normal file
After Width: | Height: | Size: 73 KiB |
Before Width: | Height: | Size: 48 KiB After Width: | Height: | Size: 41 KiB |
BIN
docs/img/examples_keras_jupyter_24_dark.png
Normal file
After Width: | Height: | Size: 41 KiB |
Before Width: | Height: | Size: 42 KiB After Width: | Height: | Size: 42 KiB |
Before Width: | Height: | Size: 42 KiB After Width: | Height: | Size: 42 KiB |
Before Width: | Height: | Size: 46 KiB After Width: | Height: | Size: 46 KiB |
Before Width: | Height: | Size: 46 KiB After Width: | Height: | Size: 46 KiB |
Before Width: | Height: | Size: 36 KiB After Width: | Height: | Size: 38 KiB |
BIN
docs/img/examples_sklearn_joblib_example_01_dark.png
Normal file
After Width: | Height: | Size: 38 KiB |
Before Width: | Height: | Size: 50 KiB After Width: | Height: | Size: 44 KiB |
BIN
docs/img/examples_sklearn_joblib_example_02_dark.png
Normal file
After Width: | Height: | Size: 44 KiB |
Before Width: | Height: | Size: 54 KiB After Width: | Height: | Size: 59 KiB |
BIN
docs/img/examples_sklearn_joblib_example_06_dark.png
Normal file
After Width: | Height: | Size: 59 KiB |
Before Width: | Height: | Size: 231 KiB After Width: | Height: | Size: 459 KiB |
BIN
docs/img/examples_slack_alerts_dark.png
Normal file
After Width: | Height: | Size: 458 KiB |
Before Width: | Height: | Size: 75 KiB After Width: | Height: | Size: 78 KiB |
BIN
docs/img/examples_tensorboard_pr_curve_01_dark.png
Normal file
After Width: | Height: | Size: 81 KiB |
Before Width: | Height: | Size: 78 KiB After Width: | Height: | Size: 78 KiB |
BIN
docs/img/examples_tensorboard_pr_curve_02_dark.png
Normal file
After Width: | Height: | Size: 81 KiB |
Before Width: | Height: | Size: 77 KiB After Width: | Height: | Size: 77 KiB |
BIN
docs/img/examples_tensorboard_pr_curve_03_dark.png
Normal file
After Width: | Height: | Size: 81 KiB |
Before Width: | Height: | Size: 85 KiB After Width: | Height: | Size: 68 KiB |
BIN
docs/img/examples_tensorboard_pr_curve_04_dark.png
Normal file
After Width: | Height: | Size: 70 KiB |
Before Width: | Height: | Size: 189 KiB After Width: | Height: | Size: 48 KiB |
BIN
docs/img/examples_tensorboard_pr_curve_05_dark.png
Normal file
After Width: | Height: | Size: 49 KiB |
Before Width: | Height: | Size: 76 KiB After Width: | Height: | Size: 71 KiB |
BIN
docs/img/examples_tensorboard_toy_01_dark.png
Normal file
After Width: | Height: | Size: 73 KiB |
Before Width: | Height: | Size: 108 KiB After Width: | Height: | Size: 97 KiB |
BIN
docs/img/examples_tensorboard_toy_03_dark.png
Normal file
After Width: | Height: | Size: 98 KiB |
Before Width: | Height: | Size: 103 KiB After Width: | Height: | Size: 174 KiB |
BIN
docs/img/examples_tensorboard_toy_04_dark.png
Normal file
After Width: | Height: | Size: 185 KiB |
Before Width: | Height: | Size: 221 KiB After Width: | Height: | Size: 388 KiB |
BIN
docs/img/examples_tensorboard_toy_05_dark.png
Normal file
After Width: | Height: | Size: 387 KiB |