diff --git a/docs/clearml_agent.md b/docs/clearml_agent.md
index 0750c4b8..768b8e9a 100644
--- a/docs/clearml_agent.md
+++ b/docs/clearml_agent.md
@@ -17,7 +17,7 @@ title: ClearML Agent
**ClearML Agent** is a virtual environment and execution manager for DL / ML solutions on GPU machines. It integrates with the **ClearML Python Package** and ClearML Server to provide a full AI cluster solution.
Its main focus is around:
-- Reproducing tasks, including their complete environments.
+- Reproducing task runs, including their complete environments.
- Scaling workflows on multiple target machines.
ClearML Agent executes a task or other workflow by reproducing the state of the code from the original machine
@@ -46,7 +46,7 @@ install Python, so make sure to use a container or environment with the version
While the agent is running, it continuously reports system metrics to the ClearML Server (these can be monitored in the
[**Orchestration**](webapp/webapp_workers_queues.md) page).
-Continue using ClearML Agent once it is running on a target machine. Reproduce tasks and execute
+Continue using ClearML Agent once it is running on a target machine. Reproducing task runs and execute
automated workflows in one (or both) of the following ways:
* Programmatically (using [`Task.enqueue()`](references/sdk/task.md#taskenqueue) or [`Task.execute_remotely()`](references/sdk/task.md#execute_remotely))
* Through the ClearML Web UI (without working directly with code), by cloning tasks and enqueuing them to the
diff --git a/docs/clearml_agent/clearml_agent_env_var.md b/docs/clearml_agent/clearml_agent_env_var.md
index 63122ed1..75a8f3bb 100644
--- a/docs/clearml_agent/clearml_agent_env_var.md
+++ b/docs/clearml_agent/clearml_agent_env_var.md
@@ -27,6 +27,7 @@ but can be overridden by command-line arguments.
|**CLEARML_AGENT_DOCKER_ARGS_HIDE_ENV** | Hide Docker environment variables containing secrets when printing out the Docker command. When printed, the variable values will be replaced by `********`. See [`agent.hide_docker_command_env_vars`](../configs/clearml_conf.md#hide_docker) |
|**CLEARML_AGENT_DISABLE_SSH_MOUNT** | Disables the auto `.ssh` mount into the docker |
|**CLEARML_AGENT_FORCE_CODE_DIR**| Allows overriding the remote execution code directory to bypass repository cloning and use a repo already available where the remote agent is running. |
+|**CLEARML_AGENT_FORCE_UV**| If set to `1`, force the agent to use UV as the package manager. Overrides the default manager set in the [clearml.conf](../configs/clearml_conf.md) under `agent.package_manager.type` |
|**CLEARML_AGENT_FORCE_EXEC_SCRIPT**| Allows overriding the remote execution script to bypass repository cloning and execute code already available where the remote agent is running. Use `module:file.py` format to specify a module and a script to execute (e.g. `.:main.py` to run `main.py` from the working dir)|
|**CLEARML_AGENT_FORCE_TASK_INIT**| If set to `1`, ClearML Agent adds `Task.init()` to scripts that do not have the call, creating a Task to capture code execution information and output, which is then sent to the ClearML Server. If set to `0` and the script does not include `Task.init()`, the agent will capture only the output streams and console output, without tracking code execution details, metrics, or models. |
|**CLEARML_AGENT_FORCE_SYSTEM_SITE_PACKAGES** | If set to `1`, overrides default [`agent.package_manager.system_site_packages: true`](../configs/clearml_conf.md#system_site_packages) behavior when running tasks in containers (docker mode and k8s-glue)|
diff --git a/docs/clearml_agent/clearml_agent_execution_env.md b/docs/clearml_agent/clearml_agent_execution_env.md
index 0180da86..fcb094bd 100644
--- a/docs/clearml_agent/clearml_agent_execution_env.md
+++ b/docs/clearml_agent/clearml_agent_execution_env.md
@@ -13,6 +13,7 @@ multiple tasks (see [Virtual Environment Reuse](clearml_agent_env_caching.md#vir
ClearML Agent supports working with one of the following package managers:
* [`pip`](https://en.wikipedia.org/wiki/Pip_(package_manager)) (default)
* [`conda`](https://docs.conda.io/en/latest/)
+* [`uv`](https://docs.astral.sh/uv/)
* [`poetry`](https://python-poetry.org/)
To change the package manager used by the agent, edit the [`package_manager.type`](../configs/clearml_conf.md#agentpackage_manager)
diff --git a/docs/clearml_agent/clearml_agent_fractional_gpus.md b/docs/clearml_agent/clearml_agent_fractional_gpus.md
index 79c4c2b1..c114d4e4 100644
--- a/docs/clearml_agent/clearml_agent_fractional_gpus.md
+++ b/docs/clearml_agent/clearml_agent_fractional_gpus.md
@@ -80,7 +80,7 @@ For either setup, you can set up in your Enterprise ClearML Agent Helm chart the
each queue. When a task is enqueued in ClearML, it translates into a Kubernetes pod running on the designated device
with the specified fractional resource as defined in the Agent Helm chart.
-#### MIG-enabled GPUs
+#### MIG-enabled GPUs
The **ClearML Dynamic MIG Operator** (CDMO) chart enables running AI workloads on K8s with optimized hardware utilization
and workload performance by facilitating MIG GPU partitioning. Make sure you have a [MIG capable GPU](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-gpus).
@@ -232,7 +232,7 @@ ranging from 2 GB to 12 GB (see [clearml-fractional-gpu repository](https://gith
This example runs the ClearML Ubuntu 22 with CUDA 12.3 container on GPU 0, which is limited to use up to 8GB of its memory.
:::note
- --pid=host is required to allow the driver to differentiate between the container's processes and other host processes when limiting memory usage
+ `--pid=host` is required to allow the driver to differentiate between the container's processes and other host processes when limiting memory usage
:::
1. Run the following command inside the container to verify that the fractional gpu memory limit is working correctly:
```bash
diff --git a/docs/clearml_agent/clearml_agent_setup.md b/docs/clearml_agent/clearml_agent_setup.md
index 2e744de0..2126b322 100644
--- a/docs/clearml_agent/clearml_agent_setup.md
+++ b/docs/clearml_agent/clearml_agent_setup.md
@@ -40,7 +40,7 @@ it can't do that when running from a virtual environment.
If the setup wizard's response indicates that a configuration file already exists, follow the instructions [here](#adding-clearml-agent-to-a-configuration-file).
The wizard does not edit or overwrite existing configuration files.
-1. At the command prompt `Paste copied configuration here:`, copy and paste the ClearML credentials and press **Enter**.
+1. At the command prompt `Paste copied configuration here:`, paste the ClearML credentials and press **Enter**.
The setup wizard confirms the credentials.
```
diff --git a/docs/clearml_data/data_management_examples/data_man_cifar_classification.md b/docs/clearml_data/data_management_examples/data_man_cifar_classification.md
index 1e4cc2a3..860c5af6 100644
--- a/docs/clearml_data/data_management_examples/data_man_cifar_classification.md
+++ b/docs/clearml_data/data_management_examples/data_man_cifar_classification.md
@@ -68,7 +68,8 @@ reproducibility.
Information about the dataset can be viewed in the WebApp, in the dataset's [details panel](../../webapp/datasets/webapp_dataset_viewing.md#version-details-panel).
In the panel's **CONTENT** tab, you can see a table summarizing version contents, including file names, file sizes, and hashes.
-
+
+
## Using the Dataset
diff --git a/docs/clearml_data/data_management_examples/data_man_python.md b/docs/clearml_data/data_management_examples/data_man_python.md
index c8e5ea14..5d4adc06 100644
--- a/docs/clearml_data/data_management_examples/data_man_python.md
+++ b/docs/clearml_data/data_management_examples/data_man_python.md
@@ -79,7 +79,8 @@ After a dataset has been closed, it can no longer be modified. This ensures futu
Information about the dataset can be viewed in the WebApp, in the dataset's [details panel](../../webapp/datasets/webapp_dataset_viewing.md#version-details-panel).
In the panel's **CONTENT** tab, you can see a table summarizing version contents, including file names, file sizes, and hashes.
-
+
+
## Data Ingestion
diff --git a/docs/clearml_sdk/clearml_sdk_setup.md b/docs/clearml_sdk/clearml_sdk_setup.md
index 6f8a82c7..7a7c53a7 100644
--- a/docs/clearml_sdk/clearml_sdk_setup.md
+++ b/docs/clearml_sdk/clearml_sdk_setup.md
@@ -2,7 +2,7 @@
title: ClearML Python Package
---
-This is step-by-step guide for installing the `clearml` Python package and connecting it to the ClearML Server. Once done,
+This is a step-by-step guide for installing the `clearml` Python package and connecting it to the ClearML Server. Once done,
you can integrate `clearml` into your code.
## Install ClearML
@@ -68,7 +68,7 @@ pip install clearml
The **LOCAL PYTHON** tab shows the data required by the setup wizard (a copy to clipboard action is available on
hover).
-1. At the command prompt `Paste copied configuration here:`, copy and paste the ClearML credentials.
+1. At the command prompt `Paste copied configuration here:`, paste the ClearML credentials.
The setup wizard verifies the credentials.
```console
Detected credentials key="********************" secret="*******"
diff --git a/docs/clearml_sdk/hpo_sdk.md b/docs/clearml_sdk/hpo_sdk.md
index ad39a9f6..6f5e7c2b 100644
--- a/docs/clearml_sdk/hpo_sdk.md
+++ b/docs/clearml_sdk/hpo_sdk.md
@@ -71,8 +71,8 @@ optimization.
from clearml import Task
task = Task.init(
- project_name='Hyper-Parameter Optimization',
- task_name='Automatic Hyper-Parameter Optimization',
+ project_name='Hyperparameter Optimization',
+ task_name='Automatic Hyperparameter Optimization',
task_type=Task.TaskTypes.optimizer,
reuse_last_task_id=False
)
diff --git a/docs/clearml_sdk/task_sdk.md b/docs/clearml_sdk/task_sdk.md
index 611e4a29..5f90e42c 100644
--- a/docs/clearml_sdk/task_sdk.md
+++ b/docs/clearml_sdk/task_sdk.md
@@ -65,6 +65,7 @@ After invoking `Task.init` in a script, ClearML starts its automagical logging,
* [argparse](../guides/reporting/hyper_parameters.md#argparse-command-line-options)
* [Python Fire](../integrations/python_fire.md)
* [LightningCLI](../integrations/pytorch_lightning.md)
+ * [jsonargparse](../integrations/jsonargparse.md)
* TensorFlow Definitions (`absl-py`)
* [Hydra](../integrations/hydra.md) - ClearML logs the OmegaConf which holds all the configuration files, as well as values overridden during runtime.
* **Models** - ClearML automatically logs and updates the models and all snapshot paths saved with the following frameworks:
@@ -74,6 +75,7 @@ After invoking `Task.init` in a script, ClearML starts its automagical logging,
* [AutoKeras](../integrations/autokeras.md)
* [CatBoost](../integrations/catboost.md)
* [Fast.ai](../integrations/fastai.md)
+ * [Hugging Face Transformers](../integrations/transformers.md)
* [LightGBM](../integrations/lightgbm.md)
* [MegEngine](../integrations/megengine.md)
* [MONAI](../integrations/monai.md)
diff --git a/docs/clearml_serving/clearml_serving_setup.md b/docs/clearml_serving/clearml_serving_setup.md
index 05219921..4820a946 100644
--- a/docs/clearml_serving/clearml_serving_setup.md
+++ b/docs/clearml_serving/clearml_serving_setup.md
@@ -15,7 +15,7 @@ The following page goes over how to set up and upgrade `clearml-serving`.
[free hosted service](https://app.clear.ml)
1. Connect `clearml` SDK to the server, see instructions [here](../clearml_sdk/clearml_sdk_setup#install-clearml)
-1. Install clearml-serving CLI:
+1. Install the `clearml-serving` CLI:
```bash
pip3 install clearml-serving
@@ -27,21 +27,22 @@ The following page goes over how to set up and upgrade `clearml-serving`.
clearml-serving create --name "serving example"
```
- The new serving service UID should be printed
+ This command prints the Serving Service UID:
```console
New Serving Service created: id=aa11bb22aa11bb22
```
- Write down the Serving Service UID
+ Copy the Serving Service UID (e.g., `aa11bb22aa11bb22`), as you will need it in the next steps.
1. Clone the `clearml-serving` repository:
```bash
git clone https://github.com/clearml/clearml-serving.git
```
-1. Edit the environment variables file (docker/example.env) with your clearml-server credentials and Serving Service UID.
- For example, you should have something like
+1. Edit the environment variables file (`docker/example.env`) with your `clearml-server` API credentials and Serving Service UID.
+ For example:
+
```bash
cat docker/example.env
```
@@ -55,31 +56,30 @@ The following page goes over how to set up and upgrade `clearml-serving`.
CLEARML_SERVING_TASK_ID=""
```
-1. Spin up the `clearml-serving` containers with `docker-compose` (or if running on Kubernetes, use the helm chart)
+1. Spin up the `clearml-serving` containers with `docker-compose` (or if running on Kubernetes, use the helm chart):
```bash
cd docker && docker-compose --env-file example.env -f docker-compose.yml up
```
- If you need Triton support (keras/pytorch/onnx etc.), use the triton docker-compose file
+ If you need Triton support (Keras/PyTorch/ONNX etc.), use the triton `docker-compose` file:
```bash
cd docker && docker-compose --env-file example.env -f docker-compose-triton.yml up
```
- If running on a GPU instance with Triton support (keras/pytorch/onnx etc.), use the triton gpu docker-compose file:
+ If running on a GPU instance with Triton support (Keras/PyTorch/ONNX etc.), use the triton gpu docker-compose file:
```bash
cd docker && docker-compose --env-file example.env -f docker-compose-triton-gpu.yml up
```
:::note
-Any model that registers with Triton engine will run the pre/post-processing code on the Inference service container,
+Any model that registers with Triton engine will run the pre/post-processing code in the Inference service container,
and the model inference itself will be executed on the Triton Engine container.
:::
## Advanced Setup - S3/GS/Azure Access (Optional)
-To add access credentials and allow the inference containers to download models from your S3/GS/Azure object-storage,
-add the respective environment variables to your env files (example.env). For further details, see
-[Configuring Storage](../integrations/storage.md#configuring-storage).
+To enable inference containers to download models from S3, Google Cloud Storage (GS), or Azure,
+add access credentials in the respective environment variables to your env files (`example.env`):
```
AWS_ACCESS_KEY_ID
@@ -92,14 +92,21 @@ AZURE_STORAGE_ACCOUNT
AZURE_STORAGE_KEY
```
+For further details, see [Configuring Storage](../integrations/storage.md#configuring-storage).
+
## Upgrading ClearML Serving
**Upgrading to v1.1**
-1. Take down the serving containers (`docker-compose` or k8s)
-1. Update the `clearml-serving` CLI `pip3 install -U clearml-serving`
+1. Shut down the serving containers (`docker-compose` or k8s)
+1. Update the `clearml-serving` CLI:
+
+ ```
+ pip3 install -U clearml-serving
+ ```
+
1. Re-add a single existing endpoint with `clearml-serving model add ...` (press yes when asked). It will upgrade the
- `clearml-serving` session definitions
+ `clearml-serving` session definitions.
1. Pull the latest serving containers (`docker-compose pull ...` or k8s)
1. Re-spin serving containers (`docker-compose` or k8s)
diff --git a/docs/clearml_serving/clearml_serving_tutorial.md b/docs/clearml_serving/clearml_serving_tutorial.md
index 64667af5..c13e81b9 100644
--- a/docs/clearml_serving/clearml_serving_tutorial.md
+++ b/docs/clearml_serving/clearml_serving_tutorial.md
@@ -212,7 +212,7 @@ Example:
ClearML serving instances send serving statistics (count/latency) automatically to Prometheus and Grafana can be used
to visualize and create live dashboards.
-The default docker-compose installation is preconfigured with Prometheus and Grafana. Notice that by default data/ate
+The default `docker-compose` installation is preconfigured with Prometheus and Grafana. Notice that by default data/ate
of both containers is *not* persistent. To add persistence, adding a volume mount is recommended.
You can also add many custom metrics on the input/predictions of your models. Once a model endpoint is registered,
diff --git a/docs/configs/clearml_conf.md b/docs/configs/clearml_conf.md
index 9aeb0959..a45308c2 100644
--- a/docs/configs/clearml_conf.md
+++ b/docs/configs/clearml_conf.md
@@ -22,7 +22,7 @@ The values in the ClearML configuration file can be overridden by environment va
and command-line arguments.
:::
-# Editing Your Configuration File
+## Editing Your Configuration File
To add, change, or delete options, edit your configuration file.
@@ -515,8 +515,12 @@ These settings define which Docker image and arguments should be used unless [ex
**`agent.package_manager`** (*dict*)
-* Dictionary containing the options for the Python package manager. The currently supported package managers are pip, conda,
- and, if the repository contains a `poetry.lock` file, poetry.
+* Dictionary containing the options for the Python package manager.
+* The currently supported package managers are
+ * pip
+ * conda
+ * uv, if the root repository contains a `uv.lock` or `pyproject.toml` file
+ * poetry, if the repository contains a `poetry.lock` or `pyproject.toml` file
---
@@ -661,13 +665,38 @@ Torch Nightly builds are ephemeral and are deleted from time to time.
* `pip`
* `conda`
* `poetry`
+ * `uv`
* If `pip` or `conda` are used, the agent installs the required packages based on the "Python Packages" section of the
Task. If the "Python Packages" section is empty, it will revert to using `requirements.txt` from the repository's root
- directory. If `poetry` is selected, and the root repository contains `poetry.lock` or `pyproject.toml`, the "Python
+ directory.
+* If `poetry` is selected, and the root repository contains `poetry.lock` or `pyproject.toml`, the "Python
Packages" section is ignored, and `poetry` is used. If `poetry` is selected and no lock file is found, it reverts to
`pip` package manager behaviour.
-
+* If `uv` is selected, and the root repository contains `uv.lock` or `pyproject.toml`, the "Python
+ Packages" section is ignored, and `uv` is used. If `uv` is selected and no lock file is found, it reverts to
+ `pip` package manager behaviour.
+
+---
+
+**`agent.package_manager.uv_files_from_repo_working_dir`** (*bool*)
+
+* If set to `true`, the agent will look for the `uv.lock` or `pyproject.toml` file in the provided directory path instead of
+ the repository's root directory.
+
+---
+
+**`agent.package_manager.uv_sync_extra_args`** (*list*)
+
+* List extra command-line arguments to pass when using `uv`.
+
+---
+
+**`agent.package_manager.uv_version`** (*string*)
+
+* The `uv` version requirements. For example, `">0.4"`, `"==0.4"`, `""` (empty string will install the latest version).
+
+
#### agent.pip_download_cache
@@ -1548,7 +1577,7 @@ environment {
}
```
-### files section
+### files section
**`files`** (*dict*)
diff --git a/docs/deploying_clearml/clearml_server.md b/docs/deploying_clearml/clearml_server.md
index dc137d76..9a545482 100644
--- a/docs/deploying_clearml/clearml_server.md
+++ b/docs/deploying_clearml/clearml_server.md
@@ -4,7 +4,7 @@ title: ClearML Server
## What is ClearML Server?
The ClearML Server is the backend service infrastructure for ClearML. It allows multiple users to collaborate and
-manage their tasks by working seamlessly with the ClearML Python package and [ClearML Agent](../clearml_agent.md).
+manage their tasks by working seamlessly with the [ClearML Python package](../clearml_sdk/clearml_sdk_setup.md) and [ClearML Agent](../clearml_agent.md).
ClearML Server is composed of the following:
* Web server including the [ClearML Web UI](../webapp/webapp_overview.md), which is the user interface for tracking, comparing, and managing tasks.
diff --git a/docs/deploying_clearml/clearml_server_config.md b/docs/deploying_clearml/clearml_server_config.md
index 014161cc..c036c574 100644
--- a/docs/deploying_clearml/clearml_server_config.md
+++ b/docs/deploying_clearml/clearml_server_config.md
@@ -233,7 +233,7 @@ The following example, which is based on AWS load balancing, demonstrates the co
-### Opening Elasticsearch, MongoDB, and Redis for External Access
+### Opening Elasticsearch, MongoDB, and Redis for External Access
For improved security, the ports for ClearML Server Elasticsearch, MongoDB, and Redis servers are not exposed by default;
they are only open internally in the docker network. If external access is needed, open these ports (but make sure to
@@ -361,10 +361,16 @@ You can also use hashed passwords instead of plain-text passwords. To do that:
### Non-responsive Task Watchdog
-The non-responsive task watchdog monitors tasks that were not updated for a specified time interval, and then
-the watchdog marks them as `aborted`. The non-responsive experiment watchdog is always active.
+The non-responsive task watchdog monitors for running tasks that have stopped communicating with the ClearML Server for a specified
+time interval. If a task remains unresponsive beyond the set threshold, the watchdog marks it as `aborted`.
-Modify the following settings for the watchdog:
+A task is considered non-responsive if the time since its last communication with the ClearML Server exceeds the
+configured threshold. The watchdog starts counting after each successful communication with the server. If no further
+updates are received within the specified time, the task is considered non-responsive. This typically happens if:
+* The task's main process is stuck but has not exited.
+* There is a network issue preventing the task from communicating with the server.
+
+You can configure the following watchdog settings:
* Watchdog status - enabled / disabled
* The time threshold (in seconds) of experiment inactivity (default value is 7200 seconds (2 hours)).
@@ -372,10 +378,15 @@ Modify the following settings for the watchdog:
**To configure the non-responsive watchdog for the ClearML Server:**
-1. In the ClearML Server `/opt/clearml/config/services.conf` file, add or edit the `tasks.non_responsive_tasks_watchdog`
- section and specify the watchdog settings.
+1. Open the ClearML Server `/opt/clearml/config/services.conf` file.
+
+ :::tip
+ If the `services.conf` file does not exist, create your own in ClearML Server's `/opt/clearml/config` directory (or
+ an alternate folder you configured).
+ :::
+
+1. Add or edit the `tasks.non_responsive_tasks_watchdog` section and specify the watchdog settings. For example:
- For example:
```
tasks {
non_responsive_tasks_watchdog {
@@ -389,11 +400,6 @@ Modify the following settings for the watchdog:
}
}
```
-
- :::tip
- If the `services.conf` file does not exist, create your own in ClearML Server's `/opt/clearml/config` directory (or
- an alternate folder you configured), and input the modified configuration
- :::
1. Restart ClearML Server.
diff --git a/docs/deploying_clearml/clearml_server_linux_mac.md b/docs/deploying_clearml/clearml_server_linux_mac.md
index 52b25658..60979018 100644
--- a/docs/deploying_clearml/clearml_server_linux_mac.md
+++ b/docs/deploying_clearml/clearml_server_linux_mac.md
@@ -5,8 +5,8 @@ title: Linux and macOS
Deploy the ClearML Server in Linux or macOS using the pre-built Docker image.
For ClearML docker images, including previous versions, see [https://hub.docker.com/r/allegroai/clearml](https://hub.docker.com/r/allegroai/clearml).
-However, pulling the ClearML Docker image directly is not required. ClearML provides a docker-compose YAML file that does this.
-The docker-compose file is included in the instructions on this page.
+However, pulling the ClearML Docker image directly is not required. ClearML provides a `docker-compose` YAML file that does this.
+The `docker-compose` file is included in the instructions on this page.
For information about upgrading ClearML Server in Linux or macOS, see [here](upgrade_server_linux_mac.md).
@@ -134,7 +134,7 @@ Deploying the server requires a minimum of 8 GB of memory, 16 GB is recommended.
sudo chown -R $(whoami):staff /opt/clearml
```
-2. Download the ClearML Server docker-compose YAML file.
+2. Download the ClearML Server `docker-compose` YAML file:
```
sudo curl https://raw.githubusercontent.com/clearml/clearml-server/master/docker/docker-compose.yml -o /opt/clearml/docker-compose.yml
```
diff --git a/docs/deploying_clearml/clearml_server_win.md b/docs/deploying_clearml/clearml_server_win.md
index 5cf0e768..154dcb7f 100644
--- a/docs/deploying_clearml/clearml_server_win.md
+++ b/docs/deploying_clearml/clearml_server_win.md
@@ -54,7 +54,7 @@ Deploying the server requires a minimum of 8 GB of memory, 16 GB is recommended.
mkdir c:\opt\clearml\logs
```
-1. Save the ClearML Server docker-compose YAML file.
+1. Save the ClearML Server `docker-compose` YAML file.
```
curl https://raw.githubusercontent.com/clearml/clearml-server/master/docker/docker-compose-win10.yml -o c:\opt\clearml\docker-compose-win10.yml
diff --git a/docs/deploying_clearml/enterprise_deploy/app_custom.md b/docs/deploying_clearml/enterprise_deploy/app_custom.md
index 84a37223..28a71960 100644
--- a/docs/deploying_clearml/enterprise_deploy/app_custom.md
+++ b/docs/deploying_clearml/enterprise_deploy/app_custom.md
@@ -29,12 +29,12 @@ The `General` section is the root-level section of the configuration file, and c
* `id` - A unique id for the application
* `name` - The name to display in the web application
* `version` - The version of the application implementation. Recommended to have three numbers and to bump up when updating applications, so that older running instances can still be displayed
-* `provider` - The person/team/group who is the owner of the application. This will appears in the UI
+* `provider` - The person/team/group who is the owner of the application. This will appear in the UI
* `description` - Short description of the application to be displayed in the ClearML Web UI
* `icon` (*Optional*) - Small image to display in the ClearML web UI as an icon for the application. Can be a public web url or an image in the application’s assets directory (described below)
* `no_info_html` (*Optional*) - HTML content to display as a placeholder for the dashboard when no instance is available. Can be a public web url or a file in the application’s assets directory (described below)
* `default-queue` - The queue to which application instance will be sent when launching a new instance. This queue should have an appropriate agent servicing it. See details in the Custom Apps Agent section below.
-* `badges` (*Optional*) - List of strings to display as a bacge/label in the UI
+* `badges` (*Optional*) - List of strings to display as a badge/label in the UI
* `resumable` - Boolean indication whether a running application instance can be restarted if required. Default is false.
* `category` (*Optional*) - Way to separate apps into different tabs in the ClearML web UI
* `featured` (*Optional*) - Value affecting the order of applications. Lower values are displayed first. Defaults to 500
@@ -61,7 +61,7 @@ The `task` section describes the task to run, containing the following fields:
* `branch` - The branch to use
* `entry_point` - The python file to run
* `working_dir` - The directory to run it from
-* `hyperparams` (*Optional*) - A list of the task’s hyper-parameters used by the application, with their default values. There is no need to specify all the parameters here, but it enables summarizing of the parameters that will be targeted by the wizard entries described below, and allows to specify default values to optional parameters appearing in the wizard.
+* `hyperparams` (*Optional*) - A list of the task’s hyperparameters used by the application, with their default values. There is no need to specify all the parameters here, but it enables summarizing of the parameters that will be targeted by the wizard entries described below, and allows to specify default values to optional parameters appearing in the wizard.
#### Example
The `task` section in the simple application example:
@@ -120,8 +120,8 @@ The `wizard` section defines the entries to display in the application instance
* `model`
* `queue`
* `dataset_version`
- * `display_field` - The field of the source object to display in the list. Usually “name”
- * `value_field` - The field of the source object to use for configuring the app instance. Usually “id”
+ * `display_field` - The field of the source object to display in the list. Usually "name"
+ * `value_field` - The field of the source object to use for configuring the app instance. Usually "id"
* `filter` - Allows to limit the choices list by setting a filter on one or more of the object’s fields. See Project Selection example below
* `target` - Where in the application instance’s task the values will be set. Contains the following:
* `field` - Either `configuration` or `hyperparams`
@@ -264,7 +264,7 @@ The dashboard elements are organized into lines.
The section contains the following information:
* `lines` - The array of line elements, each containing:
- * `style` - CSS definitions for the line e.g setting the line height
+ * `style` - CSS definitions for the line e.g. setting the line height
* `contents` - An array of dashboard elements to display in a given line. Each element may have several fields:
* `title` - Text to display at the top of the field
* `type` - one of the following:
@@ -276,7 +276,7 @@ The section contains the following information:
* hyperparameter
* configuration
* html
- * `text` - For HTML. You can refer to task elements such as hyper-parameters by using `${hyperparams...value}`
+ * `text` - For HTML. You can refer to task elements such as hyperparameters by using `${hyperparams...value}`
* `metric` - For plot, scalar-histogram, debug-images, scalar - Name of the metric
* `variant` - For plot, scalar-histogram, debug-images, scalar - List of variants to display
* `key` - For histograms, one of the following: `iter`, `timestamp` or, `iso_time`
diff --git a/docs/deploying_clearml/enterprise_deploy/app_install_ubuntu_on_prem.md b/docs/deploying_clearml/enterprise_deploy/app_install_ubuntu_on_prem.md
index 4bbc1e7e..d43f2fe1 100644
--- a/docs/deploying_clearml/enterprise_deploy/app_install_ubuntu_on_prem.md
+++ b/docs/deploying_clearml/enterprise_deploy/app_install_ubuntu_on_prem.md
@@ -13,7 +13,7 @@ without any coding. Applications are installed on top of the ClearML Server.
To run application you will need the following:
* RAM: Make sure you have at least 400 MB of RAM per application instance.
* Applications Service: Make sure that the applications agent service is up and running on your server:
- * If you are using a docker-compose solution, make sure that the clearml-apps-agent service is running.
+ * If you are using a `docker-compose` solution, make sure that the clearml-apps-agent service is running.
* If you are using a Kubernetes cluster, check for the clearml-clearml-enterprise-apps component.
* Installation Files: Each application has its installation zip file. Make sure you have the relevant files for the
applications you wish to install.
diff --git a/docs/deploying_clearml/enterprise_deploy/appgw.md b/docs/deploying_clearml/enterprise_deploy/appgw.md
index 2679df85..647c575a 100644
--- a/docs/deploying_clearml/enterprise_deploy/appgw.md
+++ b/docs/deploying_clearml/enterprise_deploy/appgw.md
@@ -30,12 +30,12 @@ their instances:
* [Embedding Model Deployment](../../webapp/applications/apps_embed_model_deployment.md)
* [Llama.cpp Model Deployment](../../webapp/applications/apps_llama_deployment.md)
-The AI Application Gateway is provided through an additional component to the ClearML Server deployment: The ClearML Task Traffic Router.
-If your ClearML Deployment does not have the Task Traffic Router properly installed, these application instances may not be accessible.
+The AI Application Gateway requires an additional component to the ClearML Server deployment: the **ClearML App Gateway Router**.
+If your ClearML Deployment does not have the App Gateway Router properly installed, these application instances may not be accessible.
#### Installation
-The Task Traffic Router supports two deployment options:
+The App Gateway Router supports two deployment options:
* [Docker Compose](appgw_install_compose.md)
* [Kubernetes](appgw_install_k8s.md)
diff --git a/docs/deploying_clearml/enterprise_deploy/appgw_install_compose.md b/docs/deploying_clearml/enterprise_deploy/appgw_install_compose.md
index 91cc338a..990bc078 100644
--- a/docs/deploying_clearml/enterprise_deploy/appgw_install_compose.md
+++ b/docs/deploying_clearml/enterprise_deploy/appgw_install_compose.md
@@ -13,11 +13,11 @@ The Application Gateway is available under the ClearML Enterprise plan.
* Credentials for the ClearML/allegroai docker repository
* A valid ClearML Server installation
-## Host configurations
+## Host Configurations
-### Docker installation
+### Docker Installation
-Installing docker and docker-compose might vary depending on the specific operating system you’re using. Here is an example for AmazonLinux:
+Installing `docker` and `docker-compose` might vary depending on the specific operating system you’re using. Here is an example for AmazonLinux:
```
sudo dnf -y install docker
@@ -33,87 +33,82 @@ sudo docker login
Use the ClearML/allegroai dockerhub credentials when prompted by docker login.
-### Docker-compose file
+### Docker-compose File
-This is an example of the docker-compose file you will need:
+This is an example of the `docker-compose` file you will need:
```
version: '3.5'
services:
-task_traffic_webserver:
- image: allegroai/task-traffic-router-webserver:${TASK-TRAFFIC-ROUTER-WEBSERVER-TAG}
- ports:
- - "80:8080"
- restart: unless-stopped
- container_name: task_traffic_webserver
- volumes:
- - ./task_traffic_router/config/nginx:/etc/nginx/conf.d:ro
- - ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:ro
-task_traffic_router:
- image: allegroai/task-traffic-router:${TASK-TRAFFIC-ROUTER-TAG}
- restart: unless-stopped
- container_name: task_traffic_router
- volumes:
- - /var/run/docker.sock:/var/run/docker.sock
- - ./task_traffic_router/config/nginx:/etc/nginx/conf.d:rw
- - ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:rw
- environment:
- - LOGGER_LEVEL=INFO
- - CLEARML_API_HOST=${CLEARML_API_HOST:?err}
- - CLEARML_API_ACCESS_KEY=${CLEARML_API_ACCESS_KEY:?err}
- - CLEARML_API_SECRET_KEY=${CLEARML_API_SECRET_KEY:?err}
- - ROUTER_URL=${ROUTER_URL:?err}
- - ROUTER_NAME=${ROUTER_NAME:?err}
- - AUTH_ENABLED=${AUTH_ENABLED:?err}
- - SSL_VERIFY=${SSL_VERIFY:?err}
- - AUTH_COOKIE_NAME=${AUTH_COOKIE_NAME:?err}
- - AUTH_BASE64_JWKS_KEY=${AUTH_BASE64_JWKS_KEY:?err}
- - LISTEN_QUEUE_NAME=${LISTEN_QUEUE_NAME}
- - EXTRA_BASH_COMMAND=${EXTRA_BASH_COMMAND}
- - TCP_ROUTER_ADDRESS=${TCP_ROUTER_ADDRESS}
- - TCP_PORT_START=${TCP_PORT_START}
- - TCP_PORT_END=${TCP_PORT_END}
-
+ task_traffic_webserver:
+ image: clearml/ai-gateway-proxy:${PROXY_TAG:?err}
+ network_mode: "host"
+ restart: unless-stopped
+ container_name: task_traffic_webserver
+ volumes:
+ - ./task_traffic_router/config/nginx:/etc/nginx/conf.d:ro
+ - ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:ro
+ task_traffic_router:
+ image: clearml/ai-gateway-router:${ROUTER_TAG:?err}
+ restart: unless-stopped
+ container_name: task_traffic_router
+ volumes:
+ - /var/run/docker.sock:/var/run/docker.sock
+ - ./task_traffic_router/config/nginx:/etc/nginx/conf.d:rw
+ - ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:rw
+ environment:
+ - ROUTER_NAME=${ROUTER_NAME:?err}
+ - ROUTER__WEBSERVER__SERVER_PORT=${ROUTER__WEBSERVER__SERVER_PORT:?err}
+ - ROUTER_URL=${ROUTER_URL:?err}
+ - CLEARML_API_HOST=${CLEARML_API_HOST:?err}
+ - CLEARML_API_ACCESS_KEY=${CLEARML_API_ACCESS_KEY:?err}
+ - CLEARML_API_SECRET_KEY=${CLEARML_API_SECRET_KEY:?err}
+ - AUTH_COOKIE_NAME=${AUTH_COOKIE_NAME:?err}
+ - AUTH_SECURE_ENABLED=${AUTH_SECURE_ENABLED}
+ - TCP_ROUTER_ADDRESS=${TCP_ROUTER_ADDRESS}
+ - TCP_PORT_START=${TCP_PORT_START}
+ - TCP_PORT_END=${TCP_PORT_END}
```
-Create a *runtime.env* file containing the following entries:
+Create a `runtime.env` file containing the following entries:
```
-TASK-TRAFFIC-ROUTER-WEBSERVER-TAG=
-TASK-TRAFFIC-ROUTER-TAG=
-CLEARML_API_HOST=https://api.
+PROXY_TAG=
+ROUTER_TAG=
+ROUTER_NAME=main-router
+ROUTER__WEBSERVER__SERVER_PORT=8010
+ROUTER_URL=
+CLEARML_API_HOST=
CLEARML_API_ACCESS_KEY=
CLEARML_API_SECRET_KEY=
-ROUTER_URL=
-ROUTER_NAME=main-router
-AUTH_ENABLED=true
-SSL_VERIFY=true
AUTH_COOKIE_NAME=
-AUTH_BASE64_JWKS_KEY=
-LISTEN_QUEUE_NAME=
-EXTRA_BASH_COMMAND=
+AUTH_SECURE_ENABLED=true
TCP_ROUTER_ADDRESS=
TCP_PORT_START=
TCP_PORT_END=
```
-Edit it according to the following guidelines:
-
-* `CLEARML_API_HOST`: URL usually starting with `https://api.`
-* `CLEARML_API_ACCESS_KEY`: ClearML server api key
-* `CLEARML_API_SECRET_KEY`: ClearML server secret key
-* `ROUTER_URL`: URL for this router that was previously configured in the load balancer starting with `https://`
-* `ROUTER_NAME`: unique name for this router
-* `AUTH_ENABLED`: enable or disable http calls authentication when the router is communicating with the ClearML server
-* `SSL_VERIFY`: enable or disable SSL certificate validation when the router is communicating with the ClearML server
-* `AUTH_COOKIE_NAME`: the cookie name used by the ClearML server to store the ClearML authentication cookie. This can usually be found in the `value_prefix` key starting with `allegro_token` in `envoy.yaml` file in the ClearML server installation (`/opt/allegro/config/envoy/envoy.yaml`) (see below)
-* `AUTH_SECURE_ENABLED`: enable the Set-Cookie `secure` parameter
-* `AUTH_BASE64_JWKS_KEY`: value form `k` key in the `jwks.json` file in the ClearML server installation
-* `LISTEN_QUEUE_NAME`: (optional) name of queue to check for tasks (if none, every task is checked)
-* `EXTRA_BASH_COMMAND`: command to be launched before starting the router
-* `TCP_ROUTER_ADDRESS`: router external address, can be an IP or the host machine or a load balancer hostname, depends on network configuration
-* `TCP_PORT_START`: start port for the TCP Session feature
-* `TCP_PORT_END`: end port port for the TCP Session feature
+**Configuration Options:**
+* `PROXY_TAG`: AI Application Gateway proxy tag. The Docker image tag for the proxy component, which needs to be
+ specified during installation. This tag is provided by ClearML to ensure compatibility with the recommended version.
+* `ROUTER_TAG`: App Gateway Router tag. The Docker image tag for the router component. It defines the specific version
+ to be installed and is provided by ClearML as part of the setup process.
+* `ROUTER_NAME`: In the case of [multiple routers on the same tenant](#multiple-router-in-the-same-tenant), each router
+ needs to have a unique name.
+* `ROUTER__WEBSERVER__SERVER_PORT`: Webserver port. The default port is 8080, but it can be adjusted to meet specific network requirements.
+* `ROUTER_URL`: External address to access the router. This can be the IP address or DNS of the node where the router
+ is running, or the address of a load balancer if the router operates behind a proxy/load balancer. This URL is used
+ to access AI workload applications (e.g. remote IDE, model deployment, etc.), so it must be reachable and resolvable for them.
+* `CLEARML_API_HOST`: ClearML API server URL starting with `https://api.`
+* `CLEARML_API_ACCESS_KEY`: ClearML server API key.
+* `CLEARML_API_SECRET_KEY`: ClearML server secret key.
+* `AUTH_COOKIE_NAME`: Cookie used by the ClearML server to store the ClearML authentication cookie. This can usually be
+ found in the `envoy.yaml` file in the ClearML server installation (`/opt/allegro/config/envoy/envoy.yaml`), under the
+ `value_prefix` key starting with `allegro_token`
+* `AUTH_SECURE_ENABLED`: Enable the Set-Cookie `secure` parameter. Set to `false` in case services are exposed with `http`.
+* `TCP_ROUTER_ADDRESS`: Router external address, can be an IP or the host machine or a load balancer hostname, depends on network configuration
+* `TCP_PORT_START`: Start port for the TCP Session feature
+* `TCP_PORT_END`: End port for the TCP Session feature
Run the following command to start the router:
@@ -121,12 +116,42 @@ Run the following command to start the router:
sudo docker compose --env-file runtime.env up -d
```
-:::Note How to find my jwkskey
+### Advanced Configuration
-The *JSON Web Key Set* (*JWKS*) is a set of keys containing the public keys used to verify any JSON Web Token (JWT).
+#### Using Open HTTP
-In a docker-compose server installation, this can be found in the `CLEARML__secure__auth__token_secret` env var in the apiserver server component.
+To deploy the App Gateway Router on open HTTP (without a certificate), set the `AUTH_SECURE_ENABLED` entry
+to `false` in the `runtime.env` file.
-:::
+#### Multiple Router in the Same Tenant
+If you have workloads running in separate networks that cannot communicate with each other, you need to deploy multiple
+routers, one for each isolated environment. Each router will only process tasks from designated queues, ensuring that
+tasks are correctly routed to agents within the same network.
+For example:
+* If Agent A and Agent B are in separate networks, each must have its own router to receive tasks.
+* Router A will handle tasks from Agent A’s queues. Router B will handle tasks from Agent B’s queues.
+
+To achieve this, each router must be configured with:
+* A unique `ROUTER_NAME`
+* A distinct set of queues defined in `LISTEN_QUEUE_NAME`.
+
+##### Example Configuration
+Each router's `runtime.env` file should include:
+
+* Router A:
+
+ ```
+ ROUTER_NAME=router-a
+ LISTEN_QUEUE_NAME=queue1,queue2
+ ```
+
+* Router B:
+
+ ```
+ ROUTER_NAME=router-b
+ LISTEN_QUEUE_NAME=queue3,queue4
+ ```
+
+Make sure `LISTEN_QUEUE_NAME` is set in the [`docker-compose` environment variables](#docker-compose-file) for each router instance.
diff --git a/docs/deploying_clearml/enterprise_deploy/appgw_install_k8s.md b/docs/deploying_clearml/enterprise_deploy/appgw_install_k8s.md
index 4274f844..16d4efc0 100644
--- a/docs/deploying_clearml/enterprise_deploy/appgw_install_k8s.md
+++ b/docs/deploying_clearml/enterprise_deploy/appgw_install_k8s.md
@@ -3,17 +3,26 @@ title: Kubernetes Deployment
---
:::important Enterprise Feature
-The Application Gateway is available under the ClearML Enterprise plan.
+The AI Application Gateway is available under the ClearML Enterprise plan.
+:::
+
+This guide details the installation of the ClearML App Gateway Router.
+The App Gateway Router enables access to your AI workload applications (e.g. remote IDEs like VSCode and Jupyter, model API interface, etc.).
+It acts as a proxy, identifying ClearML Tasks running within its [K8s namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/)
+and making them available for network access.
+
+:::important
+The App Gateway Router must be installed in the same K8s namespace as a dedicated ClearML Agent.
+It can only configure access for ClearML Tasks within its own namespace.
:::
-This guide details the installation of the ClearML AI Application Gateway, specifically the ClearML Task Router Component.
## Requirements
* Kubernetes cluster: `>= 1.21.0-0 < 1.32.0-0`
* Helm installed and configured
-* Helm token to access `allegroai` helm-chart repo
-* Credentials for `allegroai` docker repo
+* Helm token to access `clearml` helm-chart repo
+* Credentials for `clearml` docker repo
* A valid ClearML Server installation
## Optional for HTTPS
@@ -26,62 +35,55 @@ This guide details the installation of the ClearML AI Application Gateway, speci
### Login
```
-helm repo add allegroai-enterprise \
+helm repo add clearml-enterprise \
https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages \
--username \
--password
```
-### Prepare values
+Replace `` with your valid GitHub token that has access to the ClearML Enterprise Helm charts repository.
-Before installing the TTR create an helm-override files named `task-traffic-router.values-override.yaml`:
+### Prepare Values
+
+Before installing the App Gateway Router, create a Helm override file:
```
imageCredentials:
- password: ""
+ password: ""
clearml:
- apiServerKey: ""
- apiServerSecret: ""
- apiServerUrlReference: "https://api."
- jwksKey: ""
- authCookieName: ""
+ apiServerKey: ""
+ apiServerSecret: ""
+ apiServerUrlReference: ""
+ authCookieName: ""
+ sslVerify: true
ingress:
- enabled: true
- hostName: "task-router.dev"
+ enabled: true
+ hostName: ""
tcpSession:
- routerAddress: ""
- portRange:
- start:
- end:
+ routerAddress: ""
+ service:
+ type: LoadBalancer
+ portRange:
+ start:
+ end:
```
-Edit it accordingly to this guidelines:
+**Configuration options:**
-* `clearml.apiServerUrlReference`: url usually starting with `https://api.`
-* `clearml.apiServerKey`: ClearML server api key
-* `clearml.apiServerSecret`: ClearML server secret key
-* `ingress.hostName`: url of router we configured previously for loadbalancer starting with `https://`
-* `clearml.sslVerify`: enable or disable SSL certificate validation on apiserver calls check
-* `clearml.authCookieName`: value from `value_prefix` key starting with `allegro_token` in `envoy.yaml` file in ClearML server installation.
-* `clearml.jwksKey`: value form `k` key in `jwks.json` file in ClearML server installation (see below)
-* `tcpSession.routerAddress`: router external address can be an IP or the host machine or a loadbalancer hostname, depends on the network configuration
-* `tcpSession.portRange.start`: start port for the TCP Session feature
-* `tcpSession.portRange.end`: end port port for the TCP Session feature
-
-::: How to find my jwkskey
-
-The *JSON Web Key Set* (*JWKS*) is a set of keys containing the public keys used to verify any JSON Web Token (JWT).
-
-```
-kubectl -n clearml get secret clearml-conf \
--o jsonpath='{.data.secure_auth_token_secret}' \
-| base64 -d && echo
-```
-
-:::
+* `imageCredentials.password`: ClearML DockerHub Access Token.
+* `clearml.apiServerKey`: ClearML server API key.
+* `clearml.apiServerSecret`: ClearML server secret key.
+* `clearml.apiServerUrlReference`: ClearML API server URL starting with `https://api.`.
+* `clearml.authCookieName`: Cookie used by the ClearML server to store the ClearML authentication cookie.
+* `clearml.sslVerify`: Enable or disable SSL certificate validation on `apiserver` calls check.
+* `ingress.hostName`: Hostname of router used by the ingress controller to access it.
+* `tcpSession.routerAddress`: The external router address (can be an IP, hostname, or load balancer address) depending on your network setup. Ensure this address is accessible for TCP connections.
+* `tcpSession.service.type`: Service type used to expose TCP functionality, default is `NodePort`.
+* `tcpSession.portRange.start`: Start port for the TCP Session feature.
+* `tcpSession.portRange.end`: End port for the TCP Session feature.
-The whole list of supported configuration is available with the command:
+The full list of supported configuration is available with the command:
```
helm show readme allegroai-enterprise/clearml-enterprise-task-traffic-router
@@ -94,9 +96,22 @@ To install the TTR component via Helm use the following command:
```
helm upgrade --install \
\
--n \
+-n \
allegroai-enterprise/clearml-enterprise-task-traffic-router \
---version \
--f task-traffic-router.values-override.yaml
+--version \
+-f override.yaml
```
+Replace the placeholders with the following values:
+
+* `` - Unique name for the App Gateway Router within the K8s namespace. This is a required parameter in
+ Helm, which identifies a specific installation of the chart. The release name also defines the router’s name and
+ appears in the UI within AI workload application URLs (e.g. Remote IDE URLs). This can be customized to support multiple installations within the same
+ namespace by assigning different release names.
+* `` - [Kubernetes Namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/)
+ where workloads will be executed. This namespace must be shared between a dedicated ClearML Agent and an App
+ Gateway Router. The agent is responsible for monitoring its assigned task queues and spawning workloads within this
+ namespace. The router monitors the same namespace for AI workloads (e.g. remote IDE applications). The router has a
+ namespace-limited scope, meaning it can only detect and manage tasks within its
+ assigned namespace.
+* `` - Version recommended by the ClearML Support Team.
\ No newline at end of file
diff --git a/docs/deploying_clearml/enterprise_deploy/import_projects.md b/docs/deploying_clearml/enterprise_deploy/import_projects.md
index 3a162644..6c5d0387 100644
--- a/docs/deploying_clearml/enterprise_deploy/import_projects.md
+++ b/docs/deploying_clearml/enterprise_deploy/import_projects.md
@@ -36,7 +36,7 @@ them before exporting.
Execute the data tool within the `apiserver` container.
Open a bash session inside the `apiserver` container of the server:
-* In docker-compose:
+* In `docker-compose`:
```commandline
sudo docker exec -it clearml-apiserver /bin/bash
diff --git a/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md b/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md
index 45b4d4d2..60650bf2 100644
--- a/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md
+++ b/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md
@@ -100,9 +100,10 @@ Install the ClearML chart with the required configuration:
1. Prepare the `overrides.yaml` file and input the following content. Make sure to replace `` and ``
with a valid domain that will have records pointing to the ingress controller accordingly.
The credentials specified in `` and `` can be used to log in as the
- supervisor user in the web UI.
+ supervisor user in the web UI.
+
Note that the `` value must be explicitly quoted. To do so, put `\\"` around the quoted value.
- For example `"\\"email@example.com\\””`
+ For example `"\\"email@example.com\\””`.
```
imageCredentials:
@@ -192,7 +193,7 @@ Install the ClearML chart with the required configuration:
enabled: true
```
-2. Install ClearML
+2. Install ClearML:
```
helm install -n clearml \\
@@ -305,9 +306,9 @@ spec:
kubernetes.io/metadata.name: clearml
```
-## Applications Installation
+## Application Installation
-To install ClearML GUI applications, follow these steps:
+To install ClearML GUI applications:
1. Get the apps to install and the installation script by downloading and extracting the archive provided by ClearML
@@ -332,8 +333,8 @@ must be substituted with valid domain names or values from responses.
```
APISERVER_URL="https://api."
- APISERVER_KEY="GGS9F4M6XB2DXJ5AFT9F"
- APISERVER_SECRET="2oGujVFhPfaozhpuz2GzQfA5OyxmMsR3WVJpsCR5hrgHFs20PO"
+ APISERVER_KEY=""
+ APISERVER_SECRET=""
```
2. Create a *Tenant* (company):
@@ -491,7 +492,7 @@ To install the ClearML Agent Chart, follow these steps:
-d '{"name":"default"}'
```
-### Tenant Namespace isolation with NetworkPolicies
+### Tenant Namespace Isolation with NetworkPolicies
To ensure network isolation for each tenant, you need to create a `NetworkPolicy` in the tenant namespace. This way
the entire namespace/tenant will not accept any connection from other namespaces.
@@ -512,31 +513,30 @@ Create a `NetworkPolicy` in the tenant namespace with the following configuratio
- podSelector: {}
```
-### Install Task Traffic Router Chart
+### Install the App Gateway Router Chart
-Install the [Task Traffic Router](appgw.md) in your Kubernetes cluster, allowing it to manage and route tasks:
+Install the App Gateway Router in your Kubernetes cluster, allowing it to manage and route tasks:
1. Prepare the `overrides.yaml` file with the following content:
```
imageCredentials:
- password: ""
+ password: ""
clearml:
apiServerUrlReference: ""
apiserverKey: ""
apiserverSecret: ""
- jwksKey: "ymLh1ok5k5xNUQfS944Xdx9xjf0wueokqKM2dMZfHuH9ayItG2"
ingress:
enabled: true
hostName: ""
```
-2. Install Task Traffic Router in the specified tenant namespace:
+2. Install App Gateway Router in the specified tenant namespace:
```
helm install -n \\
clearml-ttr \\
- allegroai-enterprise/clearml-task-traffic-router \\
+ clearml-enterprise/clearml-task-traffic-router \\
--create-namespace \\
-f overrides.yaml
```
diff --git a/docs/deploying_clearml/enterprise_deploy/on_prem_ubuntu.md b/docs/deploying_clearml/enterprise_deploy/on_prem_ubuntu.md
index 1d823655..749f5066 100644
--- a/docs/deploying_clearml/enterprise_deploy/on_prem_ubuntu.md
+++ b/docs/deploying_clearml/enterprise_deploy/on_prem_ubuntu.md
@@ -43,7 +43,7 @@ should be reviewed and modified prior to the server installation
## Installing ClearML Server
### Preliminary Steps
-1. Install Docker CE
+1. Install Docker CE:
```
https://docs.docker.com/install/linux/docker-ce/ubuntu/
@@ -113,10 +113,10 @@ should be reviewed and modified prior to the server installation
sudo systemctl enable disable-thp
```
-1. Restart the machine
+1. Restart the machine.
### Installing the Server
-1. Remove any previous installation of ClearML Server
+1. Remove any previous installation of ClearML Server:
```
sudo rm -R /opt/clearml/
@@ -141,7 +141,7 @@ should be reviewed and modified prior to the server installation
sudo mkdir -pv /opt/allegro/config/onprem_poc
```
-1. Copy the following ClearML configuration files to `/opt/allegro`
+1. Copy the following ClearML configuration files to `/opt/allegro`:
* `constants.env`
* `docker-compose.override.yml`
* `docker-compose.yml`
@@ -165,10 +165,13 @@ should be reviewed and modified prior to the server installation
sudo docker login -u=$DOCKERHUB_USER -p=$DOCKERHUB_PASSWORD
```
-1. Start the `docker-compose` by changing directories to the directory containing the docker-compose files and running the following command:
-sudo docker-compose --env-file constants.env up -d
-
-1. Verify web access by browsing to your URL (IP address) and port 8080.
+1. Start the `docker-compose` by changing directories to the directory containing the `docker-compose` files and running the following command:
+
+ ```
+ sudo docker-compose --env-file constants.env up -d
+ ```
+
+1. Verify web access by browsing to your URL (IP address) and port 8080:
```
http://:8080
@@ -191,7 +194,10 @@ the following subdomains should be forwarded to the corresponding ports on the s
* `https://app.` should be forwarded to port 8080
* `https://files.` should be forwarded to port 8081
+
+:::warning
**Critical: Ensure no other ports are open to maintain the highest level of security.**
+:::
Additionally, ensure that the following URLs are correctly configured in the server's environment file:
diff --git a/docs/deploying_clearml/enterprise_deploy/vpc_aws.md b/docs/deploying_clearml/enterprise_deploy/vpc_aws.md
index 6f9680a9..17f048f6 100644
--- a/docs/deploying_clearml/enterprise_deploy/vpc_aws.md
+++ b/docs/deploying_clearml/enterprise_deploy/vpc_aws.md
@@ -8,7 +8,7 @@ It covers the following:
* Set up security groups and IAM role
* Create EC2 instance with required disks
* Install dependencies and mount disks
-* Deploy ClearML version using docker-compose
+* Deploy ClearML version using `docker-compose`
* Set up load balancer and DNS
* Set up server backup
@@ -38,7 +38,7 @@ It is recommended to use a VPC with IPv6 enabled for future usage expansion.
1. Create a security group for the main server (`clearml-main`):
* Ingress:
- * TCP port 10000, from the load balancer's security group
+ * TCP port 10000 from the load balancer's security group
* TCP port 22 from trusted IP addresses.
* Egress: All addresses and ports
@@ -117,10 +117,10 @@ Instance requirements:
## Load Balancer
1. Create a TLS certificate:
- 1. Choose a domain name to be used with the server. The main URL that will be used by the system’s users will be app.\
+ 1. Choose a domain name to be used with the server. The main URL that will be used by the system’s users will be `app.`
2. Create a certificate, with the following DNS names:
- 1. \
- 2. \*.\
+ 1. ``
+ 2. `*.`
2. Create the `envoy` target group for the server:
1. Port: 10000
@@ -284,7 +284,7 @@ log would usually indicate the reason for the failure.
## Maintenance
-### Removing app containers
+### Removing App Containers
To remove old application containers, add the following to the cron:
diff --git a/docs/faq.md b/docs/faq.md
index 3b5f642c..097a25ea 100644
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -137,7 +137,8 @@ the following numbers are displayed:
* API server version
* API version
-
+
+
ClearML Python package information can be obtained by using `pip freeze`.
@@ -593,7 +594,8 @@ Due to speed/optimization issues, the console displays only the last several hun
You can always download the full log as a file using the ClearML Web UI. In the **ClearML Web UI >** task's **CONSOLE**
tab, click `Download full log`.
-
+
+
@@ -604,17 +606,19 @@ and accuracy values of several tasks. In the task comparison page, under the **H
you can visualize tasks' hyperparameter values in relation to performance metrics in a scatter plot or parallel
coordinates plot:
* [Scatter plot](webapp/webapp_exp_comparing.md#scatter-plot): View the correlation between a selected hyperparameter and
- metric. For example, the image below shows a scatter plot that displays the values of a performance metric (`epoch_accuracy`)
+ metric. For example, the image below shows a scatter plot that displays the values of a performance metric (`accuracy`)
and a hyperparameter (`epochs`) of a few tasks:
- 
+ 
+ 
* [Parallel coordinates plot](webapp/webapp_exp_comparing.md#parallel-coordinates-mode): View the impact of hyperparameters
on selected metric(s). For example, the image below shows
- a parallel coordinates plot which displays the values of selected hyperparameters (`base_lr`, `batch_size`, and
- `number_of_epochs`) and a performance metric (`accuracy`) of three tasks:
+ a parallel coordinates plot which displays the values of selected hyperparameters (`epochs`, `lr`, and `batch_size`)
+ and a performance metric (`accuracy`) of a few tasks:
- 
+ 
+ 
diff --git a/docs/getting_started/clearml_agent_scheduling.md b/docs/getting_started/clearml_agent_scheduling.md
index ed3c5948..358a72cd 100644
--- a/docs/getting_started/clearml_agent_scheduling.md
+++ b/docs/getting_started/clearml_agent_scheduling.md
@@ -82,7 +82,7 @@ Currently, these runtime properties can only be set using an ClearML REST API ca
endpoint, as follows:
* The body of the request must contain the `worker-id`, and the runtime property to add.
-* An expiry date is optional. Use the format `"expiry":