diff --git a/docs/clearml_agent/clearml_agent_env_var.md b/docs/clearml_agent/clearml_agent_env_var.md index 63122ed1..75a8f3bb 100644 --- a/docs/clearml_agent/clearml_agent_env_var.md +++ b/docs/clearml_agent/clearml_agent_env_var.md @@ -27,6 +27,7 @@ but can be overridden by command-line arguments. |**CLEARML_AGENT_DOCKER_ARGS_HIDE_ENV** | Hide Docker environment variables containing secrets when printing out the Docker command. When printed, the variable values will be replaced by `********`. See [`agent.hide_docker_command_env_vars`](../configs/clearml_conf.md#hide_docker) | |**CLEARML_AGENT_DISABLE_SSH_MOUNT** | Disables the auto `.ssh` mount into the docker | |**CLEARML_AGENT_FORCE_CODE_DIR**| Allows overriding the remote execution code directory to bypass repository cloning and use a repo already available where the remote agent is running. | +|**CLEARML_AGENT_FORCE_UV**| If set to `1`, force the agent to use UV as the package manager. Overrides the default manager set in the [clearml.conf](../configs/clearml_conf.md) under `agent.package_manager.type` | |**CLEARML_AGENT_FORCE_EXEC_SCRIPT**| Allows overriding the remote execution script to bypass repository cloning and execute code already available where the remote agent is running. Use `module:file.py` format to specify a module and a script to execute (e.g. `.:main.py` to run `main.py` from the working dir)| |**CLEARML_AGENT_FORCE_TASK_INIT**| If set to `1`, ClearML Agent adds `Task.init()` to scripts that do not have the call, creating a Task to capture code execution information and output, which is then sent to the ClearML Server. If set to `0` and the script does not include `Task.init()`, the agent will capture only the output streams and console output, without tracking code execution details, metrics, or models. | |**CLEARML_AGENT_FORCE_SYSTEM_SITE_PACKAGES** | If set to `1`, overrides default [`agent.package_manager.system_site_packages: true`](../configs/clearml_conf.md#system_site_packages) behavior when running tasks in containers (docker mode and k8s-glue)| diff --git a/docs/clearml_agent/clearml_agent_execution_env.md b/docs/clearml_agent/clearml_agent_execution_env.md index 0180da86..fcb094bd 100644 --- a/docs/clearml_agent/clearml_agent_execution_env.md +++ b/docs/clearml_agent/clearml_agent_execution_env.md @@ -13,6 +13,7 @@ multiple tasks (see [Virtual Environment Reuse](clearml_agent_env_caching.md#vir ClearML Agent supports working with one of the following package managers: * [`pip`](https://en.wikipedia.org/wiki/Pip_(package_manager)) (default) * [`conda`](https://docs.conda.io/en/latest/) +* [`uv`](https://docs.astral.sh/uv/) * [`poetry`](https://python-poetry.org/) To change the package manager used by the agent, edit the [`package_manager.type`](../configs/clearml_conf.md#agentpackage_manager) diff --git a/docs/clearml_agent/clearml_agent_fractional_gpus.md b/docs/clearml_agent/clearml_agent_fractional_gpus.md index 5aa55ed1..c114d4e4 100644 --- a/docs/clearml_agent/clearml_agent_fractional_gpus.md +++ b/docs/clearml_agent/clearml_agent_fractional_gpus.md @@ -80,7 +80,7 @@ For either setup, you can set up in your Enterprise ClearML Agent Helm chart the each queue. When a task is enqueued in ClearML, it translates into a Kubernetes pod running on the designated device with the specified fractional resource as defined in the Agent Helm chart. -#### MIG-enabled GPUs +#### MIG-enabled GPUs The **ClearML Dynamic MIG Operator** (CDMO) chart enables running AI workloads on K8s with optimized hardware utilization and workload performance by facilitating MIG GPU partitioning. Make sure you have a [MIG capable GPU](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-gpus). diff --git a/docs/clearml_sdk/clearml_sdk_setup.md b/docs/clearml_sdk/clearml_sdk_setup.md index 6f8a82c7..ccb56523 100644 --- a/docs/clearml_sdk/clearml_sdk_setup.md +++ b/docs/clearml_sdk/clearml_sdk_setup.md @@ -2,7 +2,7 @@ title: ClearML Python Package --- -This is step-by-step guide for installing the `clearml` Python package and connecting it to the ClearML Server. Once done, +This is a step-by-step guide for installing the `clearml` Python package and connecting it to the ClearML Server. Once done, you can integrate `clearml` into your code. ## Install ClearML diff --git a/docs/clearml_sdk/task_sdk.md b/docs/clearml_sdk/task_sdk.md index 611e4a29..f787eb6f 100644 --- a/docs/clearml_sdk/task_sdk.md +++ b/docs/clearml_sdk/task_sdk.md @@ -74,6 +74,7 @@ After invoking `Task.init` in a script, ClearML starts its automagical logging, * [AutoKeras](../integrations/autokeras.md) * [CatBoost](../integrations/catboost.md) * [Fast.ai](../integrations/fastai.md) + * [Hugging Face Transformers](../integrations/transformers.md) * [LightGBM](../integrations/lightgbm.md) * [MegEngine](../integrations/megengine.md) * [MONAI](../integrations/monai.md) diff --git a/docs/clearml_serving/clearml_serving_setup.md b/docs/clearml_serving/clearml_serving_setup.md index 05219921..4820a946 100644 --- a/docs/clearml_serving/clearml_serving_setup.md +++ b/docs/clearml_serving/clearml_serving_setup.md @@ -15,7 +15,7 @@ The following page goes over how to set up and upgrade `clearml-serving`. [free hosted service](https://app.clear.ml) 1. Connect `clearml` SDK to the server, see instructions [here](../clearml_sdk/clearml_sdk_setup#install-clearml) -1. Install clearml-serving CLI: +1. Install the `clearml-serving` CLI: ```bash pip3 install clearml-serving @@ -27,21 +27,22 @@ The following page goes over how to set up and upgrade `clearml-serving`. clearml-serving create --name "serving example" ``` - The new serving service UID should be printed + This command prints the Serving Service UID: ```console New Serving Service created: id=aa11bb22aa11bb22 ``` - Write down the Serving Service UID + Copy the Serving Service UID (e.g., `aa11bb22aa11bb22`), as you will need it in the next steps. 1. Clone the `clearml-serving` repository: ```bash git clone https://github.com/clearml/clearml-serving.git ``` -1. Edit the environment variables file (docker/example.env) with your clearml-server credentials and Serving Service UID. - For example, you should have something like +1. Edit the environment variables file (`docker/example.env`) with your `clearml-server` API credentials and Serving Service UID. + For example: + ```bash cat docker/example.env ``` @@ -55,31 +56,30 @@ The following page goes over how to set up and upgrade `clearml-serving`. CLEARML_SERVING_TASK_ID="" ``` -1. Spin up the `clearml-serving` containers with `docker-compose` (or if running on Kubernetes, use the helm chart) +1. Spin up the `clearml-serving` containers with `docker-compose` (or if running on Kubernetes, use the helm chart): ```bash cd docker && docker-compose --env-file example.env -f docker-compose.yml up ``` - If you need Triton support (keras/pytorch/onnx etc.), use the triton docker-compose file + If you need Triton support (Keras/PyTorch/ONNX etc.), use the triton `docker-compose` file: ```bash cd docker && docker-compose --env-file example.env -f docker-compose-triton.yml up ``` - If running on a GPU instance with Triton support (keras/pytorch/onnx etc.), use the triton gpu docker-compose file: + If running on a GPU instance with Triton support (Keras/PyTorch/ONNX etc.), use the triton gpu docker-compose file: ```bash cd docker && docker-compose --env-file example.env -f docker-compose-triton-gpu.yml up ``` :::note -Any model that registers with Triton engine will run the pre/post-processing code on the Inference service container, +Any model that registers with Triton engine will run the pre/post-processing code in the Inference service container, and the model inference itself will be executed on the Triton Engine container. ::: ## Advanced Setup - S3/GS/Azure Access (Optional) -To add access credentials and allow the inference containers to download models from your S3/GS/Azure object-storage, -add the respective environment variables to your env files (example.env). For further details, see -[Configuring Storage](../integrations/storage.md#configuring-storage). +To enable inference containers to download models from S3, Google Cloud Storage (GS), or Azure, +add access credentials in the respective environment variables to your env files (`example.env`): ``` AWS_ACCESS_KEY_ID @@ -92,14 +92,21 @@ AZURE_STORAGE_ACCOUNT AZURE_STORAGE_KEY ``` +For further details, see [Configuring Storage](../integrations/storage.md#configuring-storage). + ## Upgrading ClearML Serving **Upgrading to v1.1** -1. Take down the serving containers (`docker-compose` or k8s) -1. Update the `clearml-serving` CLI `pip3 install -U clearml-serving` +1. Shut down the serving containers (`docker-compose` or k8s) +1. Update the `clearml-serving` CLI: + + ``` + pip3 install -U clearml-serving + ``` + 1. Re-add a single existing endpoint with `clearml-serving model add ...` (press yes when asked). It will upgrade the - `clearml-serving` session definitions + `clearml-serving` session definitions. 1. Pull the latest serving containers (`docker-compose pull ...` or k8s) 1. Re-spin serving containers (`docker-compose` or k8s) diff --git a/docs/configs/clearml_conf.md b/docs/configs/clearml_conf.md index b24a46a8..a45308c2 100644 --- a/docs/configs/clearml_conf.md +++ b/docs/configs/clearml_conf.md @@ -515,8 +515,12 @@ These settings define which Docker image and arguments should be used unless [ex **`agent.package_manager`** (*dict*) -* Dictionary containing the options for the Python package manager. The currently supported package managers are pip, conda, - and, if the repository contains a `poetry.lock` file, poetry. +* Dictionary containing the options for the Python package manager. +* The currently supported package managers are + * pip + * conda + * uv, if the root repository contains a `uv.lock` or `pyproject.toml` file + * poetry, if the repository contains a `poetry.lock` or `pyproject.toml` file --- @@ -661,13 +665,38 @@ Torch Nightly builds are ephemeral and are deleted from time to time. * `pip` * `conda` * `poetry` + * `uv` * If `pip` or `conda` are used, the agent installs the required packages based on the "Python Packages" section of the Task. If the "Python Packages" section is empty, it will revert to using `requirements.txt` from the repository's root - directory. If `poetry` is selected, and the root repository contains `poetry.lock` or `pyproject.toml`, the "Python + directory. +* If `poetry` is selected, and the root repository contains `poetry.lock` or `pyproject.toml`, the "Python Packages" section is ignored, and `poetry` is used. If `poetry` is selected and no lock file is found, it reverts to `pip` package manager behaviour. - +* If `uv` is selected, and the root repository contains `uv.lock` or `pyproject.toml`, the "Python + Packages" section is ignored, and `uv` is used. If `uv` is selected and no lock file is found, it reverts to + `pip` package manager behaviour. + +--- + +**`agent.package_manager.uv_files_from_repo_working_dir`** (*bool*) + +* If set to `true`, the agent will look for the `uv.lock` or `pyproject.toml` file in the provided directory path instead of + the repository's root directory. + +--- + +**`agent.package_manager.uv_sync_extra_args`** (*list*) + +* List extra command-line arguments to pass when using `uv`. + +--- + +**`agent.package_manager.uv_version`** (*string*) + +* The `uv` version requirements. For example, `">0.4"`, `"==0.4"`, `""` (empty string will install the latest version). + +
#### agent.pip_download_cache diff --git a/docs/deploying_clearml/clearml_server.md b/docs/deploying_clearml/clearml_server.md index dc137d76..9a545482 100644 --- a/docs/deploying_clearml/clearml_server.md +++ b/docs/deploying_clearml/clearml_server.md @@ -4,7 +4,7 @@ title: ClearML Server ## What is ClearML Server? The ClearML Server is the backend service infrastructure for ClearML. It allows multiple users to collaborate and -manage their tasks by working seamlessly with the ClearML Python package and [ClearML Agent](../clearml_agent.md). +manage their tasks by working seamlessly with the [ClearML Python package](../clearml_sdk/clearml_sdk_setup.md) and [ClearML Agent](../clearml_agent.md). ClearML Server is composed of the following: * Web server including the [ClearML Web UI](../webapp/webapp_overview.md), which is the user interface for tracking, comparing, and managing tasks. diff --git a/docs/deploying_clearml/clearml_server_config.md b/docs/deploying_clearml/clearml_server_config.md index 09b4e835..c036c574 100644 --- a/docs/deploying_clearml/clearml_server_config.md +++ b/docs/deploying_clearml/clearml_server_config.md @@ -233,7 +233,7 @@ The following example, which is based on AWS load balancing, demonstrates the co -### Opening Elasticsearch, MongoDB, and Redis for External Access +### Opening Elasticsearch, MongoDB, and Redis for External Access For improved security, the ports for ClearML Server Elasticsearch, MongoDB, and Redis servers are not exposed by default; they are only open internally in the docker network. If external access is needed, open these ports (but make sure to diff --git a/docs/deploying_clearml/enterprise_deploy/app_custom.md b/docs/deploying_clearml/enterprise_deploy/app_custom.md index 84a37223..3b46cc93 100644 --- a/docs/deploying_clearml/enterprise_deploy/app_custom.md +++ b/docs/deploying_clearml/enterprise_deploy/app_custom.md @@ -29,12 +29,12 @@ The `General` section is the root-level section of the configuration file, and c * `id` - A unique id for the application * `name` - The name to display in the web application * `version` - The version of the application implementation. Recommended to have three numbers and to bump up when updating applications, so that older running instances can still be displayed -* `provider` - The person/team/group who is the owner of the application. This will appears in the UI +* `provider` - The person/team/group who is the owner of the application. This will appear in the UI * `description` - Short description of the application to be displayed in the ClearML Web UI * `icon` (*Optional*) - Small image to display in the ClearML web UI as an icon for the application. Can be a public web url or an image in the application’s assets directory (described below) * `no_info_html` (*Optional*) - HTML content to display as a placeholder for the dashboard when no instance is available. Can be a public web url or a file in the application’s assets directory (described below) * `default-queue` - The queue to which application instance will be sent when launching a new instance. This queue should have an appropriate agent servicing it. See details in the Custom Apps Agent section below. -* `badges` (*Optional*) - List of strings to display as a bacge/label in the UI +* `badges` (*Optional*) - List of strings to display as a badge/label in the UI * `resumable` - Boolean indication whether a running application instance can be restarted if required. Default is false. * `category` (*Optional*) - Way to separate apps into different tabs in the ClearML web UI * `featured` (*Optional*) - Value affecting the order of applications. Lower values are displayed first. Defaults to 500 @@ -264,7 +264,7 @@ The dashboard elements are organized into lines. The section contains the following information: * `lines` - The array of line elements, each containing: - * `style` - CSS definitions for the line e.g setting the line height + * `style` - CSS definitions for the line e.g. setting the line height * `contents` - An array of dashboard elements to display in a given line. Each element may have several fields: * `title` - Text to display at the top of the field * `type` - one of the following: diff --git a/docs/deploying_clearml/enterprise_deploy/appgw.md b/docs/deploying_clearml/enterprise_deploy/appgw.md index 2679df85..647c575a 100644 --- a/docs/deploying_clearml/enterprise_deploy/appgw.md +++ b/docs/deploying_clearml/enterprise_deploy/appgw.md @@ -30,12 +30,12 @@ their instances: * [Embedding Model Deployment](../../webapp/applications/apps_embed_model_deployment.md) * [Llama.cpp Model Deployment](../../webapp/applications/apps_llama_deployment.md) -The AI Application Gateway is provided through an additional component to the ClearML Server deployment: The ClearML Task Traffic Router. -If your ClearML Deployment does not have the Task Traffic Router properly installed, these application instances may not be accessible. +The AI Application Gateway requires an additional component to the ClearML Server deployment: the **ClearML App Gateway Router**. +If your ClearML Deployment does not have the App Gateway Router properly installed, these application instances may not be accessible. #### Installation -The Task Traffic Router supports two deployment options: +The App Gateway Router supports two deployment options: * [Docker Compose](appgw_install_compose.md) * [Kubernetes](appgw_install_k8s.md) diff --git a/docs/deploying_clearml/enterprise_deploy/appgw_install_compose.md b/docs/deploying_clearml/enterprise_deploy/appgw_install_compose.md index c77f4113..de16511f 100644 --- a/docs/deploying_clearml/enterprise_deploy/appgw_install_compose.md +++ b/docs/deploying_clearml/enterprise_deploy/appgw_install_compose.md @@ -40,77 +40,72 @@ This is an example of the `docker-compose` file you will need: ``` version: '3.5' services: -task_traffic_webserver: - image: allegroai/task-traffic-router-webserver:${TASK-TRAFFIC-ROUTER-WEBSERVER-TAG} - ports: - - "80:8080" - restart: unless-stopped - container_name: task_traffic_webserver - volumes: - - ./task_traffic_router/config/nginx:/etc/nginx/conf.d:ro - - ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:ro -task_traffic_router: - image: allegroai/task-traffic-router:${TASK-TRAFFIC-ROUTER-TAG} - restart: unless-stopped - container_name: task_traffic_router - volumes: - - /var/run/docker.sock:/var/run/docker.sock - - ./task_traffic_router/config/nginx:/etc/nginx/conf.d:rw - - ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:rw - environment: - - LOGGER_LEVEL=INFO - - CLEARML_API_HOST=${CLEARML_API_HOST:?err} - - CLEARML_API_ACCESS_KEY=${CLEARML_API_ACCESS_KEY:?err} - - CLEARML_API_SECRET_KEY=${CLEARML_API_SECRET_KEY:?err} - - ROUTER_URL=${ROUTER_URL:?err} - - ROUTER_NAME=${ROUTER_NAME:?err} - - AUTH_ENABLED=${AUTH_ENABLED:?err} - - SSL_VERIFY=${SSL_VERIFY:?err} - - AUTH_COOKIE_NAME=${AUTH_COOKIE_NAME:?err} - - AUTH_BASE64_JWKS_KEY=${AUTH_BASE64_JWKS_KEY:?err} - - LISTEN_QUEUE_NAME=${LISTEN_QUEUE_NAME} - - EXTRA_BASH_COMMAND=${EXTRA_BASH_COMMAND} - - TCP_ROUTER_ADDRESS=${TCP_ROUTER_ADDRESS} - - TCP_PORT_START=${TCP_PORT_START} - - TCP_PORT_END=${TCP_PORT_END} - + task_traffic_webserver: + image: clearml/ai-gateway-proxy:${PROXY_TAG:?err} + network_mode: "host" + restart: unless-stopped + container_name: task_traffic_webserver + volumes: + - ./task_traffic_router/config/nginx:/etc/nginx/conf.d:ro + - ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:ro + task_traffic_router: + image: clearml/ai-gateway-router:${ROUTER_TAG:?err} + restart: unless-stopped + container_name: task_traffic_router + volumes: + - /var/run/docker.sock:/var/run/docker.sock + - ./task_traffic_router/config/nginx:/etc/nginx/conf.d:rw + - ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:rw + environment: + - ROUTER_NAME=${ROUTER_NAME:?err} + - ROUTER__WEBSERVER__SERVER_PORT=${ROUTER__WEBSERVER__SERVER_PORT:?err} + - ROUTER_URL=${ROUTER_URL:?err} + - CLEARML_API_HOST=${CLEARML_API_HOST:?err} + - CLEARML_API_ACCESS_KEY=${CLEARML_API_ACCESS_KEY:?err} + - CLEARML_API_SECRET_KEY=${CLEARML_API_SECRET_KEY:?err} + - AUTH_COOKIE_NAME=${AUTH_COOKIE_NAME:?err} + - AUTH_SECURE_ENABLED=${AUTH_SECURE_ENABLED} + - TCP_ROUTER_ADDRESS=${TCP_ROUTER_ADDRESS} + - TCP_PORT_START=${TCP_PORT_START} + - TCP_PORT_END=${TCP_PORT_END} ``` -Create a *runtime.env* file containing the following entries: +Create a `runtime.env` file containing the following entries: ``` -TASK-TRAFFIC-ROUTER-WEBSERVER-TAG= -TASK-TRAFFIC-ROUTER-TAG= -CLEARML_API_HOST=https://api. +PROXY_TAG= +ROUTER_TAG= +ROUTER_NAME=main-router +ROUTER__WEBSERVER__SERVER_PORT=8010 +ROUTER_URL= +CLEARML_API_HOST= CLEARML_API_ACCESS_KEY= CLEARML_API_SECRET_KEY= -ROUTER_URL= -ROUTER_NAME=main-router -AUTH_ENABLED=true -SSL_VERIFY=true AUTH_COOKIE_NAME= -AUTH_BASE64_JWKS_KEY= -LISTEN_QUEUE_NAME= -EXTRA_BASH_COMMAND= +AUTH_SECURE_ENABLED=true TCP_ROUTER_ADDRESS= TCP_PORT_START= TCP_PORT_END= ``` Edit it according to the following guidelines: - -* `CLEARML_API_HOST`: URL usually starting with `https://api.` -* `CLEARML_API_ACCESS_KEY`: ClearML server api key -* `CLEARML_API_SECRET_KEY`: ClearML server secret key -* `ROUTER_URL`: URL for this router that was previously configured in the load balancer starting with `https://` -* `ROUTER_NAME`: Unique name for this router -* `AUTH_ENABLED`: Enable or disable http calls authentication when the router is communicating with the ClearML server -* `SSL_VERIFY`: Enable or disable SSL certificate validation when the router is communicating with the ClearML server -* `AUTH_COOKIE_NAME`: Cookie name used by the ClearML server to store the ClearML authentication cookie. This can usually be found in the `value_prefix` key starting with `allegro_token` in `envoy.yaml` file in the ClearML server installation (`/opt/allegro/config/envoy/envoy.yaml`) (see below) -* `AUTH_SECURE_ENABLED`: Enable the Set-Cookie `secure` parameter -* `AUTH_BASE64_JWKS_KEY`: Value form `k` key in the `jwks.json` file in the ClearML server installation -* `LISTEN_QUEUE_NAME`: (*optional*) Name of queue to check for tasks (if none, every task is checked) -* `EXTRA_BASH_COMMAND`: Command to be launched before starting the router +* `PROXY_TAG`: AI Application Gateway proxy tag. The Docker image tag for the proxy component, which needs to be + specified during installation. This tag is provided by ClearML to ensure compatibility with the recommended version. +* `ROUTER_TAG`: App Gateway Router tag. The Docker image tag for the router component. It defines the specific version + to be installed and is provided by ClearML as part of the setup process. +* `ROUTER_NAME`: In the case of [multiple routers on the same tenant](#multiple-router-in-the-same-tenant), each router + needs to have a unique name. +* `ROUTER__WEBSERVER__SERVER_PORT`: Webserver port. The default port is 8080, but it can be adjusted to meet specific network requirements. +* `ROUTER_URL`: External address to access the router. This can be the IP address or DNS of the node where the router + is running, or the address of a load balancer if the router operates behind a proxy/load balancer. This URL is used + to access AI workload applications (e.g. remote IDE, model deployment, etc.), so it must be reachable and resolvable for them. +* `CLEARML_API_HOST`: ClearML API server URL starting with `https://api.` +* `CLEARML_API_ACCESS_KEY`: ClearML server API key. +* `CLEARML_API_SECRET_KEY`: ClearML server secret key. +* `AUTH_COOKIE_NAME`: Cookie used by the ClearML server to store the ClearML authentication cookie. This can usually be + found in the `envoy.yaml` file in the ClearML server installation (`/opt/allegro/config/envoy/envoy.yaml`), under the + `value_prefix` key starting with `allegro_token` +* `AUTH_SECURE_ENABLED`: Enable the Set-Cookie `secure` parameter. Set to `false` in case services are exposed with `http`. * `TCP_ROUTER_ADDRESS`: Router external address, can be an IP or the host machine or a load balancer hostname, depends on network configuration * `TCP_PORT_START`: Start port for the TCP Session feature * `TCP_PORT_END`: End port for the TCP Session feature @@ -121,12 +116,42 @@ Run the following command to start the router: sudo docker compose --env-file runtime.env up -d ``` -:::note How to find my jwkskey +### Advanced Configuration -The *JSON Web Key Set* (*JWKS*) is a set of keys containing the public keys used to verify any JSON Web Token (JWT). +#### Using Open HTTP -In a `docker-compose` server installation, this can be found in the `CLEARML__secure__auth__token_secret` env var in the apiserver server component. +To deploy the App Gateway Router on open HTTP (without a certificate), set the `AUTH_SECURE_ENABLED` entry +to `false` in the `runtime.env` file. -::: +#### Multiple Router in the Same Tenant +If you have workloads running in separate networks that cannot communicate with each other, you need to deploy multiple +routers, one for each isolated environment. Each router will only process tasks from designated queues, ensuring that +tasks are correctly routed to agents within the same network. +For example: +* If Agent A and Agent B are in separate networks, each must have its own router to receive tasks. +* Router A will handle tasks from Agent A’s queues. Router B will handle tasks from Agent B’s queues. + +To achieve this, each router must be configured with: +* A unique `ROUTER_NAME` +* A distinct set of queues defined in `LISTEN_QUEUE_NAME`. + +##### Example Configuration +Each router's `runtime.env` file should include: + +* Router A: + + ``` + ROUTER_NAME=router-a + LISTEN_QUEUE_NAME=queue1,queue2 + ``` + +* Router B: + + ``` + ROUTER_NAME=router-b + LISTEN_QUEUE_NAME=queue3,queue4 + ``` + +Make sure `LISTEN_QUEUE_NAME` is set in the [`docker-compose` environment variables](#docker-compose-file) for each router instance. diff --git a/docs/deploying_clearml/enterprise_deploy/appgw_install_k8s.md b/docs/deploying_clearml/enterprise_deploy/appgw_install_k8s.md index 906429c6..3a7e546f 100644 --- a/docs/deploying_clearml/enterprise_deploy/appgw_install_k8s.md +++ b/docs/deploying_clearml/enterprise_deploy/appgw_install_k8s.md @@ -3,17 +3,26 @@ title: Kubernetes Deployment --- :::important Enterprise Feature -The Application Gateway is available under the ClearML Enterprise plan. +The AI Application Gateway is available under the ClearML Enterprise plan. +::: + +This guide details the installation of the ClearML App Gateway Router. +The App Gateway Router enables access to your AI workload applications (e.g. remote IDEs like VSCode and Jupyter, model API interface, etc.). +It acts as a proxy, identifying ClearML Tasks running within its [K8s namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) +and making them available for network access. + +:::important +The App Gateway Router must be installed in the same K8s namespace as a dedicated ClearML Agent. +It can only configure access for ClearML Tasks within its own namespace. ::: -This guide details the installation of the ClearML AI Application Gateway, specifically the ClearML Task Router Component. ## Requirements * Kubernetes cluster: `>= 1.21.0-0 < 1.32.0-0` * Helm installed and configured -* Helm token to access `allegroai` helm-chart repo -* Credentials for `allegroai` docker repo +* Helm token to access `clearml` helm-chart repo +* Credentials for `clearml` docker repo * A valid ClearML Server installation ## Optional for HTTPS @@ -26,62 +35,55 @@ This guide details the installation of the ClearML AI Application Gateway, speci ### Login ``` -helm repo add allegroai-enterprise \ +helm repo add clearml-enterprise \ https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages \ --username \ --password ``` +Replace `` with your valid GitHub token that has access to the ClearML Enterprise Helm charts repository. + ### Prepare Values -Before installing the TTR, create a `helm-override` files named `task-traffic-router.values-override.yaml`: +Before installing the App Gateway Router, create a Helm override file: ``` imageCredentials: - password: "" + password: "" clearml: - apiServerKey: "" - apiServerSecret: "" - apiServerUrlReference: "https://api." - jwksKey: "" - authCookieName: "" + apiServerKey: "" + apiServerSecret: "" + apiServerUrlReference: "" + authCookieName: "" + sslVerify: true ingress: - enabled: true - hostName: "task-router.dev" + enabled: true + hostName: "" tcpSession: - routerAddress: "" - portRange: - start: - end: + routerAddress: "" + service: + type: LoadBalancer + portRange: + start: + end: ``` -Edit it accordingly to these guidelines: +Configuration options: -* `clearml.apiServerUrlReference`: URL usually starting with `https://api.` -* `clearml.apiServerKey`: ClearML server api key -* `clearml.apiServerSecret`: ClearML server secret key -* `ingress.hostName`: URL of router we configured previously for load balancer starting with `https://` -* `clearml.sslVerify`: Enable or disable SSL certificate validation on apiserver calls check -* `clearml.authCookieName`: Value from `value_prefix` key starting with `allegro_token` in `envoy.yaml` file in ClearML server installation. -* `clearml.jwksKey`: Value form `k` key in `jwks.json` file in ClearML server installation (see below) -* `tcpSession.routerAddress`: Router external address can be an IP or the host machine or a load balancer hostname, depends on the network configuration -* `tcpSession.portRange.start`: Start port for the TCP Session feature -* `tcpSession.portRange.end`: End port for the TCP Session feature - -:::note How to find my jwkskey - -The *JSON Web Key Set* (*JWKS*) is a set of keys containing the public keys used to verify any JSON Web Token (JWT). - -``` -kubectl -n clearml get secret clearml-conf \ --o jsonpath='{.data.secure_auth_token_secret}' \ -| base64 -d && echo -``` - -::: +* `imageCredentials.password`: ClearML DockerHub Access Token. +* `clearml.apiServerKey`: ClearML server API key. +* `clearml.apiServerSecret`: ClearML server secret key. +* `clearml.apiServerUrlReference`: ClearML API server URL starting with `https://api.`. +* `clearml.authCookieName`: Cookie used by the ClearML server to store the ClearML authentication cookie. +* `clearml.sslVerify`: Enable or disable SSL certificate validation on `apiserver` calls check. +* `ingress.hostName`: Hostname of router used by the ingress controller to access it. +* `tcpSession.routerAddress`: The external router address (can be an IP, hostname, or load balancer address) depending on your network setup. Ensure this address is accessible for TCP connections. +* `tcpSession.service.type`: Service type used to expose TCP functionality, default is `NodePort`. +* `tcpSession.portRange.start`: Start port for the TCP Session feature. +* `tcpSession.portRange.end`: End port for the TCP Session feature. -The whole list of supported configuration is available with the command: +The full list of supported configuration is available with the command: ``` helm show readme allegroai-enterprise/clearml-enterprise-task-traffic-router @@ -94,9 +96,22 @@ To install the TTR component via Helm use the following command: ``` helm upgrade --install \ \ --n \ +-n \ allegroai-enterprise/clearml-enterprise-task-traffic-router \ ---version \ --f task-traffic-router.values-override.yaml +--version \ +-f override.yaml ``` +Replace the placeholders with the following values: + +* `` - Unique name for the App Gateway Router within the K8s namespace. This is a required parameter in + Helm, which identifies a specific installation of the chart. The release name also defines the router’s name and + appears in the UI within AI workload application URLs (e.g. Remote IDE URLs). This can be customized to support multiple installations within the same + namespace by assigning different release names. +* `` - [Kubernetes Namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) + where workloads will be executed. This namespace must be shared between a dedicated ClearML Agent and an App + Gateway Router. The agent is responsible for monitoring its assigned task queues and spawning workloads within this + namespace. The router monitors the same namespace for AI workloads (e.g. remote IDE applications). The router has a + namespace-limited scope, meaning it can only detect and manage tasks within its + assigned namespace. +* `` - Version recommended by the ClearML Support Team. \ No newline at end of file diff --git a/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md b/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md index 7ab41ef4..60650bf2 100644 --- a/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md +++ b/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md @@ -513,31 +513,30 @@ Create a `NetworkPolicy` in the tenant namespace with the following configuratio - podSelector: {} ``` -### Install Task Traffic Router Chart +### Install the App Gateway Router Chart -Install the [Task Traffic Router](appgw.md) in your Kubernetes cluster, allowing it to manage and route tasks: +Install the App Gateway Router in your Kubernetes cluster, allowing it to manage and route tasks: 1. Prepare the `overrides.yaml` file with the following content: ``` imageCredentials: - password: "" + password: "" clearml: apiServerUrlReference: "" apiserverKey: "" apiserverSecret: "" - jwksKey: "" ingress: enabled: true hostName: "" ``` -2. Install Task Traffic Router in the specified tenant namespace: +2. Install App Gateway Router in the specified tenant namespace: ``` helm install -n \\ clearml-ttr \\ - allegroai-enterprise/clearml-task-traffic-router \\ + clearml-enterprise/clearml-task-traffic-router \\ --create-namespace \\ -f overrides.yaml ``` diff --git a/docs/getting_started/clearml_agent_scheduling.md b/docs/getting_started/clearml_agent_scheduling.md index ed3c5948..358a72cd 100644 --- a/docs/getting_started/clearml_agent_scheduling.md +++ b/docs/getting_started/clearml_agent_scheduling.md @@ -82,7 +82,7 @@ Currently, these runtime properties can only be set using an ClearML REST API ca endpoint, as follows: * The body of the request must contain the `worker-id`, and the runtime property to add. -* An expiry date is optional. Use the format `"expiry":