From b971dcaff932202ce46a26cdd06f9903fb5ec2c3 Mon Sep 17 00:00:00 2001 From: revital <revital@allegro.ai> Date: Wed, 12 Mar 2025 09:18:07 +0200 Subject: [PATCH 1/3] add hosted app gateway --- .../appgw_intall_compose_hosted.md | 156 ++++++++++++++++++ 1 file changed, 156 insertions(+) create mode 100644 docs/deploying_clearml/enterprise_deploy/appgw_intall_compose_hosted.md diff --git a/docs/deploying_clearml/enterprise_deploy/appgw_intall_compose_hosted.md b/docs/deploying_clearml/enterprise_deploy/appgw_intall_compose_hosted.md new file mode 100644 index 00000000..7f64c9b1 --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/appgw_intall_compose_hosted.md @@ -0,0 +1,156 @@ +--- +title: Installing AI Application Gateway with docker-compose - hosted server +--- + +:::important Enterprise Feature +The Application Gateway is available under the ClearML Enterprise plan. +::: + +The AI Application Gateway enables external access to ClearML tasks, and applications running on workload nodes that +require HTTP or TCP access. The gateway is configured with an endpoint or external address, making these services +accessible from the user's machine, outside the workload nodes’ network. + +This guide details the installation of the ClearML AI Application Gateway for ClearML users who use ClearML’s SaaS control +plane while hosting their own workload nodes. + +## Requirements + +* Linux OS (x86) machine with root access +* The machine needs to be reachable from your user network +* The machine needs to have network reachability to workload nodes +* Credentials for the ClearML docker repository +* A valid ClearML Server installation + +## Recommendations + +* For a secure connection, we recommend having a DNS entry and a valid SSL Certificate assigned to the machine IP. + +## Host Configuration + +### Docker Installation + +Installing `docker` and `docker-compose` might vary depending on the specific operating system you're using. Here is an +example for AmazonLinux: + +``` +sudo dnf -y install docker +DOCKER_CONFIG="/usr/local/lib/docker" +sudo mkdir -p $DOCKER_CONFIG/cli-plugins +sudo curl -SL https://github.com/docker/compose/releases/download/v2.17.3/docker-compose-linux-x86_64 -o $DOCKER_CONFIG/cli-plugins/docker-compose +sudo chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose +sudo systemctl enable docker +sudo systemctl start docker + +sudo docker login +``` + +Use the ClearML docker hub credentials when prompted by docker login. + +### Docker-compose File + +This is an example of the docker-compose file you will need to create: + +``` +version: '3.5' +services: + task_traffic_webserver: + image: clearml/ai-gateway-proxy:${PROXY_TAG:?err} + network_mode: "host" + restart: unless-stopped + container_name: task_traffic_webserver + volumes: + - ./task_traffic_router/config/nginx:/etc/nginx/conf.d:ro + - ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:ro + task_traffic_router: + image: clearml/ai-gateway-router:${ROUTER_TAG:?err} + restart: unless-stopped + container_name: task_traffic_router + volumes: + - /var/run/docker.sock:/var/run/docker.sock + - ./task_traffic_router/config/nginx:/etc/nginx/conf.d:rw + - ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:rw + environment: + - LOGGER_LEVEL=INFO + - ROUTER__WEBSERVER__SERVER_PORT="8010" + - ROUTER_NAME=${ROUTER_NAME:?err} + - ROUTER_URL=${ROUTER_URL:?err} + - CLEARML_API_HOST=${CLEARML_API_HOST:?err} + - CLEARML_API_ACCESS_KEY=${CLEARML_API_ACCESS_KEY:?err} + - CLEARML_API_SECRET_KEY=${CLEARML_API_SECRET_KEY:?err} + - AUTH_COOKIE_NAME=${AUTH_COOKIE_NAME:?err} + - AUTH_SECURE_ENABLED=${AUTH_SECURE_ENABLED} + - TCP_ROUTER_ADDRESS=${TCP_ROUTER_ADDRESS} + - TCP_PORT_START=${TCP_PORT_START} + - TCP_PORT_END=${TCP_PORT_END} +``` + +### Configuration File + +You will be provided with a prefilled `runtime.env` file containing the following entries: + +``` +# PREFILLED SECTION, PROVIDED BY CLEARML +PROXY_TAG= +ROUTER_TAG= +CLEARML_API_HOST=https://api. +AUTH_COOKIE_NAME= + +# TO BE FILLED BY USER +ROUTER_NAME=main-router +ROUTER_URL=http://<ROUTER-HOST-PUBLIC-IP>:8010 +CLEARML_API_ACCESS_KEY= +CLEARML_API_SECRET_KEY= +AUTH_SECURE_ENABLED=true +TCP_ROUTER_ADDRESS=<ROUTER-HOST-PUBLIC-IP> +TCP_PORT_START= +TCP_PORT_END= +``` + +Edit it according to the following guidelines: + +* `ROUTER_NAME`: The name of the Router, which needs to be unique for each tenant. +* `CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY:` API credentials created in the ClearML web UI, for Admin user or Service Account with admin privileges. Make sure to label these credentials clearly, so that they will not be revoked by mistake. +* `ROUTER_URL`: The URL for this router. This URL will be shown in the UI of any application for users to access (Like hosted Jupyter or LLM UI). +* `TCP_ROUTER_ADDRESS`: The TCP Router external address, which is an IP of the host machine or a load balancer hostname, depending on the customer network configuration. +* `TCP_PORT_START`: The start port for the TCP Tasks, chosen by the customer. Ensure that ports are open and can be allocated on the host. +* `TCP_PORT_END`: The end port for the TCP Tasks, chosen by the customer. Ensure that ports are open and can be allocated on the host. + +### Installation + +Run the following command to start the router: + +``` +sudo docker compose --env-file runtime.env up -d +``` + +### Advanced + +#### Running without Certificates + +When running on docker-compose with an HTTP interface and without certificates please set the following entry in the *runtime.env* as below: + +``` +AUTH_SECURE_ENABLED=false +``` + +#### Install Multiple Routers for the Same Tenant + +To install multiple routers in the same tenant it is necessary to set parameters to identify and split the workload. Using this setting, each router will only handle routing to tasks that have originated from the specific queues it was assigned. This is important in case you have two different networks with two different agents, and tasks started by Agent A can only be reached by Router A (in the same network), but simply cannot be reached by Router B. The assumption in this case is that Agent A and Agent B will service different queues, and the Routers handling routing to the tasks executed by each agent will need to match the queue definitions. +Multiple routers in the same tenant must have different `ROUTER_NAME` and listen to different queues (`LISTEN_QUEUE_NAME`). + +**Router-A** *runtime.env* + +``` +ROUTER_NAME=router-a +LISTEN_QUEUE_NAME=queue1,queue2 +``` + +**Router-2** *runtime.env* + +``` +ROUTER_NAME=router-b +LISTEN_QUEUE_NAME=queue3,queue4 +``` + +The environment variable `LISTEN_QUEUE_NAME` needs to be specified in the docker-compose file in case. +The `LISTEN_QUEUE_NAME` is a list of string names split by a comma. It supports wildcards. \ No newline at end of file From caa2ed6b4f4930fca26593f04d916c1a30a30a92 Mon Sep 17 00:00:00 2001 From: revital <revital@allegro.ai> Date: Wed, 12 Mar 2025 11:18:04 +0200 Subject: [PATCH 2/3] Add AI App Gateway docker-compose deployment for hosted servers --- ...ted.md => appgw_install_compose_hosted.md} | 66 +++++++++++-------- sidebars.js | 1 + 2 files changed, 40 insertions(+), 27 deletions(-) rename docs/deploying_clearml/enterprise_deploy/{appgw_intall_compose_hosted.md => appgw_install_compose_hosted.md} (61%) diff --git a/docs/deploying_clearml/enterprise_deploy/appgw_intall_compose_hosted.md b/docs/deploying_clearml/enterprise_deploy/appgw_install_compose_hosted.md similarity index 61% rename from docs/deploying_clearml/enterprise_deploy/appgw_intall_compose_hosted.md rename to docs/deploying_clearml/enterprise_deploy/appgw_install_compose_hosted.md index 7f64c9b1..9ded7f78 100644 --- a/docs/deploying_clearml/enterprise_deploy/appgw_intall_compose_hosted.md +++ b/docs/deploying_clearml/enterprise_deploy/appgw_install_compose_hosted.md @@ -1,5 +1,5 @@ --- -title: Installing AI Application Gateway with docker-compose - hosted server +title: Docker-Compose - Hosted Server --- :::important Enterprise Feature @@ -21,9 +21,7 @@ plane while hosting their own workload nodes. * Credentials for the ClearML docker repository * A valid ClearML Server installation -## Recommendations - -* For a secure connection, we recommend having a DNS entry and a valid SSL Certificate assigned to the machine IP. +Additionally, for a secure connection, it is recommended to have a DNS entry and a valid SSL Certificate assigned to the machine IP. ## Host Configuration @@ -44,11 +42,11 @@ sudo systemctl start docker sudo docker login ``` -Use the ClearML docker hub credentials when prompted by docker login. +Use the ClearML docker hub credentials when prompted by `docker` login. ### Docker-compose File -This is an example of the docker-compose file you will need to create: +This is an example of the `docker-compose` file you will need to create: ``` version: '3.5' @@ -108,12 +106,13 @@ TCP_PORT_END= Edit it according to the following guidelines: -* `ROUTER_NAME`: The name of the Router, which needs to be unique for each tenant. -* `CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY:` API credentials created in the ClearML web UI, for Admin user or Service Account with admin privileges. Make sure to label these credentials clearly, so that they will not be revoked by mistake. -* `ROUTER_URL`: The URL for this router. This URL will be shown in the UI of any application for users to access (Like hosted Jupyter or LLM UI). -* `TCP_ROUTER_ADDRESS`: The TCP Router external address, which is an IP of the host machine or a load balancer hostname, depending on the customer network configuration. -* `TCP_PORT_START`: The start port for the TCP Tasks, chosen by the customer. Ensure that ports are open and can be allocated on the host. -* `TCP_PORT_END`: The end port for the TCP Tasks, chosen by the customer. Ensure that ports are open and can be allocated on the host. +* `ROUTER_NAME`: Unique name for this router. +* `CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY:` API credentials for Admin user or Service Account with admin privileges + created in the ClearML web UI. Make sure to label these credentials clearly, so that they will not be revoked by mistake. +* `ROUTER_URL`: The URL for this router. This URL will be shown in the UI of any application for users to access (e.g. hosted Jupyter or LLM UI). +* `TCP_ROUTER_ADDRESS`: Router external address, can be an IP or the host machine or a load balancer hostname, depends on network configuration. +* `TCP_PORT_START`: Start port for the TCP Tasks, chosen by the customer. Ensure that ports are open and can be allocated on the host. +* `TCP_PORT_END`: End port for the TCP Tasks, chosen by the customer. Ensure that ports are open and can be allocated on the host. ### Installation @@ -127,7 +126,7 @@ sudo docker compose --env-file runtime.env up -d #### Running without Certificates -When running on docker-compose with an HTTP interface and without certificates please set the following entry in the *runtime.env* as below: +When running on `docker-compose` with an HTTP interface and without certificates, set the following entry in the `runtime.env`: ``` AUTH_SECURE_ENABLED=false @@ -135,22 +134,35 @@ AUTH_SECURE_ENABLED=false #### Install Multiple Routers for the Same Tenant -To install multiple routers in the same tenant it is necessary to set parameters to identify and split the workload. Using this setting, each router will only handle routing to tasks that have originated from the specific queues it was assigned. This is important in case you have two different networks with two different agents, and tasks started by Agent A can only be reached by Router A (in the same network), but simply cannot be reached by Router B. The assumption in this case is that Agent A and Agent B will service different queues, and the Routers handling routing to the tasks executed by each agent will need to match the queue definitions. -Multiple routers in the same tenant must have different `ROUTER_NAME` and listen to different queues (`LISTEN_QUEUE_NAME`). +To deploy multiple routers within the same tenant, you must configure each router to handle specific workloads. -**Router-A** *runtime.env* +Using this setting, each router will only route tasks that originated from its assigned queues. This +is important in case you have multiple networks with different agents. For example: +* Tasks started by Agent A can only be reached by Router A (within the same network), but cannot be reached by Router B +* Agent B will handle a separate set of tasks which can only be reached by Router B -``` -ROUTER_NAME=router-a -LISTEN_QUEUE_NAME=queue1,queue2 -``` +The assumption in this case is that Agent A and Agent B will service different queues, and routers must be configured to +route tasks based on these queue definitions. -**Router-2** *runtime.env* +Each router in the same tenant must have: +* A unique `ROUTER_NAME` +* Distinct set of queues listed in `LISTEN_QUEUE_NAME`. It supports wildcards. -``` -ROUTER_NAME=router-b -LISTEN_QUEUE_NAME=queue3,queue4 -``` +For example: +* **Router-A** `runtime.env` + + ``` + ROUTER_NAME=router-a + LISTEN_QUEUE_NAME=queue1,queue2 + ``` + +* **Router-B** `runtime.env` + + ``` + ROUTER_NAME=router-b + LISTEN_QUEUE_NAME=queue3,queue4 + ```` + +Ensure that `LISTEN_QUEUE_NAME` is included in the [`docker-compose` environment variables](#docker-compose-file) for each router +instance. -The environment variable `LISTEN_QUEUE_NAME` needs to be specified in the docker-compose file in case. -The `LISTEN_QUEUE_NAME` is a list of string names split by a comma. It supports wildcards. \ No newline at end of file diff --git a/sidebars.js b/sidebars.js index d76a812e..502c1377 100644 --- a/sidebars.js +++ b/sidebars.js @@ -659,6 +659,7 @@ module.exports = { label: 'ClearML Application Gateway', items: [ 'deploying_clearml/enterprise_deploy/appgw_install_compose', + 'deploying_clearml/enterprise_deploy/appgw_install_compose_hosted', 'deploying_clearml/enterprise_deploy/appgw_install_k8s', ] }, From b0d9a1357ab9195ceb77c5283087ea04b00eff33 Mon Sep 17 00:00:00 2001 From: revital <revital@allegro.ai> Date: Thu, 20 Mar 2025 09:15:50 +0200 Subject: [PATCH 3/3] Edits --- .../enterprise_deploy/appgw.md | 3 +- .../appgw_install_compose_hosted.md | 92 +++++++++---------- 2 files changed, 47 insertions(+), 48 deletions(-) diff --git a/docs/deploying_clearml/enterprise_deploy/appgw.md b/docs/deploying_clearml/enterprise_deploy/appgw.md index 647c575a..3466dd5a 100644 --- a/docs/deploying_clearml/enterprise_deploy/appgw.md +++ b/docs/deploying_clearml/enterprise_deploy/appgw.md @@ -35,9 +35,10 @@ If your ClearML Deployment does not have the App Gateway Router properly install #### Installation -The App Gateway Router supports two deployment options: +The App Gateway Router supports the following deployment options: * [Docker Compose](appgw_install_compose.md) +* [Docker Compose for hosted servers](appgw_install_compose_hosted.md) * [Kubernetes](appgw_install_k8s.md) The deployment configuration specifies the external and internal address and port mappings for routing requests. diff --git a/docs/deploying_clearml/enterprise_deploy/appgw_install_compose_hosted.md b/docs/deploying_clearml/enterprise_deploy/appgw_install_compose_hosted.md index 9ded7f78..ad6ce13e 100644 --- a/docs/deploying_clearml/enterprise_deploy/appgw_install_compose_hosted.md +++ b/docs/deploying_clearml/enterprise_deploy/appgw_install_compose_hosted.md @@ -3,14 +3,14 @@ title: Docker-Compose - Hosted Server --- :::important Enterprise Feature -The Application Gateway is available under the ClearML Enterprise plan. +The AI Application Gateway is available under the ClearML Enterprise plan. ::: -The AI Application Gateway enables external access to ClearML tasks, and applications running on workload nodes that +The AI Application Gateway enables external access to ClearML tasks, and applications running on workload nodes that require HTTP or TCP access. The gateway is configured with an endpoint or external address, making these services accessible from the user's machine, outside the workload nodes’ network. -This guide details the installation of the ClearML AI Application Gateway for ClearML users who use ClearML’s SaaS control +This guide details the installation of the App Gateway Router for ClearML users who use ClearML's SaaS control plane while hosting their own workload nodes. ## Requirements @@ -104,12 +104,15 @@ TCP_PORT_START= TCP_PORT_END= ``` -Edit it according to the following guidelines: +**Configuration Options:** -* `ROUTER_NAME`: Unique name for this router. +* `ROUTER_NAME`: In the case of [multiple routers on the same tenant](#multiple-router-in-the-same-tenant), each router + needs to have a unique name. * `CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY:` API credentials for Admin user or Service Account with admin privileges created in the ClearML web UI. Make sure to label these credentials clearly, so that they will not be revoked by mistake. -* `ROUTER_URL`: The URL for this router. This URL will be shown in the UI of any application for users to access (e.g. hosted Jupyter or LLM UI). +* `ROUTER_URL`: External address to access the router. This can be the IP address or DNS of the node where the router + is running, or the address of a load balancer if the router operates behind a proxy/load balancer. This URL is used + to access AI workload applications (e.g. remote IDE, model deployment, etc.), so it must be reachable and resolvable for them. * `TCP_ROUTER_ADDRESS`: Router external address, can be an IP or the host machine or a load balancer hostname, depends on network configuration. * `TCP_PORT_START`: Start port for the TCP Tasks, chosen by the customer. Ensure that ports are open and can be allocated on the host. * `TCP_PORT_END`: End port for the TCP Tasks, chosen by the customer. Ensure that ports are open and can be allocated on the host. @@ -122,47 +125,42 @@ Run the following command to start the router: sudo docker compose --env-file runtime.env up -d ``` -### Advanced +### Advanced Configuration -#### Running without Certificates +#### Using Open HTTP -When running on `docker-compose` with an HTTP interface and without certificates, set the following entry in the `runtime.env`: - -``` -AUTH_SECURE_ENABLED=false -``` - -#### Install Multiple Routers for the Same Tenant - -To deploy multiple routers within the same tenant, you must configure each router to handle specific workloads. - -Using this setting, each router will only route tasks that originated from its assigned queues. This -is important in case you have multiple networks with different agents. For example: -* Tasks started by Agent A can only be reached by Router A (within the same network), but cannot be reached by Router B -* Agent B will handle a separate set of tasks which can only be reached by Router B - -The assumption in this case is that Agent A and Agent B will service different queues, and routers must be configured to -route tasks based on these queue definitions. - -Each router in the same tenant must have: -* A unique `ROUTER_NAME` -* Distinct set of queues listed in `LISTEN_QUEUE_NAME`. It supports wildcards. - -For example: -* **Router-A** `runtime.env` - - ``` - ROUTER_NAME=router-a - LISTEN_QUEUE_NAME=queue1,queue2 - ``` - -* **Router-B** `runtime.env` - - ``` - ROUTER_NAME=router-b - LISTEN_QUEUE_NAME=queue3,queue4 - ```` - -Ensure that `LISTEN_QUEUE_NAME` is included in the [`docker-compose` environment variables](#docker-compose-file) for each router -instance. +To deploy the App Gateway Router on open HTTP (without a certificate), set the `AUTH_SECURE_ENABLED` entry +to `false` in the `runtime.env` file. +#### Multiple Router in the Same Tenant + + If you have workloads running in separate networks that cannot communicate with each other, you need to deploy multiple + routers, one for each isolated environment. Each router will only process tasks from designated queues, ensuring that + tasks are correctly routed to agents within the same network. + + For example: + * If Agent A and Agent B are in separate networks, each must have its own router to receive tasks. + * Router A will handle tasks from Agent A’s queues. Router B will handle tasks from Agent B’s queues. + + To achieve this, each router must be configured with: + * A unique `ROUTER_NAME` + * A distinct set of queues defined in `LISTEN_QUEUE_NAME`. + + ##### Example Configuration + Each router's `runtime.env` file should include: + + * Router A: + + ``` + ROUTER_NAME=router-a + LISTEN_QUEUE_NAME=queue1,queue2 + ``` + + * Router B: + + ``` + ROUTER_NAME=router-b + LISTEN_QUEUE_NAME=queue3,queue4 + ``` + + Make sure `LISTEN_QUEUE_NAME` is set in the [`docker-compose` environment variables](#docker-compose-file) for each router instance. \ No newline at end of file