mirror of
https://github.com/clearml/clearml-docs
synced 2025-05-21 20:46:45 +00:00
Merge branch 'main' of https://github.com/allegroai/clearml-docs into enterprise
This commit is contained in:
commit
d94d777e55
@ -73,8 +73,8 @@ services:
|
|||||||
Create a `runtime.env` file containing the following entries:
|
Create a `runtime.env` file containing the following entries:
|
||||||
|
|
||||||
```
|
```
|
||||||
PROXY_TAG=
|
PROXY_TAG=1.5.1
|
||||||
ROUTER_TAG=
|
ROUTER_TAG=2.6.2
|
||||||
ROUTER_NAME=main-router
|
ROUTER_NAME=main-router
|
||||||
ROUTER__WEBSERVER__SERVER_PORT=8010
|
ROUTER__WEBSERVER__SERVER_PORT=8010
|
||||||
ROUTER_URL=
|
ROUTER_URL=
|
||||||
|
@ -6,13 +6,13 @@ title: Kubernetes Deployment
|
|||||||
The AI Application Gateway is available under the ClearML Enterprise plan.
|
The AI Application Gateway is available under the ClearML Enterprise plan.
|
||||||
:::
|
:::
|
||||||
|
|
||||||
This guide details the installation of the ClearML App Gateway Router.
|
This guide details the installation of the ClearML App Gateway.
|
||||||
The App Gateway Router enables access to your AI workload applications (e.g. remote IDEs like VSCode and Jupyter, model API interface, etc.).
|
The App Gateway enables access to your AI workload applications (e.g. remote IDEs like VSCode and Jupyter, model API interface, etc.).
|
||||||
It acts as a proxy, identifying ClearML Tasks running within its [K8s namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/)
|
It acts as a proxy, identifying ClearML Tasks running within its [K8s namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/)
|
||||||
and making them available for network access.
|
and making them available for network access.
|
||||||
|
|
||||||
:::important
|
:::important
|
||||||
The App Gateway Router must be installed in the same K8s namespace as a dedicated ClearML Agent.
|
The App Gateway must be installed in the same K8s namespace as a dedicated ClearML Agent.
|
||||||
It can only configure access for ClearML Tasks within its own namespace.
|
It can only configure access for ClearML Tasks within its own namespace.
|
||||||
:::
|
:::
|
||||||
|
|
||||||
@ -27,35 +27,31 @@ It can only configure access for ClearML Tasks within its own namespace.
|
|||||||
|
|
||||||
## Optional for HTTPS
|
## Optional for HTTPS
|
||||||
|
|
||||||
* A valid DNS entry for the new App Gateway Router instance
|
* A valid DNS entry for the new App Gateway instance
|
||||||
* A valid SSL certificate
|
* A valid SSL certificate
|
||||||
|
|
||||||
## Helm
|
## Helm
|
||||||
|
|
||||||
### Login
|
### Login
|
||||||
|
|
||||||
```
|
``` bash
|
||||||
helm repo add clearml-enterprise \
|
helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <GITHUB_TOKEN> --password <GITHUB_TOKEN>
|
||||||
https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages \
|
|
||||||
--username <GITHUB_TOKEN> \
|
|
||||||
--password <GITHUB_TOKEN>
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Replace `<GITHUB_TOKEN>` with your valid GitHub token that has access to the ClearML Enterprise Helm charts repository.
|
Replace `<GITHUB_TOKEN>` with your valid GitHub token that has access to the ClearML Enterprise Helm charts repository.
|
||||||
|
|
||||||
### Prepare Values
|
### Prepare Values
|
||||||
|
|
||||||
Before installing the App Gateway Router, create a Helm override file:
|
Before installing the App Gateway, create a Helm override `clearml-app-gateway-values.override.yaml` file:
|
||||||
|
|
||||||
```
|
```yaml
|
||||||
imageCredentials:
|
imageCredentials:
|
||||||
password: ""
|
password: ""
|
||||||
clearml:
|
clearml:
|
||||||
apiServerKey: ""
|
apiKey: ""
|
||||||
apiServerSecret: ""
|
apiSecret: ""
|
||||||
apiServerUrlReference: ""
|
apiServerUrlReference: ""
|
||||||
authCookieName: ""
|
authCookieName: ""
|
||||||
sslVerify: true
|
|
||||||
ingress:
|
ingress:
|
||||||
enabled: true
|
enabled: true
|
||||||
hostName: ""
|
hostName: ""
|
||||||
@ -71,13 +67,12 @@ tcpSession:
|
|||||||
**Configuration options:**
|
**Configuration options:**
|
||||||
|
|
||||||
* `imageCredentials.password`: ClearML DockerHub Access Token.
|
* `imageCredentials.password`: ClearML DockerHub Access Token.
|
||||||
* `clearml.apiServerKey`: ClearML server API key.
|
* `clearml.apiKey` and `clearml.apiSecret`: [API credentials](../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials) created in the ClearML web UI by an Admin user or Service
|
||||||
* `clearml.apiServerSecret`: ClearML server secret key.
|
Account with admin privileges. Make sure to label these credentials clearly, so that they will not be revoked by mistake.
|
||||||
* `clearml.apiServerUrlReference`: ClearML API server URL starting with `https://api.`.
|
* `clearml.apiServerUrlReference`: ClearML API server URL starting with `https://api.`.
|
||||||
* `clearml.authCookieName`: Cookie used by the ClearML server to store the ClearML authentication cookie.
|
* `clearml.authCookieName`: Cookie used by the ClearML server to store the ClearML authentication cookie.
|
||||||
* `clearml.sslVerify`: Enable or disable SSL certificate validation on `apiserver` calls check.
|
* `ingress.hostName`: Hostname of App Gateway used by the ingress controller to access it.
|
||||||
* `ingress.hostName`: Hostname of router used by the ingress controller to access it.
|
* `tcpSession.routerAddress`: The external App Gateway address (can be an IP, hostname, or load balancer address) depending on your network setup. Ensure this address is accessible for TCP connections.
|
||||||
* `tcpSession.routerAddress`: The external router address (can be an IP, hostname, or load balancer address) depending on your network setup. Ensure this address is accessible for TCP connections.
|
|
||||||
* `tcpSession.service.type`: Service type used to expose TCP functionality, default is `NodePort`.
|
* `tcpSession.service.type`: Service type used to expose TCP functionality, default is `NodePort`.
|
||||||
* `tcpSession.portRange.start`: Start port for the TCP Session feature.
|
* `tcpSession.portRange.start`: Start port for the TCP Session feature.
|
||||||
* `tcpSession.portRange.end`: End port for the TCP Session feature.
|
* `tcpSession.portRange.end`: End port for the TCP Session feature.
|
||||||
@ -85,33 +80,28 @@ tcpSession:
|
|||||||
|
|
||||||
The full list of supported configuration is available with the command:
|
The full list of supported configuration is available with the command:
|
||||||
|
|
||||||
```
|
``` bash
|
||||||
helm show readme allegroai-enterprise/clearml-enterprise-task-traffic-router
|
helm show readme clearml-enterprise/clearml-enterprise-app-gateway
|
||||||
```
|
```
|
||||||
|
|
||||||
### Install
|
### Install
|
||||||
|
|
||||||
To install the App Gateway Router component via Helm use the following command:
|
To install the App Gateway component via Helm use the following command:
|
||||||
|
|
||||||
```
|
``` bash
|
||||||
helm upgrade --install \
|
helm upgrade --install <RELEASE_NAME> -n <WORKLOAD_NAMESPACE> clearml-enterprise/clearml-enterprise-app-gateway --version <CHART_VERSION> -f clearml-app-gateway-values.override.yaml
|
||||||
<RELEASE_NAME> \
|
|
||||||
-n <WORKLOAD_NAMESPACE> \
|
|
||||||
allegroai-enterprise/clearml-enterprise-task-traffic-router \
|
|
||||||
--version <CHART_VERSION> \
|
|
||||||
-f override.yaml
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Replace the placeholders with the following values:
|
Replace the placeholders with the following values:
|
||||||
|
|
||||||
* `<RELEASE_NAME>` - Unique name for the App Gateway Router within the K8s namespace. This is a required parameter in
|
* `<RELEASE_NAME>` - Unique name for the App Gateway within the K8s namespace. This is a required parameter in
|
||||||
Helm, which identifies a specific installation of the chart. The release name also defines the router’s name and
|
Helm, which identifies a specific installation of the chart. The release name also defines the App Gateway's name and
|
||||||
appears in the UI within AI workload application URLs (e.g. Remote IDE URLs). This can be customized to support multiple installations within the same
|
appears in the UI within AI workload application URLs (e.g. Remote IDE URLs). This can be customized to support multiple installations within the same
|
||||||
namespace by assigning different release names.
|
namespace by assigning different release names.
|
||||||
* `<WORKLOAD_NAMESPACE>` - [Kubernetes Namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/)
|
* `<WORKLOAD_NAMESPACE>` - [Kubernetes Namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/)
|
||||||
where workloads will be executed. This namespace must be shared between a dedicated ClearML Agent and an App
|
where workloads will be executed. This namespace must be shared between a dedicated ClearML Agent and an App
|
||||||
Gateway Router. The agent is responsible for monitoring its assigned task queues and spawning workloads within this
|
Gateway. The agent is responsible for monitoring its assigned task queues and spawning workloads within this
|
||||||
namespace. The router monitors the same namespace for AI workloads (e.g. remote IDE applications). The router has a
|
namespace. The App Gateway monitors the same namespace for AI workloads (e.g. remote IDE applications). The App Gateway has a
|
||||||
namespace-limited scope, meaning it can only detect and manage tasks within its
|
namespace-limited scope, meaning it can only detect and manage tasks within its
|
||||||
assigned namespace.
|
assigned namespace.
|
||||||
* `<CHART_VERSION>` - Version recommended by the ClearML Support Team.
|
* `<CHART_VERSION>` - Version recommended by the ClearML Support Team.
|
@ -2,408 +2,168 @@
|
|||||||
title: Kubernetes
|
title: Kubernetes
|
||||||
---
|
---
|
||||||
|
|
||||||
|
This guide provides step-by-step instructions for installing the ClearML Enterprise Server (control-plane) in a Kubernetes cluster.
|
||||||
|
|
||||||
This guide provides step-by-step instructions for installing the ClearML Enterprise setup in a Kubernetes cluster.
|
The ClearML Enterprise Server includes the ClearML `apiserver`, `fileserver`, and `webserver` components.
|
||||||
|
The package also includes MongoDB, ElasticSearch, and Redis as Helm dependencies.
|
||||||
|
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
|
To deploy a ClearML Server, ensure the following components and configurations are in place:
|
||||||
|
|
||||||
* A Kubernetes cluster
|
- Kubernetes Cluster: A standard Kubernetes cluster is preferred for optimal GPU support.
|
||||||
* An ingress controller (e.g. `nginx-ingress`) and the ability to create LoadBalancer services (e.g. MetalLB) if needed
|
- CLI Tools: `kubectl` and `helm` must be installed and configured.
|
||||||
to expose ClearML
|
- Ingress Controller: An Ingress controller (e.g., `nginx-ingress`) is required. If exposing services externally, a
|
||||||
* Credentials for ClearML Enterprise GitHub Helm chart repository
|
LoadBalancer-capable solution (e.g. `MetalLB`) should also be configured.
|
||||||
* Credentials for ClearML Enterprise DockerHub repository
|
- Server and workers that communicate on HTTP/S (ports 80 and 443). Additionally, the TCP session feature requires a
|
||||||
* URL for downloading the ClearML Enterprise applications configuration
|
range of ports for TCP traffic based on your configuration (see [AI App Gateway installation](appgw_install_k8s.md)).
|
||||||
|
- DNS Configuration: A domain with subdomain support is required, ideally with trusted TLS certificates. All entries must
|
||||||
|
be resolvable by the Ingress controller. Example subdomains:
|
||||||
|
- Server:
|
||||||
|
- `api.<BASE_DOMAIN>`
|
||||||
|
- `app.<BASE_DOMAIN>`
|
||||||
|
- `files.<BASE_DOMAIN>`
|
||||||
|
- Worker:
|
||||||
|
- `router.<BASE_DOMAIN>`
|
||||||
|
- `tcp-router.<BASE_DOMAIN>` (optional, for TCP sessions)
|
||||||
|
- Storage: A configured StorageClass and an accessible storage backend.
|
||||||
|
- ClearML Enterprise Access:
|
||||||
|
- Helm repository credentials (`<HELM_REPO_TOKEN>`)
|
||||||
|
- DockerHub registry credentials (`<CLEARML_DOCKERHUB_TOKEN>`)
|
||||||
|
|
||||||
|
### Recommended Cluster Specifications
|
||||||
|
|
||||||
## Control Plane Installation
|
For optimal performance, a Kubernetes cluster with at least 3 nodes is recommended, each having:
|
||||||
|
|
||||||
|
- 8 vCPUs
|
||||||
|
- 32 GB RAM
|
||||||
|
- 500 GB storage
|
||||||
|
|
||||||
The following steps cover installing the control plane (server and required charts) and will
|
## Installation
|
||||||
require some or all of the tokens/deliverables mentioned above.
|
|
||||||
|
|
||||||
|
### Add the Helm Repo Locally
|
||||||
|
|
||||||
### Requirements
|
Add the ClearML Helm repository:
|
||||||
|
|
||||||
|
``` bash
|
||||||
|
helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN>
|
||||||
|
```
|
||||||
|
|
||||||
* Add the ClearML Enterprise repository:
|
Update the repository locally:
|
||||||
|
``` bash
|
||||||
|
helm repo update
|
||||||
|
```
|
||||||
|
|
||||||
|
### Prepare Values
|
||||||
|
|
||||||
```
|
Create a `clearml-values.override.yaml` file with the following content:
|
||||||
helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <clearmlenterprise_GitHub_TOKEN> --password <clearmlenterprise_GitHub_TOKEN>
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
* Update the repository locally:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
helm repo update
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
### Install ClearML Enterprise Chart
|
|
||||||
|
|
||||||
|
|
||||||
#### Configuration
|
|
||||||
|
|
||||||
|
|
||||||
The Helm Chart must be installed with an `overrides.yaml` overriding values as follows:
|
|
||||||
|
|
||||||
|
|
||||||
:::note
|
:::note
|
||||||
In the following configuration, replace `<BASE_DOMAIN>` with a valid domain
|
In the following configuration, replace the `<BASE_DOMAIN>` placeholders with a valid domain that will have records
|
||||||
that will have records pointing to the cluster’s ingress controller (see ingress details in the values below).
|
pointing to the cluster's Ingress Controller. This will be the base domain for reaching your ClearML installation.
|
||||||
:::
|
:::
|
||||||
|
|
||||||
|
``` yaml
|
||||||
```yaml
|
|
||||||
imageCredentials:
|
imageCredentials:
|
||||||
password: "<clearml_enterprise_DockerHub_TOKEN>"
|
password: "<CLEARML_DOCKERHUB_TOKEN>"
|
||||||
|
|
||||||
clearml:
|
clearml:
|
||||||
cookieDomain: "<BASE_DOMAIN>"
|
cookieDomain: "<BASE_DOMAIN>"
|
||||||
# Set values for improved security
|
|
||||||
apiserverKey: "<GENERATED_API_SERVER_KEY>"
|
|
||||||
apiserverSecret: "<GENERATED_API_SERVER_SECRET>"
|
|
||||||
fileserverKey: "<GENERATED_FILE_SERVER_KEY>"
|
|
||||||
fileserverSecret: "<GENERATED_FILE_SERVER_SECRET>"
|
|
||||||
secureAuthTokenSecret: "<GENERATED_AUTH_TOKEN_SECRET>"
|
|
||||||
testUserKey: "<GENERATED_TEST_USER_KEY>"
|
|
||||||
testUserSecret: "<GENERATED_TEST_USER_SECRET>"
|
|
||||||
|
|
||||||
apiserver:
|
apiserver:
|
||||||
ingress:
|
ingress:
|
||||||
enabled: true
|
enabled: true
|
||||||
hostName: "api.<BASE_DOMAIN>"
|
hostName: "api.<BASE_DOMAIN>"
|
||||||
service:
|
service:
|
||||||
type: ClusterIP
|
type: ClusterIP
|
||||||
|
|
||||||
fileserver:
|
fileserver:
|
||||||
ingress:
|
ingress:
|
||||||
enabled: true
|
enabled: true
|
||||||
hostName: "file.<BASE_DOMAIN>"
|
hostName: "files.<BASE_DOMAIN>"
|
||||||
service:
|
service:
|
||||||
type: ClusterIP
|
type: ClusterIP
|
||||||
|
|
||||||
webserver:
|
webserver:
|
||||||
ingress:
|
ingress:
|
||||||
enabled: true
|
enabled: true
|
||||||
hostName: "app.<BASE_DOMAIN>"
|
hostName: "app.<BASE_DOMAIN>"
|
||||||
service:
|
service:
|
||||||
type: ClusterIP
|
type: ClusterIP
|
||||||
|
|
||||||
clearmlApplications:
|
clearmlApplications:
|
||||||
enabled: true
|
enabled: true
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Additional Configuration Options
|
### Install the Chart
|
||||||
##### Fixed Users (Simple Login)
|
|
||||||
|
|
||||||
|
Install the ClearML Enterprise Helm chart using the previous values override file.
|
||||||
|
|
||||||
Enable static login with username and password in `overrides.yaml`.
|
``` bash
|
||||||
|
helm upgrade -i -n clearml clearml clearml-enterprise/clearml-enterprise --create-namespace -f clearml-values.override.yaml
|
||||||
|
|
||||||
This is an optional step in case SSO (Identity provider) configuration will not be performed.
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
apiserver:
|
|
||||||
additionalConfigs:
|
|
||||||
apiserver.conf: |
|
|
||||||
auth {
|
|
||||||
fixed_users {
|
|
||||||
enabled: true
|
|
||||||
pass_hashed: false
|
|
||||||
users: [
|
|
||||||
{
|
|
||||||
username: "my_user"
|
|
||||||
password: "my_password"
|
|
||||||
name: "My User"
|
|
||||||
admin: true
|
|
||||||
},
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Additional Configuration Options
|
||||||
|
|
||||||
##### SSO (Identity Provider)
|
:::note
|
||||||
|
You can view the full set of available and documented values of the chart by running the following command:
|
||||||
|
|
||||||
|
```bash
|
||||||
The following examples (Auth0 and Keycloak) show how to configure an identity provider on the ClearML server.
|
helm show readme clearml-enterprise/clearml-enterprise
|
||||||
|
# or
|
||||||
|
helm show values clearml-enterprise/clearml-enterprise
|
||||||
Add the following values configuring `extraEnvs` for `apiserver` in the `clearml-enterprise` values `override.yaml` file.
|
|
||||||
|
|
||||||
|
|
||||||
Substitute all `<PLACEHOLDER>`s with the correct value for your configuration.
|
|
||||||
|
|
||||||
|
|
||||||
##### Auth0 Identity Provider
|
|
||||||
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
apiserver:
|
|
||||||
extraEnvs:
|
|
||||||
- name: CLEARML__secure__login__sso__oauth_client__auth0__client_id
|
|
||||||
value: "<SSO_CLIENT_ID>"
|
|
||||||
- name: CLEARML__secure__login__sso__oauth_client__auth0__client_secret
|
|
||||||
value: "<SSO_CLIENT_SECRET>"
|
|
||||||
- name: CLEARML__services__login__sso__oauth_client__auth0__base_url
|
|
||||||
value: "<SSO_CLIENT_URL>"
|
|
||||||
- name: CLEARML__services__login__sso__oauth_client__auth0__authorize_url
|
|
||||||
value: "<SSO_CLIENT_AUTHORIZE_URL>"
|
|
||||||
- name: CLEARML__services__login__sso__oauth_client__auth0__access_token_url
|
|
||||||
value: "<SSO_CLIENT_ACCESS_TOKEN_URL>"
|
|
||||||
- name: CLEARML__services__login__sso__oauth_client__auth0__audience
|
|
||||||
value: "<SSO_CLIENT_AUDIENCE>"
|
|
||||||
```
|
```
|
||||||
|
:::
|
||||||
|
|
||||||
|
### Default Secret Values
|
||||||
|
|
||||||
##### Keycloak Identity Provider
|
For improved security, all the internal credentials are auto-generated randomly and stored in a Secret in
|
||||||
|
Kubernetes.
|
||||||
|
|
||||||
|
If you need to define your own credentials to be used instead, replace the default key and secret values in `clearml-values.override.yaml`.
|
||||||
|
|
||||||
```yaml
|
``` yaml
|
||||||
apiserver:
|
|
||||||
extraEnvs:
|
|
||||||
- name: CLEARML__secure__login__sso__oauth_client__keycloak__client_id
|
|
||||||
value: "<KC_CLIENT_ID>"
|
|
||||||
- name: CLEARML__secure__login__sso__oauth_client__keycloak__client_secret
|
|
||||||
value: "<KC_SECRET_ID>"
|
|
||||||
- name: CLEARML__services__login__sso__oauth_client__keycloak__base_url
|
|
||||||
value: "<KC_URL>/realms/<REALM_NAME>/"
|
|
||||||
- name: CLEARML__services__login__sso__oauth_client__keycloak__authorize_url
|
|
||||||
value: "<KC_URL>/realms/<REALM_NAME>/protocol/openid-connect/auth"
|
|
||||||
- name: CLEARML__services__login__sso__oauth_client__keycloak__access_token_url
|
|
||||||
value: "<KC_URL>/realms/<REALM_NAME>/protocol/openid-connect/token"
|
|
||||||
- name: CLEARML__services__login__sso__oauth_client__keycloak__idp_logout
|
|
||||||
value: "true"
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
#### Installing the Chart
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
helm install -n clearml \
|
|
||||||
clearml \
|
|
||||||
clearml-enterprise/clearml-enterprise \
|
|
||||||
--create-namespace \
|
|
||||||
-f overrides.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
### Install ClearML Agent Chart
|
|
||||||
|
|
||||||
|
|
||||||
#### Configuration
|
|
||||||
|
|
||||||
|
|
||||||
To configure the agent you will need to choose a Redis password and use that when setting up Redis as well
|
|
||||||
(see [Shared Redis installation](multi_tenant_k8s.md#shared-redis-installation)).
|
|
||||||
|
|
||||||
|
|
||||||
The Helm Chart must be installed with `overrides.yaml`:
|
|
||||||
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
imageCredentials:
|
|
||||||
password: "<CLEARML_DOCKERHUB_TOKEN>"
|
|
||||||
clearml:
|
clearml:
|
||||||
agentk8sglueKey: "<ACCESS_KEY>"
|
# Replace the following values to use custom internal credentials.
|
||||||
agentk8sglueSecret: "<SECRET_KEY>"
|
apiserverKey: ""
|
||||||
agentk8sglue:
|
apiserverSecret: ""
|
||||||
apiServerUrlReference: "https://api.<BASE_DOMAIN>"
|
fileserverKey: ""
|
||||||
fileServerUrlReference: "https://files.<BASE_DOMAIN>"
|
fileserverSecret: ""
|
||||||
webServerUrlReference: "https://app.<BASE_DOMAIN>"
|
secureAuthTokenSecret: ""
|
||||||
defaultContainerImage: "python:3.9"
|
testUserKey: ""
|
||||||
|
testUserSecret: ""
|
||||||
```
|
```
|
||||||
|
|
||||||
|
In a shell, if `openssl` is installed, you can use this simple command to generate random strings suitable as keys and secrets:
|
||||||
|
|
||||||
#### Installing the Chart
|
``` bash
|
||||||
|
openssl rand -hex 16
|
||||||
|
|
||||||
```bash
|
|
||||||
helm install -n <WORKLOAD_NAMESPACE> \
|
|
||||||
clearml-agent \
|
|
||||||
clearml-enterprise/clearml-enterprise-agent \
|
|
||||||
--create-namespace \
|
|
||||||
-f overrides.yaml
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Fixed Users
|
||||||
|
|
||||||
To create a queue by API:
|
Enable and configure simple login with username and password in `clearml-values.override.yaml`. This is useful for simple PoC
|
||||||
|
installations. This is an optional step in case the SSO (Identity provider) configuration is not performed.
|
||||||
|
|
||||||
|
Please note that this setup is not ideal for multi-tenant setups as fixed users will only be associated with the default tenant.
|
||||||
|
|
||||||
```bash
|
``` yaml
|
||||||
curl $APISERVER_URL/queues.create \
|
apiserver:
|
||||||
-H "Content-Type: application/json" \
|
additionalConfigs:
|
||||||
-H "X-Clearml-Impersonate-As:<USER_ID>" \
|
apiserver.conf: |
|
||||||
-u $APISERVER_KEY:$APISERVER_SECRET \
|
auth {
|
||||||
-d '{"name":"default"}'
|
fixed_users {
|
||||||
|
enabled: true
|
||||||
|
pass_hashed: false
|
||||||
|
users: [
|
||||||
|
{
|
||||||
|
username: "my_user"
|
||||||
|
password: "my_password"
|
||||||
|
name: "My User"
|
||||||
|
admin: true
|
||||||
|
},
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
## ClearML AI Application Gateway Installation
|
Once the ClearML Enterprise Server is up and running, proceed with installing the ClearML Enterprise Agent and
|
||||||
|
[AI App Gateway](appgw_install_k8s.md).
|
||||||
|
|
||||||
### Configuring Chart
|
|
||||||
|
|
||||||
|
|
||||||
The Helm Chart must be installed with `overrides.yaml`:
|
|
||||||
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
imageCredentials:
|
|
||||||
password: "<DOCKERHUB_TOKEN>"
|
|
||||||
clearml:
|
|
||||||
apiServerKey: ""
|
|
||||||
apiServerSecret: ""
|
|
||||||
apiServerUrlReference: "https://api."
|
|
||||||
authCookieName: ""
|
|
||||||
ingress:
|
|
||||||
enabled: true
|
|
||||||
hostName: "task-router.dev"
|
|
||||||
tcpSession:
|
|
||||||
routerAddress: "<NODE_IP OR EXTERNAL_NAME>"
|
|
||||||
portRange:
|
|
||||||
start: <START_PORT>
|
|
||||||
end: <END_PORT>
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
**Configuration options:**
|
|
||||||
|
|
||||||
|
|
||||||
* **`clearml.apiServerUrlReference`:** URL usually starting with `https://api.`
|
|
||||||
* **`clearml.apiServerKey`:** ClearML server API key
|
|
||||||
* **`clearml.apiServerSecret`:** ClearML server secret key
|
|
||||||
* **`ingress.hostName`:** URL of the router we configured previously for load balancer starting with `https://`
|
|
||||||
* **`clearml.sslVerify`:** Enable or disable SSL certificate validation on apiserver calls check
|
|
||||||
* **`clearml.authCookieName`:** Value from `value_prefix` key starting with `allegro_token` in `envoy.yaml` file in ClearML server installation.
|
|
||||||
* **`tcpSession.routerAddress`**: Router external address can be an IP or the host machine or a load balancer hostname, depends on the network configuration
|
|
||||||
* **`tcpSession.portRange.start`**: Start port for the TCP Session feature
|
|
||||||
* **`tcpSession.portRange.end`**: End port for the TCP Session feature
|
|
||||||
|
|
||||||
|
|
||||||
### Installing the Chart
|
|
||||||
|
|
||||||
|
|
||||||
```bash
|
|
||||||
helm install -n <WORKLOAD_NAMESPACE> \
|
|
||||||
clearml-ttr \
|
|
||||||
clearml-enterprise/clearml-enterprise-task-traffic-router \
|
|
||||||
--create-namespace \
|
|
||||||
-f overrides.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Applications Installation
|
|
||||||
|
|
||||||
|
|
||||||
To install the ClearML Applications on the newly installed ClearML Enterprise control-plane, download the applications
|
|
||||||
package using the URL provided by the ClearML staff.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Download and Extract
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
wget -O apps.zip "<ClearML enterprise applications configuration download url>"
|
|
||||||
unzip apps.zip
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
### Adjust Application Docker Images Location (Air-Gapped Systems)
|
|
||||||
|
|
||||||
|
|
||||||
ClearML applications use pre-built docker images provided by ClearML on the ClearML DockerHub
|
|
||||||
repository. If you are using an air-gapped system, these images must be available as part of your internal docker
|
|
||||||
registry, and the correct docker images location must be specified before installing the applications.
|
|
||||||
|
|
||||||
|
|
||||||
Use the following script to adjust the applications packages accordingly before installing the applications:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
python convert_image_registry.py \
|
|
||||||
--apps-dir /path/to/apps/ \
|
|
||||||
--repo local_registry/clearml-apps
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
The script will change the application zip files to point to the new registry, and will output the list of containers
|
|
||||||
that need to be copied to the local registry. For example:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
make sure allegroai/clearml-apps:hpo-1.10.0-1062 was added to local_registry/clearml-apps
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
### Install Applications
|
|
||||||
|
|
||||||
|
|
||||||
Use the `upload_apps.py` script to upload the application packages to the ClearML server:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
python upload_apps.py \
|
|
||||||
--host $APISERVER_ADDRESS \
|
|
||||||
--user $APISERVER_USER --password $APISERVER_PASSWORD \
|
|
||||||
--dir apps -ml
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
## Configuring Shared Memory for Large Model Deployment
|
|
||||||
|
|
||||||
|
|
||||||
Deploying large models may fail due to shared memory size limitations. This issue commonly arises when the allocated
|
|
||||||
`/dev/shm` space is insufficient.:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
> 3d3e22c3066f:168:168 [0] misc/shmutils.cc:72 NCCL WARN Error: failed to extend /dev/shm/nccl-UbzKZ9 to 9637892 bytes
|
|
||||||
> 3d3e22c3066f:168:168 [0] misc/shmutils.cc:113 NCCL WARN Error while creating shared memory segment /dev/shm/nccl-UbzKZ9 (size 9637888)
|
|
||||||
> 3d3e22c3066f:168:168 [0] NCCL INFO transport/shm.cc:114 -> 2
|
|
||||||
> 3d3e22c3066f:168:168 [0] NCCL INFO transport.cc:33 -> 2
|
|
||||||
> 3d3e22c3066f:168:168 [0] NCCL INFO transport.cc:113 -> 2
|
|
||||||
> 3d3e22c3066f:168:168 [0] NCCL INFO init.cc:1263 -> 2
|
|
||||||
> 3d3e22c3066f:168:168 [0] NCCL INFO init.cc:1548 -> 2
|
|
||||||
> 3d3e22c3066f:168:168 [0] NCCL INFO init.cc:1799 -> 2
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
To configure a proper SHM size you can use the following configuration in the agent `overrides.yaml`.
|
|
||||||
|
|
||||||
|
|
||||||
Replace `<SIZE>` with the desired memory allocation in GiB, based on your model requirements.
|
|
||||||
|
|
||||||
|
|
||||||
This example configures a specific queue, but you can include this setting in the `basePodTemplate` if you need to
|
|
||||||
apply it to all tasks.
|
|
||||||
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
agentk8sglue:
|
|
||||||
queues:
|
|
||||||
GPUshm:
|
|
||||||
templateOverrides:
|
|
||||||
env:
|
|
||||||
- name: VLLM_SKIP_P2P_CHECK
|
|
||||||
value: "1"
|
|
||||||
volumeMounts:
|
|
||||||
- name: dshm
|
|
||||||
mountPath: /dev/shm
|
|
||||||
volumes:
|
|
||||||
- name: dshm
|
|
||||||
emptyDir:
|
|
||||||
medium: Memory
|
|
||||||
sizeLimit: <SIZE>Gi
|
|
||||||
```
|
|
||||||
|
@ -154,7 +154,7 @@ should be reviewed and modified prior to the server installation
|
|||||||
fixed_users {
|
fixed_users {
|
||||||
enabled: true,
|
enabled: true,
|
||||||
users: [
|
users: [
|
||||||
{username: "support", password: "<enter password here>", admin: true, name: "allegro.ai Support User"},
|
{username: "support", password: "<enter password here>", admin: true, name: "ClearML Support User"},
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -2,26 +2,6 @@
|
|||||||
title: Version 2.0
|
title: Version 2.0
|
||||||
---
|
---
|
||||||
|
|
||||||
### ClearML Server 2.0.1
|
|
||||||
|
|
||||||
**New Features**
|
|
||||||
* New UI task creation options
|
|
||||||
* Support bash as well as python scripts
|
|
||||||
* Support file upload
|
|
||||||
|
|
||||||
**Bug Fixes**
|
|
||||||
* Fix ctrl-f does not open a search bar in UI editor modals ([ClearML Web GitHub issue #99](https://github.com/clearml/clearml-web/issues/99))
|
|
||||||
* Fix UI smoothed plots are dimmer than original plots in dark mode ([ClearML Server GitHub issue #270](https://github.com/clearml/clearml-server/issues/270))
|
|
||||||
* Fix webserver configuration environment variables don't load with single-quoted strings ([ClearML Server GitHub issue #271](https://github.com/clearml/clearml-server/issues/271))
|
|
||||||
* Fix image plots sometimes not rendered in UI
|
|
||||||
* Fix "All" tag filter not working in UI model selection modal in comparison pages
|
|
||||||
* Fix manual refresh function sometimes does not work in UI task
|
|
||||||
* Fix UI embedded plot colors do not change upon UI theme change
|
|
||||||
* Fix deleting a parameter in the UI task creation modal incorrectly removes another parameter
|
|
||||||
* Fix UI global search displays aborted tasks as completed
|
|
||||||
* Fix can't show/hide specific UI plot variants
|
|
||||||
* Fix UI breadcrumbs sometimes does not display project name
|
|
||||||
|
|
||||||
### ClearML Server 2.0.0
|
### ClearML Server 2.0.0
|
||||||
|
|
||||||
**Breaking Changes**
|
**Breaking Changes**
|
||||||
|
@ -81,7 +81,7 @@ The ClearML UI provides two search options on most pages:
|
|||||||
on the top banner of every page searches the whole WebApp for objects that match the queries as specified above and
|
on the top banner of every page searches the whole WebApp for objects that match the queries as specified above and
|
||||||
returns results grouped by object (projects, tasks, models, etc.).
|
returns results grouped by object (projects, tasks, models, etc.).
|
||||||
|
|
||||||
To use regular expressions, click the .* icon in the search bar.
|
To use regular expressions, click the `.*` icon in the search bar.
|
||||||
|
|
||||||

|

|
||||||

|

|
||||||
|
Loading…
Reference in New Issue
Block a user