clearml-docs/docs/deploying_clearml/enterprise_deploy/k8s.md
2025-05-13 08:31:25 +03:00

170 lines
5.0 KiB
Markdown

---
title: Kubernetes
---
This guide provides step-by-step instructions for installing the ClearML Enterprise control-plane setup in a Kubernetes cluster.
ClearML Enterprise is the main ClearML Server, comprising the ClearML `apiserver`, `fileserver`, and `webserver` components.
The package also includes MongoDB, ElasticSearch, and Redis as Helm dependencies.
## Prerequisites
To deploy ClearML Enterprise, ensure the following components and configurations are in place:
- Kubernetes Cluster: A vanilla Kubernetes cluster is preferred for optimal GPU support.
- CLI Tools: `kubectl` and `helm` must be installed and configured.
- Ingress Controller: An Ingress controller (e.g., `nginx-ingress`) is required. If exposing services externally, a
LoadBalancer-capable solution (e.g. `MetalLB`) should also be configured.
- Server and workers that communicate on HTTP/S (ports 80 and 443). Additionally, the TCP session feature requires a
range of ports for TCP traffic based on your configuration (see [AI App Gateway installation](appgw_install_k8s.md)).
- DNS Configuration: A domain with subdomain support is required, ideally with trusted TLS certificates. All entries must
be resolvable by the Ingress controller. Example subdomains:
- Control Plane:
- `api.<BASE_DOMAIN>`
- `app.<BASE_DOMAIN>`
- `files.<BASE_DOMAIN>`
- Worker:
- `router.<BASE_DOMAIN>`
- `tcp-router.<BASE_DOMAIN>` (optional, for TCP sessions)
- Storage: A configured StorageClass and an accessible storage backend.
- ClearML Enterprise Access:
- Helm repository credentials (`<HELM_REPO_TOKEN>`)
- DockerHub registry credentials (`<CLEARML_DOCKERHUB_TOKEN>`)
### Recommended Cluster Specifications
For optimal performance, a Kubernetes cluster with at least 3 nodes is recommended, each having:
- 8 vCPUs
- 32 GB RAM
- 500 GB storage
## Installation
### Add the Helm Repo Locally
Add the ClearML Helm repository:
``` bash
helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN>
```
Update the repository locally:
``` bash
helm repo update
```
### Prepare Values
Create a `clearml-values.override.yaml` file with the following content:
:::note
In the following configuration, replace the `<BASE_DOMAIN>` placeholders with a valid domain that will have records
pointing to the cluster's Ingress Controller. This will be the base domain for reaching your ClearML installation.
:::
``` yaml
imageCredentials:
password: "<CLEARML_DOCKERHUB_TOKEN>"
clearml:
cookieDomain: "<BASE_DOMAIN>"
apiserver:
ingress:
enabled: true
hostName: "api.<BASE_DOMAIN>"
service:
type: ClusterIP
fileserver:
ingress:
enabled: true
hostName: "files.<BASE_DOMAIN>"
service:
type: ClusterIP
webserver:
ingress:
enabled: true
hostName: "app.<BASE_DOMAIN>"
service:
type: ClusterIP
clearmlApplications:
enabled: true
```
### Install the Chart
Install the ClearML Enterprise Helm chart using the previous values override file.
``` bash
helm upgrade -i -n clearml clearml clearml-enterprise/clearml-enterprise --create-namespace -f clearml-values.override.yaml
```
## Additional Configuration Options
:::note
You can view the full set of available and documented values of the chart by running the following command:
```bash
helm show readme clearml-enterprise/clearml-enterprise
# or
helm show values clearml-enterprise/clearml-enterprise
```
:::
### Default Secret Values
For improved security, all the internal credentials are auto-generated randomly and stored in a Secret in
Kubernetes.
If you need to define your own credentials to be used instead, replace the default key and secret values in `clearml-values.override.yaml`.
``` yaml
clearml:
# Replace the following values to use custom internal credentials.
apiserverKey: ""
apiserverSecret: ""
fileserverKey: ""
fileserverSecret: ""
secureAuthTokenSecret: ""
testUserKey: ""
testUserSecret: ""
```
In a shell, if `openssl` is installed, you can use this simple command to generate random strings suitable as keys and secrets:
``` bash
openssl rand -hex 16
```
### Fixed Users
Enable and configure simple login with username and password in `clearml-values.override.yaml`. This is useful for simple PoC
installations. This is an optional step in case the SSO (Identity provider) configuration is not performed.
Please note that this setup is not ideal for multi-tenant setups as fixed users will only be associated with the default tenant.
``` yaml
apiserver:
additionalConfigs:
apiserver.conf: |
auth {
fixed_users {
enabled: true
pass_hashed: false
users: [
{
username: "my_user"
password: "my_password"
name: "My User"
admin: true
},
]
}
}
```
## Next Steps
Once the ClearML Enterprise control-plane is up and running, proceed with installing the ClearML Enterprise Agent and
[AI App Gateway](appgw_install_k8s.md).