mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-16 11:29:48 +00:00
119 lines
4.7 KiB
Markdown
119 lines
4.7 KiB
Markdown
---
|
|
title: ClearML Enterprise K8s Installation and Configuration
|
|
---
|
|
|
|
This guides walks you through the complete installation and configuration of ClearML Enterprise Server on Kubernetes
|
|
from initial setup to advanced configuration options.
|
|
|
|
Follow the steps in the order presented for a smooth setup process.
|
|
|
|
## Prerequisites
|
|
|
|
Before you begin, ensure the following requirements are met:
|
|
|
|
- Kubernetes Cluster: A standard Kubernetes (vanilla) cluster is recommended, especially for optimal GPU support.
|
|
- CLI Tools: `kubectl` and `helm` must be installed and configured.
|
|
- Ingress Controller: Required to expose services via HTTP/S (e.g., `nginx-ingress`). For external access,
|
|
configure a LoadBalancer (e.g., `MetalLB`).
|
|
- Network Ports:
|
|
- HTTP/S communication (ports 80 and 443) must be available between the server and agents.
|
|
- TCP session support (for interactive apps) requires an additional port range (see the [AI App Gateway installation](appgw_install_k8s.md)).
|
|
- DNS Configuration: A domain with subdomain support is required, ideally with trusted TLS certificates. All entries must
|
|
be resolvable by the Ingress controller. Example subdomains:
|
|
- Server:
|
|
- `api.<BASE_DOMAIN>`
|
|
- `app.<BASE_DOMAIN>`
|
|
- `files.<BASE_DOMAIN>`
|
|
- Worker:
|
|
- `router.<BASE_DOMAIN>`
|
|
- `tcp-router.<BASE_DOMAIN>` (optional, for TCP sessions)
|
|
- Storage: A configured StorageClass and an accessible storage backend.
|
|
- ClearML Enterprise Access:
|
|
- Helm repository token (`<HELM_REPO_TOKEN>`)
|
|
- DockerHub registry token (`<CLEARML_DOCKERHUB_TOKEN>`)
|
|
|
|
### Recommended Cluster Specifications
|
|
|
|
A 3-node cluster is recommended for production setups, with each node provisioned with:
|
|
|
|
- 8 vCPUs
|
|
- 32 GB RAM
|
|
- 500 GB storage
|
|
|
|
## Installation
|
|
|
|
### ClearML Enterprise Server
|
|
|
|
The ClearML Enterprise Server (control plane) includes the ClearML `apiserver`, `fileserver`, and `webserver` components.
|
|
The package also includes MongoDB, ElasticSearch, and Redis as Helm dependencies.
|
|
|
|
See the [ClearML Server on Kubernetes installation guide](k8s.md).
|
|
|
|
### ClearML Applications
|
|
|
|
ClearML Applications are plugin services that automate ML workloads
|
|
without any coding. Applications are installed on top of the ClearML Server and are provided by the ClearML team.
|
|
|
|
See the [Application Installation guide](extra_configs/apps.md).
|
|
|
|
### ClearML Enterprise Agent
|
|
|
|
The ClearML Enterprise Agent enables scheduling and execution of distributed workloads (Tasks) on your Kubernetes cluster.
|
|
|
|
See the [ClearML Agent installation guide](agent_k8s.md).
|
|
|
|
### AI Application Gateway
|
|
|
|
The AI App Gateway enables secure, authenticated access to interactive ClearML applications (e.g., JupyterLab, Streamlit)
|
|
based on ClearML user permissions. It routes HTTPS traffic from users to running pods on the cluster.
|
|
|
|
See the [AI Application Gateway installation guide](appgw_install_k8s.md).
|
|
|
|
## Additional Configuration Options
|
|
|
|
### GPU Operator
|
|
|
|
Deploy the NVIDIA GPU Operator in order to use Nvidia GPUs in ClearML.
|
|
|
|
See the [GPU Operator Basic Deployment guide](extra_configs/gpu_operator.md)
|
|
|
|
### Fractional GPU Support
|
|
|
|
To optimize GPU utilization:
|
|
|
|
* ClearML Dynamic MIG Orchestrator (CDMO): Manage GPU fractions using NVIDIA MIGs. See the [CDMO guide](fractional_gpus/cdmo.md)
|
|
* ClearML Fractional GPU Injector (CFGI): Use fractional (non-MIG) GPU slices for efficient resource sharing. See the [CFGI guide](fractional_gpus/cfgi.md)
|
|
* Mixed Deployments: Deploy both CDMO and CFGI in clusters with diverse GPU types. Use the NVIDIA GPU Operator to handle
|
|
mixed hardware setups. See the [CDMO and CFGI guide](fractional_gpus/cdmo_cfgi_same_cluster.md).
|
|
|
|
### Multi-Tenant Setup
|
|
|
|
Run multiple isolated tenants on a single ClearML Server instance, each with its own configuration and user namespaces.
|
|
|
|
See the [Multi-Tenant Service guide](multi_tenant_k8s.md).
|
|
|
|
### SSO (Identity Provider) Setup
|
|
|
|
Integrate identity providers to enable SSO login for ClearML Enterprise users.
|
|
|
|
See the [SSO Setup guide](extra_configs/sso_login.md).
|
|
|
|
### ClearML Custom Events
|
|
|
|
ClearML Enterprise supports sending custom events to selected Kafka topics.
|
|
|
|
See the [Custom Event guide](extra_configs/custom_billing.md).
|
|
|
|
### ClearML Presign Service
|
|
|
|
The ClearML Presign Service securely generates pre-signed storage URLs for authenticated users.
|
|
|
|
See the [ClearML Presign Service guide](extra_configs/presign_service.md).
|
|
|
|
## Install with a Non-Root User
|
|
|
|
In some Helm charts, you will find a values file called `values-enterprise-non-root-privileged.yaml` to be used for a
|
|
non-root installation.
|
|
|
|
These values are for Enterprise versions only, and they need to be adapted to specific infrastructure needs. The
|
|
`containerSecurityContext` is related to the Kubernetes distribution used/configuration and will need to be customized accordingly. |