mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
Add ClearML Enterprise Server on k8s overview and ToC
This commit is contained in:
parent
9fdc893582
commit
8e1c790532
@ -1,22 +1,23 @@
|
|||||||
---
|
---
|
||||||
title: Complete ClearML Enterprise K8s Installation and Configuration
|
title: ClearML Enterprise K8s Installation and Configuration
|
||||||
---
|
---
|
||||||
|
|
||||||
This guides walks you through installing and configuring ClearML Enterprise on Kubernetes, from basic installation
|
This guides walks you through the complete installation and configuration of ClearML Enterprise Server on Kubernetes
|
||||||
to advanced configuration options.
|
from initial setup to advanced configuration options.
|
||||||
|
|
||||||
Follow the steps in the order presented for a smooth setup process.
|
Follow the steps in the order presented for a smooth setup process.
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
Before installing ClearML Enterprise, verify that the following components are in place:
|
Before you begin, ensure the following requirements are met:
|
||||||
|
|
||||||
- Kubernetes Cluster: A vanilla Kubernetes cluster is recommended for optimal GPU support.
|
- Kubernetes Cluster: A standard Kubernetes (vanilla) cluster is recommended, especially for optimal GPU support.
|
||||||
- CLI Tools: `kubectl` and `helm` must be installed and configured.
|
- CLI Tools: `kubectl` and `helm` must be installed and configured.
|
||||||
- Ingress Controller: Required to expose services via HTTP/S (e.g., `nginx-ingress`). If you need external access, also
|
- Ingress Controller: Required to expose services via HTTP/S (e.g., `nginx-ingress`). For external access,
|
||||||
configure a LoadBalancer (e.g., `MetalLB`).
|
configure a LoadBalancer (e.g., `MetalLB`).
|
||||||
- Server and workers communicating on HTTP/S (ports 80 and 443). Additionally, the TCP session feature requires a range
|
- Network Ports:
|
||||||
of ports for TCP traffic based on your configuration (see [AI App Gateway installation](appgw_install_k8s.md)).
|
- HTTP/S communication (ports 80 and 443) must be available between the server and agents.
|
||||||
|
- TCP session support (for interactive apps) requires an additional port range (see the [AI App Gateway installation](appgw_install_k8s.md)).
|
||||||
- DNS Configuration: A domain with subdomain support is required, ideally with trusted TLS certificates. All entries must
|
- DNS Configuration: A domain with subdomain support is required, ideally with trusted TLS certificates. All entries must
|
||||||
be resolvable by the Ingress controller. Example subdomains:
|
be resolvable by the Ingress controller. Example subdomains:
|
||||||
- Server:
|
- Server:
|
||||||
@ -50,54 +51,50 @@ See the [ClearML Server on Kubernetes installation guide](k8s.md).
|
|||||||
|
|
||||||
### ClearML Applications
|
### ClearML Applications
|
||||||
|
|
||||||
ClearML Applications are like plugins that allow you to manage ML workloads and automatically run recurring workflows
|
ClearML Applications are plugin services that automate ML workloads
|
||||||
without any coding. Applications are installed on top of the ClearML Server and are provided by the ClearML team.
|
without any coding. Applications are installed on top of the ClearML Server and are provided by the ClearML team.
|
||||||
|
|
||||||
See the [Application Installation guide](extra_configs/apps.md).
|
See the [Application Installation guide](extra_configs/apps.md).
|
||||||
|
|
||||||
### ClearML Enterprise Agent
|
### ClearML Enterprise Agent
|
||||||
|
|
||||||
The ClearML Enterprise Agent Enables scheduling and execution of distributed workloads (Tasks) on your Kubernetes cluster.
|
The ClearML Enterprise Agent enables scheduling and execution of distributed workloads (Tasks) on your Kubernetes cluster.
|
||||||
|
|
||||||
See the [ClearML Agent installation guide](agent_k8s.md).
|
See the [ClearML Agent installation guide](agent_k8s.md).
|
||||||
|
|
||||||
### AI Application Gateway
|
### AI Application Gateway
|
||||||
|
|
||||||
The ClearML AI Application Gateway provides secure and authenticated routing of HTTPS connections from a
|
The AI App Gateway enables secure, authenticated access to interactive ClearML applications (e.g., JupyterLab, Streamlit)
|
||||||
user's browser running the ClearML WebApp to pods running interactive ClearML applications.
|
based on ClearML user permissions. It routes HTTPS traffic from users to running pods on the cluster.
|
||||||
|
|
||||||
Some ClearML applications (e.g., JupyterLab, Streamlit) may require users to access running ClearML tasks in a secure
|
|
||||||
and authenticated manner, based on ClearML user permissions. To provide access to these tasks running inside pods, an AI
|
|
||||||
App Gateway service must run on the same network as the agents and pods running the tasks.
|
|
||||||
|
|
||||||
See the [AI Application Gateway installation guide](appgw_install_k8s.md).
|
See the [AI Application Gateway installation guide](appgw_install_k8s.md).
|
||||||
|
|
||||||
## Additional Configurations
|
## Additional Configuration Options
|
||||||
|
|
||||||
### Setup GPUs
|
### GPU Operator
|
||||||
|
|
||||||
#### GPU Operator
|
Deploy the NVIDIA GPU Operator in order to use Nvidia GPUs in ClearML.
|
||||||
|
|
||||||
|
See the [GPU Operator Basic Deployment guide](extra_configs/gpu_operator.md)
|
||||||
$$$$$$$$$$$$$$$$$$$$In order to use Nvidia GPUs in ClearML.
|
|
||||||
|
|
||||||
See the [guide for deploying the NVIDIA GPU Operator alongside ClearML Enterprise](extra_configs/gpu_operator.md)
|
|
||||||
|
|
||||||
### Fractional GPU Support
|
### Fractional GPU Support
|
||||||
|
|
||||||
Enable allocating a fraction of the available GPU cores and memory for better utilization of shared GPU nodes.
|
To optimize GPU utilization:
|
||||||
|
|
||||||
$$$$$$TODO link
|
* ClearML Dynamic MIG Orchestrator (CDMO): Manage GPU fractions using NVIDIA MIGs. See the [CDMO guide](fractional_gpus/cdmo.md)
|
||||||
|
* ClearML Fractional GPU Injector (CFGI): Use fractional (non-MIG) GPU slices for efficient resource sharing. See the [CFGI guide](fractional_gpus/cfgi.md)
|
||||||
|
* Mixed Deployments: Deploy both CDMO and CFGI in clusters with diverse GPU types. Use the NVIDIA GPU Operator to handle
|
||||||
|
mixed hardware setups. See the [CDMO and CFGI guide](fractional_gpus/cdmo_cfgi_same_cluster.md).
|
||||||
|
|
||||||
### Multi-Tenant Setup
|
### Multi-Tenant Setup
|
||||||
|
|
||||||
Enable isolated tenants within the same ClearML Server, each with separate configuration, users, and project namespaces.
|
Run multiple isolated tenants on a single ClearML Server instance, each with its own configuration and user namespaces.
|
||||||
|
|
||||||
See the [multi-tenant service installation guide](multi_tenant_k8s.md).
|
See the [Multi-Tenant Service guide](multi_tenant_k8s.md).
|
||||||
|
|
||||||
### SSO (Identity Provider) Setup
|
### SSO (Identity Provider) Setup
|
||||||
|
|
||||||
Configure Single sign-on identity providers on ClearML Enterprise.
|
Integrate identity providers to enable SSO login for ClearML Enterprise users.
|
||||||
|
|
||||||
See the [SSO Setup guide](extra_configs/sso_login.md).
|
See the [SSO Setup guide](extra_configs/sso_login.md).
|
||||||
|
|
||||||
@ -109,12 +106,11 @@ See the [Custom Event guide](extra_configs/custom_events.md).
|
|||||||
|
|
||||||
### ClearML Presign Service
|
### ClearML Presign Service
|
||||||
|
|
||||||
The ClearML Presign Service is a secure component for generating and redirecting pre-signed storage URLs for
|
The ClearML Presign Service securely generates pre-signed storage URLs for authenticated users.
|
||||||
authenticated users.
|
|
||||||
|
|
||||||
See the [ClearML Presign Service guide](extra_configs/presign_service.md).
|
See the [ClearML Presign Service guide](extra_configs/presign_service.md).
|
||||||
|
|
||||||
### Install with a non-root user
|
## Install with a Non-Root User
|
||||||
|
|
||||||
In some Helm charts, you will find a values file called `values-enterprise-non-root-privileged.yaml` to be used for a
|
In some Helm charts, you will find a values file called `values-enterprise-non-root-privileged.yaml` to be used for a
|
||||||
non-root installation.
|
non-root installation.
|
||||||
|
71
sidebars.js
71
sidebars.js
@ -664,7 +664,7 @@ module.exports = {
|
|||||||
{'ClearML Application Gateway': [
|
{'ClearML Application Gateway': [
|
||||||
'deploying_clearml/enterprise_deploy/appgw_install_compose',
|
'deploying_clearml/enterprise_deploy/appgw_install_compose',
|
||||||
'deploying_clearml/enterprise_deploy/appgw_install_compose_hosted',
|
'deploying_clearml/enterprise_deploy/appgw_install_compose_hosted',
|
||||||
'deploying_clearml/enterprise_deploy/appgw_install_k8s',
|
'deploying_clearml/enterprise_deploy/appgw_install_k8s',
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
'deploying_clearml/enterprise_deploy/custom_billing',
|
'deploying_clearml/enterprise_deploy/custom_billing',
|
||||||
@ -719,5 +719,74 @@ module.exports = {
|
|||||||
},
|
},
|
||||||
],
|
],
|
||||||
},
|
},
|
||||||
|
],
|
||||||
|
enterpriseDeploy: [
|
||||||
|
{
|
||||||
|
type: 'category',
|
||||||
|
collapsible: true,
|
||||||
|
label: 'ClearML Enterprise K8s Installation and Configuration',
|
||||||
|
link: {type: 'doc', id: 'deploying_clearml/enterprise_deploy/k8s_overview'},
|
||||||
|
items: [
|
||||||
|
'deploying_clearml/enterprise_deploy/agent_k8s',
|
||||||
|
'deploying_clearml/enterprise_deploy/extra_configs/apps',
|
||||||
|
{
|
||||||
|
type: 'category',
|
||||||
|
collapsible: true,
|
||||||
|
label: 'Extra Configuration',
|
||||||
|
items: [
|
||||||
|
{
|
||||||
|
type: 'doc',
|
||||||
|
label: 'GPU Operator Basic Deployment',
|
||||||
|
id: 'deploying_clearml/enterprise_deploy/extra_configs/gpu_operator'
|
||||||
|
}, {
|
||||||
|
type: 'doc',
|
||||||
|
id: 'deploying_clearml/enterprise_deploy/extra_configs/custom_events'
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'doc',
|
||||||
|
id: 'deploying_clearml/enterprise_deploy/extra_configs/presign_service'
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'doc',
|
||||||
|
id: 'deploying_clearml/enterprise_deploy/extra_configs/dynamic_edit_task_pod_template'
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'doc',
|
||||||
|
id: 'deploying_clearml/enterprise_deploy/extra_configs/multi_node_training'
|
||||||
|
},
|
||||||
|
|
||||||
|
{
|
||||||
|
type: 'doc',
|
||||||
|
label: 'K8s Deployment with Self-Signed Certificates',
|
||||||
|
id: 'deploying_clearml/enterprise_deploy/extra_configs/self_signed_certificates'
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'doc',
|
||||||
|
id: 'deploying_clearml/enterprise_deploy/extra_configs/sso_login'
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'category',
|
||||||
|
collapsible: true,
|
||||||
|
label: 'Fractional GPUs',
|
||||||
|
items: [
|
||||||
|
{
|
||||||
|
type: 'doc',
|
||||||
|
label: 'ClearML Dynamic MIG Operator (CDMO)',
|
||||||
|
id: 'deploying_clearml/enterprise_deploy/fractional_gpus/cdmo'
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'doc',
|
||||||
|
id: 'deploying_clearml/enterprise_deploy/fractional_gpus/cfgi'
|
||||||
|
},
|
||||||
|
{
|
||||||
|
type: 'doc',
|
||||||
|
id: 'deploying_clearml/enterprise_deploy/fractional_gpus/cdmo_cfgi_same_cluster'
|
||||||
|
},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
]
|
||||||
|
}
|
||||||
]
|
]
|
||||||
};
|
};
|
||||||
|
Loading…
Reference in New Issue
Block a user