Add ClearML Enterprise Server on k8s overview and ToC

2025-06-26 18:17:44 +00:00 · 2025-06-10 09:27:24 +03:00 · 2025-06-10 09:27:24 +03:00 · 8e1c790532
commit 8e1c790532
parent 9fdc893582
2 changed files with 97 additions and 32 deletions
--- a/docs/deploying_clearml/enterprise_deploy/k8s_overview.md
+++ b/docs/deploying_clearml/enterprise_deploy/k8s_overview.md
@ -1,22 +1,23 @@
 ---
-title: Complete ClearML Enterprise K8s Installation and Configuration
+title: ClearML Enterprise K8s Installation and Configuration
 ---

-This guides walks you through installing and configuring ClearML Enterprise on Kubernetes, from basic installation 
-to advanced configuration options.
+This guides walks you through the complete installation and configuration of ClearML Enterprise Server on Kubernetes
+from initial setup to advanced configuration options.

 Follow the steps in the order presented for a smooth setup process.

 ## Prerequisites

-Before installing ClearML Enterprise, verify that the following components are in place:
+Before you begin, ensure the following requirements are met:

- Kubernetes Cluster: A vanilla Kubernetes cluster is recommended for optimal GPU support.
+- Kubernetes Cluster: A standard Kubernetes (vanilla) cluster is recommended, especially for optimal GPU support.
 - CLI Tools: `kubectl` and `helm` must be installed and configured.
- Ingress Controller:  Required to expose services via HTTP/S (e.g., `nginx-ingress`). If you need external access, also 
+- Ingress Controller:  Required to expose services via HTTP/S (e.g., `nginx-ingress`). For external access,
  configure a LoadBalancer (e.g., `MetalLB`).
- Server and workers communicating on HTTP/S (ports 80 and 443). Additionally, the TCP session feature requires a range 
-  of ports for TCP traffic based on your configuration (see [AI App Gateway installation](appgw_install_k8s.md)).
+- Network Ports:
+  - HTTP/S communication (ports 80 and 443) must be available between the server and agents.
+   - TCP session support (for interactive apps) requires an additional port range (see the [AI App Gateway installation](appgw_install_k8s.md)).
 - DNS Configuration: A domain with subdomain support is required, ideally with trusted TLS certificates. All entries must 
  be resolvable by the Ingress controller. Example subdomains:
  - Server:
@ -50,54 +51,50 @@ See the [ClearML Server on Kubernetes installation guide](k8s.md).

 ### ClearML Applications

-ClearML Applications are like plugins that allow you to manage ML workloads and automatically run recurring workflows 
+ClearML Applications are plugin services that automate ML workloads 
 without any coding. Applications are installed on top of the ClearML Server and are provided by the ClearML team.

 See the [Application Installation guide](extra_configs/apps.md).

 ### ClearML Enterprise Agent

-The ClearML Enterprise Agent Enables scheduling and execution of distributed workloads (Tasks) on your Kubernetes cluster.
+The ClearML Enterprise Agent enables scheduling and execution of distributed workloads (Tasks) on your Kubernetes cluster.

 See the [ClearML Agent installation guide](agent_k8s.md).

 ### AI Application Gateway

-The ClearML AI Application Gateway provides secure and authenticated routing of HTTPS connections from a 
-user's browser running the ClearML WebApp to pods running interactive ClearML applications.
-
-Some ClearML applications (e.g., JupyterLab, Streamlit)  may require users to access running ClearML tasks in a secure 
-and authenticated manner, based on ClearML user permissions. To provide access to these tasks running inside pods, an AI 
-App Gateway service must run on the same network as the agents and pods running the tasks.
+The AI App Gateway enables secure, authenticated access to interactive ClearML applications (e.g., JupyterLab, Streamlit) 
+based on ClearML user permissions. It routes HTTPS traffic from users to running pods on the cluster.

 See the [AI Application Gateway installation guide](appgw_install_k8s.md).

-## Additional Configurations
+## Additional Configuration Options

-### Setup GPUs
+### GPU Operator

-#### GPU Operator
+Deploy the NVIDIA GPU Operator in order to use Nvidia GPUs in ClearML.

-
-$$$$$$$$$$$$$$$$$$$$In order to use Nvidia GPUs in ClearML.
-
-See the [guide for deploying the NVIDIA GPU Operator alongside ClearML Enterprise](extra_configs/gpu_operator.md)
+See the [GPU Operator Basic Deployment guide](extra_configs/gpu_operator.md)

 ### Fractional GPU Support

-Enable allocating a fraction of the available GPU cores and memory for better utilization of shared GPU nodes.
+To optimize GPU utilization:

-$$$$$$TODO link
+* ClearML Dynamic MIG Orchestrator (CDMO): Manage GPU fractions using NVIDIA MIGs. See the [CDMO guide](fractional_gpus/cdmo.md)
+* ClearML Fractional GPU Injector (CFGI): Use fractional (non-MIG) GPU slices for efficient resource sharing. See the [CFGI guide](fractional_gpus/cfgi.md)
+* Mixed Deployments: Deploy both CDMO and CFGI in clusters with diverse GPU types. Use the NVIDIA GPU Operator to handle 
+  mixed hardware setups. See the [CDMO and CFGI guide](fractional_gpus/cdmo_cfgi_same_cluster.md).

 ### Multi-Tenant Setup

-Enable isolated tenants within the same ClearML Server, each with separate configuration, users, and project namespaces.
+Run multiple isolated tenants on a single ClearML Server instance, each with its own configuration and user namespaces.

-See the [multi-tenant service installation guide](multi_tenant_k8s.md).
+See the [Multi-Tenant Service guide](multi_tenant_k8s.md).

 ### SSO (Identity Provider) Setup

-Configure Single sign-on identity providers on ClearML Enterprise.
+Integrate identity providers to enable SSO login for ClearML Enterprise users.

 See the [SSO Setup guide](extra_configs/sso_login.md).

@ -109,12 +106,11 @@ See the [Custom Event guide](extra_configs/custom_events.md).

 ### ClearML Presign Service

-The ClearML Presign Service is a secure component for generating and redirecting pre-signed storage URLs for
-authenticated users.
+The ClearML Presign Service securely generates pre-signed storage URLs for authenticated users.

 See the [ClearML Presign Service guide](extra_configs/presign_service.md).

-### Install with a non-root user
+## Install with a Non-Root User

 In some Helm charts, you will find a values file called `values-enterprise-non-root-privileged.yaml` to be used for a 
 non-root installation.
--- a/sidebars.js
+++ b/sidebars.js
@ -664,7 +664,7 @@ module.exports = {
                {'ClearML Application Gateway': [
                    'deploying_clearml/enterprise_deploy/appgw_install_compose',
                    'deploying_clearml/enterprise_deploy/appgw_install_compose_hosted',
-                'deploying_clearml/enterprise_deploy/appgw_install_k8s',
+                    'deploying_clearml/enterprise_deploy/appgw_install_k8s',
                    ]
                },
                'deploying_clearml/enterprise_deploy/custom_billing',
@ -719,5 +719,74 @@ module.exports = {
                },
            ],
        },
+    ],
+    enterpriseDeploy: [
+       {
+          type: 'category',
+          collapsible: true,
+          label: 'ClearML Enterprise K8s Installation and Configuration',
+          link: {type: 'doc', id: 'deploying_clearml/enterprise_deploy/k8s_overview'},
+          items: [
+            'deploying_clearml/enterprise_deploy/agent_k8s',
+            'deploying_clearml/enterprise_deploy/extra_configs/apps',
+            {
+                type: 'category',
+                collapsible: true,
+                label: 'Extra Configuration',
+                items: [
+                    {
+                        type: 'doc',
+                        label: 'GPU Operator Basic Deployment',
+                        id: 'deploying_clearml/enterprise_deploy/extra_configs/gpu_operator'
+                    },                    {
+                        type: 'doc',
+                        id: 'deploying_clearml/enterprise_deploy/extra_configs/custom_events'
+                    },
+                    {
+                        type: 'doc',
+                        id: 'deploying_clearml/enterprise_deploy/extra_configs/presign_service'
+                    },
+                    {
+                        type: 'doc',
+                        id: 'deploying_clearml/enterprise_deploy/extra_configs/dynamic_edit_task_pod_template'
+                    },
+                    {
+                        type: 'doc',
+                        id: 'deploying_clearml/enterprise_deploy/extra_configs/multi_node_training'
+                    },
+
+                    {
+                        type: 'doc',
+                        label: 'K8s Deployment with Self-Signed Certificates',
+                        id: 'deploying_clearml/enterprise_deploy/extra_configs/self_signed_certificates'
+                    },
+                    {
+                        type: 'doc',
+                        id: 'deploying_clearml/enterprise_deploy/extra_configs/sso_login'
+                    },
+                ],
+            },
+            {
+                type: 'category',
+                collapsible: true,
+                label: 'Fractional GPUs',
+                items: [
+                    {
+                        type: 'doc',
+                        label: 'ClearML Dynamic MIG Operator (CDMO)',
+                        id: 'deploying_clearml/enterprise_deploy/fractional_gpus/cdmo'
+                    },
+                    {
+                        type: 'doc',
+                        id: 'deploying_clearml/enterprise_deploy/fractional_gpus/cfgi'
+                    },
+                    {
+                        type: 'doc',
+                        id: 'deploying_clearml/enterprise_deploy/fractional_gpus/cdmo_cfgi_same_cluster'
+                    },
+                ],
+            },
+          ]
+       }
    ]
 };