Add ClearML Enterprise Server on k8s overview and ToC

2025-06-26 18:17:44 +00:00 · 2025-06-10 09:27:24 +03:00 · 2025-06-10 09:27:24 +03:00 · 8e1c790532
commit 8e1c790532
parent 9fdc893582
2 changed files with 97 additions and 32 deletions
--- a/docs/deploying_clearml/enterprise_deploy/k8s_overview.md
+++ b/docs/deploying_clearml/enterprise_deploy/k8s_overview.md
@ -1,22 +1,23 @@
 ---
-title: Complete ClearML Enterprise K8s Installation and Configuration
+title: ClearML Enterprise K8s Installation and Configuration
 ---
-This guides walks you through installing and configuring ClearML Enterprise on Kubernetes, from basic installation 
+This guides walks you through the complete installation and configuration of ClearML Enterprise Server on Kubernetes
-to advanced configuration options.
+from initial setup to advanced configuration options.
 Follow the steps in the order presented for a smooth setup process.
 ## Prerequisites
-Before installing ClearML Enterprise, verify that the following components are in place:
+Before you begin, ensure the following requirements are met:
- Kubernetes Cluster: A vanilla Kubernetes cluster is recommended for optimal GPU support.
+- Kubernetes Cluster: A standard Kubernetes (vanilla) cluster is recommended, especially for optimal GPU support.
 - CLI Tools: `kubectl` and `helm` must be installed and configured.
- Ingress Controller:  Required to expose services via HTTP/S (e.g., `nginx-ingress`). If you need external access, also 
+- Ingress Controller:  Required to expose services via HTTP/S (e.g., `nginx-ingress`). For external access,
  configure a LoadBalancer (e.g., `MetalLB`).
- Server and workers communicating on HTTP/S (ports 80 and 443). Additionally, the TCP session feature requires a range 
+- Network Ports:
-  of ports for TCP traffic based on your configuration (see [AI App Gateway installation](appgw_install_k8s.md)).
+  - HTTP/S communication (ports 80 and 443) must be available between the server and agents.
   - TCP session support (for interactive apps) requires an additional port range (see the [AI App Gateway installation](appgw_install_k8s.md)).
 - DNS Configuration: A domain with subdomain support is required, ideally with trusted TLS certificates. All entries must 
  be resolvable by the Ingress controller. Example subdomains:
  - Server:
@ -50,54 +51,50 @@ See the [ClearML Server on Kubernetes installation guide](k8s.md).
 ### ClearML Applications
-ClearML Applications are like plugins that allow you to manage ML workloads and automatically run recurring workflows 
+ClearML Applications are plugin services that automate ML workloads 
 without any coding. Applications are installed on top of the ClearML Server and are provided by the ClearML team.
 See the [Application Installation guide](extra_configs/apps.md).
 ### ClearML Enterprise Agent
-The ClearML Enterprise Agent Enables scheduling and execution of distributed workloads (Tasks) on your Kubernetes cluster.
+The ClearML Enterprise Agent enables scheduling and execution of distributed workloads (Tasks) on your Kubernetes cluster.
 See the [ClearML Agent installation guide](agent_k8s.md).
 ### AI Application Gateway
-The ClearML AI Application Gateway provides secure and authenticated routing of HTTPS connections from a 
+The AI App Gateway enables secure, authenticated access to interactive ClearML applications (e.g., JupyterLab, Streamlit) 
-user's browser running the ClearML WebApp to pods running interactive ClearML applications.
+based on ClearML user permissions. It routes HTTPS traffic from users to running pods on the cluster.
 Some ClearML applications (e.g., JupyterLab, Streamlit)  may require users to access running ClearML tasks in a secure 
 and authenticated manner, based on ClearML user permissions. To provide access to these tasks running inside pods, an AI 
 App Gateway service must run on the same network as the agents and pods running the tasks.
 See the [AI Application Gateway installation guide](appgw_install_k8s.md).
-## Additional Configurations
+## Additional Configuration Options
-### Setup GPUs
+### GPU Operator
-#### GPU Operator
+Deploy the NVIDIA GPU Operator in order to use Nvidia GPUs in ClearML.
-
+See the [GPU Operator Basic Deployment guide](extra_configs/gpu_operator.md)
 $$$$$$$$$$$$$$$$$$$$In order to use Nvidia GPUs in ClearML.
 See the [guide for deploying the NVIDIA GPU Operator alongside ClearML Enterprise](extra_configs/gpu_operator.md)
 ### Fractional GPU Support
-Enable allocating a fraction of the available GPU cores and memory for better utilization of shared GPU nodes.
+To optimize GPU utilization:
-$$$$$$TODO link
+* ClearML Dynamic MIG Orchestrator (CDMO): Manage GPU fractions using NVIDIA MIGs. See the [CDMO guide](fractional_gpus/cdmo.md)
 * ClearML Fractional GPU Injector (CFGI): Use fractional (non-MIG) GPU slices for efficient resource sharing. See the [CFGI guide](fractional_gpus/cfgi.md)
 * Mixed Deployments: Deploy both CDMO and CFGI in clusters with diverse GPU types. Use the NVIDIA GPU Operator to handle 
  mixed hardware setups. See the [CDMO and CFGI guide](fractional_gpus/cdmo_cfgi_same_cluster.md).
 ### Multi-Tenant Setup
-Enable isolated tenants within the same ClearML Server, each with separate configuration, users, and project namespaces.
+Run multiple isolated tenants on a single ClearML Server instance, each with its own configuration and user namespaces.
-See the [multi-tenant service installation guide](multi_tenant_k8s.md).
+See the [Multi-Tenant Service guide](multi_tenant_k8s.md).
 ### SSO (Identity Provider) Setup
-Configure Single sign-on identity providers on ClearML Enterprise.
+Integrate identity providers to enable SSO login for ClearML Enterprise users.
 See the [SSO Setup guide](extra_configs/sso_login.md).
@ -109,12 +106,11 @@ See the [Custom Event guide](extra_configs/custom_events.md).
 ### ClearML Presign Service
-The ClearML Presign Service is a secure component for generating and redirecting pre-signed storage URLs for
+The ClearML Presign Service securely generates pre-signed storage URLs for authenticated users.
 authenticated users.
 See the [ClearML Presign Service guide](extra_configs/presign_service.md).
-### Install with a non-root user
+## Install with a Non-Root User
 In some Helm charts, you will find a values file called `values-enterprise-non-root-privileged.yaml` to be used for a 
 non-root installation.
--- a/sidebars.js
+++ b/sidebars.js
@ -664,7 +664,7 @@ module.exports = {
                {'ClearML Application Gateway': [
                    'deploying_clearml/enterprise_deploy/appgw_install_compose',
                    'deploying_clearml/enterprise_deploy/appgw_install_compose_hosted',
-                'deploying_clearml/enterprise_deploy/appgw_install_k8s',
+                    'deploying_clearml/enterprise_deploy/appgw_install_k8s',
                    ]
                },
                'deploying_clearml/enterprise_deploy/custom_billing',
@ -719,5 +719,74 @@ module.exports = {
                },
            ],
        },
    ],
    enterpriseDeploy: [
       {
          type: 'category',
          collapsible: true,
          label: 'ClearML Enterprise K8s Installation and Configuration',
          link: {type: 'doc', id: 'deploying_clearml/enterprise_deploy/k8s_overview'},
          items: [
            'deploying_clearml/enterprise_deploy/agent_k8s',
            'deploying_clearml/enterprise_deploy/extra_configs/apps',
            {
                type: 'category',
                collapsible: true,
                label: 'Extra Configuration',
                items: [
                    {
                        type: 'doc',
                        label: 'GPU Operator Basic Deployment',
                        id: 'deploying_clearml/enterprise_deploy/extra_configs/gpu_operator'
                    },                    {
                        type: 'doc',
                        id: 'deploying_clearml/enterprise_deploy/extra_configs/custom_events'
                    },
                    {
                        type: 'doc',
                        id: 'deploying_clearml/enterprise_deploy/extra_configs/presign_service'
                    },
                    {
                        type: 'doc',
                        id: 'deploying_clearml/enterprise_deploy/extra_configs/dynamic_edit_task_pod_template'
                    },
                    {
                        type: 'doc',
                        id: 'deploying_clearml/enterprise_deploy/extra_configs/multi_node_training'
                    },
                    {
                        type: 'doc',
                        label: 'K8s Deployment with Self-Signed Certificates',
                        id: 'deploying_clearml/enterprise_deploy/extra_configs/self_signed_certificates'
                    },
                    {
                        type: 'doc',
                        id: 'deploying_clearml/enterprise_deploy/extra_configs/sso_login'
                    },
                ],
            },
            {
                type: 'category',
                collapsible: true,
                label: 'Fractional GPUs',
                items: [
                    {
                        type: 'doc',
                        label: 'ClearML Dynamic MIG Operator (CDMO)',
                        id: 'deploying_clearml/enterprise_deploy/fractional_gpus/cdmo'
                    },
                    {
                        type: 'doc',
                        id: 'deploying_clearml/enterprise_deploy/fractional_gpus/cfgi'
                    },
                    {
                        type: 'doc',
                        id: 'deploying_clearml/enterprise_deploy/fractional_gpus/cdmo_cfgi_same_cluster'
                    },
                ],
            },
          ]
       }
    ]
 };