Edits

2025-06-26 18:17:44 +00:00 · 2025-05-15 14:55:35 +03:00 · 2025-05-15 14:55:35 +03:00 · aaa3851de3
commit aaa3851de3
parent d94d777e55
10 changed files with 139 additions and 147 deletions
--- a/docs/deploying_clearml/enterprise_deploy/agent_k8s.md
+++ b/docs/deploying_clearml/enterprise_deploy/agent_k8s.md
@ -6,8 +6,8 @@ The ClearML Agent enables scheduling and executing distributed experiments on a

 ## Prerequisites

- A ClearML Enterprise server is up and running.
- Generate a set of `<ACCESS_KEY>` and `<SECRET_KEY>` credentials in the ClearML Server. The easiest way is via 
+- A running [ClearML Enterprise Server](k8s.md)
+- API credentials (`<ACCESS_KEY>` and `<SECRET_KEY>`) generated via 
  the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). For more information, see [ClearML API Credentials](../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials). 

  :::note
@ -15,7 +15,7 @@ The ClearML Agent enables scheduling and executing distributed experiments on a
  :::
 
 - The worker environment must be able to access the ClearML Server over the same network.
- * Helm token to access `clearml-enterprise` helm-chart repo
+- Helm token to access `clearml-enterprise` Helm chart repo

 ## Installation

@ -36,9 +36,9 @@ helm repo update
 Create a `clearml-agent-values.override.yaml` file with the following content:

 :::note
-Replace the `<ACCESS_KEY>` and `<SECRET_KEY>` with the admin credentials 
-you created earlier. Set `<api|file|web>ServerUrlReference` to the relevant URLs of your ClearML 
-Server installation.
+Replace the `<ACCESS_KEY>` and `<SECRET_KEY>`with the API credentials you generated earlier. 
+Set the `<api|file|web>ServerUrlReference` fields to match your ClearML 
+Server URLs.
 :::

 ```yaml
@ -60,7 +60,7 @@ agentk8sglue:

 ### Install the Chart

-Install the ClearML Enterprise Agent Helm chart using the previous values override file:
+Install the ClearML Enterprise Agent Helm chart:

 ```bash
 helm upgrade -i -n <WORKER_NAMESPACE> clearml-agent clearml-enterprise/clearml-enterprise-agent --create-namespace -f clearml-agent-values.override.yaml
@ -68,7 +68,7 @@ helm upgrade -i -n <WORKER_NAMESPACE> clearml-agent clearml-enterprise/clearml-e

 ## Additional Configuration Options

-To view all configurable options for the Helm chart, run the following command:
+To view available configuration options for the Helm chart, run the following command:

 ```bash
 helm show readme clearml-enterprise/clearml-enterprise-agent
@ -76,7 +76,7 @@ helm show readme clearml-enterprise/clearml-enterprise-agent
 helm show values clearml-enterprise/clearml-enterprise-agent
 ```

-### Set GPU Availability in Orchestration Dashboard
+### Reporting GPU Availability to Orchestration Dashboard

 To show GPU availability in the [Orchestration Dashboard](../../webapp/webapp_orchestration_dash.md), explicitly set the number of GPUs:

@ -88,25 +88,22 @@ agentk8sglue:

 ### Queues

-The ClearML Agent monitors ClearML queues and pulls tasks that are scheduled for execution.
+The ClearML Agent monitors [ClearML queues](../../fundamentals/agents_and_queues.md) and pulls tasks that are
+scheduled for execution.

-A single agent can monitor multiple queues. By default, the queues share a base pod template (`agentk8sglue.basePodTemplate`) 
-used when submitting a task to Kubernetes after it has been extracted from the queue.
+A single agent can monitor multiple queues. By default, all queues share a base pod template (`agentk8sglue.basePodTemplate`) 
+used when launching tasks on Kubernetes after it has been pulled from the queue.

-Each queue can be configured with dedicated Pod template spec override (`templateOverrides`). This way queue definitions 
-can be tailored to different use cases.
+Each queue can be configured to override the base pod template with its own settings with a `templateOverrides` queue template. 
+This way queue definitions can be tailored to different use cases.

 The following are a few examples of agent queue templates:

 #### Example: GPU Queues

+To support GPU queues, you must deploy the NVIDIA GPU Operator on your Kubernetes cluster. For more information, see [GPU Operator](extra_configs/gpu_operator.md).

-GPU queue support requires deploying the NVIDIA GPU Operator on your Kubernetes cluster.
-
-For more information, see [GPU Operator](extra_configs/gpu_operator.md).
-
-
-``` yaml
+```yaml
 agentk8sglue:
  createQueues: true
  queues:
@ -122,8 +119,9 @@ agentk8sglue:
            nvidia.com/gpu: 2
 ```

-#### Example: Overriding Pod Templates per Queue
+#### Example: Custom Pod Template per Queue

+This example demonstrates how to override the base pod template definitions on a per-queue basis.
 In this example:

 - The `red` queue inherits both the label `team=red` and the 1Gi memory limit from the `basePodTemplate` section.
@ -167,5 +165,5 @@ agentk8sglue:

 ## Next Steps

-Once the agent is up and running, proceed with deploying the[ ClearML Enterprise App Gateway](appgw_install_k8s.md).
+Once the agent is up and running, proceed with deploying the [ClearML Enterprise App Gateway](appgw_install_k8s.md).

--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/dynamic_edit_task_pod_template.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/dynamic_edit_task_pod_template.md
@ -2,18 +2,15 @@
 title: Dynamically Edit Task Pod Template
 ---

-The ClearML Enterprise Agent supports defining custom Python code  to modify a task's Pod template before it is applied 
-to Kubernetes.
+ClearML Agent allows you to inject custom Python code to dynamically modify the Kubernetes Pod template before applying it. 

-This enables dynamic customization of Task Pod manifests in the context of a ClearML Enterprise Agent, which is useful 
-for injecting values or changing configurations based on runtime context.

 ## Agent Configuration

 The `CLEARML_K8S_GLUE_TEMPLATE_MODULE` environment variable defines the Python module and function inside that 
-module that the ClearML Enterprise Agent should invoke before applying a Task Pod template. 
+module to be invoked by the agent before applying a task pod template. 

-The Agent will run this code in its own context, pass arguments (including the actual template) to the function, and use 
+The agent will run this code in its own context, pass arguments (including the actual template) to the function, and use 
 the returned template to create the final Task Pod in Kubernetes.

 Arguments passed to the function include:
@ -60,13 +57,13 @@ agentk8sglue:
 ```

 :::note notes
-* Make sure to include `*args, **kwargs` at the end of the function's argument list and to only use keyword arguments. 
+* Always include `*args, **kwargs` at the end of the function's argument list and only use keyword arguments. 
  This is needed to maintain backward compatibility.

 * Custom code modules can be included as a file in the pod's container, and the environment variable can be used to
  point to the file and entry point.

-* When defining a custom code module, by default the Agent will start watching pods in all namespaces 
+* When defining a custom code module, by default the agent will start watching pods in all namespaces 
  across the cluster. If you do not intend to give a `ClusterRole` permission, make sure to set the 
  `CLEARML_K8S_GLUE_MONITOR_ALL_NAMESPACES` env to `"0"` to prevent the Agent to try listing pods in all namespaces. 
  Instead, set it to `"1"` if namespace-related changes are needed in the code.
@ -80,13 +77,13 @@ agentk8sglue:

  To customize the bash startup scripts instead of the pod spec, use:

-```yaml
-agentk8sglue:
+  ```yaml
+  agentk8sglue:
    # -- Custom Bash script for the Agent pod ran by Glue Agent
    customBashScript: ""
    # -- Custom Bash script for the Task Pods ran by Glue Agent
    containerCustomBashScript: ""
-```
+  ```

 ## Examples

@ -167,7 +164,7 @@ agentk8sglue:

 ### Example: Bind PVC Resource to Task Pod

-In this example, a PVC is created and attached to every Pod created from a dedicated queue, then deleted afterwards.
+In this example, a PVC is created and attached to every pod created from a dedicated queue, then it is deleted.

 Key points:

--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/gpu_operator.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/gpu_operator.md
@ -2,10 +2,12 @@
 title: Basic Deployment - Suggested GPU Operator Values
 ---

+This guide provides recommended configuration values for deploying the NVIDIA GPU Operator alongside ClearML Enterprise. 

 ## Add the Helm Repo Locally

 Add the NVIDIA GPU Operator Helm repository:
+
 ```bash
 helm repo add nvidia https://nvidia.github.io/gpu-operator
 ```
@ -17,10 +19,8 @@ helm repo update

 ## Installation

-To prevent unprivileged containers from bypassing the Kubernetes Device Plugin API, configure the GPU operator with the 
-following override values.
-
-Create a `gpu-operator.override.yaml` file with the following content:
+To prevent unprivileged containers from bypassing the Kubernetes Device Plugin API, configure the GPU operator 
+using the following `gpu-operator.override.yaml` file:

 ```yaml
 toolkit:
@ -53,7 +53,7 @@ helm install -n gpu-operator gpu-operator nvidia/gpu-operator --create-namespace

 ## Fractional GPU Support

-For support with fractional GPUs, refer to the dedicated guides:
+To enable fractional GPU allocation or manage mixed GPU configurations, refer to the following guides:
 * [ClearML Dynamic MIG Operator](../fractional_gpus/cdmo.md) (CDMO) – Dynamically configures MIG GPUs on supported devices.
 * [ClearML Enterprise Fractional GPU Injector](../fractional_gpus/cfgi.md) (CFGI) – Enables fractional (non-MIG) GPU 
  allocation for better hardware utilization and workload distribution in Kubernetes.
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/multi_node_training.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/multi_node_training.md
@ -2,7 +2,8 @@
 title: Multi-Node Training
 --- 

-The ClearML Enterprise Agent supports horizontal multi-node training--running a single Task across multiple Pods on different nodes.
+The ClearML Enterprise Agent supports horizontal multi-node training, allowing a single Task to run across multiple pods 
+on different nodes.

 Below is a configuration example using `clearml-agent-values.override.yaml`:

--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/presign_service.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/presign_service.md
@ -7,16 +7,16 @@ users, enabling direct access to cloud-hosted data (e.g., S3) without exposing c

 ## Prerequisites

- A ClearML Enterprise server is up and running.
- Generate `<ACCESS_KEY>` and `<SECRET_KEY>` credentials in the ClearML Server. The easiest way is via the ClearML UI 
-  (**Settings > Workspace > App Credentials > Create new credentials**).
+- A ClearML Enterprise Server is up and running.
+- API credentials (`<ACCESS_KEY>` and `<SECRET_KEY>`) generated via 
+  the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). For more information, see [ClearML API Credentials](../../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials).

  :::note
  Make sure these credentials belong to an admin user or a service user with admin privileges.
  :::
 
 - The worker environment must be able to access the ClearML Server over the same network.
-
+- Token to access `clearml-enterprise` Helm chart repo

 ## Installation

@ -50,7 +50,7 @@ ingress:

 ### Deploy the Helm Chart

-Install the `clearml-presign-service` helm chart in the same namespace as the ClearML Enterprise server:
+Install the `clearml-presign-service` Helm chart in the same namespace as the ClearML Enterprise server:

 ```bash
 helm install -n clearml clearml-presign-service clearml-enterprise/clearml-presign-service -f presign-service.override.yaml
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/self_signed_certificates.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/self_signed_certificates.md
@ -2,10 +2,8 @@
 title: ClearML Tenant with Self Signed Certificates
 ---

-This guide covers the configuration to support SSL Custom certificates for the following components:
-
- ClearML Enterprise AI Application Gateway 
- ClearML Enterprise Agent
+This guide covers how to configure the [AI Application Gateway](#ai-application-gateway) and [ClearML Agent](#clearml-agent) 
+to use self-signed or custom SSL certificates. 

 ## AI Application Gateway

@ -25,15 +23,15 @@ customCertificates:
         -----END CERTIFICATE-----
 ```

-In this section, there are two options:
+You have two configuration options:

- [**Replace**](#replace-the-whole-ca-certificatescrt-file) the entire `ca-certificates.crt` file
+- [**Replace**](#replace-entire-ca-certificatescrt-file) the entire `ca-certificates.crt` file
 - [**Append**](#append-extra-certificates-to-the-existing-ca-certificatescrt) extra certificates to the existing `ca-certificates.crt`


 ### Replace Entire `ca-certificates.crt` File

-To replace the whole ca-bundle, you can attach a concatenation of all your valid CA in a `pem` format as 
+To replace the whole ca-bundle, provide a concatenated list of all trusted CA certificates in `pem` format as 
 they are stored in a standard `ca-certificates.crt`.

 ```yaml
@ -55,7 +53,7 @@ customCertificates:

 ### Append Extra Certificates to the Existing `ca-certificates.crt`

-You can add certificates to the existing CA bundle. Ensure each certificate has a unique `alias`.
+You can add certificates to the existing CA bundle. Each certificate must have a unique `alias`.

 ```yaml
 # -- Custom certificates
@ -82,9 +80,9 @@ To apply the changes, run the update command:
 helm upgrade -i <RELEASE_NAME> -n <WORKLOAD_NAMESPACE> clearml-enterprise/clearml-enterprise-app-gateway --version <CHART_VERSION> -f clearml-app-gateway-values.override.yaml
 ```

-## ClearML Enterprise Agent
+## ClearML Agent

-For the Agent, configure certificates in the `clearml-agent-values.override.yaml` file:
+For the ClearML Agent, configure certificates in the `clearml-agent-values.override.yaml` file:

 ```yaml
 # -- Custom certificates
@ -100,17 +98,18 @@ customCertificates:
         -----END CERTIFICATE-----
 ```

-In the section, there are two options:
+You have two configuration options:

- [**Replace**](#replace-the-whole-ca-certificatescrt-file) the entire `ca-certificates.crt` file
- [**Append**](#append-extra-certificates-to-the-existing-ca-certificatescrt) extra certificates to the existing `ca-certificates.crt`
+- [**Replace**](#replace-entire-ca-certificatescrt-file-1) the entire `ca-certificates.crt` file
+- [**Append**](#append-extra-certificates-to-the-existing-ca-certificatescrt-1) extra certificates to the existing `ca-certificates.crt`


 ### Replace Entire `ca-certificates.crt` File

-If you need to replace the whole ca-bundle you can attach a concatenation of all your valid CA in a `pem` format like 
+To replace the whole ca-bundle, provide a concatenated list of all trusted CA certificates in `pem` format as 
 they are stored in a standard `ca-certificates.crt`.

+
 ```yaml
 # -- Custom certificates
 customCertificates:
@ -130,7 +129,7 @@ customCertificates:

 ### Append Extra Certificates to the Existing `ca-certificates.crt`

-You can add certificates to the existing CA bundle. Ensure each certificate has a unique `alias`.
+You can add certificates to the existing CA bundle. Each certificate must have a unique `alias`.

 ```yaml
 # -- Custom certificates
@ -151,7 +150,7 @@ customCertificates:

 ### Add Certificates to Task Pods

-If your workloads need access to these certificates (e.g., for HTTPS requests), configure the agent to inject them into Pods:
+If your workloads need access to these certificates (e.g., for HTTPS requests), configure the agent to inject them into pods:

 ```yaml
 agentk8sglue:
@ -195,7 +194,7 @@ Their names are usually prefixed with the Helm release name, so adjust according

 ### Apply Changes

-Applying the changes by running the the update command:
+Apply the changes by running the update command:

 ``` bash
 helm upgrade -i -n <WORKER_NAMESPACE> clearml-agent clearml-enterprise/clearml-enterprise-agent --create-namespace -f clearml-agent-values.override.yaml
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/sso_login.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/sso_login.md
@ -3,7 +3,8 @@ title: SSO (Identity Provider) Setup
 ---

 ClearML Enterprise Server supports various Single Sign-On (SSO) identity providers.
-SSO configuration is managed via environment variables in your `clearml-values.override.yaml` file and applied to the `apiserver` component.
+SSO configuration is managed via environment variables in your `clearml-values.override.yaml` file and is applied to the 
+`apiserver` component.

 The following are configuration examples for commonly used providers. Other supported systems include: 
 * Auth0
@ -11,7 +12,7 @@ The following are configuration examples for commonly used providers. Other supp
 * Okta
 * Azure AD
 * Google
-* and AWS Cognito
+* AWS Cognito

 ## Auth0

@ -56,17 +57,17 @@ apiserver:
 To map Keycloak groups into the ClearML user's SSO token:

 1. Go to the **Client Scopes** tab.
-1. Click on the first row `<clearml client>-dedicated`.
-1. Click **Add Mapper > By configuration > Group membership** 
-1. In the dialog:
-   * select the **Name** "groups" 
+1. Click on the `<clearml client>-dedicated` scope.
+1. Click **Add Mapper > By Configuration > Group Membership** 
+1. Configure the mapper:
+   * Select the **Name** "groups" 
   * Set **Token Claim Name** "groups"
   * Uncheck the **Full group path**
   * Save the mapper.

 To verify:

-1. Return to **Client Details > Client scope** tab.
-1. Go to the Evaluate sub-tab and select a user who has any group memberships.
-1. Go to **Generated ID token** and then to **Generated User Info**. 
-1Inspect that in both cases you can see the group's claim in the displayed user data.
+1. Go to the **Client Details > Client scope** tab.
+1. Go to the **Evaluate** sub-tab and select a user with any group memberships.
+1. Go to **Generated ID Token** and then to **Generated User Info**. 
+1. Inspect that in both cases you can see the group's claim in the displayed user data.
--- a/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cdmo.md
+++ b/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cdmo.md
@ -2,14 +2,12 @@
 title: ClearML Dynamic MIG Operator (CDMO)
 ---

-The  ClearML Dynamic MIG Operator (CDMO) enables dynamic MIG GPU configurations.
+The  ClearML Dynamic MIG Operator (CDMO) enables dynamic MIG (Multi-Instance GPU) configurations.

 ## Installation

 ### Requirements

-* Install the official NVIDIA `gpu-operator` using Helm with one of the following configurations.
-
 * Add and update the Nvidia Helm repo:

  ```bash
@ -46,7 +44,7 @@ The  ClearML Dynamic MIG Operator (CDMO) enables dynamic MIG GPU configurations.
        value: all
  ```

-* Install the official NVIDIA `gpu-operator` using Helm with the previous configuration:
+* Install the NVIDIA `gpu-operator` using Helm with the previous configuration:

  ```bash
  helm install -n gpu-operator gpu-operator nvidia/gpu-operator --create-namespace -f gpu-operator.override.yaml
@ -54,33 +52,33 @@ The  ClearML Dynamic MIG Operator (CDMO) enables dynamic MIG GPU configurations.

 ### Installing CDMO 

-* Create a `cdmo-values.override.yaml` file with the following content:
+1. Create a `cdmo-values.override.yaml` file with the following content:

  ```yaml
  imageCredentials:
    password: "<CLEARML_DOCKERHUB_TOKEN>"
  ```

-* Install the CDMO Helm Chart using the previous override file:
+1. Install the CDMO Helm Chart using the previous override file:

  ```bash
  helm install -n cdmo cdmo clearml-enterprise/clearml-dynamic-mig-operator --create-namespace -f cdmo-values.override.yaml
  ```

-* Enable the NVIDIA MIG support on your cluster by running the following command on all nodes with a MIG-supported GPU 
+1. Enable the NVIDIA MIG support on your cluster by running the following command on all nodes with a MIG-supported GPU 
  (run it for each GPU `<GPU_ID>` on the host):

  ```bash
  nvidia-smi -mig 1
  ```

-:::note notes
-* The node reboot may be required if the command output indicates so.
+  :::note notes
+  * A node reboot may be required if the command output indicates so.
  
-* For convenience, this command can be issued from inside the `nvidia-device-plugin-daemonset` pod running on the related node.
-:::
+  * For convenience, this command can be run from within the `nvidia-device-plugin-daemonset` pod running on the related node.
+  :::

-* Any MIG-enabled GPU node `<NODE_NAME>` from the last point must be labeled accordingly as follows:
+1. Label all MIG-enabled GPU node `<NODE_NAME>` from the previous step:

  ```bash
  kubectl label nodes <NODE_NAME> "cdmo.clear.ml/gpu-partitioning=mig"
@ -88,7 +86,7 @@ The  ClearML Dynamic MIG Operator (CDMO) enables dynamic MIG GPU configurations.

 ## Disabling MIGs

-To disable MIG, follow these steps:
+To disable MIG mode and restore standard full-GPU access:

 1. Ensure no running workflows are using GPUs on the target node(s).

@ -108,7 +106,7 @@ To disable MIG, follow these steps:
    nvidia-smi -mig 0
    ```

-4. Edit the `gpu-operator.override.yaml` file to have a standard configuration for full GPUs, and upgrade the `gpu-operator`:
+4. Edit the `gpu-operator.override.yaml` file to restore full-GPU access, and upgrade the `gpu-operator`:

    ```yaml
    toolkit:
--- a/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cdmo_cfgi_same_cluster.md
+++ b/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cdmo_cfgi_same_cluster.md
@ -1,7 +1,8 @@
 ---
-title: Install CDMO and CFGI on the same Cluster
+title: Install CDMO and CFGI on the Same Cluster
 ---

+You can install both CDMO (ClearML Dynamic MIG Orchestrator) and CFGI (ClearML Fractional GPU Injector) on a shared Kubernetes cluster. 
 In clusters with multiple nodes and varying GPU types, the `gpu-operator` can be used to manage different device configurations
 and fractioning modes.

@ -11,7 +12,7 @@ The NVIDIA `gpu-operator` supports defining multiple configurations for the Devi

 The following example YAML defines two configurations: "mig" and "ts" (time-slicing).

-``` yaml
+```yaml
 migManager:
  enabled: false
 mig:
@ -69,24 +70,15 @@ devicePlugin:

 ## Applying Configuration to Nodes

-To activate a configuration, label the Kubernetes node accordingly. After a node is labeled, 
-the NVIDIA `device-plugin` will automatically reload the new configuration.
+Label each Kubernetes node accordingly to activate a specific GPU mode:

-Example usage:
-  * Apply the `mig` (MIG mode) config:
-    ``` bash
-    kubectl label node <NODE_NAME> nvidia.com/device-plugin.config=mig
-    ```
+|Mode| Label command|
+|----|-----|
+| `mig` | `kubectl label node <NODE_NAME> nvidia.com/device-plugin.config=mig` |
+| `ts` (time slicing) | `kubectl label node <NODE_NAME> nvidia.com/device-plugin.config=ts` |
+| Standard full-GPU access | `kubectl label node <NODE_NAME> nvidia.com/device-plugin.config=all-disabled` |

-  * Apply the `ts` (time slicing) config:
-    ``` bash
-    kubectl label node <NODE_NAME> nvidia.com/device-plugin.config=ts
-    ```
-
-  * Apply the `all-disabled` (standard full GPU access) config:
-    ``` bash
-    kubectl label node <NODE_NAME> nvidia.com/device-plugin.config=all-disabled
-    ```
+After a node is labeled, the NVIDIA `device-plugin` will automatically reload the new configuration.

 ## Installing CDMO and CFGI

@ -97,22 +89,26 @@ and [CFGI](cfgi.md).

 ### Time Slicing

-To switch between time-slicing and full GPU access, update the node label using the `--overwrite` flag:
+To disable time-slicing and use full GPU access, update the node label using the `--overwrite` flag:
+
+```bash
+kubectl label node <NODE_NAME> nvidia.com/device-plugin.config=all-disabled --overwrite
+```

 ### MIG

 To disable MIG mode:

-1. Ensure there are no more running workflows requesting any form of GPU on the node(s) before re-configuring it.
-2. Remove the CDMO label from the target node(s) to disable the dynamic MIG reconfiguration.
+1. Ensure there are no more running workflows requesting any form of GPU on the node(s).
+2. Remove the CDMO label from the target node(s).

-    ``` bash
+    ```bash
    kubectl label nodes <NODE_NAME> "cdmo.clear.ml/gpu-partitioning-"
    ```

-3. Execute a shell in the `device-plugin-daemonset` Pod instance running on the target node(s) and execute the following commands:
+3. Execute a shell in the `device-plugin-daemonset` pod instance running on the target node(s) and execute the following commands:

-    ``` bash
+    ```bash
    nvidia-smi mig -dci

    nvidia-smi mig -dgi
@ -120,8 +116,8 @@ To disable MIG mode:
    nvidia-smi -mig 0
    ```

-4. Relabel the target node to disable MIG:
+4. Label the node to use standard (non-MIG) GPU mode:

-    ``` bash
+    ```bash
    kubectl label node <NODE_NAME> nvidia.com/device-plugin.config=all-disabled --overwrite
    ```
--- a/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cfgi.md
+++ b/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cfgi.md
@ -2,34 +2,36 @@
 title: ClearML Fractional GPU Injector (CFGI)
 ---

-The **ClearML Enterprise Fractional GPU Injector** (CFGI) allows AI workloads to run on Kubernetes using non-MIG GPU 
-fractions, optimizing both hardware utilization and performance.
+The **ClearML Enterprise Fractional GPU Injector** (CFGI) allows AI workloads to utilize fractional (non-MIG) GPU slices 
+on Kubernetes clusters, maximizing hardware efficiency and performance.

 ## Installation

 ### Add the Local ClearML Helm Repository

-``` bash
+```bash
 helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <GITHUB_TOKEN> --password <GITHUB_TOKEN>
 helm repo update
 ```

 ### Requirements

-* Install the official NVIDIA `gpu-operator` using Helm with one of the following configurations.
-* The number of slices must be 8.
+* Install the NVIDIA `gpu-operator` using Helm
+* Set the number of GPU slices to 8
 * Add and update the Nvidia Helm repo:

-  ``` bash
+  ```bash
  helm repo add nvidia https://nvidia.github.io/gpu-operator
  helm repo update
  ```
  
-#### GPU Operator Configuration
+* Credentials for the ClearML Enterprise DockerHub repository

-##### For CFGI Version >= 1.3.0
+### GPU Operator Configuration

-1. Create a docker-registry secret named `clearml-dockerhub-access` in the `gpu-operator` Namespace, making sure to replace your `<CLEARML_DOCKERHUB_TOKEN>`:
+#### For CFGI Version >= 1.3.0
+
+1. Create a Docker Registry secret named `clearml-dockerhub-access` in the `gpu-operator` namespace. Make sure to replace `<CLEARML_DOCKERHUB_TOKEN>` with your token.

  ```bash
  kubectl create secret -n gpu-operator docker-registry clearml-dockerhub-access \
@ -101,11 +103,11 @@ devicePlugin:
              replicas: 8
 ```

-##### For CFGI version < 1.3.0 (Legacy GPU Operator)
+#### For CFGI version < 1.3.0 (Legacy)

 Create a `gpu-operator.override.yaml` file:

-``` yaml
+```yaml
 toolkit:
  env:
    - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
@ -144,26 +146,26 @@ devicePlugin:
                replicas: 8
 ```

-### Install
+### Install GPU Operator and CFGI 

-Install the nvidia `gpu-operator` using the previously created `gpu-operator.override.yaml` override file:
+1. Install the NVIDIA `gpu-operator` using the previously created `gpu-operator.override.yaml` file:

-```bash
-helm install -n gpu-operator gpu-operator nvidia/gpu-operator --create-namespace -f gpu-operator.override.yaml
-```
+  ```bash
+  helm install -n gpu-operator gpu-operator nvidia/gpu-operator --create-namespace -f gpu-operator.override.yaml
+  ```

-Create a `cfgi-values.override.yaml` file with the following content:
+1. Create a `cfgi-values.override.yaml` file with the following content:

-```yaml
-imageCredentials:
+  ```yaml
+  imageCredentials:
    password: "<CLEARML_DOCKERHUB_TOKEN>"
-```
+  ```

-Install the CFGI Helm Chart using the previous override file:
+1. Install the CFGI Helm Chart using the previous override file:

-```bash
-helm install -n cfgi cfgi clearml-enterprise/clearml-fractional-gpu-injector --create-namespace -f cfgi-values.override.yaml
-```
+  ```bash
+  helm install -n cfgi cfgi clearml-enterprise/clearml-fractional-gpu-injector --create-namespace -f cfgi-values.override.yaml
+  ```

 ## Usage

@ -187,9 +189,9 @@ Valid values for `"<GPU_FRACTION_VALUE>"` include:
  * "0.875"
 * Integer representation of GPUs such as `1.000`, `2`, `2.0`, etc.

-### ClearML Enterprise Agent Configuration
+### ClearML Agent Configuration

-To run ClearML jobs that request specific GPU fractions, configure the queues in your `clearml-agent-values.override.yaml` file.
+To run ClearML jobs with fractional GPU allocation, configure your queues in accordingly in your `clearml-agent-values.override.yaml` file.

 Each queue should include a `templateOverride` that sets the `clearml-injector/fraction` label, which determines the 
 fraction of a GPU to allocate (e.g., "0.500" for half a GPU).
@ -259,16 +261,16 @@ agentk8sglue:
            nvidia.com/gpu: 1
 ```

-## Upgrading Chart
+## Upgrading CFGI Chart

-To upgrade to the latest version of this chart:
+To upgrade to the latest chart version:

 ```bash
 helm repo update
 helm upgrade -n cfgi cfgi clearml-enterprise/clearml-fractional-gpu-injector
 ```

-To apply changes to values on an existing installation:
+To apply new values to an existing installation:

 ```bash
 helm upgrade -n cfgi cfgi clearml-enterprise/clearml-fractional-gpu-injector -f cfgi-values.override.yaml