Add Enterprise Server deployment usecases

2025-06-26 18:17:44 +00:00 · 2025-05-16 17:10:15 +03:00
parent d0a8cc4448 aaa3851de3
commit 31a74f8b56
10 changed files with 1434 additions and 0 deletions
--- a/docs/deploying_clearml/enterprise_deploy/agent_k8s.md
+++ b/docs/deploying_clearml/enterprise_deploy/agent_k8s.md
@@ -0,0 +1,169 @@
+---
+title: ClearML Agent on Kubernetes
+---
+
+The ClearML Agent enables scheduling and executing distributed experiments on a Kubernetes cluster.
+
+## Prerequisites
+
+- A running [ClearML Enterprise Server](k8s.md)
+- API credentials (`<ACCESS_KEY>` and `<SECRET_KEY>`) generated via 
+  the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). For more information, see [ClearML API Credentials](../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials). 
+
+  :::note
+  Make sure these credentials belong to an admin user or a service user with admin privileges.
+  :::
+ 
+- The worker environment must be able to access the ClearML Server over the same network.
+- Helm token to access `clearml-enterprise` Helm chart repo
+
+## Installation
+
+### Add the Helm Repo Locally
+
+Add the ClearML Helm repository:
+```bash
+helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN>
+```
+
+Update the repository locally:
+```bash
+helm repo update
+```
+
+### Create a Values Override File
+
+Create a `clearml-agent-values.override.yaml` file with the following content:
+
+:::note
+Replace the `<ACCESS_KEY>` and `<SECRET_KEY>`with the API credentials you generated earlier. 
+Set the `<api|file|web>ServerUrlReference` fields to match your ClearML 
+Server URLs.
+:::
+
+```yaml
+imageCredentials:
+  password: "<CLEARML_DOCKERHUB_TOKEN>"
+clearml:
+  agentk8sglueKey: "<ACCESS_KEY>"
+  agentk8sglueSecret: "<SECRET_KEY>"
+agentk8sglue:
+  apiServerUrlReference: "<CLEARML_API_SERVER_REFERENCE>"
+  fileServerUrlReference: "<CLEARML_FILE_SERVER_REFERENCE>"
+  webServerUrlReference: "<CLEARML_WEB_SERVER_REFERENCE>"
+  createQueues: true
+  queues:
+    exampleQueue:
+      templateOverrides: {}
+      queueSettings: {}
+```
+
+### Install the Chart
+
+Install the ClearML Enterprise Agent Helm chart:
+
+```bash
+helm upgrade -i -n <WORKER_NAMESPACE> clearml-agent clearml-enterprise/clearml-enterprise-agent --create-namespace -f clearml-agent-values.override.yaml
+```
+
+## Additional Configuration Options
+
+To view available configuration options for the Helm chart, run the following command:
+
+```bash
+helm show readme clearml-enterprise/clearml-enterprise-agent
+# or
+helm show values clearml-enterprise/clearml-enterprise-agent
+```
+
+### Reporting GPU Availability to Orchestration Dashboard
+
+To show GPU availability in the [Orchestration Dashboard](../../webapp/webapp_orchestration_dash.md), explicitly set the number of GPUs:
+
+```yaml
+agentk8sglue:
+  # -- Agent reporting to Dashboard max GPU available. This will report 2 GPUs.
+  dashboardReportMaxGpu: 2
+```
+
+### Queues
+
+The ClearML Agent monitors [ClearML queues](../../fundamentals/agents_and_queues.md) and pulls tasks that are
+scheduled for execution.
+
+A single agent can monitor multiple queues. By default, all queues share a base pod template (`agentk8sglue.basePodTemplate`) 
+used when launching tasks on Kubernetes after it has been pulled from the queue.
+
+Each queue can be configured to override the base pod template with its own settings with a `templateOverrides` queue template. 
+This way queue definitions can be tailored to different use cases.
+
+The following are a few examples of agent queue templates:
+
+#### Example: GPU Queues
+
+To support GPU queues, you must deploy the NVIDIA GPU Operator on your Kubernetes cluster. For more information, see [GPU Operator](extra_configs/gpu_operator.md).
+
+```yaml
+agentk8sglue:
+  createQueues: true
+  queues:
+    1xGPU:
+      templateOverrides:
+        resources:
+          limits:
+            nvidia.com/gpu: 1
+    2xGPU:
+      templateOverrides:
+        resources:
+          limits:
+            nvidia.com/gpu: 2
+```
+
+#### Example: Custom Pod Template per Queue
+
+This example demonstrates how to override the base pod template definitions on a per-queue basis.
+In this example:
+
+- The `red` queue inherits both the label `team=red` and the 1Gi memory limit from the `basePodTemplate` section.
+- The `blue` queue overrides the label by setting it to `team=blue`, and inherits the 1Gi memory from the `basePodTemplate` section.
+- The `green` queue overrides the label by setting it to `team=green`, and overrides the memory limit by setting it to 2Gi. 
+  It also sets an annotation and an environment variable.
+
+```yaml
+agentk8sglue:
+  # Defines common template
+  basePodTemplate:
+    labels:
+      team: red
+    resources:
+      limits:
+        memory: 1Gi
+  createQueues: true
+  queues:
+    red:
+      # Does not override
+      templateOverrides: {}
+    blue:
+      # Overrides labels
+      templateOverrides:
+        labels:
+          team: blue
+    green:
+      # Overrides labels and resources, plus set new fields
+      templateOverrides:
+        labels:
+          team: green
+        annotations:
+          example: "example value"
+        resources:
+          limits:
+            memory: 2Gi
+        env:
+          - name: MY_ENV
+            value: "my_value"
+```
+
+## Next Steps
+
+Once the agent is up and running, proceed with deploying the [ClearML Enterprise App Gateway](appgw_install_k8s.md).
+
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/dynamic_edit_task_pod_template.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/dynamic_edit_task_pod_template.md
@@ -0,0 +1,271 @@
+---
+title: Dynamically Edit Task Pod Template
+---
+
+ClearML Agent allows you to inject custom Python code to dynamically modify the Kubernetes Pod template before applying it. 
+
+
+## Agent Configuration
+
+The `CLEARML_K8S_GLUE_TEMPLATE_MODULE` environment variable defines the Python module and function inside that 
+module to be invoked by the agent before applying a task pod template. 
+
+The agent will run this code in its own context, pass arguments (including the actual template) to the function, and use 
+the returned template to create the final Task Pod in Kubernetes.
+
+Arguments passed to the function include:
+
+* `queue` (string) - ID of the queue from which the task was pulled.
+* `queue_name` (string) - Name of the queue from which the task was pulled.
+* `template` (Python dictionary) - Base Pod template created from the agent's configuration and any queue-specific overrides.
+* `task_data` (object) - Task data object (as returned by the `tasks.get_by_id` API call). For example, use `task_data.project` to get the task's project ID.
+* `providers_info` (dictionary) - Provider info containing optional information collected for the user running this task 
+  when the user logged into the system (requires additional server configuration).
+* `task_config` (`clearml_agent.backend_config.Config` object) - Task configuration containing configuration vaults applicable 
+  for the user running this task, and other configuration. Use `task_config.get("...")` to get specific configuration values.
+* `worker` - The agent Python object in case custom calls might be required.
+
+### Usage
+
+Update `clearml-agent-values.override.yaml` to include:
+
+```yaml
+agentk8sglue:
+  extraEnvs:
+   - name: CLEARML_K8S_GLUE_TEMPLATE_MODULE
+     value: "custom_code:update_template"
+  fileMounts:
+    - name: "custom_code.py"
+      folderPath: "/root"
+      fileContent: |-
+        import json
+        from pprint import pformat 
+        
+        def update_template(queue, task_data, providers_info, template, task_config, worker, queue_name, *args, **kwargs):
+          print(pformat(template))
+          
+          my_var_name = "foo"
+          my_var_value = "bar"
+          
+          try:
+              template["spec"]["containers"][0]["env"][0]["name"] = str(my_var_name)
+              template["spec"]["containers"][0]["env"][0]["value"] = str(my_var_value)
+          except KeyError as ex:
+              raise Exception("Failed modifying template: {}".format(ex))
+
+          return {"template": template}
+```
+
+:::note notes
+* Always include `*args, **kwargs` at the end of the function's argument list and only use keyword arguments. 
+  This is needed to maintain backward compatibility.
+
+* Custom code modules can be included as a file in the pod's container, and the environment variable can be used to
+  point to the file and entry point.
+
+* When defining a custom code module, by default the agent will start watching pods in all namespaces 
+  across the cluster. If you do not intend to give a `ClusterRole` permission, make sure to set the 
+  `CLEARML_K8S_GLUE_MONITOR_ALL_NAMESPACES` env to `"0"` to prevent the Agent to try listing pods in all namespaces. 
+  Instead, set it to `"1"` if namespace-related changes are needed in the code.
+
+  ```yaml
+  agentk8sglue:
+    extraEnvs:
+      - name: CLEARML_K8S_GLUE_MONITOR_ALL_NAMESPACES
+        value: "0"
+  ```
+
+  To customize the bash startup scripts instead of the pod spec, use:
+
+  ```yaml
+  agentk8sglue:
+    # -- Custom Bash script for the Agent pod ran by Glue Agent
+    customBashScript: ""
+    # -- Custom Bash script for the Task Pods ran by Glue Agent
+    containerCustomBashScript: ""
+  ```
+
+## Examples
+
+### Example: Edit Template Based on ENV Var
+
+```yaml
+agentk8sglue:
+  extraEnvs:
+   - name: CLEARML_K8S_GLUE_TEMPLATE_MODULE
+     value: "custom_code:update_template"
+  fileMounts:
+    - name: "custom_code.py"
+      folderPath: "/root"
+      fileContent: |-
+        import json
+        from pprint import pformat 
+        def update_template(queue, task_data, providers_info, template, task_config, worker, queue_name, *args, **kwargs):
+          print(pformat(template))
+          
+          my_var = "some_var"
+          
+          try:
+              template["spec"]["initContainers"][0]["command"][-1] = \
+                  template["spec"]["initContainers"][0]["command"][-1].replace("MY_VAR", str(my_var))
+              template["spec"]["containers"][0]["volumeMounts"][0]["subPath"] = str(my_var)
+          except KeyError as ex:
+              raise Exception("Failed modifying template with MY_VAR: {}".format(ex))
+
+          return {"template": template}
+  basePodTemplate:
+    initContainers:
+      - name: myInitContainer
+        image: docker/ubuntu:18.04
+        command:
+          - /bin/bash
+          - -c
+          - >
+            echo MY_VAR;
+        volumeMounts:
+        - name: myTemplatedMount
+          mountPath: MY_VAR
+    volumes:
+      - name: myTemplatedMount
+        emptyDir: {}
+```
+
+### Example: Inject NFS Mount Path
+
+```yaml
+agentk8sglue:
+  extraEnvs:
+   - name: CLEARML_K8S_GLUE_TEMPLATE_MODULE
+     value: "custom_code:update_template"
+  fileMounts:
+    - name: "custom_code.py"
+      folderPath: "/root"
+      fileContent: |-
+        import json
+        from pprint import pformat
+        def update_template(queue, task_data, providers_info, template, task_config, worker, queue_name, *args, **kwargs):
+            nfs = task_config.get("nfs")
+            # ad_role = providers_info.get("ad-role")
+            if nfs: # and ad_role == "some-value"
+                print(pformat(template))
+        
+                try:
+                    template["spec"]["containers"][0]["volumeMounts"].append(
+                      {"name": "custom-mount", "mountPath": nfs.get("mountPath")}
+                    )
+                    template["spec"]["containers"][0]["volumes"].append(
+                      {"name": "custom-mount", "nfs": {"server": nfs.get("server.ip"), "path": nfs.get("server.path")}}
+                    )
+                except KeyError as ex:
+                    raise Exception("Failed modifying template: {}".format(ex))
+        
+            return {"template": template}
+```
+
+### Example: Bind PVC Resource to Task Pod
+
+In this example, a PVC is created and attached to every pod created from a dedicated queue, then it is deleted.
+
+Key points:
+
+- `CLEARML_K8S_GLUE_POD_PRE_APPLY_CMD` and `CLEARML_K8S_GLUE_POD_POST_DELETE_CMD` env vars let you define custom bash 
+  code hooks to be executed around the main apply command by the Agent, such as creating and deleting a PVC object.
+
+- `CLEARML_K8S_GLUE_TEMPLATE_MODULE` env var and a file mount let you define custom Python code in a specific context, 
+  useful to dynamically update the main Pod template before the Agent applies it.
+
+:::note notes
+* This example uses a queue named `pvc-test`, make sure to replace all occurrences of it.
+
+* `CLEARML_K8S_GLUE_POD_PRE_APPLY_CMD` can reference templated vars as `{queue_name}, {pod_name}, {namespace}` that will 
+  get replaced with the actual value by the Agent at execution time.
+
+```yaml
+agentk8sglue:
+  # Bind a pre-defined custom 'custom-agent-role' Role with the ability to handle 'persistentvolumeclaims'
+  additionalRoleBindings:
+    - custom-agent-role
+  extraEnvs:
+    # Need this or permissions to list all namespaces
+    - name: CLEARML_K8S_GLUE_MONITOR_ALL_NAMESPACES
+      value: "0"
+    # Executed before applying the Task Pod. Replace the $PVC_NAME placeholder in the manifest template with the Pod name and apply it, only in a specific queue.
+    - name: CLEARML_K8S_GLUE_POD_PRE_APPLY_CMD
+      value: "[[ {queue_name} == 'pvc-test' ]] && sed 's/\\$PVC_NAME/{pod_name}/g' /mnt/yaml-manifests/pvc.yaml | kubectl apply -n {namespace} -f - || echo 'Skipping PRE_APPLY PVC creation...'"
+    # Executed after deleting the Task Pod. Delete the PVC.
+    - name: CLEARML_K8S_GLUE_POD_POST_DELETE_CMD
+      value: "kubectl delete pvc {pod_name} -n {namespace} || echo 'Skipping POST_DELETE PVC deletion...'"
+    # Define a custom code module for updating the Pod template
+    - name: CLEARML_K8S_GLUE_TEMPLATE_MODULE
+      value: "custom_code:update_template"
+  fileMounts:
+    # Mount a PVC manifest file with a templated $PVC_NAME name
+    - name: "pvc.yaml"
+      folderPath: "/mnt/yaml-manifests"
+      fileContent: |
+        apiVersion: v1
+        kind: PersistentVolumeClaim
+        metadata:
+          name: $PVC_NAME
+        spec:
+          resources:
+            requests:
+              storage: 5Gi
+          volumeMode: Filesystem
+          storageClassName: standard
+          accessModes:
+            - ReadWriteOnce
+    # Custom code module for updating the Pod template
+    - name: "custom_code.py"
+      folderPath: "/root"
+      fileContent: |-
+        import json
+        from pprint import pformat 
+        def update_template(queue, task_data, providers_info, template, task_config, worker, queue_name, *args, **kwargs):
+          if queue_name == "pvc-test":
+            # Set PVC_NAME as the name of the Pod
+            PVC_NAME = f"clearml-id-{task_data.id}"
+            try:
+              # Replace the claimName placeholder with a dynamic value
+              template["spec"]["volumes"][0]["persistentVolumeClaim"]["claimName"] = str(PVC_NAME)
+            except KeyError as ex:
+              raise Exception("Failed modifying template with PVC_NAME: {}".format(ex))
+          # Return the edited template
+          return {"template": template}
+  createQueues: true
+  queues:
+    # Define a queue with an override `volumes` and `volumeMounts` section for binding a PVC
+    pvc-test:
+      templateOverrides:
+        volumes:
+          - name: task-pvc
+            persistentVolumeClaim:
+              # PVC_NAME placeholder. This will get replaced in the custom code module.
+              claimName: PVC_NAME
+        volumeMounts:
+          - mountPath: "/tmp/task/"
+            name: task-pvc
+```
+
+### Example: Required Role 
+
+The following is an example of `custom-agent-role` Role with permissions to handle `persistentvolumeclaims`:
+
+```yaml
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: custom-agent-role
+rules:
+- apiGroups:
+  - ""
+  resources:
+  - persistentvolumeclaims
+  verbs:
+  - get
+  - list
+  - watch
+  - create
+  - patch
+  - delete
+```
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/gpu_operator.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/gpu_operator.md
@@ -0,0 +1,61 @@
+---
+title: Basic Deployment - Suggested GPU Operator Values
+---
+
+This guide provides recommended configuration values for deploying the NVIDIA GPU Operator alongside ClearML Enterprise. 
+
+## Add the Helm Repo Locally
+
+Add the NVIDIA GPU Operator Helm repository:
+
+```bash
+helm repo add nvidia https://nvidia.github.io/gpu-operator
+```
+
+Update the repository locally:
+```bash
+helm repo update
+```
+
+## Installation
+
+To prevent unprivileged containers from bypassing the Kubernetes Device Plugin API, configure the GPU operator 
+using the following `gpu-operator.override.yaml` file:
+
+```yaml
+toolkit:
+  env:
+    - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
+      value: "false"
+    - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
+      value: "true"
+devicePlugin:
+  env:
+    - name: PASS_DEVICE_SPECS
+      value: "true"
+    - name: FAIL_ON_INIT_ERROR
+      value: "true"
+    - name: DEVICE_LIST_STRATEGY # Use volume-mounts
+      value: volume-mounts
+    - name: DEVICE_ID_STRATEGY
+      value: uuid
+    - name: NVIDIA_VISIBLE_DEVICES
+      value: all
+    - name: NVIDIA_DRIVER_CAPABILITIES
+      value: all
+```
+
+Install the `gpu-operator`:
+
+``` bash
+helm install -n gpu-operator gpu-operator nvidia/gpu-operator --create-namespace -f gpu-operator.override.yaml
+```
+
+## Fractional GPU Support
+
+To enable fractional GPU allocation or manage mixed GPU configurations, refer to the following guides:
+* [ClearML Dynamic MIG Operator](../fractional_gpus/cdmo.md) (CDMO) – Dynamically configures MIG GPUs on supported devices.
+* [ClearML Enterprise Fractional GPU Injector](../fractional_gpus/cfgi.md) (CFGI) – Enables fractional (non-MIG) GPU 
+  allocation for better hardware utilization and workload distribution in Kubernetes.
+* [CDMO and CFGI on the same Cluster](../fractional_gpus/cdmo_cfgi_same_cluster.md) - In clusters with multiple nodes and 
+  varying GPU types, the `gpu-operator` can be used to manage different device configurations and fractioning modes.
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/multi_node_training.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/multi_node_training.md
@@ -0,0 +1,27 @@
+---
+title: Multi-Node Training
+--- 
+
+The ClearML Enterprise Agent supports horizontal multi-node training, allowing a single Task to run across multiple pods 
+on different nodes.
+
+Below is a configuration example using `clearml-agent-values.override.yaml`:
+
+```yaml
+agentk8sglue:
+  # Cluster access is required to run multi-node Tasks
+  serviceAccountClusterAccess: true
+  multiNode:
+    enabled: true
+  createQueues: true
+  queues:
+    multi-node-example:
+      queueSettings:
+        # Defines the distribution of GPUs Tasks across multiple nodes. The format [x, y, ...] specifies the distribution of Tasks as 'x' GPUs on a node and 'y' GPUs on another node. Multiple Pods will be spawned respectively based on the lowest-common-denominator defined.
+        multiNode: [ 4, 2 ]
+      templateOverrides:
+        resources:
+          limits:
+            # Note you will need to use the lowest-common-denominator of the GPUs distribution defined in `queueSettings.multiNode`.
+            nvidia.com/gpu: 2
+```
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/presign_service.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/presign_service.md
@@ -0,0 +1,72 @@
+---
+title: ClearML Presign Service
+---
+
+The ClearML Presign Service is a secure service that generates and redirects pre-signed storage URLs for authenticated 
+users, enabling direct access to cloud-hosted data (e.g., S3) without exposing credentials.
+
+## Prerequisites
+
+- A ClearML Enterprise Server is up and running.
+- API credentials (`<ACCESS_KEY>` and `<SECRET_KEY>`) generated via 
+  the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). For more information, see [ClearML API Credentials](../../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials).
+
+  :::note
+  Make sure these credentials belong to an admin user or a service user with admin privileges.
+  :::
+ 
+- The worker environment must be able to access the ClearML Server over the same network.
+- Token to access `clearml-enterprise` Helm chart repo
+
+## Installation
+
+### Add the Helm Repo Locally
+
+Add the ClearML Helm repository:
+```bash
+helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN>
+```
+
+Update the repository locally:
+```bash
+helm repo update
+```
+
+### Prepare Configuration
+
+Create a `presign-service.override.yaml` file (make sure to replace the placeholders):
+
+```yaml
+imageCredentials:
+  password: "<CLEARML_DOCKERHUB_TOKEN>"
+clearml:
+  apiServerUrlReference: "<CLEARML_API_SERVER_URL>"
+  apiKey: "<ACCESS_KEY>"
+  apiSecret: "<SECRET_KEY>"
+ingress:
+  enabled: true
+  hostName: "<PRESIGN_SERVICE_URL>"
+```
+
+### Deploy the Helm Chart
+
+Install the `clearml-presign-service` Helm chart in the same namespace as the ClearML Enterprise server:
+
+```bash
+helm install -n clearml clearml-presign-service clearml-enterprise/clearml-presign-service -f presign-service.override.yaml
+```
+
+### Update ClearML Enterprise Server Configuration
+
+Enable the ClearML Server to use the Presign Service by editing your `clearml-values.override.yaml` file. 
+Add the following to the `apiserver.extraEnvs` section (make sure to replace `<PRESIGN_SERVICE_URL>`). 
+
+```yaml
+apiserver:
+  extraEnvs:
+    - name: CLEARML__SERVICES__SYSTEM__COMPANY__DEFAULT__SERVICES
+      value: "[{\"type\":\"presign\",\"url\":\"https://<PRESIGN_SERVICE_URL>\",\"use_fallback\":\"false\",\"match_sets\":[{\"rules\":[{\"field\":\"\",\"obj_type\":\"\",\"regex\":\"^s3://\"}]}]}]"
+```
+
+Apply the changes with a Helm upgrade.
+
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/self_signed_certificates.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/self_signed_certificates.md
@@ -0,0 +1,201 @@
+---
+title: ClearML Tenant with Self Signed Certificates
+---
+
+This guide covers how to configure the [AI Application Gateway](#ai-application-gateway) and [ClearML Agent](#clearml-agent) 
+to use self-signed or custom SSL certificates. 
+
+## AI Application Gateway
+
+To configure certificates for the Application Gateway, update your `clearml-app-gateway-values.override.yaml` file:
+
+```yaml
+# -- Custom certificates
+customCertificates:
+  # -- Override system crt certificate bundle. Mutual exclusive with extraCerts.
+  overrideCaCertificatesCrt:
+  # -- Extra certs usable in case of needs of adding more certificates to the standard bundle, Requires root permissions to run update-ca-certificates. Mutual exclusive with overrideCaCertificatesCrt.
+  extraCerts:
+     - alias: certificateName
+       pem: |
+         -----BEGIN CERTIFICATE-----
+         ###
+         -----END CERTIFICATE-----
+```
+
+You have two configuration options:
+
+- [**Replace**](#replace-entire-ca-certificatescrt-file) the entire `ca-certificates.crt` file
+- [**Append**](#append-extra-certificates-to-the-existing-ca-certificatescrt) extra certificates to the existing `ca-certificates.crt`
+
+
+### Replace Entire `ca-certificates.crt` File
+
+To replace the whole ca-bundle, provide a concatenated list of all trusted CA certificates in `pem` format as 
+they are stored in a standard `ca-certificates.crt`.
+
+```yaml
+# -- Custom certificates
+customCertificates:
+  # -- Override system crt certificate bundle. Mutual exclusive with extraCerts.
+  overrideCaCertificatesCrt: |
+    -----BEGIN CERTIFICATE-----
+    ### CERT 1
+    -----END CERTIFICATE-----
+    -----BEGIN CERTIFICATE-----
+    ### CERT 2
+    -----END CERTIFICATE-----
+    -----BEGIN CERTIFICATE-----
+    ### CERT 3
+    -----END CERTIFICATE-----
+   ...
+```
+
+### Append Extra Certificates to the Existing `ca-certificates.crt`
+
+You can add certificates to the existing CA bundle. Each certificate must have a unique `alias`.
+
+```yaml
+# -- Custom certificates
+customCertificates:
+  # -- Extra certs usable in case of needs of adding more certificates to the standard bundle, Requires root permissions to run update-ca-certificates. Mutual exclusive with overrideCaCertificatesCrt.
+  extraCerts:
+     - alias: certificate-name-1
+       pem: |
+         -----BEGIN CERTIFICATE-----
+         ###
+         -----END CERTIFICATE-----
+     - alias: certificate-name-2
+       pem: |
+         -----BEGIN CERTIFICATE-----
+         ###
+         -----END CERTIFICATE-----
+```
+
+### Apply Changes
+
+To apply the changes, run the update command:
+
+```bash
+helm upgrade -i <RELEASE_NAME> -n <WORKLOAD_NAMESPACE> clearml-enterprise/clearml-enterprise-app-gateway --version <CHART_VERSION> -f clearml-app-gateway-values.override.yaml
+```
+
+## ClearML Agent
+
+For the ClearML Agent, configure certificates in the `clearml-agent-values.override.yaml` file:
+
+```yaml
+# -- Custom certificates
+customCertificates:
+  # -- Override system crt certificate bundle. Mutual exclusive with extraCerts.
+  overrideCaCertificatesCrt:
+  # -- Extra certs usable in case of needs of adding more certificates to the standard bundle, Requires root permissions to run update-ca-certificates. Mutual exclusive with overrideCaCertificatesCrt.
+  extraCerts:
+     - alias: certificateName
+       pem: |
+         -----BEGIN CERTIFICATE-----
+         ###
+         -----END CERTIFICATE-----
+```
+
+You have two configuration options:
+
+- [**Replace**](#replace-entire-ca-certificatescrt-file-1) the entire `ca-certificates.crt` file
+- [**Append**](#append-extra-certificates-to-the-existing-ca-certificatescrt-1) extra certificates to the existing `ca-certificates.crt`
+
+
+### Replace Entire `ca-certificates.crt` File
+
+To replace the whole ca-bundle, provide a concatenated list of all trusted CA certificates in `pem` format as 
+they are stored in a standard `ca-certificates.crt`.
+
+
+```yaml
+# -- Custom certificates
+customCertificates:
+  # -- Override system crt certificate bundle. Mutual exclusive with extraCerts.
+  overrideCaCertificatesCrt: |
+    -----BEGIN CERTIFICATE-----
+    ### CERT 1
+    -----END CERTIFICATE-----
+    -----BEGIN CERTIFICATE-----
+    ### CERT 2
+    -----END CERTIFICATE-----
+    -----BEGIN CERTIFICATE-----
+    ### CERT 3
+    -----END CERTIFICATE-----
+   ...
+```
+
+### Append Extra Certificates to the Existing `ca-certificates.crt`
+
+You can add certificates to the existing CA bundle. Each certificate must have a unique `alias`.
+
+```yaml
+# -- Custom certificates
+customCertificates:
+  # -- Extra certs usable in case of needs of adding more certificates to the standard bundle, Requires root permissions to run update-ca-certificates. Mutual exclusive with overrideCaCertificatesCrt.
+  extraCerts:
+     - alias: certificate-name-1
+       pem: |
+         -----BEGIN CERTIFICATE-----
+         ###
+         -----END CERTIFICATE-----
+     - alias: certificate-name-2
+       pem: |
+         -----BEGIN CERTIFICATE-----
+         ###
+         -----END CERTIFICATE-----
+```
+
+### Add Certificates to Task Pods
+
+If your workloads need access to these certificates (e.g., for HTTPS requests), configure the agent to inject them into pods:
+
+```yaml
+agentk8sglue:
+  basePodTemplate:
+    initContainers:
+      - command:
+        - /bin/sh
+        - -c
+        - update-ca-certificates
+        image: allegroai/clearml-enterprise-agent-k8s-base:<AGENT-VERSION-AVAIABLE-ON-REPO>
+        imagePullPolicy: IfNotPresent
+        name: init-task
+        volumeMounts:
+          - name: etc-ssl-certs
+            mountPath: "/etc/ssl/certs"
+          - name: clearml-extra-ca-certs
+            mountPath: "/usr/local/share/ca-certificates"
+    env:
+      - name: REQUESTS_CA_BUNDLE
+        value: /etc/ssl/certs/ca-certificates.crt
+    volumeMounts:
+      - name: etc-ssl-certs
+        mountPath: "/etc/ssl/certs"
+    volumes:
+      - name: etc-ssl-certs
+        emptyDir: {}
+      - name: clearml-extra-ca-certs
+        projected:
+          defaultMode: 420
+          sources:
+# LIST HERE CONFIGMAPS CREATED BY THE AGENT CHART, THE CARDINALITY DEPENDS ON THE NUMBER OF CERTS PROVIDED.
+          - configMap:
+              name: clearml-agent-clearml-enterprise-agent-custom-ca-0
+          - configMap:
+              name: clearml-agent-clearml-enterprise-agent-custom-ca-1
+```
+
+The `clearml-extra-ca-certs` volume must include all `ConfigMap` resources generated by the agent for the custom certificates.
+These `ConfigMaps` are automatically created by the Helm chart based on the number of certificates provided.
+Their names are usually prefixed with the Helm release name, so adjust accordingly if you used a custom release name.
+
+### Apply Changes
+
+Apply the changes by running the update command:
+
+``` bash
+helm upgrade -i -n <WORKER_NAMESPACE> clearml-agent clearml-enterprise/clearml-enterprise-agent --create-namespace -f clearml-agent-values.override.yaml
+```
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/sso_login.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/sso_login.md
@@ -0,0 +1,73 @@
+---
+title: SSO (Identity Provider) Setup
+---
+
+ClearML Enterprise Server supports various Single Sign-On (SSO) identity providers.
+SSO configuration is managed via environment variables in your `clearml-values.override.yaml` file and is applied to the 
+`apiserver` component.
+
+The following are configuration examples for commonly used providers. Other supported systems include: 
+* Auth0
+* Keycloak
+* Okta
+* Azure AD
+* Google
+* AWS Cognito
+
+## Auth0
+
+```yaml
+apiserver:
+  extraEnvs:
+    - name: CLEARML__secure__login__sso__oauth_client__auth0__client_id
+      value: "<AUTH0_CLIENT_ID>"
+    - name: CLEARML__secure__login__sso__oauth_client__auth0__client_secret
+      value: "<AUTH0_CLIENT_SECRET>"
+    - name: CLEARML__services__login__sso__oauth_client__auth0__base_url
+      value: "<AUTH0_BASE_URL>"
+    - name: CLEARML__services__login__sso__oauth_client__auth0__authorize_url
+      value: "<AUTH0_AUTHORIZE_URL>"
+    - name: CLEARML__services__login__sso__oauth_client__auth0__access_token_url
+      value: "<AUTH0_ACCESS_TOKEN_URL>"
+    - name: CLEARML__services__login__sso__oauth_client__auth0__audience
+      value: "<AUTH0_AUDIENCE>"
+```
+
+## Keycloak
+
+```yaml
+apiserver:
+  extraEnvs:
+    - name: CLEARML__secure__login__sso__oauth_client__keycloak__client_id
+      value: "<KC_CLIENT_ID>"
+    - name: CLEARML__secure__login__sso__oauth_client__keycloak__client_secret
+      value: "<KC_SECRET_ID>"
+    - name: CLEARML__services__login__sso__oauth_client__keycloak__base_url
+      value: "<KC_URL>/realms/<REALM_NAME>/"
+    - name: CLEARML__services__login__sso__oauth_client__keycloak__authorize_url
+      value: "<KC_URL>/realms/<REALM_NAME>/protocol/openid-connect/auth"
+    - name: CLEARML__services__login__sso__oauth_client__keycloak__access_token_url
+      value: "<KC_URL>/realms/<REALM_NAME>/protocol/openid-connect/token"
+    - name: CLEARML__services__login__sso__oauth_client__keycloak__idp_logout
+      value: "true"
+```
+
+## Group Membership Mapping in Keycloak
+
+To map Keycloak groups into the ClearML user's SSO token:
+
+1. Go to the **Client Scopes** tab.
+1. Click on the `<clearml client>-dedicated` scope.
+1. Click **Add Mapper > By Configuration > Group Membership** 
+1. Configure the mapper:
+   * Select the **Name** "groups" 
+   * Set **Token Claim Name** "groups"
+   * Uncheck the **Full group path**
+   * Save the mapper.
+
+To verify:
+
+1. Go to the **Client Details > Client scope** tab.
+1. Go to the **Evaluate** sub-tab and select a user with any group memberships.
+1. Go to **Generated ID Token** and then to **Generated User Info**. 
+1. Inspect that in both cases you can see the group's claim in the displayed user data.
--- a/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cdmo.md
+++ b/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cdmo.md
@@ -0,0 +1,132 @@
+---
+title: ClearML Dynamic MIG Operator (CDMO)
+---
+
+The  ClearML Dynamic MIG Operator (CDMO) enables dynamic MIG (Multi-Instance GPU) configurations.
+
+## Installation
+
+### Requirements
+
+* Add and update the Nvidia Helm repo:
+
+  ```bash
+  helm repo add nvidia https://nvidia.github.io/gpu-operator
+  helm repo update
+  ```
+
+* Create a `gpu-operator.override.yaml` file with the following content:
+
+  ```yaml
+  migManager:
+    enabled: false
+  mig:
+    strategy: mixed
+  toolkit:
+    env:
+      - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
+        value: "false"
+      - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
+        value: "true"
+  devicePlugin:
+    env:
+      - name: PASS_DEVICE_SPECS
+        value: "true"
+      - name: FAIL_ON_INIT_ERROR
+        value: "true"
+      - name: DEVICE_LIST_STRATEGY # Use volume-mounts
+        value: volume-mounts
+      - name: DEVICE_ID_STRATEGY
+        value: uuid
+      - name: NVIDIA_VISIBLE_DEVICES
+        value: all
+      - name: NVIDIA_DRIVER_CAPABILITIES
+        value: all
+  ```
+
+* Install the NVIDIA `gpu-operator` using Helm with the previous configuration:
+
+  ```bash
+  helm install -n gpu-operator gpu-operator nvidia/gpu-operator --create-namespace -f gpu-operator.override.yaml
+  ```
+
+### Installing CDMO 
+
+1. Create a `cdmo-values.override.yaml` file with the following content:
+
+  ```yaml
+  imageCredentials:
+    password: "<CLEARML_DOCKERHUB_TOKEN>"
+  ```
+
+1. Install the CDMO Helm Chart using the previous override file:
+
+  ```bash
+  helm install -n cdmo cdmo clearml-enterprise/clearml-dynamic-mig-operator --create-namespace -f cdmo-values.override.yaml
+  ```
+
+1. Enable the NVIDIA MIG support on your cluster by running the following command on all nodes with a MIG-supported GPU 
+  (run it for each GPU `<GPU_ID>` on the host):
+
+  ```bash
+  nvidia-smi -mig 1
+  ```
+
+  :::note notes
+  * A node reboot may be required if the command output indicates so.
+  
+  * For convenience, this command can be run from within the `nvidia-device-plugin-daemonset` pod running on the related node.
+  :::
+
+1. Label all MIG-enabled GPU node `<NODE_NAME>` from the previous step:
+
+  ```bash
+  kubectl label nodes <NODE_NAME> "cdmo.clear.ml/gpu-partitioning=mig"
+  ```
+
+## Disabling MIGs
+
+To disable MIG mode and restore standard full-GPU access:
+
+1. Ensure no running workflows are using GPUs on the target node(s).
+
+2. Remove the CDMO label from the target node(s) to disable the dynamic MIG reconfiguration.
+
+    ```bash
+    kubectl label nodes <NODE_NAME> "cdmo.clear.ml/gpu-partitioning-"
+    ```
+
+3. Execute a shell into the `device-plugin-daemonset` pod instance running on the target node(s) and execute the following commands:
+
+    ```bash
+    nvidia-smi mig -dci
+
+    nvidia-smi mig -dgi
+
+    nvidia-smi -mig 0
+    ```
+
+4. Edit the `gpu-operator.override.yaml` file to restore full-GPU access, and upgrade the `gpu-operator`:
+
+    ```yaml
+    toolkit:
+    env:
+        - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
+          value: "false"
+        - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
+          value: "true"
+    devicePlugin:
+    env:
+        - name: PASS_DEVICE_SPECS
+          value: "true"
+        - name: FAIL_ON_INIT_ERROR
+          value: "true"
+        - name: DEVICE_LIST_STRATEGY # Use volume-mounts
+          value: volume-mounts
+        - name: DEVICE_ID_STRATEGY
+          value: uuid
+        - name: NVIDIA_VISIBLE_DEVICES
+          value: all
+        - name: NVIDIA_DRIVER_CAPABILITIES
+          value: all
+    ```
--- a/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cdmo_cfgi_same_cluster.md
+++ b/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cdmo_cfgi_same_cluster.md
@@ -0,0 +1,123 @@
+---
+title: Install CDMO and CFGI on the Same Cluster
+---
+
+You can install both CDMO (ClearML Dynamic MIG Orchestrator) and CFGI (ClearML Fractional GPU Injector) on a shared Kubernetes cluster. 
+In clusters with multiple nodes and varying GPU types, the `gpu-operator` can be used to manage different device configurations
+and fractioning modes.
+
+## Configuring the NVIDIA GPU Operator
+
+The NVIDIA `gpu-operator` supports defining multiple configurations for the Device Plugin.
+
+The following example YAML defines two configurations: "mig" and "ts" (time-slicing).
+
+```yaml
+migManager:
+  enabled: false
+mig:
+  strategy: mixed
+toolkit:
+  env:
+    - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
+      value: "false"
+    - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
+      value: "true"
+devicePlugin:
+  enabled: true
+  env:
+    - name: PASS_DEVICE_SPECS
+      value: "true"
+    - name: FAIL_ON_INIT_ERROR
+      value: "true"
+    - name: DEVICE_LIST_STRATEGY
+      value: volume-mounts
+    - name: DEVICE_ID_STRATEGY
+      value: uuid
+    - name: NVIDIA_VISIBLE_DEVICES
+      value: all
+    - name: NVIDIA_DRIVER_CAPABILITIES
+      value: all
+  config:
+    name: device-plugin-config
+    create: true
+    default: "all-disabled"
+    data:
+      all-disabled: |-
+        version: v1
+        flags:
+          migStrategy: none
+      ts: |-
+        version: v1
+        flags:
+          migStrategy: none
+        sharing:
+          timeSlicing:
+            renameByDefault: false
+            failRequestsGreaterThanOne: false
+            # Edit the following configuration as needed, adding as many GPU indices as many cards are installed on the Host.
+            resources:
+            - name: nvidia.com/gpu
+              rename: nvidia.com/gpu-0
+              devices:
+              - "0"
+              replicas: 8
+      mig: |-
+        version: v1
+        flags:
+          migStrategy: mixed
+```
+
+## Applying Configuration to Nodes
+
+Label each Kubernetes node accordingly to activate a specific GPU mode:
+
+|Mode| Label command|
+|----|-----|
+| `mig` | `kubectl label node <NODE_NAME> nvidia.com/device-plugin.config=mig` |
+| `ts` (time slicing) | `kubectl label node <NODE_NAME> nvidia.com/device-plugin.config=ts` |
+| Standard full-GPU access | `kubectl label node <NODE_NAME> nvidia.com/device-plugin.config=all-disabled` |
+
+After a node is labeled, the NVIDIA `device-plugin` will automatically reload the new configuration.
+
+## Installing CDMO and CFGI
+
+After configuring the NVIDIA `gpu-operator` and labeling nodes, proceed with the standard installations of [CDMO](cdmo.md) 
+and [CFGI](cfgi.md).
+
+## Disabling Configurations 
+
+### Time Slicing
+
+To disable time-slicing and use full GPU access, update the node label using the `--overwrite` flag:
+
+```bash
+kubectl label node <NODE_NAME> nvidia.com/device-plugin.config=all-disabled --overwrite
+```
+
+### MIG
+
+To disable MIG mode:
+
+1. Ensure there are no more running workflows requesting any form of GPU on the node(s).
+2. Remove the CDMO label from the target node(s).
+
+    ```bash
+    kubectl label nodes <NODE_NAME> "cdmo.clear.ml/gpu-partitioning-"
+    ```
+
+3. Execute a shell in the `device-plugin-daemonset` pod instance running on the target node(s) and execute the following commands:
+
+    ```bash
+    nvidia-smi mig -dci
+
+    nvidia-smi mig -dgi
+
+    nvidia-smi -mig 0
+    ```
+
+4. Label the node to use standard (non-MIG) GPU mode:
+
+    ```bash
+    kubectl label node <NODE_NAME> nvidia.com/device-plugin.config=all-disabled --overwrite
+    ```
--- a/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cfgi.md
+++ b/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cfgi.md
@@ -0,0 +1,305 @@
+---
+title: ClearML Fractional GPU Injector (CFGI)
+---
+
+The **ClearML Enterprise Fractional GPU Injector** (CFGI) allows AI workloads to utilize fractional (non-MIG) GPU slices 
+on Kubernetes clusters, maximizing hardware efficiency and performance.
+
+## Installation
+
+### Add the Local ClearML Helm Repository
+
+```bash
+helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <GITHUB_TOKEN> --password <GITHUB_TOKEN>
+helm repo update
+```
+
+### Requirements
+
+* Install the NVIDIA `gpu-operator` using Helm
+* Set the number of GPU slices to 8
+* Add and update the Nvidia Helm repo:
+
+  ```bash
+  helm repo add nvidia https://nvidia.github.io/gpu-operator
+  helm repo update
+  ```
+  
+* Credentials for the ClearML Enterprise DockerHub repository
+
+### GPU Operator Configuration
+
+#### For CFGI Version >= 1.3.0
+
+1. Create a Docker Registry secret named `clearml-dockerhub-access` in the `gpu-operator` namespace. Make sure to replace `<CLEARML_DOCKERHUB_TOKEN>` with your token.
+
+  ```bash
+  kubectl create secret -n gpu-operator docker-registry clearml-dockerhub-access \
+    --docker-server=docker.io \
+    --docker-username=allegroaienterprise \
+    --docker-password="<CLEARML_DOCKERHUB_TOKEN>" \
+    --docker-email=""
+  ```
+
+1. Create a `gpu-operator.override.yaml` file as follows:
+  * Set `devicePlugin.repository` to `docker.io/clearml` 
+  * Configure `devicePlugin.config.data.renamed-resources.sharing.timeSlicing.resources` for each GPU index on the host
+  * Use `nvidia.com/gpu-<INDEX>` format for the `rename` field, and set `replicas` to `8`.
+
+```yaml
+gfd:
+  imagePullSecrets:
+    - "clearml-dockerhub-access"
+toolkit:
+  env:
+    - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
+      value: "false"
+    - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
+      value: "true"
+devicePlugin:
+  repository: docker.io/clearml
+  image: k8s-device-plugin
+  version: v0.17.1-gpu-card-selection
+  imagePullPolicy: Always
+  imagePullSecrets:
+    - "clearml-dockerhub-access"
+  env:
+    - name: PASS_DEVICE_SPECS
+      value: "true"
+    - name: FAIL_ON_INIT_ERROR
+      value: "true"
+    - name: DEVICE_LIST_STRATEGY # Use volume-mounts
+      value: volume-mounts
+    - name: DEVICE_ID_STRATEGY
+      value: uuid
+    - name: NVIDIA_VISIBLE_DEVICES
+      value: all
+    - name: NVIDIA_DRIVER_CAPABILITIES
+      value: all
+  config:
+    name: device-plugin-config
+    create: true
+    default: "renamed-resources"
+    data:
+      renamed-resources: |-
+        version: v1
+        flags:
+          migStrategy: none
+        sharing:
+          timeSlicing:
+            renameByDefault: false
+            failRequestsGreaterThanOne: false
+            # Edit the following configuration as needed, adding as many GPU indices as many cards are installed on the Host.
+            resources:
+            - name: nvidia.com/gpu
+              rename: nvidia.com/gpu-0
+              devices:
+              - "0"
+              replicas: 8
+            - name: nvidia.com/gpu
+              rename: nvidia.com/gpu-1
+              devices:
+              - "1"
+              replicas: 8
+```
+
+#### For CFGI version < 1.3.0 (Legacy)
+
+Create a `gpu-operator.override.yaml` file:
+
+```yaml
+toolkit:
+  env:
+    - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
+      value: "false"
+    - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
+      value: "true"
+devicePlugin:
+  env:
+    - name: PASS_DEVICE_SPECS
+      value: "true"
+    - name: FAIL_ON_INIT_ERROR
+      value: "true"
+    - name: DEVICE_LIST_STRATEGY # Use volume-mounts
+      value: volume-mounts
+    - name: DEVICE_ID_STRATEGY
+      value: uuid
+    - name: NVIDIA_VISIBLE_DEVICES
+      value: all
+    - name: NVIDIA_DRIVER_CAPABILITIES
+      value: all
+  config:
+    name: device-plugin-config
+    create: true
+    default: "any"
+    data:
+      any: |-
+        version: v1
+        flags:
+          migStrategy: none
+        sharing:
+          timeSlicing:
+            renameByDefault: false
+            failRequestsGreaterThanOne: false
+            resources:
+              - name: nvidia.com/gpu
+                replicas: 8
+```
+
+### Install GPU Operator and CFGI 
+
+1. Install the NVIDIA `gpu-operator` using the previously created `gpu-operator.override.yaml` file:
+
+  ```bash
+  helm install -n gpu-operator gpu-operator nvidia/gpu-operator --create-namespace -f gpu-operator.override.yaml
+  ```
+
+1. Create a `cfgi-values.override.yaml` file with the following content:
+
+  ```yaml
+  imageCredentials:
+    password: "<CLEARML_DOCKERHUB_TOKEN>"
+  ```
+
+1. Install the CFGI Helm Chart using the previous override file:
+
+  ```bash
+  helm install -n cfgi cfgi clearml-enterprise/clearml-fractional-gpu-injector --create-namespace -f cfgi-values.override.yaml
+  ```
+
+## Usage
+
+To use fractional GPUs, label your pod with:
+
+```yaml
+labels:
+  clearml-injector/fraction: "<GPU_FRACTION_VALUE>"
+```
+
+Valid values for `"<GPU_FRACTION_VALUE>"` include: 
+
+* Fractions: 
+  * "0.0625" (1/16th)
+  * "0.125" (1/8th)
+  * "0.250"
+  * "0.375"
+  * "0.500"
+  * "0.625"
+  * "0.750"
+  * "0.875"
+* Integer representation of GPUs such as `1.000`, `2`, `2.0`, etc.
+
+### ClearML Agent Configuration
+
+To run ClearML jobs with fractional GPU allocation, configure your queues in accordingly in your `clearml-agent-values.override.yaml` file.
+
+Each queue should include a `templateOverride` that sets the `clearml-injector/fraction` label, which determines the 
+fraction of a GPU to allocate (e.g., "0.500" for half a GPU).
+
+This label is used by the CFGI to assign the correct portion of GPU resources to the pod running the task.
+
+#### CFGI Version >= 1.3.0
+
+Starting from version 1.3.0, there is no need to specify the resources field. You only need to set the labels:
+
+
+``` yaml
+agentk8sglue:
+  createQueues: true
+  queues:
+    gpu-fraction-1_000:
+      templateOverrides:
+        labels:
+          clearml-injector/fraction: "1.000"
+    gpu-fraction-0_500:
+      templateOverrides:
+        labels:
+          clearml-injector/fraction: "0.500"
+    gpu-fraction-0_250:
+      templateOverrides:
+        labels:
+          clearml-injector/fraction: "0.250"
+    gpu-fraction-0_125:
+      templateOverrides:
+        labels:
+          clearml-injector/fraction: "0.125"
+```
+
+#### CFGI Version < 1.3.0
+
+For versions older than 1.3.0, the GPU limits must be defined: 
+
+```yaml
+agentk8sglue:
+  createQueues: true
+  queues:
+    gpu-fraction-1_000:
+      templateOverrides:
+        resources:
+          limits:
+            nvidia.com/gpu: 8
+    gpu-fraction-0_500:
+      templateOverrides:
+        labels:
+          clearml-injector/fraction: "0.500"
+        resources:
+          limits:
+            nvidia.com/gpu: 4
+    gpu-fraction-0_250:
+      templateOverrides:
+        labels:
+          clearml-injector/fraction: "0.250"
+        resources:
+          limits:
+            nvidia.com/gpu: 2
+    gpu-fraction-0_125:
+      templateOverrides:
+        labels:
+          clearml-injector/fraction: "0.125"
+        resources:
+          limits:
+            nvidia.com/gpu: 1
+```
+
+## Upgrading CFGI Chart
+
+To upgrade to the latest chart version:
+
+```bash
+helm repo update
+helm upgrade -n cfgi cfgi clearml-enterprise/clearml-fractional-gpu-injector
+```
+
+To apply new values to an existing installation:
+
+```bash
+helm upgrade -n cfgi cfgi clearml-enterprise/clearml-fractional-gpu-injector -f cfgi-values.override.yaml
+```
+
+## Disabling Fractions
+
+To revert to standard GPU scheduling (without time slicing), remove the `devicePlugin.config` section from the `gpu-operator.override.yaml` 
+file and upgrade the `gpu-operator`:
+
+```yaml
+toolkit:
+  env:
+    - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
+      value: "false"
+    - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
+      value: "true"
+devicePlugin:
+  env:
+    - name: PASS_DEVICE_SPECS
+      value: "true"
+    - name: FAIL_ON_INIT_ERROR
+      value: "true"
+    - name: DEVICE_LIST_STRATEGY # Use volume-mounts
+      value: volume-mounts
+    - name: DEVICE_ID_STRATEGY
+      value: uuid
+    - name: NVIDIA_VISIBLE_DEVICES
+      value: all
+    - name: NVIDIA_DRIVER_CAPABILITIES
+      value: all
+```