Edits

2025-06-26 18:17:44 +00:00 · 2025-05-20 13:33:57 +03:00 · 2025-05-20 13:33:57 +03:00 · f0e9cbe027
commit f0e9cbe027
parent cc47ae1465
12 changed files with 88 additions and 162 deletions
--- a/docs/deploying_clearml/enterprise_deploy/agent_k8s.md
+++ b/docs/deploying_clearml/enterprise_deploy/agent_k8s.md
@ -11,7 +11,7 @@ The ClearML Agent enables scheduling and executing distributed experiments on a
  the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). For more information, see [ClearML API Credentials](../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials). 
  :::note
-  Make sure these credentials belong to an admin user or a service user with admin privileges.
+  Make sure these credentials belong to an admin user or a service account with admin privileges.
  :::
 - The worker environment must be able to access the ClearML Server over the same network.
@ -26,7 +26,7 @@ Add the ClearML Helm repository:
 helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN>
 ```
-Update the repository locally:
+Update the local repository:
 ```bash
 helm repo update
 ```
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/dynamic_edit_task_pod_template.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/dynamic_edit_task_pod_template.md
@ -18,8 +18,8 @@ Arguments passed to the function include:
 * `queue` (string) - ID of the queue from which the task was pulled.
 * `queue_name` (string) - Name of the queue from which the task was pulled.
 * `template` (Python dictionary) - Base Pod template created from the agent's configuration and any queue-specific overrides.
-* `task_data` (object) - Task data object (as returned by the `tasks.get_by_id` API call). For example, use `task_data.project` to get the task's project ID.
+* `task_data` (object) - [Task object](../../../references/sdk/task.md) (as returned by the `tasks.get_by_id` API call). For example, use `task_data.project` to get the task's project ID.
-* `providers_info` (dictionary) - Provider info containing optional information collected for the user running this task 
+* `providers_info` (dictionary) - [Identity provider](sso_login.md) info containing optional information collected for the user running this task 
  when the user logged into the system (requires additional server configuration).
 * `task_config` (`clearml_agent.backend_config.Config` object) - Task configuration containing configuration vaults applicable 
  for the user running this task, and other configuration. Use `task_config.get("...")` to get specific configuration values.
@ -248,11 +248,8 @@ agentk8sglue:
          - mountPath: "/tmp/task/"
            name: task-pvc
 ```
 :::
-### Example: Required Role 
+* The following is an example of `custom-agent-role` Role with permissions to handle `persistentvolumeclaims`:
 The following is an example of `custom-agent-role` Role with permissions to handle `persistentvolumeclaims`:
 ```yaml
 apiVersion: rbac.authorization.k8s.io/v1
@ -272,3 +269,5 @@ rules:
  - patch
  - delete
 ```
 :::
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/gpu_operator.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/gpu_operator.md
@ -12,7 +12,7 @@ Add the NVIDIA GPU Operator Helm repository:
 helm repo add nvidia https://nvidia.github.io/gpu-operator
 ```
-Update the repository locally:
+Update the local repository:
 ```bash
 helm repo update
 ```
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/multi_node_training.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/multi_node_training.md
@ -2,10 +2,28 @@
 title: Multi-Node Training
 --- 
-The ClearML Enterprise Agent supports horizontal multi-node training, allowing a single Task to run across multiple pods 
+The ClearML Enterprise Agent supports horizontal multi-node training, allowing a single ClearML Task to run across multiple pods 
 on different nodes.
-Below is a configuration example using `clearml-agent-values.override.yaml`:
+This is useful for distributed training where the training job needs to span multiple GPUs and potentially 
 multiple nodes.
 To enable multi-node scheduling, set both `agentk8sglue.serviceAccountClusterAccess` and `agentk8sglue.multiNode` to `true`. 
 Multi-node behavior is controlled using the `multiNode` key in a queue configuration. This setting tells the 
 agent how to divide a Task's GPU requirements across multiple pods, with each pod running a part of the training job.
 Below is a configuration example using `clearml-agent-values.override.yaml` to enable multi-node training.
 In this example:
 * The `multiNode: [4, 2]` setting means splits the Task into two workloads:
  * One workload will need 4 GPUs
  * The other workload will need 2 GPUs
 * The GPU limit per pod is set to `nvidia.com/gpu: 2`, meaning each pod will be limited to 2 GPUs
 With this setup:
 * The first workload (which needs 4 GPUs) will be scheduled as 2 pods, each with 2 GPUs
 * The second workload (which needs 2 GPUs) will be scheduled as 1 pod with 2 GPUs
 ```yaml
 agentk8sglue:
@ -17,7 +35,7 @@ agentk8sglue:
  queues:
    multi-node-example:
      queueSettings:
-        # Defines the distribution of GPUs Tasks across multiple nodes. The format [x, y, ...] specifies the distribution of Tasks as 'x' GPUs on a node and 'y' GPUs on another node. Multiple Pods will be spawned respectively based on the lowest-common-denominator defined.
+         # Defines GPU needs per worker (e.g., 4 GPUs and 2 GPUs). Multiple Pods will be spawned respectively based on the lowest-common-denominator defined.
        multiNode: [ 4, 2 ]
      templateOverrides:
        resources:
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/presign_service.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/presign_service.md
@ -1,9 +1,18 @@
 ---
-title: ClearML Presign Service
+title: ClearML S3 Presign Service
 ---
 The ClearML Presign Service is a secure service that generates and redirects pre-signed storage URLs for authenticated 
-users, enabling direct access to cloud-hosted data (e.g., S3) without exposing credentials.
+users, enabling direct access to S3 data without exposing credentials.
 When configured, the ClearML WebApp automatically redirects requests for matching storage URLs (like `s3://...`) to the 
 Presign Service. The service:
 * Verifies the user's ClearML authentication.
 * Generates a temporary, secure (pre-signed) S3 URL.
 * Redirects the user's browser to the URL for direct access.
 This setup ensures secure access to S3-hosted data.
 ## Prerequisites
@ -12,7 +21,7 @@ users, enabling direct access to cloud-hosted data (e.g., S3) without exposing c
  the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). For more information, see [ClearML API Credentials](../../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials).
  :::note
-  Make sure these credentials belong to an admin user or a service user with admin privileges.
+  Make sure these credentials belong to an admin user or a service account with admin privileges.
  :::
 - The worker environment must be able to access the ClearML Server over the same network.
@ -27,7 +36,7 @@ Add the ClearML Helm repository:
 helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN>
 ```
-Update the repository locally:
+Update the local repository:
 ```bash
 helm repo update
 ```
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/self_signed_certificates.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/self_signed_certificates.md
@ -1,13 +1,15 @@
 ---
-title: ClearML Tenant with Self Signed Certificates
+title: Self-Signed Certificates for ClearML Agent and AI App Gateway
 ---
-This guide covers how to configure the [AI Application Gateway](#ai-application-gateway) and [ClearML Agent](#clearml-agent) 
+This guide covers how to configure the [AI Application Gateway](../appgw.md) and [ClearML Agent](../agent_k8s.md) 
 to use self-signed or custom SSL certificates. 
-## AI Application Gateway
+## Certificate Configuration
-To configure certificates for the Application Gateway, update your `clearml-app-gateway-values.override.yaml` file:
+To configure certificates, update the following files:
 * For AI Application Gateway: `clearml-app-gateway-values.override.yaml` file
 * For ClearML Agent: `clearml-agent-values.override.yaml`
 ```yaml
 # -- Custom certificates
@ -72,83 +74,7 @@ customCertificates:
         -----END CERTIFICATE-----
 ```
-### Apply Changes
+### ClearML Agent: Add Certificates to Task Pods
 To apply the changes, run the update command:
 ```bash
 helm upgrade -i <RELEASE_NAME> -n <WORKLOAD_NAMESPACE> clearml-enterprise/clearml-enterprise-app-gateway --version <CHART_VERSION> -f clearml-app-gateway-values.override.yaml
 ```
 ## ClearML Agent
 For the ClearML Agent, configure certificates in the `clearml-agent-values.override.yaml` file:
 ```yaml
 # -- Custom certificates
 customCertificates:
  # -- Override system crt certificate bundle. Mutual exclusive with extraCerts.
  overrideCaCertificatesCrt:
  # -- Extra certs usable in case of needs of adding more certificates to the standard bundle, Requires root permissions to run update-ca-certificates. Mutual exclusive with overrideCaCertificatesCrt.
  extraCerts:
     - alias: certificateName
       pem: |
         -----BEGIN CERTIFICATE-----
         ###
         -----END CERTIFICATE-----
 ```
 You have two configuration options:
 - [**Replace**](#replace-entire-ca-certificatescrt-file-1) the entire `ca-certificates.crt` file
 - [**Append**](#append-extra-certificates-to-the-existing-ca-certificatescrt-1) extra certificates to the existing `ca-certificates.crt`
 ### Replace Entire ca-certificates.crt File
 To replace the whole ca-bundle, provide a concatenated list of all trusted CA certificates in `pem` format as 
 they are stored in a standard `ca-certificates.crt`.
 ```yaml
 # -- Custom certificates
 customCertificates:
  # -- Override system crt certificate bundle. Mutual exclusive with extraCerts.
  overrideCaCertificatesCrt: |
    -----BEGIN CERTIFICATE-----
    ### CERT 1
    -----END CERTIFICATE-----
    -----BEGIN CERTIFICATE-----
    ### CERT 2
    -----END CERTIFICATE-----
    -----BEGIN CERTIFICATE-----
    ### CERT 3
    -----END CERTIFICATE-----
   ...
 ```
 ### Append Extra Certificates to the Existing ca-certificates.crt
 You can add certificates to the existing CA bundle. Each certificate must have a unique `alias`.
 ```yaml
 # -- Custom certificates
 customCertificates:
  # -- Extra certs usable in case of needs of adding more certificates to the standard bundle, Requires root permissions to run update-ca-certificates. Mutual exclusive with overrideCaCertificatesCrt.
  extraCerts:
     - alias: certificate-name-1
       pem: |
         -----BEGIN CERTIFICATE-----
         ###
         -----END CERTIFICATE-----
     - alias: certificate-name-2
       pem: |
         -----BEGIN CERTIFICATE-----
         ###
         -----END CERTIFICATE-----
 ```
 ### Add Certificates to Task Pods
 If your workloads need access to these certificates (e.g., for HTTPS requests), configure the agent to inject them into pods:
@ -194,7 +120,14 @@ Their names are usually prefixed with the Helm release name, so adjust according
 ### Apply Changes
-Apply the changes by running the update command:
+To apply the changes, run the update command:
 * For AI Application Gateway:
   ```bash
   helm upgrade -i <RELEASE_NAME> -n <WORKLOAD_NAMESPACE> clearml-enterprise/clearml-enterprise-app-gateway --version <CHART_VERSION> -f clearml-app-gateway-values.override.yaml
   ```
 * For ClearML Agent: 
   ```bash
   helm upgrade -i -n <WORKER_NAMESPACE> clearml-agent clearml-enterprise/clearml-enterprise-agent --create-namespace -f clearml-agent-values.override.yaml
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/sso_login.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/sso_login.md
@ -6,13 +6,8 @@ ClearML Enterprise Server supports various Single Sign-On (SSO) identity provide
 SSO configuration is managed via environment variables in your `clearml-values.override.yaml` file and is applied to the 
 `apiserver` component.
-The following are configuration examples for commonly used providers. Other supported systems include: 
+The following are configuration examples for commonly used identity providers. See [full list of supported identity providers](../../../webapp/settings/webapp_settings_id_providers.md).
-* Auth0
+
 * Keycloak
 * Okta
 * Azure AD
 * Google
 * AWS Cognito
 ## Auth0
@ -52,7 +47,7 @@ apiserver:
      value: "true"
 ```
-## Group Membership Mapping in Keycloak
+### Group Membership Mapping in Keycloak
 To map Keycloak groups into the ClearML user's SSO token:
--- a/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cdmo.md
+++ b/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cdmo.md
@ -1,54 +1,20 @@
 ---
-title: ClearML Dynamic MIG Operator (CDMO)
+title: Managing GPU Fragments with ClearML Dynamic MIG Operator (CDMO)
 ---
-The  ClearML Dynamic MIG Operator (CDMO) enables dynamic MIG (Multi-Instance GPU) configurations.
+This guide covers using GPU fragments in Kubernetes clusters using NVIDIA MIGs and
 ClearML's Dynamic MIG Operator (CDMO). CDMO enables dynamic MIG (Multi-Instance GPU) configurations. 
 This guide covers:
 * Installing CDMO
 * Enabling MIG mode on your cluster
 * Managing GPU partitioning dynamically 
 ## Installation
 ### Requirements
-* Add and update the Nvidia Helm repo:
+* Install the NVIDIA `gpu-operator` using Helm. For instructions, see [Basic Deployment](../extra_configs/gpu_operator.md).
  ```bash
  helm repo add nvidia https://nvidia.github.io/gpu-operator
  helm repo update
  ```
 * Create a `gpu-operator.override.yaml` file with the following content:
  ```yaml
  migManager:
    enabled: false
  mig:
    strategy: mixed
  toolkit:
    env:
      - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
        value: "false"
      - name: ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
        value: "true"
  devicePlugin:
    env:
      - name: PASS_DEVICE_SPECS
        value: "true"
      - name: FAIL_ON_INIT_ERROR
        value: "true"
      - name: DEVICE_LIST_STRATEGY # Use volume-mounts
        value: volume-mounts
      - name: DEVICE_ID_STRATEGY
        value: uuid
      - name: NVIDIA_VISIBLE_DEVICES
        value: all
      - name: NVIDIA_DRIVER_CAPABILITIES
        value: all
  ```
 * Install the NVIDIA `gpu-operator` using Helm with the previous configuration:
  ```bash
  helm install -n gpu-operator gpu-operator nvidia/gpu-operator --create-namespace -f gpu-operator.override.yaml
  ```
 ### Installing CDMO 
@ -78,7 +44,7 @@ The  ClearML Dynamic MIG Operator (CDMO) enables dynamic MIG (Multi-Instance GPU
   * For convenience, this command can be run from within the `nvidia-device-plugin-daemonset` pod running on the related node.
   :::
-1. Label all MIG-enabled GPU node `<NODE_NAME>` from the previous step:
+1. Label all MIG-enabled GPU nodes `<NODE_NAME>` from the previous step:
   ```bash
   kubectl label nodes <NODE_NAME> "cdmo.clear.ml/gpu-partitioning=mig"
@ -106,7 +72,7 @@ To disable MIG mode and restore standard full-GPU access:
    nvidia-smi -mig 0
    ```
-4. Edit the `gpu-operator.override.yaml` file to restore full-GPU access, and upgrade the `gpu-operator`:
+4. Edit the `gpu-operator.override.yaml` file to restore full-GPU access: 
    ```yaml
    toolkit:
@ -130,3 +96,9 @@ To disable MIG mode and restore standard full-GPU access:
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: all
    ```
 5. Upgrade the `gpu-operator`:
   ```bash
   helm upgrade -n gpu-operator gpu-operator nvidia/gpu-operator -f gpu-operator.override.yaml
   ```
--- a/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cfgi.md
+++ b/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cfgi.md
@ -16,7 +16,7 @@ helm repo update
 ### Requirements
-* Install the NVIDIA `gpu-operator` using Helm
+* Install the NVIDIA `gpu-operator` using Helm. For instructions, see [Basic Deployment](../extra_configs/gpu_operator.md).
 * Set the number of GPU slices to 8
 * Add and update the Nvidia Helm repo:
@ -191,7 +191,7 @@ Valid values for `"<GPU_FRACTION_VALUE>"` include:
 ### ClearML Agent Configuration
-To run ClearML jobs with fractional GPU allocation, configure your queues in accordingly in your `clearml-agent-values.override.yaml` file.
+To run ClearML jobs with fractional GPU allocation, configure your queues in your `clearml-agent-values.override.yaml` file.
 Each queue should include a `templateOverride` that sets the `clearml-injector/fraction` label, which determines the 
 fraction of a GPU to allocate (e.g., "0.500" for half a GPU).
--- a/docs/deploying_clearml/enterprise_deploy/k8s.md
+++ b/docs/deploying_clearml/enterprise_deploy/k8s.md
@ -50,7 +50,7 @@ Add the ClearML Helm repository:
 helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN>
 ```
-Update the repository locally:
+Update the local repository:
 ``` bash
 helm repo update
 ```
--- a/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md
+++ b/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md
@ -709,7 +709,7 @@ The following features can be assigned to groups via the `features` configuratio
 | `reports` | Enables access to Reports. | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
 | `resource_dashboard` | Enables access to the compute resource dashboard feature. | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
 | `sso_management` | Enables the SSO (Single Sign-On) configuration wizard. | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
-| `service_users` | Enables support for creating and managing service users (API keys). | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
+| `service_users` | Enables support for creating and managing service accounts (API keys). | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
 | `resource_policy` | Enables the resource policy feature. | May default to a trial feature if not explicitly enabled. |
 | `model_serving` | Enables access to the model serving endpoints feature. | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
 | `show_dashboard` | Makes the "Dashboard" menu item visible in the UI sidebar. | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
--- a/docs/release_notes/clearml_server/enterprise/ver_3_25.md
+++ b/docs/release_notes/clearml_server/enterprise/ver_3_25.md
@ -29,7 +29,7 @@ title: Version 3.25
  * Display per-GPU metrics in "CPU and GPU Usage" and "Video Memory" graphs when multiple GPUs are available
  * Add "GPU Count" column to the Resource Groups table in the Orchestration Dashboard
 * Add global search bar to all UI pages
-* Enable setting service users as admins
+* Enable setting service accounts as admins
 * Add filter to UI Model Endpoints table 
 * Add UI scalar viewing configuration on a per-project basis ([ClearML GitHub issue #1377](https://github.com/clearml/clearml/issues/1377))
 * Add clicking project name in breadcrumbs of full-screen task opens the task in detail’s view ([ClearML GitHub issue #1376](https://github.com/clearml/clearml/issues/1376))
@ -42,7 +42,7 @@ title: Version 3.25
 * Fix EMA smoothing in UI scalars is incorrect in first data point ([ClearML Web GitHub issue #101](https://github.com/clearml/clearml-web/issues/101))
 * Improve UI scalar smoothing algorithms (ClearML Web GitHub issues [#101](https://github.com/clearml/clearml-web/issues/101), [#102](https://github.com/clearml/clearml-web/issues/102), [#103](https://github.com/clearml/clearml-web/issues/103))
 * Fix UI Users & Groups table's "Groups" column data remains condensed after column is expanded
-* Fix setting service users as admins causes apiserver to crash
+* Fix setting service accounts as admins causes apiserver to crash
 * Fix UI "New Dataview" modal's version selection sometimes does not display draft versions
 * Fix GCS and Azure credential input popups not displaying in UI task debug samples
 * Fix UI pipeline "Preview" tab sometimes displays "Failed to get plot charts" error