Edit Enterprise Server pages

2025-06-26 18:17:44 +00:00 · 2025-05-22 13:48:22 +03:00 · 2025-05-22 13:48:22 +03:00 · ac733ba3f0
commit ac733ba3f0
parent ffa57dcd4b 41e455f46c
12 changed files with 87 additions and 121 deletions
--- a/docs/deploying_clearml/enterprise_deploy/agent_k8s.md
+++ b/docs/deploying_clearml/enterprise_deploy/agent_k8s.md
@ -11,7 +11,7 @@ The ClearML Agent enables scheduling and executing distributed experiments on a
  the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). For more information, see [ClearML API Credentials](../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials). 

  :::note
-  Make sure these credentials belong to an admin user or a service user with admin privileges.
+  Make sure these credentials belong to an admin user or a service account with admin privileges.
  :::
 
 - The worker environment must be able to access the ClearML Server over the same network.
@ -26,7 +26,7 @@ Add the ClearML Helm repository:
 helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN>
 ```

-Update the repository locally:
+Update the local repository:
 ```bash
 helm repo update
 ```
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/dynamic_edit_task_pod_template.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/dynamic_edit_task_pod_template.md
@ -18,8 +18,8 @@ Arguments passed to the function include:
 * `queue` (string) - ID of the queue from which the task was pulled.
 * `queue_name` (string) - Name of the queue from which the task was pulled.
 * `template` (Python dictionary) - Base Pod template created from the agent's configuration and any queue-specific overrides.
-* `task_data` (object) - Task data object (as returned by the `tasks.get_by_id` API call). For example, use `task_data.project` to get the task's project ID.
-* `providers_info` (dictionary) - Provider info containing optional information collected for the user running this task 
+* `task_data` (object) - [Task object](../../../references/sdk/task.md) (as returned by the `tasks.get_by_id` API call). For example, use `task_data.project` to get the task's project ID.
+* `providers_info` (dictionary) - [Identity provider](sso_login.md) info containing optional information collected for the user running this task 
  when the user logged into the system (requires additional server configuration).
 * `task_config` (`clearml_agent.backend_config.Config` object) - Task configuration containing configuration vaults applicable 
  for the user running this task, and other configuration. Use `task_config.get("...")` to get specific configuration values.
@ -248,11 +248,8 @@ agentk8sglue:
          - mountPath: "/tmp/task/"
            name: task-pvc
 ```
-:::

-### Example: Required Role 
-
-The following is an example of `custom-agent-role` Role with permissions to handle `persistentvolumeclaims`:
+* The following is an example of `custom-agent-role` Role with permissions to handle `persistentvolumeclaims`:

 ```yaml
 apiVersion: rbac.authorization.k8s.io/v1
@ -271,4 +268,6 @@ rules:
  - create
  - patch
  - delete
-```
+```
+
+:::
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/gpu_operator.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/gpu_operator.md
@ -12,7 +12,7 @@ Add the NVIDIA GPU Operator Helm repository:
 helm repo add nvidia https://nvidia.github.io/gpu-operator
 ```

-Update the repository locally:
+Update the local repository:
 ```bash
 helm repo update
 ```
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/multi_node_training.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/multi_node_training.md
@ -2,10 +2,28 @@
 title: Multi-Node Training
 --- 

-The ClearML Enterprise Agent supports horizontal multi-node training, allowing a single Task to run across multiple pods 
+The ClearML Enterprise Agent supports horizontal multi-node training, allowing a single ClearML Task to run across multiple pods 
 on different nodes.

-Below is a configuration example using `clearml-agent-values.override.yaml`:
+This is useful for distributed training where the training job needs to span multiple GPUs and potentially 
+multiple nodes.
+
+To enable multi-node scheduling, set both `agentk8sglue.serviceAccountClusterAccess` and `agentk8sglue.multiNode` to `true`. 
+
+Multi-node behavior is controlled using the `multiNode` key in a queue configuration. This setting tells the 
+agent how to divide a Task's GPU requirements across multiple pods, with each pod running a part of the training job.
+
+Below is a configuration example using `clearml-agent-values.override.yaml` to enable multi-node training.
+
+In this example:
+* The `multiNode: [4, 2]` setting means splits the Task into two workloads:
+  * One workload will need 4 GPUs
+  * The other workload will need 2 GPUs
+* The GPU limit per pod is set to `nvidia.com/gpu: 2`, meaning each pod will be limited to 2 GPUs
+
+With this setup:
+* The first workload (which needs 4 GPUs) will be scheduled as 2 pods, each with 2 GPUs
+* The second workload (which needs 2 GPUs) will be scheduled as 1 pod with 2 GPUs

 ```yaml
 agentk8sglue:
@ -17,7 +35,7 @@ agentk8sglue:
  queues:
    multi-node-example:
      queueSettings:
-        # Defines the distribution of GPUs Tasks across multiple nodes. The format [x, y, ...] specifies the distribution of Tasks as 'x' GPUs on a node and 'y' GPUs on another node. Multiple Pods will be spawned respectively based on the lowest-common-denominator defined.
+         # Defines GPU needs per worker (e.g., 4 GPUs and 2 GPUs). Multiple Pods will be spawned respectively based on the lowest-common-denominator defined.
        multiNode: [ 4, 2 ]
      templateOverrides:
        resources:
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/presign_service.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/presign_service.md
@ -1,9 +1,18 @@
 ---
-title: ClearML Presign Service
+title: ClearML S3 Presign Service
 ---

 The ClearML Presign Service is a secure service that generates and redirects pre-signed storage URLs for authenticated 
-users, enabling direct access to cloud-hosted data (e.g., S3) without exposing credentials.
+users, enabling direct access to S3 data without exposing credentials.
+
+When configured, the ClearML WebApp automatically redirects requests for matching storage URLs (like `s3://...`) to the 
+Presign Service. The service:
+
+* Authenticates the use with ClearML.
+* Generates a temporary, secure (pre-signed) S3 URL.
+* Redirects the user's browser to the URL for direct access.
+
+This setup ensures secure access to S3-hosted data.

 ## Prerequisites

@ -12,7 +21,7 @@ users, enabling direct access to cloud-hosted data (e.g., S3) without exposing c
  the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). For more information, see [ClearML API Credentials](../../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials).

  :::note
-  Make sure these credentials belong to an admin user or a service user with admin privileges.
+  Make sure these credentials belong to an admin user or a service account with admin privileges.
  :::
 
 - The worker environment must be able to access the ClearML Server over the same network.
@ -27,7 +36,7 @@ Add the ClearML Helm repository:
 helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN>
 ```

-Update the repository locally:
+Update the local repository:
 ```bash
 helm repo update
 ```
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/self_signed_certificates.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/self_signed_certificates.md
@ -1,13 +1,15 @@
 ---
-title: ClearML Tenant with Self Signed Certificates
+title: Kubernetes Deployment with Self-Signed Certificates
 ---

-This guide covers how to configure the [AI Application Gateway](#ai-application-gateway) and [ClearML Agent](#clearml-agent) 
+This guide covers how to configure the [AI Application Gateway](../appgw.md) and [ClearML Agent](../agent_k8s.md) 
 to use self-signed or custom SSL certificates. 

-## AI Application Gateway
+## Certificate Configuration

-To configure certificates for the Application Gateway, update your `clearml-app-gateway-values.override.yaml` file:
+To configure certificates, update the applicable overrides file:
+* For AI Application Gateway: `clearml-app-gateway-values.override.yaml` file
+* For ClearML Agent: `clearml-agent-values.override.yaml` file

 ```yaml
 # -- Custom certificates
@ -72,83 +74,7 @@ customCertificates:
         -----END CERTIFICATE-----
 ```

-### Apply Changes
-
-To apply the changes, run the update command:
-
-```bash
-helm upgrade -i <RELEASE_NAME> -n <WORKLOAD_NAMESPACE> clearml-enterprise/clearml-enterprise-app-gateway --version <CHART_VERSION> -f clearml-app-gateway-values.override.yaml
-```
-
-## ClearML Agent
-
-For the ClearML Agent, configure certificates in the `clearml-agent-values.override.yaml` file:
-
-```yaml
-# -- Custom certificates
-customCertificates:
-  # -- Override system crt certificate bundle. Mutual exclusive with extraCerts.
-  overrideCaCertificatesCrt:
-  # -- Extra certs usable in case of needs of adding more certificates to the standard bundle, Requires root permissions to run update-ca-certificates. Mutual exclusive with overrideCaCertificatesCrt.
-  extraCerts:
-     - alias: certificateName
-       pem: |
-         -----BEGIN CERTIFICATE-----
-         ###
-         -----END CERTIFICATE-----
-```
-
-You have two configuration options:
-
- [**Replace**](#replace-entire-ca-certificatescrt-file-1) the entire `ca-certificates.crt` file
- [**Append**](#append-extra-certificates-to-the-existing-ca-certificatescrt-1) extra certificates to the existing `ca-certificates.crt`
-
-
-### Replace Entire ca-certificates.crt File
-
-To replace the whole ca-bundle, provide a concatenated list of all trusted CA certificates in `pem` format as 
-they are stored in a standard `ca-certificates.crt`.
-
-
-```yaml
-# -- Custom certificates
-customCertificates:
-  # -- Override system crt certificate bundle. Mutual exclusive with extraCerts.
-  overrideCaCertificatesCrt: |
-    -----BEGIN CERTIFICATE-----
-    ### CERT 1
-    -----END CERTIFICATE-----
-    -----BEGIN CERTIFICATE-----
-    ### CERT 2
-    -----END CERTIFICATE-----
-    -----BEGIN CERTIFICATE-----
-    ### CERT 3
-    -----END CERTIFICATE-----
-   ...
-```
-
-### Append Extra Certificates to the Existing ca-certificates.crt
-
-You can add certificates to the existing CA bundle. Each certificate must have a unique `alias`.
-
-```yaml
-# -- Custom certificates
-customCertificates:
-  # -- Extra certs usable in case of needs of adding more certificates to the standard bundle, Requires root permissions to run update-ca-certificates. Mutual exclusive with overrideCaCertificatesCrt.
-  extraCerts:
-     - alias: certificate-name-1
-       pem: |
-         -----BEGIN CERTIFICATE-----
-         ###
-         -----END CERTIFICATE-----
-     - alias: certificate-name-2
-       pem: |
-         -----BEGIN CERTIFICATE-----
-         ###
-         -----END CERTIFICATE-----
-```
-
-### Add Certificates to Task Pods
+### ClearML Agent: Add Certificates to Task Pods

 If your workloads need access to these certificates (e.g., for HTTPS requests), configure the agent to inject them into pods:

@ -194,8 +120,15 @@ Their names are usually prefixed with the Helm release name, so adjust according

 ### Apply Changes

-Apply the changes by running the update command:
+To apply the changes, run the update command:
+* For AI Application Gateway:

-``` bash
-helm upgrade -i -n <WORKER_NAMESPACE> clearml-agent clearml-enterprise/clearml-enterprise-agent --create-namespace -f clearml-agent-values.override.yaml
-```
+   ```bash
+   helm upgrade -i <RELEASE_NAME> -n <WORKLOAD_NAMESPACE> clearml-enterprise/clearml-enterprise-app-gateway --version <CHART_VERSION> -f clearml-app-gateway-values.override.yaml
+   ```
+
+* For ClearML Agent: 
+
+   ```bash
+   helm upgrade -i -n <WORKER_NAMESPACE> clearml-agent clearml-enterprise/clearml-enterprise-agent --create-namespace -f clearml-agent-values.override.yaml
+   ```
--- a/docs/deploying_clearml/enterprise_deploy/extra_configs/sso_login.md
+++ b/docs/deploying_clearml/enterprise_deploy/extra_configs/sso_login.md
@ -6,13 +6,8 @@ ClearML Enterprise Server supports various Single Sign-On (SSO) identity provide
 SSO configuration is managed via environment variables in your `clearml-values.override.yaml` file and is applied to the 
 `apiserver` component.

-The following are configuration examples for commonly used providers. Other supported systems include: 
-* Auth0
-* Keycloak
-* Okta
-* Azure AD
-* Google
-* AWS Cognito
+The following are configuration examples for commonly used identity providers. See [full list of supported identity providers](../../../webapp/settings/webapp_settings_id_providers.md).
+

 ## Auth0

@ -52,7 +47,7 @@ apiserver:
      value: "true"
 ```

-## Group Membership Mapping in Keycloak
+### Group Membership Mapping in Keycloak

 To map Keycloak groups into the ClearML user's SSO token:

--- a/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cdmo.md
+++ b/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cdmo.md
@ -1,8 +1,14 @@
 ---
-title: ClearML Dynamic MIG Operator (CDMO)
+title: Managing GPU Fractions with ClearML Dynamic MIG Operator (CDMO)
 ---

-The  ClearML Dynamic MIG Operator (CDMO) enables dynamic MIG (Multi-Instance GPU) configurations.
+This guide covers using GPU fractions in Kubernetes clusters using NVIDIA MIGs and
+ClearML's Dynamic MIG Operator (CDMO). CDMO enables dynamic MIG (Multi-Instance GPU) configurations. 
+
+This guide covers:
+* Installing CDMO
+* Enabling MIG mode on your cluster
+* Managing GPU partitioning dynamically 

 ## Installation

@ -78,7 +84,7 @@ The  ClearML Dynamic MIG Operator (CDMO) enables dynamic MIG (Multi-Instance GPU
   * For convenience, this command can be run from within the `nvidia-device-plugin-daemonset` pod running on the related node.
   :::

-1. Label all MIG-enabled GPU node `<NODE_NAME>` from the previous step:
+1. Label all MIG-enabled GPU nodes `<NODE_NAME>` from the previous step:

   ```bash
   kubectl label nodes <NODE_NAME> "cdmo.clear.ml/gpu-partitioning=mig"
@ -106,7 +112,7 @@ To disable MIG mode and restore standard full-GPU access:
    nvidia-smi -mig 0
    ```

-4. Edit the `gpu-operator.override.yaml` file to restore full-GPU access, and upgrade the `gpu-operator`:
+4. Edit the `gpu-operator.override.yaml` file to restore full-GPU access: 

    ```yaml
    toolkit:
@ -129,4 +135,10 @@ To disable MIG mode and restore standard full-GPU access:
          value: all
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: all
-    ```
+    ```
+   
+5. Upgrade the `gpu-operator`:
+
+   ```bash
+   helm upgrade -n gpu-operator gpu-operator nvidia/gpu-operator -f gpu-operator.override.yaml
+   ```
--- a/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cfgi.md
+++ b/docs/deploying_clearml/enterprise_deploy/fractional_gpus/cfgi.md
@ -16,7 +16,7 @@ helm repo update

 ### Requirements

-* Install the NVIDIA `gpu-operator` using Helm
+* Install the NVIDIA `gpu-operator` using Helm. For instructions, see [Basic Deployment](../extra_configs/gpu_operator.md).
 * Set the number of GPU slices to 8
 * Add and update the Nvidia Helm repo:

@ -191,7 +191,7 @@ Valid values for `"<GPU_FRACTION_VALUE>"` include:

 ### ClearML Agent Configuration

-To run ClearML jobs with fractional GPU allocation, configure your queues in accordingly in your `clearml-agent-values.override.yaml` file.
+To run ClearML jobs with fractional GPU allocation, configure your queues in your `clearml-agent-values.override.yaml` file.

 Each queue should include a `templateOverride` that sets the `clearml-injector/fraction` label, which determines the 
 fraction of a GPU to allocate (e.g., "0.500" for half a GPU).
--- a/docs/deploying_clearml/enterprise_deploy/k8s.md
+++ b/docs/deploying_clearml/enterprise_deploy/k8s.md
@ -50,7 +50,7 @@ Add the ClearML Helm repository:
 helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN>
 ```

-Update the repository locally:
+Update the local repository:
 ``` bash
 helm repo update
 ```
--- a/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md
+++ b/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md
@ -709,7 +709,7 @@ The following features can be assigned to groups via the `features` configuratio
 | `reports` | Enables access to Reports. | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
 | `resource_dashboard` | Enables access to the compute resource dashboard feature. | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
 | `sso_management` | Enables the SSO (Single Sign-On) configuration wizard. | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
-| `service_users` | Enables support for creating and managing service users (API keys). | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
+| `service_users` | Enables support for creating and managing service accounts (API keys). | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
 | `resource_policy` | Enables the resource policy feature. | May default to a trial feature if not explicitly enabled. |
 | `model_serving` | Enables access to the model serving endpoints feature. | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
 | `show_dashboard` | Makes the "Dashboard" menu item visible in the UI sidebar. | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
--- a/docs/release_notes/clearml_server/enterprise/ver_3_25.md
+++ b/docs/release_notes/clearml_server/enterprise/ver_3_25.md
@ -29,7 +29,7 @@ title: Version 3.25
  * Display per-GPU metrics in "CPU and GPU Usage" and "Video Memory" graphs when multiple GPUs are available
  * Add "GPU Count" column to the Resource Groups table in the Orchestration Dashboard
 * Add global search bar to all UI pages
-* Enable setting service users as admins
+* Enable setting service accounts as admins
 * Add filter to UI Model Endpoints table 
 * Add UI scalar viewing configuration on a per-project basis ([ClearML GitHub issue #1377](https://github.com/clearml/clearml/issues/1377))
 * Add clicking project name in breadcrumbs of full-screen task opens the task in detail’s view ([ClearML GitHub issue #1376](https://github.com/clearml/clearml/issues/1376))
@ -42,7 +42,7 @@ title: Version 3.25
 * Fix EMA smoothing in UI scalars is incorrect in first data point ([ClearML Web GitHub issue #101](https://github.com/clearml/clearml-web/issues/101))
 * Improve UI scalar smoothing algorithms (ClearML Web GitHub issues [#101](https://github.com/clearml/clearml-web/issues/101), [#102](https://github.com/clearml/clearml-web/issues/102), [#103](https://github.com/clearml/clearml-web/issues/103))
 * Fix UI Users & Groups table's "Groups" column data remains condensed after column is expanded
-* Fix setting service users as admins causes apiserver to crash
+* Fix setting service accounts as admins causes apiserver to crash
 * Fix UI "New Dataview" modal's version selection sometimes does not display draft versions
 * Fix GCS and Azure credential input popups not displaying in UI task debug samples
 * Fix UI pipeline "Preview" tab sometimes displays "Failed to get plot charts" error