Update multi-tenant setup guide

This commit is contained in:
revital 2025-05-26 10:06:18 +03:00
commit 3d9abe5bf7
15 changed files with 127 additions and 125 deletions

View File

@ -65,6 +65,7 @@ errors in identifying the correct default branch.
| `--docker_bash_setup_script` | Add a bash script to be executed inside the container before setting up the task's environment | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> | | `--docker_bash_setup_script` | Add a bash script to be executed inside the container before setting up the task's environment | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--docker_args` | Add Docker arguments. Pass a single string in the following format: `--docker_args "<argument_string>"`. For example: `--docker_args "-v some_dir_1:other_dir_1 -v some_dir_2:other_dir_2"` | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> | | `--docker_args` | Add Docker arguments. Pass a single string in the following format: `--docker_args "<argument_string>"`. For example: `--docker_args "-v some_dir_1:other_dir_1 -v some_dir_2:other_dir_2"` | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--folder` | Execute the code from a local folder. Notice, it assumes a git repository already exists. Current state of the repo (commit ID and uncommitted changes) is logged and replicated on the remote machine | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> | | `--folder` | Execute the code from a local folder. Notice, it assumes a git repository already exists. Current state of the repo (commit ID and uncommitted changes) is logged and replicated on the remote machine | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--force-no-requirements` | If specified, skips all package and requirements installation, and neither packages nor a requirements file need to be provided. | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--import-offline-session`| Specify the path to the offline session you want to import.| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> | | `--import-offline-session`| Specify the path to the offline session you want to import.| <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--name` | Set a target name for the new task | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> | | `--name` | Set a target name for the new task | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--output-uri` | Set the task `output_uri`, upload destination for task models and artifacts | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> | | `--output-uri` | Set the task `output_uri`, upload destination for task models and artifacts | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
@ -73,7 +74,9 @@ errors in identifying the correct default branch.
| `--queue` | Select a task's execution queue. If not provided, a task is created but not launched | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> | | `--queue` | Select a task's execution queue. If not provided, a task is created but not launched | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--repo` | URL of remote repository. Example: `--repo https://github.com/clearml/clearml.git` | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> | | `--repo` | URL of remote repository. Example: `--repo https://github.com/clearml/clearml.git` | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--requirements` | Specify `requirements.txt` file to install when setting the session. By default, the` requirements.txt` from the repository will be used | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> | | `--requirements` | Specify `requirements.txt` file to install when setting the session. By default, the` requirements.txt` from the repository will be used | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--skip-python-env-install` | If specified, agent will use the existing Python environment without installing packages. Only applies when running in Docker mode or on Kubernetes. | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--script` | Entry point script for the remote execution. When used with `--repo`, input the script's relative path inside the repository. For example: `--script source/train.py`. When used with `--folder`, it supports a direct path to a file inside the local repository itself, for example: `--script ~/project/source/train.py` | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> | | `--script` | Entry point script for the remote execution. When used with `--repo`, input the script's relative path inside the repository. For example: `--script source/train.py`. When used with `--folder`, it supports a direct path to a file inside the local repository itself, for example: `--script ~/project/source/train.py` | <img src="/docs/latest/icons/ico-optional-yes.svg" alt="Yes" className="icon size-md center-md" /> |
| `--skip-repo-detection` | If specified, skip repository detection when a repository is not specified. No repository will be set in remote execution | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--skip-task-init` | If set, `Task.init()` call is not added to the entry point, and is assumed to be called within the script | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> | | `--skip-task-init` | If set, `Task.init()` call is not added to the entry point, and is assumed to be called within the script | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--tags` | Add tags to the newly created task. For example: `--tags "base" "job"` | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> | | `--tags` | Add tags to the newly created task. For example: `--tags "base" "job"` | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |
| `--task-type` | Set the task type. Optional values: training, testing, inference, data_processing, application, monitor, controller, optimizer, service, qc, custom | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> | | `--task-type` | Set the task type. Optional values: training, testing, inference, data_processing, application, monitor, controller, optimizer, service, qc, custom | <img src="/docs/latest/icons/ico-optional-no.svg" alt="No" className="icon size-md center-md" /> |

View File

@ -11,7 +11,7 @@ The ClearML Agent enables scheduling and executing distributed experiments on a
the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). For more information, see [ClearML API Credentials](../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials). the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). For more information, see [ClearML API Credentials](../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials).
:::note :::note
Make sure these credentials belong to an admin user or a service user with admin privileges. Make sure these credentials belong to an admin user or a service account with admin privileges.
::: :::
- The worker environment must be able to access the ClearML Server over the same network. - The worker environment must be able to access the ClearML Server over the same network.
@ -26,7 +26,7 @@ Add the ClearML Helm repository:
helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN> helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN>
``` ```
Update the repository locally: Update the local repository:
```bash ```bash
helm repo update helm repo update
``` ```

View File

@ -11,14 +11,14 @@ Applications are installed on top of the ClearML Server and are provided by the
## Requirements ## Requirements
- Python 3 installed on your local machine to run the provided installation scripts) - Python 3 installed on your local machine to run the provided installation scripts
- A ClearML Enterprise Server is up and running with `clearmlApplications.enabled` set to `"true"` in the server's `overrides.yaml` file. - A ClearML Enterprise Server is up and running with `clearmlApplications.enabled` set to `"true"` in the server's `overrides.yaml` file.
- Applications package provided by ClearML, including the following scripts: - Applications package provided by ClearML, including the following scripts:
- `convert_image_registry.py` - `convert_image_registry.py`
- `upload_apps.py` - `upload_apps.py`
- API credentials (`<ACCESS_KEY>` and `<SECRET_KEY>`) generated via - API credentials (`<ACCESS_KEY>` and `<SECRET_KEY>`) generated via
the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). Make sure these credentials the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). Make sure these credentials
belong to an admin user or a service user with admin privilegesFor more information, see [ClearML API Credentials](../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials). belong to an admin user or a service user with admin privilegesFor more information, see [ClearML API Credentials](../../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials).
## Installation ## Installation

View File

@ -18,8 +18,8 @@ Arguments passed to the function include:
* `queue` (string) - ID of the queue from which the task was pulled. * `queue` (string) - ID of the queue from which the task was pulled.
* `queue_name` (string) - Name of the queue from which the task was pulled. * `queue_name` (string) - Name of the queue from which the task was pulled.
* `template` (Python dictionary) - Base Pod template created from the agent's configuration and any queue-specific overrides. * `template` (Python dictionary) - Base Pod template created from the agent's configuration and any queue-specific overrides.
* `task_data` (object) - Task data object (as returned by the `tasks.get_by_id` API call). For example, use `task_data.project` to get the task's project ID. * `task_data` (object) - [Task object](../../../references/sdk/task.md) (as returned by the `tasks.get_by_id` API call). For example, use `task_data.project` to get the task's project ID.
* `providers_info` (dictionary) - Provider info containing optional information collected for the user running this task * `providers_info` (dictionary) - [Identity provider](sso_login.md) info containing optional information collected for the user running this task
when the user logged into the system (requires additional server configuration). when the user logged into the system (requires additional server configuration).
* `task_config` (`clearml_agent.backend_config.Config` object) - Task configuration containing configuration vaults applicable * `task_config` (`clearml_agent.backend_config.Config` object) - Task configuration containing configuration vaults applicable
for the user running this task, and other configuration. Use `task_config.get("...")` to get specific configuration values. for the user running this task, and other configuration. Use `task_config.get("...")` to get specific configuration values.
@ -248,11 +248,8 @@ agentk8sglue:
- mountPath: "/tmp/task/" - mountPath: "/tmp/task/"
name: task-pvc name: task-pvc
``` ```
:::
### Example: Required Role * The following is an example of `custom-agent-role` Role with permissions to handle `persistentvolumeclaims`:
The following is an example of `custom-agent-role` Role with permissions to handle `persistentvolumeclaims`:
```yaml ```yaml
apiVersion: rbac.authorization.k8s.io/v1 apiVersion: rbac.authorization.k8s.io/v1
@ -272,3 +269,5 @@ rules:
- patch - patch
- delete - delete
``` ```
:::

View File

@ -12,7 +12,7 @@ Add the NVIDIA GPU Operator Helm repository:
helm repo add nvidia https://nvidia.github.io/gpu-operator helm repo add nvidia https://nvidia.github.io/gpu-operator
``` ```
Update the repository locally: Update the local repository:
```bash ```bash
helm repo update helm repo update
``` ```

View File

@ -2,10 +2,28 @@
title: Multi-Node Training title: Multi-Node Training
--- ---
The ClearML Enterprise Agent supports horizontal multi-node training, allowing a single Task to run across multiple pods The ClearML Enterprise Agent supports horizontal multi-node training, allowing a single ClearML Task to run across multiple pods
on different nodes. on different nodes.
Below is a configuration example using `clearml-agent-values.override.yaml`: This is useful for distributed training where the training job needs to span multiple GPUs and potentially
multiple nodes.
To enable multi-node scheduling, set both `agentk8sglue.serviceAccountClusterAccess` and `agentk8sglue.multiNode` to `true`.
Multi-node behavior is controlled using the `multiNode` key in a queue configuration. This setting tells the
agent how to divide a Task's GPU requirements across multiple pods, with each pod running a part of the training job.
Below is a configuration example using `clearml-agent-values.override.yaml` to enable multi-node training.
In this example:
* The `multiNode: [4, 2]` setting means splits the Task into two workloads:
* One workload will need 4 GPUs
* The other workload will need 2 GPUs
* The GPU limit per pod is set to `nvidia.com/gpu: 2`, meaning each pod will be limited to 2 GPUs
With this setup:
* The first workload (which needs 4 GPUs) will be scheduled as 2 pods, each with 2 GPUs
* The second workload (which needs 2 GPUs) will be scheduled as 1 pod with 2 GPUs
```yaml ```yaml
agentk8sglue: agentk8sglue:
@ -17,7 +35,7 @@ agentk8sglue:
queues: queues:
multi-node-example: multi-node-example:
queueSettings: queueSettings:
# Defines the distribution of GPUs Tasks across multiple nodes. The format [x, y, ...] specifies the distribution of Tasks as 'x' GPUs on a node and 'y' GPUs on another node. Multiple Pods will be spawned respectively based on the lowest-common-denominator defined. # Defines GPU needs per worker (e.g., 4 GPUs and 2 GPUs). Multiple Pods will be spawned respectively based on the lowest-common-denominator defined.
multiNode: [ 4, 2 ] multiNode: [ 4, 2 ]
templateOverrides: templateOverrides:
resources: resources:

View File

@ -1,9 +1,18 @@
--- ---
title: ClearML Presign Service title: ClearML S3 Presign Service
--- ---
The ClearML Presign Service is a secure service that generates and redirects pre-signed storage URLs for authenticated The ClearML Presign Service is a secure service that generates and redirects pre-signed storage URLs for authenticated
users, enabling direct access to cloud-hosted data (e.g., S3) without exposing credentials. users, enabling direct access to S3 data without exposing credentials.
When configured, the ClearML WebApp automatically redirects requests for matching storage URLs (like `s3://...`) to the
Presign Service. The service:
* Authenticates the user with ClearML.
* Generates a temporary, secure (pre-signed) S3 URL.
* Redirects the user's browser to the URL for direct access.
This setup ensures secure access to S3-hosted data.
## Prerequisites ## Prerequisites
@ -12,7 +21,7 @@ users, enabling direct access to cloud-hosted data (e.g., S3) without exposing c
the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). For more information, see [ClearML API Credentials](../../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials). the ClearML UI (**Settings > Workspace > API Credentials > Create new credentials**). For more information, see [ClearML API Credentials](../../../webapp/settings/webapp_settings_profile.md#clearml-api-credentials).
:::note :::note
Make sure these credentials belong to an admin user or a service user with admin privileges. Make sure these credentials belong to an admin user or a service account with admin privileges.
::: :::
- The worker environment must be able to access the ClearML Server over the same network. - The worker environment must be able to access the ClearML Server over the same network.
@ -27,7 +36,7 @@ Add the ClearML Helm repository:
helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN> helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN>
``` ```
Update the repository locally: Update the local repository:
```bash ```bash
helm repo update helm repo update
``` ```

View File

@ -1,13 +1,15 @@
--- ---
title: ClearML Tenant with Self Signed Certificates title: Kubernetes Deployment with Self-Signed Certificates
--- ---
This guide covers how to configure the [AI Application Gateway](#ai-application-gateway) and [ClearML Agent](#clearml-agent) This guide covers how to configure the [AI Application Gateway](../appgw.md) and [ClearML Agent](../agent_k8s.md)
to use self-signed or custom SSL certificates. to use self-signed or custom SSL certificates.
## AI Application Gateway ## Certificate Configuration
To configure certificates for the Application Gateway, update your `clearml-app-gateway-values.override.yaml` file: To configure certificates, update the applicable overrides file:
* For AI Application Gateway: `clearml-app-gateway-values.override.yaml` file
* For ClearML Agent: `clearml-agent-values.override.yaml` file
```yaml ```yaml
# -- Custom certificates # -- Custom certificates
@ -72,83 +74,7 @@ customCertificates:
-----END CERTIFICATE----- -----END CERTIFICATE-----
``` ```
### Apply Changes ### ClearML Agent: Add Certificates to Task Pods
To apply the changes, run the update command:
```bash
helm upgrade -i <RELEASE_NAME> -n <WORKLOAD_NAMESPACE> clearml-enterprise/clearml-enterprise-app-gateway --version <CHART_VERSION> -f clearml-app-gateway-values.override.yaml
```
## ClearML Agent
For the ClearML Agent, configure certificates in the `clearml-agent-values.override.yaml` file:
```yaml
# -- Custom certificates
customCertificates:
# -- Override system crt certificate bundle. Mutual exclusive with extraCerts.
overrideCaCertificatesCrt:
# -- Extra certs usable in case of needs of adding more certificates to the standard bundle, Requires root permissions to run update-ca-certificates. Mutual exclusive with overrideCaCertificatesCrt.
extraCerts:
- alias: certificateName
pem: |
-----BEGIN CERTIFICATE-----
###
-----END CERTIFICATE-----
```
You have two configuration options:
- [**Replace**](#replace-entire-ca-certificatescrt-file-1) the entire `ca-certificates.crt` file
- [**Append**](#append-extra-certificates-to-the-existing-ca-certificatescrt-1) extra certificates to the existing `ca-certificates.crt`
### Replace Entire ca-certificates.crt File
To replace the whole ca-bundle, provide a concatenated list of all trusted CA certificates in `pem` format as
they are stored in a standard `ca-certificates.crt`.
```yaml
# -- Custom certificates
customCertificates:
# -- Override system crt certificate bundle. Mutual exclusive with extraCerts.
overrideCaCertificatesCrt: |
-----BEGIN CERTIFICATE-----
### CERT 1
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
### CERT 2
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
### CERT 3
-----END CERTIFICATE-----
...
```
### Append Extra Certificates to the Existing ca-certificates.crt
You can add certificates to the existing CA bundle. Each certificate must have a unique `alias`.
```yaml
# -- Custom certificates
customCertificates:
# -- Extra certs usable in case of needs of adding more certificates to the standard bundle, Requires root permissions to run update-ca-certificates. Mutual exclusive with overrideCaCertificatesCrt.
extraCerts:
- alias: certificate-name-1
pem: |
-----BEGIN CERTIFICATE-----
###
-----END CERTIFICATE-----
- alias: certificate-name-2
pem: |
-----BEGIN CERTIFICATE-----
###
-----END CERTIFICATE-----
```
### Add Certificates to Task Pods
If your workloads need access to these certificates (e.g., for HTTPS requests), configure the agent to inject them into pods: If your workloads need access to these certificates (e.g., for HTTPS requests), configure the agent to inject them into pods:
@ -194,8 +120,15 @@ Their names are usually prefixed with the Helm release name, so adjust according
### Apply Changes ### Apply Changes
Apply the changes by running the update command: To apply the changes, run the update command:
* For AI Application Gateway:
``` bash ```bash
helm upgrade -i -n <WORKER_NAMESPACE> clearml-agent clearml-enterprise/clearml-enterprise-agent --create-namespace -f clearml-agent-values.override.yaml helm upgrade -i <RELEASE_NAME> -n <WORKLOAD_NAMESPACE> clearml-enterprise/clearml-enterprise-app-gateway --version <CHART_VERSION> -f clearml-app-gateway-values.override.yaml
``` ```
* For ClearML Agent:
```bash
helm upgrade -i -n <WORKER_NAMESPACE> clearml-agent clearml-enterprise/clearml-enterprise-agent --create-namespace -f clearml-agent-values.override.yaml
```

View File

@ -6,13 +6,8 @@ ClearML Enterprise Server supports various Single Sign-On (SSO) identity provide
SSO configuration is managed via environment variables in your `clearml-values.override.yaml` file and is applied to the SSO configuration is managed via environment variables in your `clearml-values.override.yaml` file and is applied to the
`apiserver` component. `apiserver` component.
The following are configuration examples for commonly used providers. Other supported systems include: The following are configuration examples for commonly used identity providers. See [full list of supported identity providers](../../../webapp/settings/webapp_settings_id_providers.md).
* Auth0
* Keycloak
* Okta
* Azure AD
* Google
* AWS Cognito
## Auth0 ## Auth0
@ -52,7 +47,7 @@ apiserver:
value: "true" value: "true"
``` ```
## Group Membership Mapping in Keycloak ### Group Membership Mapping in Keycloak
To map Keycloak groups into the ClearML user's SSO token: To map Keycloak groups into the ClearML user's SSO token:

View File

@ -1,8 +1,14 @@
--- ---
title: ClearML Dynamic MIG Operator (CDMO) title: Managing GPU Fractions with ClearML Dynamic MIG Operator (CDMO)
--- ---
The ClearML Dynamic MIG Operator (CDMO) enables dynamic MIG (Multi-Instance GPU) configurations. This guide covers using GPU fractions in Kubernetes clusters using NVIDIA MIGs and
ClearML's Dynamic MIG Operator (CDMO). CDMO enables dynamic MIG (Multi-Instance GPU) configurations.
This guide covers:
* Installing CDMO
* Enabling MIG mode on your cluster
* Managing GPU partitioning dynamically
## Installation ## Installation
@ -78,7 +84,7 @@ The ClearML Dynamic MIG Operator (CDMO) enables dynamic MIG (Multi-Instance GPU
* For convenience, this command can be run from within the `nvidia-device-plugin-daemonset` pod running on the related node. * For convenience, this command can be run from within the `nvidia-device-plugin-daemonset` pod running on the related node.
::: :::
1. Label all MIG-enabled GPU node `<NODE_NAME>` from the previous step: 1. Label all MIG-enabled GPU nodes `<NODE_NAME>` from the previous step:
```bash ```bash
kubectl label nodes <NODE_NAME> "cdmo.clear.ml/gpu-partitioning=mig" kubectl label nodes <NODE_NAME> "cdmo.clear.ml/gpu-partitioning=mig"
@ -106,7 +112,7 @@ To disable MIG mode and restore standard full-GPU access:
nvidia-smi -mig 0 nvidia-smi -mig 0
``` ```
4. Edit the `gpu-operator.override.yaml` file to restore full-GPU access, and upgrade the `gpu-operator`: 4. Edit the `gpu-operator.override.yaml` file to restore full-GPU access:
```yaml ```yaml
toolkit: toolkit:
@ -130,3 +136,9 @@ To disable MIG mode and restore standard full-GPU access:
- name: NVIDIA_DRIVER_CAPABILITIES - name: NVIDIA_DRIVER_CAPABILITIES
value: all value: all
``` ```
5. Upgrade the `gpu-operator`:
```bash
helm upgrade -n gpu-operator gpu-operator nvidia/gpu-operator -f gpu-operator.override.yaml
```

View File

@ -16,7 +16,7 @@ helm repo update
### Requirements ### Requirements
* Install the NVIDIA `gpu-operator` using Helm * Install the NVIDIA `gpu-operator` using Helm. For instructions, see [Basic Deployment](../extra_configs/gpu_operator.md).
* Set the number of GPU slices to 8 * Set the number of GPU slices to 8
* Add and update the Nvidia Helm repo: * Add and update the Nvidia Helm repo:
@ -59,7 +59,7 @@ helm repo update
devicePlugin: devicePlugin:
repository: docker.io/clearml repository: docker.io/clearml
image: k8s-device-plugin image: k8s-device-plugin
version: v0.17.1-gpu-card-selection version: "v0.17.2-gpu-card-selection"
imagePullPolicy: Always imagePullPolicy: Always
imagePullSecrets: imagePullSecrets:
- "clearml-dockerhub-access" - "clearml-dockerhub-access"
@ -191,7 +191,7 @@ Valid values for `"<GPU_FRACTION_VALUE>"` include:
### ClearML Agent Configuration ### ClearML Agent Configuration
To run ClearML jobs with fractional GPU allocation, configure your queues in accordingly in your `clearml-agent-values.override.yaml` file. To run ClearML jobs with fractional GPU allocation, configure your queues in your `clearml-agent-values.override.yaml` file.
Each queue should include a `templateOverride` that sets the `clearml-injector/fraction` label, which determines the Each queue should include a `templateOverride` that sets the `clearml-injector/fraction` label, which determines the
fraction of a GPU to allocate (e.g., "0.500" for half a GPU). fraction of a GPU to allocate (e.g., "0.500" for half a GPU).

View File

@ -50,7 +50,7 @@ Add the ClearML Helm repository:
helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN> helm repo add clearml-enterprise https://raw.githubusercontent.com/clearml/clearml-enterprise-helm-charts/gh-pages --username <HELM_REPO_TOKEN> --password <HELM_REPO_TOKEN>
``` ```
Update the repository locally: Update the local repository:
``` bash ``` bash
helm repo update helm repo update
``` ```

View File

@ -29,7 +29,7 @@ title: Version 3.25
* Display per-GPU metrics in "CPU and GPU Usage" and "Video Memory" graphs when multiple GPUs are available * Display per-GPU metrics in "CPU and GPU Usage" and "Video Memory" graphs when multiple GPUs are available
* Add "GPU Count" column to the Resource Groups table in the Orchestration Dashboard * Add "GPU Count" column to the Resource Groups table in the Orchestration Dashboard
* Add global search bar to all UI pages * Add global search bar to all UI pages
* Enable setting service users as admins * Enable setting service accounts as admins
* Add filter to UI Model Endpoints table * Add filter to UI Model Endpoints table
* Add UI scalar viewing configuration on a per-project basis ([ClearML GitHub issue #1377](https://github.com/clearml/clearml/issues/1377)) * Add UI scalar viewing configuration on a per-project basis ([ClearML GitHub issue #1377](https://github.com/clearml/clearml/issues/1377))
* Add clicking project name in breadcrumbs of full-screen task opens the task in details view ([ClearML GitHub issue #1376](https://github.com/clearml/clearml/issues/1376)) * Add clicking project name in breadcrumbs of full-screen task opens the task in details view ([ClearML GitHub issue #1376](https://github.com/clearml/clearml/issues/1376))
@ -42,7 +42,7 @@ title: Version 3.25
* Fix EMA smoothing in UI scalars is incorrect in first data point ([ClearML Web GitHub issue #101](https://github.com/clearml/clearml-web/issues/101)) * Fix EMA smoothing in UI scalars is incorrect in first data point ([ClearML Web GitHub issue #101](https://github.com/clearml/clearml-web/issues/101))
* Improve UI scalar smoothing algorithms (ClearML Web GitHub issues [#101](https://github.com/clearml/clearml-web/issues/101), [#102](https://github.com/clearml/clearml-web/issues/102), [#103](https://github.com/clearml/clearml-web/issues/103)) * Improve UI scalar smoothing algorithms (ClearML Web GitHub issues [#101](https://github.com/clearml/clearml-web/issues/101), [#102](https://github.com/clearml/clearml-web/issues/102), [#103](https://github.com/clearml/clearml-web/issues/103))
* Fix UI Users & Groups table's "Groups" column data remains condensed after column is expanded * Fix UI Users & Groups table's "Groups" column data remains condensed after column is expanded
* Fix setting service users as admins causes apiserver to crash * Fix setting service accounts as admins causes apiserver to crash
* Fix UI "New Dataview" modal's version selection sometimes does not display draft versions * Fix UI "New Dataview" modal's version selection sometimes does not display draft versions
* Fix GCS and Azure credential input popups not displaying in UI task debug samples * Fix GCS and Azure credential input popups not displaying in UI task debug samples
* Fix UI pipeline "Preview" tab sometimes displays "Failed to get plot charts" error * Fix UI pipeline "Preview" tab sometimes displays "Failed to get plot charts" error

View File

@ -0,0 +1,33 @@
---
title: Version 2.0
---
### ClearML 2.0.0
**New Features**
* Clean up exception handling in `cleanup_service.py` ([ClearML GitHub issue #1386](https://github.com/clearml/clearml/pull/1386))
* Add support for `clearml-task` command line options `--force-no-requirements`,` --skip-repo-detection`, and `--skip-python-env-install`
* Allow calling the same pipeline step multiple times with inputs that originate from tasks/controller
* Add` Task.upload_artifact()` argument` sort_keys` to allow disabling sorting yaml/json keys when uploading artifacts
* Add Python annotations to all methods
* Update `pyjwt` constraint version
**Bug Fixes**
* Fix local file uploads without scheme ([ClearML GitHub issue #1313](https://github.com/clearml/clearml/pull/1313))
* Fix argument order mismatch in `PipelineController` ([ClearML GitHub PR #1406](https://github.com/clearml/clearml/pull/1406))
* Fix `_logger` property might be `None` in Session ([ClearML GitHub PR #1412](https://github.com/clearml/clearml/pull/1412))
* Fix unhandled `None` value in project IDs when listing all datasets ([ClearML GitHub PR #1413](https://github.com/clearml/clearml/pull/1413))
* Fix typo in config exception string ([ClearML GitHub PR #1418](https://github.com/clearml/clearml/pull/1418))
* Fix experiments are created twice during HPO ([ClearML GitHub issue #644](https://github.com/clearml/clearml/issues/644))
* Fix `clearml-task-run` HPO breaks up ([ClearML GitHub issue #1151](https://github.com/clearml/clearml/issues/1151))
* Fix oversized event reports cause subsequent events to be lost ([ClearML GitHub issue #1316](https://github.com/clearml/clearml/issues/1316))
* Fix downloading datasets with multiple parents might not work ([ClearML GitHub issue #1398](https://github.com/clearml/clearml/issues/1398))
* Fix GPU reporting fails to detect GPU when the `NVIDIA_VISIBLE_DEVICES` env var contains a directory reference
* Fix verify configuration option for S3 storage (boto3) is not used when testing buckets
* Fix `PipelineDecorator.component()` ignores `*args` and crashes with `**kwargs`
* Fix Pipelines run via `clearml-task` do not appear in the UI
* Fix task log URL print for API v2.31 should show `"/tasks/{}/output/log"`
* Fix tqdm upload/download reporting, remove warning
* Fix pipeline from CLI with no args fails
* Fix pillow constraint for `Python<=3.7`
* Fix requests constraint for `Python<3.8`

View File

@ -328,10 +328,10 @@ module.exports = {
{ {
'Open Source': 'Open Source':
[ [
'release_notes/sdk/open_source/ver_1_18', 'release_notes/sdk/open_source/ver_2_0',
{ {
'Older Versions': [ 'Older Versions': [
'release_notes/sdk/open_source/ver_1_17', 'release_notes/sdk/open_source/ver_1_18', 'release_notes/sdk/open_source/ver_1_17',
'release_notes/sdk/open_source/ver_1_16', 'release_notes/sdk/open_source/ver_1_15', 'release_notes/sdk/open_source/ver_1_16', 'release_notes/sdk/open_source/ver_1_15',
'release_notes/sdk/open_source/ver_1_14', 'release_notes/sdk/open_source/ver_1_13', 'release_notes/sdk/open_source/ver_1_14', 'release_notes/sdk/open_source/ver_1_13',
'release_notes/sdk/open_source/ver_1_12', 'release_notes/sdk/open_source/ver_1_11', 'release_notes/sdk/open_source/ver_1_12', 'release_notes/sdk/open_source/ver_1_11',