mirror of
https://github.com/clearml/clearml-docs
synced 2025-05-20 03:58:13 +00:00
Small edits
This commit is contained in:
parent
9420513df8
commit
60be54d54b
@ -125,7 +125,7 @@ If the data and the configuration need to be restored:
|
||||
|
||||
The following section contains a list of Custom Image URLs (exported in different formats) for each released ClearML Server version.
|
||||
|
||||
### Latest Version - v1.13.1
|
||||
### Latest Version - v1.13.0
|
||||
|
||||
- [https://storage.googleapis.com/allegro-files/clearml-server/clearml-server.tar.gz](https://storage.googleapis.com/allegro-files/clearml-server/clearml-server.tar.gz)
|
||||
|
||||
|
@ -8,7 +8,7 @@ The Application Gateway is available under the ClearML Enterprise plan.
|
||||
|
||||
The AI Application Gateway enables external HTTP(S) or direct TCP access to ClearML tasks and applications running on
|
||||
nodes. The gateway is configured with an endpoint or external address, making these services accessible from the user's
|
||||
machine, outside the workload’ network.
|
||||
machine, outside the workload's network.
|
||||
|
||||
This guide describes how to install and run the ClearML AI Application Gateway using docker-compose for environments
|
||||
where you manage both the ClearML Server and the workload nodes.
|
||||
|
@ -84,6 +84,8 @@ agentk8sglue:
|
||||
# -- Custom Bash script for the Task Pods ran by Glue Agent
|
||||
containerCustomBashScript: ""
|
||||
```
|
||||
:::
|
||||
|
||||
|
||||
## Examples
|
||||
|
||||
@ -246,6 +248,7 @@ agentk8sglue:
|
||||
- mountPath: "/tmp/task/"
|
||||
name: task-pvc
|
||||
```
|
||||
:::
|
||||
|
||||
### Example: Required Role
|
||||
|
||||
|
@ -29,7 +29,7 @@ You have two configuration options:
|
||||
- [**Append**](#append-extra-certificates-to-the-existing-ca-certificatescrt) extra certificates to the existing `ca-certificates.crt`
|
||||
|
||||
|
||||
### Replace Entire `ca-certificates.crt` File
|
||||
### Replace Entire ca-certificates.crt File
|
||||
|
||||
To replace the whole ca-bundle, provide a concatenated list of all trusted CA certificates in `pem` format as
|
||||
they are stored in a standard `ca-certificates.crt`.
|
||||
@ -51,7 +51,7 @@ customCertificates:
|
||||
...
|
||||
```
|
||||
|
||||
### Append Extra Certificates to the Existing `ca-certificates.crt`
|
||||
### Append Extra Certificates to the Existing ca-certificates.crt
|
||||
|
||||
You can add certificates to the existing CA bundle. Each certificate must have a unique `alias`.
|
||||
|
||||
@ -104,7 +104,7 @@ You have two configuration options:
|
||||
- [**Append**](#append-extra-certificates-to-the-existing-ca-certificatescrt-1) extra certificates to the existing `ca-certificates.crt`
|
||||
|
||||
|
||||
### Replace Entire `ca-certificates.crt` File
|
||||
### Replace Entire ca-certificates.crt File
|
||||
|
||||
To replace the whole ca-bundle, provide a concatenated list of all trusted CA certificates in `pem` format as
|
||||
they are stored in a standard `ca-certificates.crt`.
|
||||
@ -127,7 +127,7 @@ customCertificates:
|
||||
...
|
||||
```
|
||||
|
||||
### Append Extra Certificates to the Existing `ca-certificates.crt`
|
||||
### Append Extra Certificates to the Existing ca-certificates.crt
|
||||
|
||||
You can add certificates to the existing CA bundle. Each certificate must have a unique `alias`.
|
||||
|
||||
|
@ -52,37 +52,37 @@ The ClearML Dynamic MIG Operator (CDMO) enables dynamic MIG (Multi-Instance GPU
|
||||
|
||||
### Installing CDMO
|
||||
|
||||
1. Create a `cdmo-values.override.yaml` file with the following content:
|
||||
|
||||
```yaml
|
||||
imageCredentials:
|
||||
password: "<CLEARML_DOCKERHUB_TOKEN>"
|
||||
```
|
||||
1. Create a `cdmo-values.override.yaml` file with the following content:
|
||||
|
||||
```yaml
|
||||
imageCredentials:
|
||||
password: "<CLEARML_DOCKERHUB_TOKEN>"
|
||||
```
|
||||
|
||||
1. Install the CDMO Helm Chart using the previous override file:
|
||||
|
||||
```bash
|
||||
helm install -n cdmo cdmo clearml-enterprise/clearml-dynamic-mig-operator --create-namespace -f cdmo-values.override.yaml
|
||||
```
|
||||
```bash
|
||||
helm install -n cdmo cdmo clearml-enterprise/clearml-dynamic-mig-operator --create-namespace -f cdmo-values.override.yaml
|
||||
```
|
||||
|
||||
1. Enable the NVIDIA MIG support on your cluster by running the following command on all nodes with a MIG-supported GPU
|
||||
(run it for each GPU `<GPU_ID>` on the host):
|
||||
|
||||
```bash
|
||||
nvidia-smi -mig 1
|
||||
```
|
||||
```bash
|
||||
nvidia-smi -mig 1
|
||||
```
|
||||
|
||||
:::note notes
|
||||
* A node reboot may be required if the command output indicates so.
|
||||
|
||||
* For convenience, this command can be run from within the `nvidia-device-plugin-daemonset` pod running on the related node.
|
||||
:::
|
||||
:::note notes
|
||||
* A node reboot may be required if the command output indicates so.
|
||||
|
||||
* For convenience, this command can be run from within the `nvidia-device-plugin-daemonset` pod running on the related node.
|
||||
:::
|
||||
|
||||
1. Label all MIG-enabled GPU node `<NODE_NAME>` from the previous step:
|
||||
|
||||
```bash
|
||||
kubectl label nodes <NODE_NAME> "cdmo.clear.ml/gpu-partitioning=mig"
|
||||
```
|
||||
```bash
|
||||
kubectl label nodes <NODE_NAME> "cdmo.clear.ml/gpu-partitioning=mig"
|
||||
```
|
||||
|
||||
## Disabling MIGs
|
||||
|
||||
|
@ -33,75 +33,75 @@ helm repo update
|
||||
|
||||
1. Create a Docker Registry secret named `clearml-dockerhub-access` in the `gpu-operator` namespace. Make sure to replace `<CLEARML_DOCKERHUB_TOKEN>` with your token.
|
||||
|
||||
```bash
|
||||
kubectl create secret -n gpu-operator docker-registry clearml-dockerhub-access \
|
||||
--docker-server=docker.io \
|
||||
--docker-username=allegroaienterprise \
|
||||
--docker-password="<CLEARML_DOCKERHUB_TOKEN>" \
|
||||
--docker-email=""
|
||||
```
|
||||
```bash
|
||||
kubectl create secret -n gpu-operator docker-registry clearml-dockerhub-access \
|
||||
--docker-server=docker.io \
|
||||
--docker-username=allegroaienterprise \
|
||||
--docker-password="<CLEARML_DOCKERHUB_TOKEN>" \
|
||||
--docker-email=""
|
||||
```
|
||||
|
||||
1. Create a `gpu-operator.override.yaml` file as follows:
|
||||
* Set `devicePlugin.repository` to `docker.io/clearml`
|
||||
* Configure `devicePlugin.config.data.renamed-resources.sharing.timeSlicing.resources` for each GPU index on the host
|
||||
* Use `nvidia.com/gpu-<INDEX>` format for the `rename` field, and set `replicas` to `8`.
|
||||
* Set `devicePlugin.repository` to `docker.io/clearml`
|
||||
* Configure `devicePlugin.config.data.renamed-resources.sharing.timeSlicing.resources` for each GPU index on the host
|
||||
* Use `nvidia.com/gpu-<INDEX>` format for the `rename` field, and set `replicas` to `8`.
|
||||
|
||||
```yaml
|
||||
gfd:
|
||||
imagePullSecrets:
|
||||
- "clearml-dockerhub-access"
|
||||
toolkit:
|
||||
env:
|
||||
- name: ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
|
||||
value: "false"
|
||||
- name: ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
|
||||
value: "true"
|
||||
devicePlugin:
|
||||
repository: docker.io/clearml
|
||||
image: k8s-device-plugin
|
||||
version: v0.17.1-gpu-card-selection
|
||||
imagePullPolicy: Always
|
||||
imagePullSecrets:
|
||||
- "clearml-dockerhub-access"
|
||||
env:
|
||||
- name: PASS_DEVICE_SPECS
|
||||
value: "true"
|
||||
- name: FAIL_ON_INIT_ERROR
|
||||
value: "true"
|
||||
- name: DEVICE_LIST_STRATEGY # Use volume-mounts
|
||||
value: volume-mounts
|
||||
- name: DEVICE_ID_STRATEGY
|
||||
value: uuid
|
||||
- name: NVIDIA_VISIBLE_DEVICES
|
||||
value: all
|
||||
- name: NVIDIA_DRIVER_CAPABILITIES
|
||||
value: all
|
||||
config:
|
||||
name: device-plugin-config
|
||||
create: true
|
||||
default: "renamed-resources"
|
||||
data:
|
||||
renamed-resources: |-
|
||||
version: v1
|
||||
flags:
|
||||
migStrategy: none
|
||||
sharing:
|
||||
timeSlicing:
|
||||
renameByDefault: false
|
||||
failRequestsGreaterThanOne: false
|
||||
# Edit the following configuration as needed, adding as many GPU indices as many cards are installed on the Host.
|
||||
resources:
|
||||
- name: nvidia.com/gpu
|
||||
rename: nvidia.com/gpu-0
|
||||
devices:
|
||||
- "0"
|
||||
replicas: 8
|
||||
- name: nvidia.com/gpu
|
||||
rename: nvidia.com/gpu-1
|
||||
devices:
|
||||
- "1"
|
||||
replicas: 8
|
||||
```
|
||||
```yaml
|
||||
gfd:
|
||||
imagePullSecrets:
|
||||
- "clearml-dockerhub-access"
|
||||
toolkit:
|
||||
env:
|
||||
- name: ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
|
||||
value: "false"
|
||||
- name: ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
|
||||
value: "true"
|
||||
devicePlugin:
|
||||
repository: docker.io/clearml
|
||||
image: k8s-device-plugin
|
||||
version: v0.17.1-gpu-card-selection
|
||||
imagePullPolicy: Always
|
||||
imagePullSecrets:
|
||||
- "clearml-dockerhub-access"
|
||||
env:
|
||||
- name: PASS_DEVICE_SPECS
|
||||
value: "true"
|
||||
- name: FAIL_ON_INIT_ERROR
|
||||
value: "true"
|
||||
- name: DEVICE_LIST_STRATEGY # Use volume-mounts
|
||||
value: volume-mounts
|
||||
- name: DEVICE_ID_STRATEGY
|
||||
value: uuid
|
||||
- name: NVIDIA_VISIBLE_DEVICES
|
||||
value: all
|
||||
- name: NVIDIA_DRIVER_CAPABILITIES
|
||||
value: all
|
||||
config:
|
||||
name: device-plugin-config
|
||||
create: true
|
||||
default: "renamed-resources"
|
||||
data:
|
||||
renamed-resources: |-
|
||||
version: v1
|
||||
flags:
|
||||
migStrategy: none
|
||||
sharing:
|
||||
timeSlicing:
|
||||
renameByDefault: false
|
||||
failRequestsGreaterThanOne: false
|
||||
# Edit the following configuration as needed, adding as many GPU indices as many cards are installed on the Host.
|
||||
resources:
|
||||
- name: nvidia.com/gpu
|
||||
rename: nvidia.com/gpu-0
|
||||
devices:
|
||||
- "0"
|
||||
replicas: 8
|
||||
- name: nvidia.com/gpu
|
||||
rename: nvidia.com/gpu-1
|
||||
devices:
|
||||
- "1"
|
||||
replicas: 8
|
||||
```
|
||||
|
||||
#### For CFGI version < 1.3.0 (Legacy)
|
||||
|
||||
@ -150,22 +150,22 @@ devicePlugin:
|
||||
|
||||
1. Install the NVIDIA `gpu-operator` using the previously created `gpu-operator.override.yaml` file:
|
||||
|
||||
```bash
|
||||
helm install -n gpu-operator gpu-operator nvidia/gpu-operator --create-namespace -f gpu-operator.override.yaml
|
||||
```
|
||||
```bash
|
||||
helm install -n gpu-operator gpu-operator nvidia/gpu-operator --create-namespace -f gpu-operator.override.yaml
|
||||
```
|
||||
|
||||
1. Create a `cfgi-values.override.yaml` file with the following content:
|
||||
|
||||
```yaml
|
||||
imageCredentials:
|
||||
password: "<CLEARML_DOCKERHUB_TOKEN>"
|
||||
```
|
||||
```yaml
|
||||
imageCredentials:
|
||||
password: "<CLEARML_DOCKERHUB_TOKEN>"
|
||||
```
|
||||
|
||||
1. Install the CFGI Helm Chart using the previous override file:
|
||||
|
||||
```bash
|
||||
helm install -n cfgi cfgi clearml-enterprise/clearml-fractional-gpu-injector --create-namespace -f cfgi-values.override.yaml
|
||||
```
|
||||
```bash
|
||||
helm install -n cfgi cfgi clearml-enterprise/clearml-fractional-gpu-injector --create-namespace -f cfgi-values.override.yaml
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user