mirror of
https://github.com/clearml/clearml-docs
synced 2025-05-24 14:04:55 +00:00
Fixed: cdmo
This commit is contained in:
parent
ca17d1563a
commit
26fd03a81d
@ -1,8 +1,8 @@
|
||||
---
|
||||
title: Managing GPU Fragments with ClearML Dynamic MIG Operator (CDMO)
|
||||
title: Managing GPU Fractions with ClearML Dynamic MIG Operator (CDMO)
|
||||
---
|
||||
|
||||
This guide covers using GPU fragments in Kubernetes clusters using NVIDIA MIGs and
|
||||
This guide covers using GPU fractions in Kubernetes clusters using NVIDIA MIGs and
|
||||
ClearML's Dynamic MIG Operator (CDMO). CDMO enables dynamic MIG (Multi-Instance GPU) configurations.
|
||||
|
||||
This guide covers:
|
||||
@ -14,7 +14,46 @@ This guide covers:
|
||||
|
||||
### Requirements
|
||||
|
||||
* Install the NVIDIA `gpu-operator` using Helm. For instructions, see [Basic Deployment](../extra_configs/gpu_operator.md).
|
||||
* Add and update the Nvidia Helm repo:
|
||||
|
||||
```bash
|
||||
helm repo add nvidia https://nvidia.github.io/gpu-operator
|
||||
helm repo update
|
||||
```
|
||||
|
||||
* Create a `gpu-operator.override.yaml` file with the following content:
|
||||
|
||||
```yaml
|
||||
migManager:
|
||||
enabled: false
|
||||
mig:
|
||||
strategy: mixed
|
||||
toolkit:
|
||||
env:
|
||||
- name: ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
|
||||
value: "false"
|
||||
- name: ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
|
||||
value: "true"
|
||||
devicePlugin:
|
||||
env:
|
||||
- name: PASS_DEVICE_SPECS
|
||||
value: "true"
|
||||
- name: FAIL_ON_INIT_ERROR
|
||||
value: "true"
|
||||
- name: DEVICE_LIST_STRATEGY # Use volume-mounts
|
||||
value: volume-mounts
|
||||
- name: DEVICE_ID_STRATEGY
|
||||
value: uuid
|
||||
- name: NVIDIA_VISIBLE_DEVICES
|
||||
value: all
|
||||
- name: NVIDIA_DRIVER_CAPABILITIES
|
||||
value: all
|
||||
```
|
||||
* Install the NVIDIA `gpu-operator` using Helm with the previous configuration:
|
||||
|
||||
```bash
|
||||
helm install -n gpu-operator gpu-operator nvidia/gpu-operator --create-namespace -f gpu-operator.override.yaml
|
||||
```
|
||||
|
||||
### Installing CDMO
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user