mirror of
https://github.com/clearml/clearml-docs
synced 2025-02-07 13:21:46 +00:00
Rewrite K8s static MIG fraction section (#876)
This commit is contained in:
parent
ce6cfecc8a
commit
cbb65cb974
@ -317,9 +317,10 @@ See example custom Dockerfiles in the [clearml-fractional-gpu repository](https:
|
|||||||
Set up NVIDIA MIG (Multi-Instance GPU) support for Kubernetes to define GPU fraction profiles for specific workloads
|
Set up NVIDIA MIG (Multi-Instance GPU) support for Kubernetes to define GPU fraction profiles for specific workloads
|
||||||
through your NVIDIA device plugin.
|
through your NVIDIA device plugin.
|
||||||
|
|
||||||
The ClearML Agent Helm chart lets you specify a pod template for each queue which describes the resources that the pod
|
The standard way to configure a Kubernetes pod template to use specific MIG slices is for the template to specify the
|
||||||
will use. The template should specify the requested GPU slices under `Containers.resources.limits` to have the pods use
|
requested GPU slices under `Containers.resources.limits`. For example, the
|
||||||
the defined resources. For example, the following configures a K8s pod to run a `3g.20gb` MIG device:
|
following configures a K8s pod to run a 3g.20gb MIG device:
|
||||||
|
|
||||||
```
|
```
|
||||||
# tf-benchmarks-mixed.yaml
|
# tf-benchmarks-mixed.yaml
|
||||||
apiVersion: v1
|
apiVersion: v1
|
||||||
@ -340,6 +341,10 @@ spec:
|
|||||||
nvidia.com/gpu.product: A100-SXM4-40GB
|
nvidia.com/gpu.product: A100-SXM4-40GB
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The ClearML Agent Helm chart lets you specify a pod template for each queue which describes the resources that the pod
|
||||||
|
will use. The ClearML Agent uses this configuration to generate the necessary Kubernetes pod template for executing
|
||||||
|
tasks based on the queue through which they are scheduled.
|
||||||
|
|
||||||
When tasks are added to the relevant queue, the agent pulls the task and creates a pod to execute it, using the
|
When tasks are added to the relevant queue, the agent pulls the task and creates a pod to execute it, using the
|
||||||
specified GPU slice.
|
specified GPU slice.
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user