mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
This commit is contained in:
commit
90a412d9db
@ -19,6 +19,40 @@ each instance is spun up.
|
||||
|
||||
For more information about how autoscalers work, see [Autoscalers Overview](../../cloud_autoscaling/autoscaling_overview.md#autoscaler-applications).
|
||||
|
||||
:::info AWS NVIDIA GPU Support
|
||||
* Recent NVIDIA AMIs only install the required drivers on initial user login. To make use of such AMIs, the autoscaler
|
||||
needs to mimic an initial user login. This can be accomplished by, adding the following script to the `Init script`
|
||||
field in the app instance launch form:
|
||||
|
||||
```
|
||||
apt-get update
|
||||
DEBIAN_FRONTEND=noninteractive apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" upgrade
|
||||
su -l ubuntu -c '/usr/bin/bash /home/ubuntu/.profile'
|
||||
```
|
||||
|
||||
* Sometimes training jobs may fail to detect GPUs after autoscaler provisioning, due to Nvidia drivers loading slowly.
|
||||
To ensure GPU detection, add the following to the `Init script` field in the application instance launch form:
|
||||
|
||||
```bash
|
||||
cat /etc/docker/daemon.json
|
||||
|
||||
sed -i 's|"runtimes": {|"exec-opts": ["native.cgroupdriver=cgroupfs"], "runtimes": {|g' /etc/docker/daemon.json
|
||||
|
||||
cat /etc/docker/daemon.json
|
||||
|
||||
sed -i 's|#no-cgroups = false|no-cgroups = false|g' /etc/nvidia-container-runtime/config.toml
|
||||
|
||||
cat /etc/nvidia-container-runtime/config.toml
|
||||
|
||||
systemctl daemon-reload
|
||||
systemctl restart docker
|
||||
|
||||
echo "try nvidia"
|
||||
docker run -t --rm --ipc=host --gpus all bitnami/minideb:bullseye bash -c "nvidia-smi && apt update && nvidia-smi"
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
## Autoscaler Instance Configuration
|
||||
|
||||
When configuring a new AWS Autoscaler instance, you can fill in the required parameters or reuse the configuration of
|
||||
@ -69,19 +103,11 @@ to open the app's instance launch form.
|
||||
* Availability Zone - The [EC2 availability zone](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.AvailabilityZones)
|
||||
to launch this resource in
|
||||
* AMI ID - The AWS AMI to launch
|
||||
:::note AMI prerequisites
|
||||
The AMI used for the autoscaler must include docker runtime and virtualenv.
|
||||
|
||||
Recent NVIDIA AMIs only install the required drivers on initial user login. To make use of such AMIs, the autoscaler
|
||||
needs to mimic an initial user login. This can be accomplished by, adding the following script to the `Init script`
|
||||
field:
|
||||
|
||||
```
|
||||
apt-get update
|
||||
DEBIAN_FRONTEND=noninteractive apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" upgrade
|
||||
su -l ubuntu -c '/usr/bin/bash /home/ubuntu/.profile'
|
||||
```
|
||||
:::
|
||||
|
||||
:::note AMI prerequisites
|
||||
The AMI used for the autoscaler must include docker runtime and virtualenv.
|
||||
:::
|
||||
|
||||
* Max Number of Instances - Maximum number of concurrent running instances of this type allowed
|
||||
* Monitored Queue - Queue associated with this instance type. The tasks enqueued to this queue will be executed on
|
||||
instances of this type
|
||||
|
Loading…
Reference in New Issue
Block a user