From 9b0dd064b7e9237dcbd8a6cb8266717a5291de4b Mon Sep 17 00:00:00 2001
From: revital <revital@allegro.ai>
Date: Wed, 11 Jun 2025 11:39:53 +0300
Subject: [PATCH] Add AWS Autoscaler AWS NVIDIA GPU Support  admonition

---
 .../applications/apps_aws_autoscaler.md       | 52 ++++++++++++++-----
 1 file changed, 39 insertions(+), 13 deletions(-)

diff --git a/docs/webapp/applications/apps_aws_autoscaler.md b/docs/webapp/applications/apps_aws_autoscaler.md
index 3068db42..a2630785 100644
--- a/docs/webapp/applications/apps_aws_autoscaler.md
+++ b/docs/webapp/applications/apps_aws_autoscaler.md
@@ -19,6 +19,40 @@ each instance is spun up.
 
 For more information about how autoscalers work, see [Autoscalers Overview](../../cloud_autoscaling/autoscaling_overview.md#autoscaler-applications).
 
+:::info AWS NVIDIA GPU Support   
+* Recent NVIDIA AMIs only install the required drivers on initial user login. To make use of such AMIs, the autoscaler 
+  needs to mimic an initial user login. This can be accomplished by, adding the following script to the `Init script`
+  field in the app instance launch form:
+    
+  ```
+  apt-get update
+  DEBIAN_FRONTEND=noninteractive apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" upgrade
+  su -l ubuntu -c '/usr/bin/bash /home/ubuntu/.profile'
+  ```
+
+* Sometimes training jobs may fail to detect GPUs after autoscaler provisioning, due to Nvidia drivers loading slowly.
+  To ensure GPU detection, add the following to the `Init script` field in the application instance launch form:
+
+  ```bash
+  cat /etc/docker/daemon.json 
+
+  sed -i 's|"runtimes": {|"exec-opts": ["native.cgroupdriver=cgroupfs"], "runtimes": {|g' /etc/docker/daemon.json
+
+  cat /etc/docker/daemon.json 
+
+  sed -i 's|#no-cgroups = false|no-cgroups = false|g' /etc/nvidia-container-runtime/config.toml
+
+  cat /etc/nvidia-container-runtime/config.toml
+
+  systemctl daemon-reload
+  systemctl restart docker
+
+  echo "try nvidia"
+  docker run -t --rm --ipc=host --gpus all bitnami/minideb:bullseye bash -c "nvidia-smi && apt update && nvidia-smi" 
+  ```
+
+:::
+
 ## Autoscaler Instance Configuration
 
 When configuring a new AWS Autoscaler instance, you can fill in the required parameters or reuse the configuration of 
@@ -69,19 +103,11 @@ to open the app's instance launch form.
     * Availability Zone - The [EC2 availability zone](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.AvailabilityZones) 
       to launch this resource in
     * AMI ID - The AWS AMI to launch
-    :::note AMI prerequisites
-    The AMI used for the autoscaler must include docker runtime and virtualenv.
-    
-    Recent NVIDIA AMIs only install the required drivers on initial user login. To make use of such AMIs, the autoscaler 
-    needs to mimic an initial user login. This can be accomplished by, adding the following script to the `Init script`
-    field:
-    
-    ```
-    apt-get update
-    DEBIAN_FRONTEND=noninteractive apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" upgrade
-    su -l ubuntu -c '/usr/bin/bash /home/ubuntu/.profile'
-    ```
-    :::
+     
+      :::note AMI prerequisites
+      The AMI used for the autoscaler must include docker runtime and virtualenv.
+      :::
+  
     * Max Number of Instances - Maximum number of concurrent running instances of this type allowed
     * Monitored Queue - Queue associated with this instance type. The tasks enqueued to this queue will be executed on 
       instances of this type