Update nvidia-container-runtime README

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-26 18:18:24 +00:00 · 2023-10-23 13:35:19 +02:00
parent 53b24618a5
commit ebff62f56b
1 changed files with 123 additions and 0 deletions
--- a/cmd/nvidia-container-runtime/README.md
+++ b/cmd/nvidia-container-runtime/README.md
@@ -85,3 +85,126 @@ Alternatively the NVIDIA Container Runtime can be set as the default runtime for
    }
 }
 ```
+
+## Environment variables (OCI spec)
+
+Each environment variable maps to an command-line argument for `nvidia-container-cli` from [libnvidia-container](https://github.com/NVIDIA/libnvidia-container).
+These variables are already set in our [official CUDA images](https://hub.docker.com/r/nvidia/cuda/).
+
+### `NVIDIA_VISIBLE_DEVICES`
+This variable controls which GPUs will be made accessible inside the container.
+
+#### Possible values
+* `0,1,2`, `GPU-fef8089b` …: a comma-separated list of GPU UUID(s) or index(es).
+* `all`: all GPUs will be accessible, this is the default value in our container images.
+* `none`: no GPU will be accessible, but driver capabilities will be enabled.
+* `void` or *empty* or *unset*: `nvidia-container-runtime` will have the same behavior as `runc`.
+
+**Note**: When running on a MIG capable device, the following values will also be available:
+* `0:0,0:1,1:0`, `MIG-GPU-fef8089b/0/1` …: a comma-separated list of MIG Device UUID(s) or index(es).
+
+Where the MIG device indices have the form `<GPU Device Index>:<MIG Device Index>` as seen in the example output:
+```
+$ nvidia-smi -L
+GPU 0: Graphics Device (UUID: GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5)
+  MIG Device 0: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/0)
+  MIG Device 1: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/1)
+  MIG Device 2: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/11/0)
+```
+
+### `NVIDIA_MIG_CONFIG_DEVICES`
+This variable controls which of the visible GPUs can have their MIG
+configuration managed from within the container. This includes enabling and
+disabling MIG mode, creating and destroying GPU Instances and Compute
+Instances, etc.
+
+#### Possible values
+* `all`: Allow all MIG-capable GPUs in the visible device list to have their
+  MIG configurations managed.
+
+**Note**:
+* This feature is only available on MIG capable devices (e.g. the A100).
+* To use this feature, the container must be started with `CAP_SYS_ADMIN` privileges.
+* When not running as `root`, the container user must have read access to the
+  `/proc/driver/nvidia/capabilities/mig/config` file on the host.
+
+### `NVIDIA_MIG_MONITOR_DEVICES`
+This variable controls which of the visible GPUs can have aggregate information
+about all of their MIG devices monitored from within the container. This
+includes inspecting the aggregate memory usage, listing the aggregate running
+processes, etc.
+
+#### Possible values
+* `all`: Allow all MIG-capable GPUs in the visible device list to have their
+  MIG devices monitored.
+
+**Note**:
+* This feature is only available on MIG capable devices (e.g. the A100).
+* To use this feature, the container must be started with `CAP_SYS_ADMIN` privileges.
+* When not running as `root`, the container user must have read access to the
+  `/proc/driver/nvidia/capabilities/mig/monitor` file on the host.
+
+### `NVIDIA_DRIVER_CAPABILITIES`
+This option controls which driver libraries/binaries will be mounted inside the container.
+
+#### Possible values
+* `compute,video`, `graphics,utility` …: a comma-separated list of driver features the container needs.
+* `all`: enable all available driver capabilities.
+* *empty* or *unset*: use default driver capability: `utility,compute`.
+
+#### Supported driver capabilities
+* `compute`: required for CUDA and OpenCL applications.
+* `compat32`: required for running 32-bit applications.
+* `graphics`: required for running OpenGL and Vulkan applications.
+* `utility`: required for using `nvidia-smi` and NVML.
+* `video`: required for using the Video Codec SDK.
+* `display`: required for leveraging X11 display.
+
+### `NVIDIA_REQUIRE_*`
+A logical expression to define constraints on the configurations supported by the container.
+
+#### Supported constraints
+* `cuda`: constraint on the CUDA driver version.
+* `driver`: constraint on the driver version.
+* `arch`: constraint on the compute architectures of the selected GPUs.
+* `brand`: constraint on the brand of the selected GPUs (e.g. GeForce, Tesla, GRID).
+
+#### Expressions
+Multiple constraints can be expressed in a single environment variable: space-separated constraints are ORed, comma-separated constraints are ANDed.
+Multiple environment variables of the form `NVIDIA_REQUIRE_*` are ANDed together.
+
+### `NVIDIA_DISABLE_REQUIRE`
+Single switch to disable all the constraints of the form `NVIDIA_REQUIRE_*`.
+
+### `NVIDIA_REQUIRE_CUDA`
+
+The version of the CUDA toolkit used by the container. It is an instance of the generic `NVIDIA_REQUIRE_*` case and it is set by official CUDA images.
+If the version of the NVIDIA driver is insufficient to run this version of CUDA, the container will not be started.
+
+#### Possible values
+* `cuda>=7.5`, `cuda>=8.0`, `cuda>=9.0` …: any valid CUDA version in the form `major.minor`.
+
+### `CUDA_VERSION`
+Similar to `NVIDIA_REQUIRE_CUDA`, for legacy CUDA images.
+In addition, if `NVIDIA_REQUIRE_CUDA` is not set, `NVIDIA_VISIBLE_DEVICES` and `NVIDIA_DRIVER_CAPABILITIES` will default to `all`.
+
+## Usage example
+
+**NOTE:** The use of the `nvidia-container-runtime` as CLI replacement for `runc` is uncommon and is only provided for completeness.
+
+Although the `nvidia-container-runtime` is typically configured as a replacement for `runc` or `crun` in various container engines, it can also be
+invoked from the command line as `runc` would. For example:
+
+```sh
+# Setup a rootfs based on Ubuntu 16.04
+cd $(mktemp -d) && mkdir rootfs
+curl -sS http://cdimage.ubuntu.com/ubuntu-base/releases/16.04/release/ubuntu-base-16.04-core-amd64.tar.gz | tar --exclude 'dev/*' -C rootfs -xz
+
+# Create an OCI runtime spec
+nvidia-container-runtime spec
+sed -i 's;"sh";"nvidia-smi";' config.json
+sed -i 's;\("TERM=xterm"\);\1, "NVIDIA_VISIBLE_DEVICES=0";' config.json
+
+# Run the container
+sudo nvidia-container-runtime run nvidia_smi
+```