mirror of
				https://github.com/NVIDIA/nvidia-container-toolkit
				synced 2025-06-26 18:18:24 +00:00 
			
		
		
		
	Merge branch 'update-container-runtime-readme' into 'main'
Update nvidia-container-runtime README See merge request nvidia/container-toolkit/container-toolkit!484
This commit is contained in:
		
						commit
						b02d5538b0
					
				| @ -85,3 +85,126 @@ Alternatively the NVIDIA Container Runtime can be set as the default runtime for | ||||
|     } | ||||
| } | ||||
| ``` | ||||
| 
 | ||||
| ## Environment variables (OCI spec) | ||||
| 
 | ||||
| Each environment variable maps to an command-line argument for `nvidia-container-cli` from [libnvidia-container](https://github.com/NVIDIA/libnvidia-container). | ||||
| These variables are already set in our [official CUDA images](https://hub.docker.com/r/nvidia/cuda/). | ||||
| 
 | ||||
| ### `NVIDIA_VISIBLE_DEVICES` | ||||
| This variable controls which GPUs will be made accessible inside the container. | ||||
| 
 | ||||
| #### Possible values | ||||
| * `0,1,2`, `GPU-fef8089b` …: a comma-separated list of GPU UUID(s) or index(es). | ||||
| * `all`: all GPUs will be accessible, this is the default value in our container images. | ||||
| * `none`: no GPU will be accessible, but driver capabilities will be enabled. | ||||
| * `void` or *empty* or *unset*: `nvidia-container-runtime` will have the same behavior as `runc`. | ||||
| 
 | ||||
| **Note**: When running on a MIG capable device, the following values will also be available: | ||||
| * `0:0,0:1,1:0`, `MIG-GPU-fef8089b/0/1` …: a comma-separated list of MIG Device UUID(s) or index(es). | ||||
| 
 | ||||
| Where the MIG device indices have the form `<GPU Device Index>:<MIG Device Index>` as seen in the example output: | ||||
| ``` | ||||
| $ nvidia-smi -L | ||||
| GPU 0: Graphics Device (UUID: GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5) | ||||
|   MIG Device 0: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/0) | ||||
|   MIG Device 1: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/1) | ||||
|   MIG Device 2: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/11/0) | ||||
| ``` | ||||
| 
 | ||||
| ### `NVIDIA_MIG_CONFIG_DEVICES` | ||||
| This variable controls which of the visible GPUs can have their MIG | ||||
| configuration managed from within the container. This includes enabling and | ||||
| disabling MIG mode, creating and destroying GPU Instances and Compute | ||||
| Instances, etc. | ||||
| 
 | ||||
| #### Possible values | ||||
| * `all`: Allow all MIG-capable GPUs in the visible device list to have their | ||||
|   MIG configurations managed. | ||||
| 
 | ||||
| **Note**: | ||||
| * This feature is only available on MIG capable devices (e.g. the A100). | ||||
| * To use this feature, the container must be started with `CAP_SYS_ADMIN` privileges. | ||||
| * When not running as `root`, the container user must have read access to the | ||||
|   `/proc/driver/nvidia/capabilities/mig/config` file on the host. | ||||
| 
 | ||||
| ### `NVIDIA_MIG_MONITOR_DEVICES` | ||||
| This variable controls which of the visible GPUs can have aggregate information | ||||
| about all of their MIG devices monitored from within the container. This | ||||
| includes inspecting the aggregate memory usage, listing the aggregate running | ||||
| processes, etc. | ||||
| 
 | ||||
| #### Possible values | ||||
| * `all`: Allow all MIG-capable GPUs in the visible device list to have their | ||||
|   MIG devices monitored. | ||||
| 
 | ||||
| **Note**: | ||||
| * This feature is only available on MIG capable devices (e.g. the A100). | ||||
| * To use this feature, the container must be started with `CAP_SYS_ADMIN` privileges. | ||||
| * When not running as `root`, the container user must have read access to the | ||||
|   `/proc/driver/nvidia/capabilities/mig/monitor` file on the host. | ||||
| 
 | ||||
| ### `NVIDIA_DRIVER_CAPABILITIES` | ||||
| This option controls which driver libraries/binaries will be mounted inside the container. | ||||
| 
 | ||||
| #### Possible values | ||||
| * `compute,video`, `graphics,utility` …: a comma-separated list of driver features the container needs. | ||||
| * `all`: enable all available driver capabilities. | ||||
| * *empty* or *unset*: use default driver capability: `utility,compute`. | ||||
| 
 | ||||
| #### Supported driver capabilities | ||||
| * `compute`: required for CUDA and OpenCL applications. | ||||
| * `compat32`: required for running 32-bit applications. | ||||
| * `graphics`: required for running OpenGL and Vulkan applications. | ||||
| * `utility`: required for using `nvidia-smi` and NVML. | ||||
| * `video`: required for using the Video Codec SDK. | ||||
| * `display`: required for leveraging X11 display. | ||||
| 
 | ||||
| ### `NVIDIA_REQUIRE_*` | ||||
| A logical expression to define constraints on the configurations supported by the container. | ||||
| 
 | ||||
| #### Supported constraints | ||||
| * `cuda`: constraint on the CUDA driver version. | ||||
| * `driver`: constraint on the driver version. | ||||
| * `arch`: constraint on the compute architectures of the selected GPUs. | ||||
| * `brand`: constraint on the brand of the selected GPUs (e.g. GeForce, Tesla, GRID). | ||||
| 
 | ||||
| #### Expressions | ||||
| Multiple constraints can be expressed in a single environment variable: space-separated constraints are ORed, comma-separated constraints are ANDed. | ||||
| Multiple environment variables of the form `NVIDIA_REQUIRE_*` are ANDed together. | ||||
| 
 | ||||
| ### `NVIDIA_DISABLE_REQUIRE` | ||||
| Single switch to disable all the constraints of the form `NVIDIA_REQUIRE_*`. | ||||
| 
 | ||||
| ### `NVIDIA_REQUIRE_CUDA` | ||||
| 
 | ||||
| The version of the CUDA toolkit used by the container. It is an instance of the generic `NVIDIA_REQUIRE_*` case and it is set by official CUDA images. | ||||
| If the version of the NVIDIA driver is insufficient to run this version of CUDA, the container will not be started. | ||||
| 
 | ||||
| #### Possible values | ||||
| * `cuda>=7.5`, `cuda>=8.0`, `cuda>=9.0` …: any valid CUDA version in the form `major.minor`. | ||||
| 
 | ||||
| ### `CUDA_VERSION` | ||||
| Similar to `NVIDIA_REQUIRE_CUDA`, for legacy CUDA images. | ||||
| In addition, if `NVIDIA_REQUIRE_CUDA` is not set, `NVIDIA_VISIBLE_DEVICES` and `NVIDIA_DRIVER_CAPABILITIES` will default to `all`. | ||||
| 
 | ||||
| ## Usage example | ||||
| 
 | ||||
| **NOTE:** The use of the `nvidia-container-runtime` as CLI replacement for `runc` is uncommon and is only provided for completeness. | ||||
| 
 | ||||
| Although the `nvidia-container-runtime` is typically configured as a replacement for `runc` or `crun` in various container engines, it can also be | ||||
| invoked from the command line as `runc` would. For example: | ||||
| 
 | ||||
| ```sh | ||||
| # Setup a rootfs based on Ubuntu 16.04 | ||||
| cd $(mktemp -d) && mkdir rootfs | ||||
| curl -sS http://cdimage.ubuntu.com/ubuntu-base/releases/16.04/release/ubuntu-base-16.04-core-amd64.tar.gz | tar --exclude 'dev/*' -C rootfs -xz | ||||
| 
 | ||||
| # Create an OCI runtime spec | ||||
| nvidia-container-runtime spec | ||||
| sed -i 's;"sh";"nvidia-smi";' config.json | ||||
| sed -i 's;\("TERM=xterm"\);\1, "NVIDIA_VISIBLE_DEVICES=0";' config.json | ||||
| 
 | ||||
| # Run the container | ||||
| sudo nvidia-container-runtime run nvidia_smi | ||||
| ``` | ||||
|  | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user