Signed-off-by: Evan Lezar <elezar@nvidia.com>
7.6 KiB
The Experimental NVIDIA Container Runtime
Introduction
The experimental NVIDIA Container Runtime is a proof-of-concept runtime that approaches the problem of making GPUs (or other NVIDIA devices) available in containerized environments in a different manner to the existing NVIDIA Container Runtime. Wherease the current runtime relies on the NVIDIA Container Library to perform the modifications to a container, the experiemental runtime aims to express the required modifications in terms of changes to a container's OCI runtime specification. This also aligns with open initiatives such as the Container Device Interface (CDI).
Known Limitations
- The path of NVIDIA CUDA libraries / binaries injected into the container currently match that of the host system. This means that on an Ubuntu-based host systems these would be at
/usr/lib/x86_64-linux-gnu
even if the container distribution would normally expect these at another location (e.g./usr/lib64
) - Tools such as
nvidia-smi
may create additional device nodes in the container when run. This is prevented in the "classic" runtime (and the NVIDIA Container Library) by modifying the/proc/driver/nvidia/params
file in the container. - Other
NVIDIA_*
environment variables (e.g.NVIDIA_DRIVER_CAPABILITIES
) are not considered to filter mounted libraries or binaries. This is equivalent to always usingNVIDIA_DRIVER_CAPABILITIES=all
.
Building / Installing
The experimental NVIDIA Container Runtime is a self-contained golang binary and
can thus be built from source, or installed directly using go install
.
From source
After cloning the nvidia-container-toolkit
repository from GitLab
or from the read-only mirror on GitHub
running the following make command in the repository root:
make cmd-nvidia-container-runtime.experimental
will create an executable file nvidia-container-runtime.experimental
in the
root.
A dockerized target:
make docker-cmd-nvidia-container-runtime.experimental
will also create the executable file nvidia-container-runtime.experimental
without requiring the setup of a development environment (with the exception)
of having make
and docker
installed.
Go install
The experimental NVIDIA Container Runtime can also be go installed
by running
the following command:
go install github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime.experimental@experimental
which will build and install the nvidia-container-runtime.experimental
executable in the ${GOPATH}/bin
folder.
Using the Runtime
The experimental NVIDIA Container Runtime is intended as a drop-in replacement for the "classic" NVIDIA Container Runtime. As such it is used in the same way (with the exception of the known limitiations noted above).
In general terms, to use the experimental NVIDIA Container Runtime to launch a
container with GPU support, it should be inserted as a shim for the desired
low-level OCI-compliant runtime (e.g. runc
or crun
). How this is achieved
depends on how containers are being launched.
Docker
In the case of docker
for example, the runtime must be registered with the
Docker daemon. This can be done by modifying the /etc/docker/daemon.json
file
to contain the following:
{
"runtimes": {
"nvidia-experimental": {
"path": "nvidia-container-runtime.experimental",
"runtimeArgs": []
}
},
}
This can then be invoked from docker by including the --runtime=nvidia-experimental
option when executing a docker run
command.
Runc
If runc
is being used to run a container directly substituting the runc
command for nvidia-container-runtime.experimental
should be sufficient as
the latter will exec
to runc
once the required (in-place) modifications have
been made to the container's OCI spec (config.json
file).
Configuration
Runtime Path
The experimental NVIDIA Container Runtime allows for the path to the low-level
runtime to be specified. This is done by setting the following option in the
/etc/nvidia-container-runtime/config.toml
file or setting the
NVIDIA_CONTAINER_RUNTIME_PATH
environment variable.
[nvidia-container-runtime.experimental]
runtime-path = "/path/to/low-level-runtime"
This path can be set to the path for runc
or crun
on a system and if it is
a relative path, the PATH
is searched for a matching executable.
Device Selection
In order to select a specific device, the experimental NVIDIA Container Runtime mimics the behaviour of the "classic" runtime. That is to say that the values of certain environment variables in the container's OCI specification control the behaviour of the runtime.
NVIDIA_VISIBLE_DEVICES
This variable controls which GPUs will be made accessible inside the container.
Possible values
0,1,2
,GPU-fef8089b
…: a comma-separated list of GPU UUID(s) or index(es).all
: all GPUs will be accessible, this is the default value in our container images.none
: no GPU will be accessible, but driver capabilities will be enabled.void
or empty or unset:nvidia-container-runtime
will have the same behavior asrunc
.
Note: When running on a MIG capable device, the following values will also be available:
0:0,0:1,1:0
,MIG-GPU-fef8089b/0/1
…: a comma-separated list of MIG Device UUID(s) or index(es).
Where the MIG device indices have the form <GPU Device Index>:<MIG Device Index>
as seen in the example output:
$ nvidia-smi -L
GPU 0: Graphics Device (UUID: GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5)
MIG Device 0: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/0)
MIG Device 1: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/1)
MIG Device 2: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/11/0)
NVIDIA_MIG_CONFIG_DEVICES
This variable controls which of the visible GPUs can have their MIG configuration managed from within the container. This includes enabling and disabling MIG mode, creating and destroying GPU Instances and Compute Instances, etc.
Possible values
all
: Allow all MIG-capable GPUs in the visible device list to have their MIG configurations managed.
Note:
- This feature is only available on MIG capable devices (e.g. the A100).
- To use this feature, the container must be started with
CAP_SYS_ADMIN
privileges. - When not running as
root
, the container user must have read access to the/proc/driver/nvidia/capabilities/mig/config
file on the host.
NVIDIA_MIG_MONITOR_DEVICES
This variable controls which of the visible GPUs can have aggregate information about all of their MIG devices monitored from within the container. This includes inspecting the aggregate memory usage, listing the aggregate running processes, etc.
Possible values
all
: Allow all MIG-capable GPUs in the visible device list to have their MIG devices monitored.
Note:
- This feature is only available on MIG capable devices (e.g. the A100).
- To use this feature, the container must be started with
CAP_SYS_ADMIN
privileges. - When not running as
root
, the container user must have read access to the/proc/driver/nvidia/capabilities/mig/monitor
file on the host.