nvidia-container-toolkit/cmd/nvidia-container-runtime.experimental/README.md
Evan Lezar cf192169a8 Add basic README for experimental runtime
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2021-12-08 21:43:06 +01:00

7.6 KiB

The Experimental NVIDIA Container Runtime

Introduction

The experimental NVIDIA Container Runtime is a proof-of-concept runtime that approaches the problem of making GPUs (or other NVIDIA devices) available in containerized environments in a different manner to the existing NVIDIA Container Runtime. Wherease the current runtime relies on the NVIDIA Container Library to perform the modifications to a container, the experiemental runtime aims to express the required modifications in terms of changes to a container's OCI runtime specification. This also aligns with open initiatives such as the Container Device Interface (CDI).

Known Limitations

  • The path of NVIDIA CUDA libraries / binaries injected into the container currently match that of the host system. This means that on an Ubuntu-based host systems these would be at /usr/lib/x86_64-linux-gnu even if the container distribution would normally expect these at another location (e.g. /usr/lib64)
  • Tools such as nvidia-smi may create additional device nodes in the container when run. This is prevented in the "classic" runtime (and the NVIDIA Container Library) by modifying the /proc/driver/nvidia/params file in the container.
  • Other NVIDIA_* environment variables (e.g. NVIDIA_DRIVER_CAPABILITIES) are not considered to filter mounted libraries or binaries. This is equivalent to always using NVIDIA_DRIVER_CAPABILITIES=all.

Building / Installing

The experimental NVIDIA Container Runtime is a self-contained golang binary and can thus be built from source, or installed directly using go install.

From source

After cloning the nvidia-container-toolkit repository from GitLab or from the read-only mirror on GitHub running the following make command in the repository root:

make cmd-nvidia-container-runtime.experimental

will create an executable file nvidia-container-runtime.experimental in the root.

A dockerized target:

make docker-cmd-nvidia-container-runtime.experimental

will also create the executable file nvidia-container-runtime.experimental without requiring the setup of a development environment (with the exception) of having make and docker installed.

Go install

The experimental NVIDIA Container Runtime can also be go installed by running the following command:

go install github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime.experimental@experimental

which will build and install the nvidia-container-runtime.experimental executable in the ${GOPATH}/bin folder.

Using the Runtime

The experimental NVIDIA Container Runtime is intended as a drop-in replacement for the "classic" NVIDIA Container Runtime. As such it is used in the same way (with the exception of the known limitiations noted above).

In general terms, to use the experimental NVIDIA Container Runtime to launch a container with GPU support, it should be inserted as a shim for the desired low-level OCI-compliant runtime (e.g. runc or crun). How this is achieved depends on how containers are being launched.

Docker

In the case of docker for example, the runtime must be registered with the Docker daemon. This can be done by modifying the /etc/docker/daemon.json file to contain the following:

{
    "runtimes": {
        "nvidia-experimental": {
                "path": "nvidia-container-runtime.experimental",
                "runtimeArgs": []
        }
    },
}

This can then be invoked from docker by including the --runtime=nvidia-experimental option when executing a docker run command.

Runc

If runc is being used to run a container directly substituting the runc command for nvidia-container-runtime.experimental should be sufficient as the latter will exec to runc once the required (in-place) modifications have been made to the container's OCI spec (config.json file).

Configuration

Runtime Path

The experimental NVIDIA Container Runtime allows for the path to the low-level runtime to be specified. This is done by setting the following option in the /etc/nvidia-container-runtime/config.toml file or setting the NVIDIA_CONTAINER_RUNTIME_PATH environment variable.

[nvidia-container-runtime.experimental]

runtime-path = "/path/to/low-level-runtime"

This path can be set to the path for runc or crun on a system and if it is a relative path, the PATH is searched for a matching executable.

Device Selection

In order to select a specific device, the experimental NVIDIA Container Runtime mimics the behaviour of the "classic" runtime. That is to say that the values of certain environment variables in the container's OCI specification control the behaviour of the runtime.

NVIDIA_VISIBLE_DEVICES

This variable controls which GPUs will be made accessible inside the container.

Possible values
  • 0,1,2, GPU-fef8089b …: a comma-separated list of GPU UUID(s) or index(es).
  • all: all GPUs will be accessible, this is the default value in our container images.
  • none: no GPU will be accessible, but driver capabilities will be enabled.
  • void or empty or unset: nvidia-container-runtime will have the same behavior as runc.

Note: When running on a MIG capable device, the following values will also be available:

  • 0:0,0:1,1:0, MIG-GPU-fef8089b/0/1 …: a comma-separated list of MIG Device UUID(s) or index(es).

Where the MIG device indices have the form <GPU Device Index>:<MIG Device Index> as seen in the example output:

$ nvidia-smi -L
GPU 0: Graphics Device (UUID: GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5)
  MIG Device 0: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/0)
  MIG Device 1: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/1)
  MIG Device 2: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/11/0)

NVIDIA_MIG_CONFIG_DEVICES

This variable controls which of the visible GPUs can have their MIG configuration managed from within the container. This includes enabling and disabling MIG mode, creating and destroying GPU Instances and Compute Instances, etc.

Possible values
  • all: Allow all MIG-capable GPUs in the visible device list to have their MIG configurations managed.

Note:

  • This feature is only available on MIG capable devices (e.g. the A100).
  • To use this feature, the container must be started with CAP_SYS_ADMIN privileges.
  • When not running as root, the container user must have read access to the /proc/driver/nvidia/capabilities/mig/config file on the host.

NVIDIA_MIG_MONITOR_DEVICES

This variable controls which of the visible GPUs can have aggregate information about all of their MIG devices monitored from within the container. This includes inspecting the aggregate memory usage, listing the aggregate running processes, etc.

Possible values
  • all: Allow all MIG-capable GPUs in the visible device list to have their MIG devices monitored.

Note:

  • This feature is only available on MIG capable devices (e.g. the A100).
  • To use this feature, the container must be started with CAP_SYS_ADMIN privileges.
  • When not running as root, the container user must have read access to the /proc/driver/nvidia/capabilities/mig/monitor file on the host.