This change copies the cmd/nvidia-container-runtime, internal, and test
folders from github.com/NVIDIA/nvidia-container-runtime@8a63b4b34f3ce3b4167f0516aa3f7207ca280dfb
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds support for the NVIDIA_FABRIC_DEVICES envvar. The (non-empty)
value of this envvar is passed to the NVIDIA Container CLI using the --fabric-device
command line flag and allows for nvswitch and nvlink devices to be mounted
into the container.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change improves the CI for the container toolkit. The go targets are
executed in a docker container which allows for reproducible behaviour on
local systems as well as CI. The Makefile is updated to facilitate this.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change moves the pkg folder to `cmd/nvidia-container-toolkit` to
better match go best practices. This allows, for example, for the
`cmd/nvidia-container-toolkit` to be go installed.
The only package included in `pkg` was `main`.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change fixes a bug where the value of NVIDIA_VISIBLE_DEVICES would be used to
select devices even if the `swarm-resource` config option is specified.
Note that this does not change the value of NVIDIA_VISIBLE_DEVICES in the container.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change simplifies the build process by only targetting ubuntu20.04-amd64
and adds logic to push tagged builds to artifactory.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds check targets for Golang to the make file. These are also
added as stages to the to the Jenkinsfile definition and the GitLab CI
is modified to use them too.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ignores the value of NVIDIA_VISIBLE_DEVICES instead of
raising an error when launching a container with insufficient permissions.
This changes the behaviour under the following conditions:
NVIDIA_VISIBLE_DEVICES is set
and
accept-nvidia-visible-devices-envvar-when-unprivileged = false (default: true)
or
privileged = false (default: false)
This means that a user need not explicitly clear the NVIDIA_VISIBLE_DEVICES
environment variable if no GPUs are to be used in unprivileged containers.
Note that this envvar is set to 'all' by default in many CUDA images that
are used as base images.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
For most practical purposes, it should be fine to set
NVIDIA_DRIVER_CAPABILITIES=all nowadays.
Historically, these different capabilities exist because they were added
incrementally, with varying degrees of stability. It's fairly common to
run with GPUs in containers today, but a few years ago the driver didn't
support them very well, and it was important to make sure the libraries
being injected into the container actually worked in a containerized
environment. When they didn't, it was common to get information leaks,
crashes, or even silent failures.
In the past, whenever a new set of libraries was being vetted for
injected, a new capability was added to make sure that users had control
to explicitly include only those libraries they were comfortable having
injected into their containers.
The idea being that whoever puts together a container image for use with
GPUs should have the knowledge of what capabilities the software in that
container image requires, and can set the NVIDIA_DRIVER_CAPABILITIES
envvar in that image appropriately.
After some back and forth, we've decided it doesn't quite make sense to
set it to "all" just yet, but we should set it to "utility, compute"
instead of just "utility", so that at least the core CUDA libraries work
by default (once installed in the container).
Signed-off-by: Kevin Klues <kklues@nvidia.com>