This change swithces to using a single image for the NVIDIA Container Toolkit contianer.
Here the contents of the architecture-specific deb and rpm packages are extracted
to a known root. These contents can then be installed using the updated installation
mechanism which has been updated to detect the source root based on the packaging type.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows hooks to be configured with debug logging. This
is currently only enabled for the hooks generated from the runtime.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
To allow for CDI hooks to be added gradually we provide a generic no-op hook
for unrecognised subcommands. This will log a warning instead of erroring out.
An unsupported hook could be the result of a CDI specification referring to a
new hook that is not yet supported by an older NVIDIA Container Toolkit
version or a hook that has been removed in newer version.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change uses the reexec package to run the update of the
ldcache in a container in a process with isolated namespaces.
Since the hook is invoked as a createContainer hook, these
namespaces are cloned from the container's namespaces.
In the reexec handler, we further isolate the proc filesystem,
mount the host ldconfig to a tmpfs, and pivot into the containers
root.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-container-runtime.modes.legacy.cuda-compat-mode
config option. This can be set to one of four values:
* ldconfig (default): the --cuda-compat-mode=ldconfig flag is passed to the nvidia-container-cli
* mount: the --cuda-compat-mode=mount flag is passed to the nvidia-conainer-cli
* disabled: the --cuda-compat-mode=disabled flag is passed to the nvidia-container-cli
* hook: the --cuda-compat-mode=disabled flag is passed to the nvidia-container-cli AND the
enable-cuda-compat hook is used to provide forward compatibility.
Note that the disable-cuda-compat-lib-hook feature flag will prevent the enable-cuda-compat
hook from being used. This change also means that the allow-cuda-compat-libs-from-container
feature flag no longer has any effect.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This ensures that mount propagation is set to rprivate for
mounts from the host into the container. This aligns with the
default in docker.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Since this is running in a contianer the contents of the
/etc/nvidia-container-runtime/config.toml file is equivalent to the
default config. This change makes it explicit.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds support for specifying the container runtime
executable path. This can be used if, for example, there are
two containerd or crio executables and a specific one must be used.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates how the lconfig arguments are constructed. This
makes the update-ldcache more robust and ensures that folders are
specified last if at al.
Checks are also included for empty container roots at the start of the
hook to simplify later checks.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Parsing positional arguments require additional processing
instead of relying on named flags. This change switches to
using a named flag for specifying the toolkit installation directory.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The GPU Operator no longer sets the RUNTIME_ARGS environment variable
to control the runtime and instead sets general environment variables.
This change removes support for setting RUNTIME_ARGS in the CLI as these
settings have no effect.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The urfave update to v2.27.6 fixes the behaviour when disabling a separator
for repeated StringSliceFlags. This change updates the nvidia-ctk config
command to allow list options to be specified as comma-separated values.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Management containers don't generally need forward compatibility.
We disable the enable-cuda-compat hook to not include this in the
generated CDI specifications.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds the enable-cuda-compat hook to the incomming OCI runtime spec
if the allow-cuda-compat-libs-from-container feature flag is not enabled.
An update-ldcache hook is also injected to ensure that the required folders
are processed.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-cdi-hook enable-cuda-compat hook that checks the
container for cuda compat libs and updates /etc/ld.so.conf.d to include their
parent folder if their driver major version is sufficient.
This allows CUDA Forward Compatibility to be used when this is not available
through the libnvidia-container.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This allows the NVIDIA Container Toolkit to ignore IMEX channel requests
through the NVIDIA_IMEX_CHANNELS envvar or volume mounts and ensures that
the NVIDIA Container Toolkit cannot be used to provide out-of-band access
to an IMEX channel by simply specifying an environment variable, possibly
bypassing other checks by an orchestration system such as kubernetes.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an EnableCDI method to the container engine config files and
Updates the 'nvidia-ctk runtime configure' command to use this new method.
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
This change adds an allow-cuda-compat-libs-from-container feature flag
to the NVIDIA Container Toolkit config. This allows a user to opt-in
to the previous default behaviour of overriding certain driver
libraries with CUDA compat libraries from the container.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change passes the --no-cntlibs argument to the nvidia-container-cli
from the nvidia-container-runtime-hook to disable overwriting host
drivers with the compat libs from a container being started.
Note that this may be a breaking change for some applications.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change moves the containerized installer from nvidia-toolkit to
cmd/nvidia-ctk-installer to allow for its use in CI.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This removes the untested watch option from the
nvidia-ctk system create-dev-char-symlinks command.
This also removes the direct dependency on fsnotify.
Signed-off-by: Evan Lezar <elezar@nvidia.com>