This change fixes a bug when using just-in-time CDI spec generation for the
NVIDIA Container Runtime for specific devices (i.e. not 'all').
Instead of unconditionally using the default nvsandboxutils library -- leading
to errors due to undefined symbols -- we check whether the library can be
properly initialised before continuing.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change aligns the creation of symlinks under CDI with
the implementation in libnvidia-container. If the driver libraries
are present, the following symlinks are created:
* {{ .LibRoot }}/libcuda.so -> libcuda.so.1
* {{ .LibRoot }}/libnvidia-opticalflow.so -> libnvidia-opticalflow.so.1
* {{ .LibRoot }}/libGLX_indirect.so.0 -> libGLX_nvidia.so.{{ .Version }}
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds the vdpau subfolder to the paths searched
for driver libraries. This allows the libvdpau_nvidia.so.RM_VERSION
library to also be discovered.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a new discoverer for Sandboxutils to report the file
system paths and associated symbolic links using GetGpuResource and
GetFileContent APIs. Both GPU and MIG devices are supported. If the
Sandboxutils discoverer fails, the NVML discoverer is used to report
the file system information.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
This change includes the usage of Sandboxutils GetDriverVersion API to
retrieve the CUDA driver version. If the library is not available on the
system or the API call fails for some other reason, it will fallback to
the NVML API to return the driver version.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
This change adds an include-persistenced-socket flag to the
nvidia-ctk cdi generate command that ensures that a generated
specification includes the nvidia-persistenced socket if present on
the host.
Note that for mangement mode, these sockets are always included
if detected.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the internal CDI representation includes
the persistenced socket if the include-persistenced-socket feature
flag is enabled.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Although the nvidia-ctk cdi generate command generates
specs with 644 permissions, the nvidia-ctk cdi transform
commands do not. This change sets the default permissions
to 600 instead of 644.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This changes adds an option to the toolkit container to allow
the dev root to be specified. This adds support for driver installations
where the driver files are at one root and the dev nodes are created
elsewhere -- most typically at /. This is the case, for example, for
GKE driver installations.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that nvnllib and devicelib are constructed
before these are used to construct infolib.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change creates an nvidia-cdi-hook binary for implementing
CDI hooks. This allows for these hooks to be separated from the
nvidia-ctk command which may, for example, require libnvidia-ml
to support other functionality.
The nvidia-ctk hook subcommand is maintained as an alias for the
time being to allow for existing CDI specifications referring to
this path to work as expected.
Signed-off-by: Avi Deitcher <avi@deitcher.net>
This change moves from using strings to useing device.Identifiers
as input for requesting CDI specifications for specific
devices.
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures taht NVIDIA_VISIBLE_DEVICES=void is included in
generated CDI specs. This prevents the NVIDIA Container Runtime Hook
from injecting devices if NVIDIA_VISIBLE_DEVICES=all is set.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Since the `createContainer` `runc` hook runs with the environment that
the container's config.json specifies, the path to `ldconfig` may not be
easily resolvable if the host environment differs enough from the
container (e.g. on a NixOS host where all binaries are under hashed
paths in /nix/store with an Ubuntu container whose PATH contains
FHS-style paths such as /bin and /usr/bin). This change allows for
specifying exactly where ldconfig comes from.
Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
This change renames the root transformer to indicate that it
operates on host paths and adds a container root transformer for
explicitly transforming container roots.
The transform.NewRootTransformer constructor still exists, but has
been marked as deprecated.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a driver root abstraction that defines how
libraries are located relative to the root. This allows for
this driver root to be constructed once and passed to discovery
code.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
A driverRoot defines both the driver library root and the
root for device nodes. In the case of preinstalled drivers or
the driver container, these are equal, but in cases such as GKE
they do not match. In this case, drivers are extracted to a folder
and devices exist at the root /.
The changes here add a devRoot option to the nvcdi API that allows the
parent of /dev to be specified explicitly.
Signed-off-by: Evan Lezar <elezar@nvidia.com>