This change fixes a bug when using just-in-time CDI spec generation for the
NVIDIA Container Runtime for specific devices (i.e. not 'all').
Instead of unconditionally using the default nvsandboxutils library -- leading
to errors due to undefined symbols -- we check whether the library can be
properly initialised before continuing.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates the create-symlinks hook to always evaluate
link paths in the container's root filesystem. In addition the
executable is updated to return an error if a link could not
be created.
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
This chagne ensures that we always treat the link path as a path
relative to the container root. Without this change, relative paths
in link paths would result links being created relative to the
current working directory where the hook is executed.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The hostRoot argument is always empty and not applicable to
how links are specified.
Links are specified by the paths in the container filesystem and as such
the only transform required to change the root is a join of the filepath.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Since hostRoot is always the empty string and we are changing the root in the
target path to /, the call to changeRoot is redundant.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change udpates the ldcache locator to read the ldcache at construction
and use these contents to perform future lookups against. Each of the cache
entries are resolved and lookups return the resolved target.
Assuming a symlink chain: libcuda.so -> libcuda.so.1 -> libcuda.so.VERSION, this
means that libcuda.so.VERION will be returned for any of the following inputs:
libcuda.so, libcuda.so.1, libcudal.so.*.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a test for locating libcuda as a driver library.
This includes a failing test on a system where libcuda.so.1 is in
the ldcache, but not at one of the predefined library search paths.
A testdata folder with sample root filesystems is included to test
various combinations.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Since we use a map to keep track of the elements of a symlink chain
the construction of the final list of located elements is not stable.
This change constructs the output as this is being discovered and as
such maintains the original ordering.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change removes support for specifying csv-filenames when
calling the create-symlinks hook. This is no longer required
as tegra-based systems generate hooks with `--link` arguments.
This also allows the hook to better serve as a reference implementation
for upstream projects wanting to implement a set of standard CDI hooks.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the toolkit works with older
versions of the GPU Operator where runtime-specific envvars are
used to set options such as the config file location.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows IMEX channels to be requested using the
volume mount mechanism.
A mount from /dev/null to /var/run/nvidia-container-devices/imex/{{ .ChannelID }}
is equivalent to including {{ .ChannelID }} in the NVIDIA_IMEX_CHANNELS
envvironment variables.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change enables opt-in (off-by-default) features to be opted into.
These features can be toggled by name by specifying the (repeated)
--opt-in-features command line argument or as a comma-separated list
in the NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES environment variable.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change REMOVES the ability to set opt-in features
(e.g. GDS, MOFED, GDRCOPY) in the config file. The existing
per-container envvars are unaffected.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
add default runtime binary path to runtimes field of toolkit config toml
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
[no-relnote] Get low-level runtimes consistently
We ensure that we use the same low-level runtimes regardless
of the runtime engine being configured. This ensures consistent
behaviour.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Co-authored-by: Evan Lezar <elezar@nvidia.com>
address review comment
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
This change aligns the creation of symlinks under CDI with
the implementation in libnvidia-container. If the driver libraries
are present, the following symlinks are created:
* {{ .LibRoot }}/libcuda.so -> libcuda.so.1
* {{ .LibRoot }}/libnvidia-opticalflow.so -> libnvidia-opticalflow.so.1
* {{ .LibRoot }}/libGLX_indirect.so.0 -> libGLX_nvidia.so.{{ .Version }}
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change converts the toolkit installation logic to a go package
and invokes this installation over the go API instead of starting
this executable.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change refactors the toml config file handlig for runtimes
such as containerd or crio. A toml.Loader is introduced that
encapsulates loading the required file.
This can be extended to allow other mechanisms for loading
loading the current config.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Since we expect .so.1 symlinks to be created by ldconfig, we don't
explicitly request these. This change removes the creation of
a libnvidia-allocator.so.1 -> libnvidia-allocator.so.RM_VERSION symlink
through the create-symlinks hook.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds the vdpau subfolder to the paths searched
for driver libraries. This allows the libvdpau_nvidia.so.RM_VERSION
library to also be discovered.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a safe pointer conversion to fix an
incompatible C pointer conversion, which caused build failures on some
architectures.
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
This change adds a new discoverer for Sandboxutils to report the file
system paths and associated symbolic links using GetGpuResource and
GetFileContent APIs. Both GPU and MIG devices are supported. If the
Sandboxutils discoverer fails, the NVML discoverer is used to report
the file system information.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
This change includes the usage of Sandboxutils GetDriverVersion API to
retrieve the CUDA driver version. If the library is not available on the
system or the API call fails for some other reason, it will fallback to
the NVML API to return the driver version.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
This change adds manual wrappers around the generated bindings to make
them into more user-friendly APIs for the caller. Some helper functions
are also added.
The APIs that are currently present in the library and implemented here
are:
nvSandboxUtilsInit
nvSandboxUtilsShutdown
nvSandboxUtilsGetDriverVersion
nvSandboxUtilsGetGpuResource
nvSandboxUtilsGetFileContent
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
This change adds the internal bindings for Sandboxutils, some of which
have been automatically generated with the help of c-for-go. The format
followed is similar to what is used in go-nvml. These would need to be
regenerated when the header file is modified and new APIs are added.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
This change adds a script and related files to generate the internal
bindings for Sandboxutils library with the help of c-for-go.
This can be used to update the bindings when the header file is modified
with reference to how they are generated with the Makefile in go-nvml.
Run: ./update-bindings.sh
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
With the move to dependabot to udpate the libnvidia-container
submodule it is no longer required to have checks in place that
ensure that this is up to date when building.
In addition when re-building old commits in CI, for example we explicitly
do NOT want to have this check.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change enables opt-in (off-by-default) features to be opted into.
These features can be toggled by name by specifying the (repeated)
--opt-in-feature command line argument or as a comma-separated list
in the NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES environment variable.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an include-persistenced-socket flag to the
nvidia-ctk cdi generate command that ensures that a generated
specification includes the nvidia-persistenced socket if present on
the host.
Note that for mangement mode, these sockets are always included
if detected.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the internal CDI representation includes
the persistenced socket if the include-persistenced-socket feature
flag is enabled.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This changes skips the injection of the nvidia-persistenced socket by
default.
An include-persistenced-socket feature flag is added to allow the
injection of this socket to be explicitly requested.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates the create-symlinks hook to check whether
link paths resolve in the container's filesystem. In addition
the executable is updated to return an error if a link could
not be created.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that we unnecessarily print warnings for
runtimes where these configs are not applicable.
This removes the following warnings:
WARN[0000] Ignoring runtime-config-override flag for docker
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds logic to extract the golang version from a Dockefile.
This allows us to trigger golang updates using dependabot.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates the logic to populate the options for the
nvidia runtime configs added to containerd or crio from a default runtime
if this is specified and a runc entry is not found.
This allows the default runtime values for settings such as SystemdCgroup
to be applied correctly.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change moves most of the logging for the NVIDIA Container runtime
from Info to Debug or Trace -- especially in the case where no
modifications are requried.
This removes execessive logging.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change makes the following changes:
* Allows the toolkit.pid path to be specified
* Creates the toolkit.pid file at /run/nvidia/toolkit/toolkit.pid by default
* Handles failures to remove the /run/nvidia/toolkit folder
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the created /etc/ld.so.conf.d file
has a higher priority to ensure that the injected libraries
take precendence over non-compat libraries.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Tools such as oc mirror do not support the provenence metadata
added to the image manifests with newer docker buildx versions.
This change disables the addition of provenance information.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Although the nvidia-ctk cdi generate command generates
specs with 644 permissions, the nvidia-ctk cdi transform
commands do not. This change sets the default permissions
to 600 instead of 644.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change renames the nvidia-ctk system create-device-nodes
flag driver-root to root. This makes it clearer that this is
used to load the kernel modules and is not specific to the
user-mode driver installation.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This changes adds an option to the toolkit container to allow
the dev root to be specified. This adds support for driver installations
where the driver files are at one root and the dev nodes are created
elsewhere -- most typically at /. This is the case, for example, for
GKE driver installations.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that nvnllib and devicelib are constructed
before these are used to construct infolib.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change creates an nvidia-cdi-hook binary for implementing
CDI hooks. This allows for these hooks to be separated from the
nvidia-ctk command which may, for example, require libnvidia-ml
to support other functionality.
The nvidia-ctk hook subcommand is maintained as an alias for the
time being to allow for existing CDI specifications referring to
this path to work as expected.
Signed-off-by: Avi Deitcher <avi@deitcher.net>
This allows settings such as:
nvidia-ctk config --set nvidia-container-runtime.runtimes=crun:runc
to be applied correctly.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change moves from using strings to useing device.Identifiers
as input for requesting CDI specifications for specific
devices.
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
When running nvidia-ctk on a system that uses a custom XDG_DATA_DIRS
environment variable value, the configuration files for `glvnd`,
`vulkan`, and `egl` fail to get passed through from the host to the
container. Reading from XDG_DATA_DIRS instead of hardcoding the default
value allows for finding said files so they can be mounted in the
container.
Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a features config that allows
individual features to be toggled at a global level. Each feature can (by default)
be controlled by an environment variable.
The GDS, MOFED, NVSWITCH, and GDRCOPY features are examples of such features.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change includes an arm64 arch check when installing golang.
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures taht NVIDIA_VISIBLE_DEVICES=void is included in
generated CDI specs. This prevents the NVIDIA Container Runtime Hook
from injecting devices if NVIDIA_VISIBLE_DEVICES=all is set.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change removes the additional libnvidia-container0=0.10.0+jetpack dependency
that was introduced for Tegra-based systems. These have since been migrated to
CDI-based direct injection using the NVIDIA Container Runtime.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a --create-device-nodes option to the toolkit config CLI.
Most noteably, this allows the creation of control devices to be skipped
when CDI spec generation is enabled.
Currently values of "", "node", and "control" are supported and can be set
via the command line flag or the CREATE_DEVICE_NODES environment variable.
The default value of CREATE_DEVICE_NODES=control will trigger the creation
of control device nodes. Setting this envvar to include the (comma-separated)
strings of "" or "none" will disable device node creation regardless of
whether other supported strings are included.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that CLI tools that require the path to the
driver root accept both the NVIDIA_DRIVER_ROOT and DRIVER_ROOT
environment variables in addition to the --driver-root command
line argument.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change removes the centos8-x86_64 and centos8-aarch64 pipeline jobs.
These packages are no longer used since centos7 packages are used instead.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Instead of relying only on Experimental mode, the docker daemon
config requires that CDI is an opt-in feature.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Allow for customizing the path to ldconfig and call ldconfig with explicit cache/config paths
See merge request nvidia/container-toolkit/container-toolkit!525
Since the `update-ldcache` hook uses the host's `ldconfig`, the default
cache and config files configured on the host will be used. If those
defaults differ from what nvidia-ctk expects it to be (/etc/ld.so.cache
and /etc/ld.so.conf, respectively), then the hook will fail. This change
makes the call to ldconfig explicit in which cache and config files are
being used.
Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
Since the `createContainer` `runc` hook runs with the environment that
the container's config.json specifies, the path to `ldconfig` may not be
easily resolvable if the host environment differs enough from the
container (e.g. on a NixOS host where all binaries are under hashed
paths in /nix/store with an Ubuntu container whose PATH contains
FHS-style paths such as /bin and /usr/bin). This change allows for
specifying exactly where ldconfig comes from.
Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
This change adds crun as a configured low-level runtime.
Note that runc still preferred and will be used if present on the
system.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
For users running the nvidia-container-runtime it would be useful
to determine the runtime mode used from the logs directly instead
of relying on other log messages as signals. This change ensures
that an explicitly selected mode is also logged instead of only
when mode=auto.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Using `WithDevRoot` on Tegra platforms was incorrectly setting
`driverRoot`, fix it so that it correctly sets `devRoot`.
Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
Refactor the engine.Interface such that the Set() API does not return an extraneous error
See merge request nvidia/container-toolkit/container-toolkit!515
This change adds a --relative-to option to the nvidia-ctk transform root
command. This defaults to "host" maintaining the existing behaviour.
If --relative-to=container is specified, the root transform is applied to
container paths in the CDI specification instead of host paths.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change renames the root transformer to indicate that it
operates on host paths and adds a container root transformer for
explicitly transforming container roots.
The transform.NewRootTransformer constructor still exists, but has
been marked as deprecated.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change switches to using the reflect package to determine
the type of config options instead of inferring the type from the
Toml data structure.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds support for an NVIDIA_NVSWITCH environment variable.
When set to `enabled` this striggers the injection of all available
/dev/nvidia-nvswitch* device nodes.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a driver root abstraction that defines how
libraries are located relative to the root. This allows for
this driver root to be constructed once and passed to discovery
code.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Instead of relying solely on a static config, we resolve the path
to ldconfig. The path is checked for existence and a .real suffix is preferred.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change renames NewGraphicsDiscoverer to NewDRMNodesDiscoverer and
instead calls NewGraphicsMountsDiscoverer explicitly when constructing
a graphics modifier.
This avoids the import of config.Config into the discover package
which leads to a transitive dependency on toml-specifics and
requires that the vendor/github.com/pelletier/ package
be vendored in to consumers.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
A driverRoot defines both the driver library root and the
root for device nodes. In the case of preinstalled drivers or
the driver container, these are equal, but in cases such as GKE
they do not match. In this case, drivers are extracted to a folder
and devices exist at the root /.
The changes here add a devRoot option to the nvcdi API that allows the
parent of /dev to be specified explicitly.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
In some cases we might get a permission error trying to chmod -
most likely this is due to something beyond our control
like whole `/dev` being mounted.
Do not fail container creation in this case.
Due to loosing control of the program after `exec()`-ing `chmod(1)` program
and therefore not being able to handle errors -
refactor to use `chmod(2)` syscall instead of `exec()` `chmod(1)` program.
Fixes: #143
Signed-off-by: Ievgen Popovych <jmennius@gmail.com>
This change skips the update of ld.cache in the container if it
doesn't exist. Instead, the -N flag is used to only create the
relevant symlinks.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Since we allow pattern inputs for locating symlinks we could have
multiples. The error being checked is resolved by the deduplication.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
These changes make library lookups more robust. The core change is that
library lookups now first look a set of predefined locations before checking
the ldcache. This also handles cases where an ldcache is not available more
gracefully.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows CDI devices to be requested as mounts in the
container. This enables their use in environments such as kind
where environment variables or annotations cannot be used.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change refactors the use of the symlink filter to make it extendible.
A blocked filter can be set on the Tegra CSV discoverer to ensure that the correct
symlink libraries are filtered out. Here, globs can be used to select mulitple libraries,
and a **/ prefix on the globs indicates that the pattern that follows is only applied to
the filename of the symlink entry in the CSV file.
A --csv.ignore-pattern command line argument is added to the nvidia-ctk cdi generate
command that allows this to be set.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change improves the testibility of the CSV discoverer.
This is done by adding injection points for mocks for library discovery and
symlink resolution.
Note that this highlights a bug in the current implementation where the
library filter causes valid symlinks to be skipped.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a "required" option to the new toml config
that controls whether a default config is returned or not.
This is useful from the NVIDIA Container Runtime Hook, where
/run/driver/nvidia/etc/nvidia-container-runtime/config.toml
is checked before the standard path.
This fixes a bug where the default config was always applied
when this config was not used.
See https://github.com/NVIDIA/nvidia-container-toolkit/issues/106
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a UsesNVGPUModule function that checks whether the nvgpu
kernel module is used by NVML. This allows for more robust detection of
Tegra-based platforms where libnvidia-ml.so is supported to enumerate the
iGPU.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates go-nvlib to include logic to skip NVIDIA PCI-E
devices where the name or class id is not known.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change renames the csv.library-search-path option to
library-search-path so as to be more generally applicable in
future. Note that the option is still only applied in csv mode.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This chagne simplifies the nvidia-ctk config default command.
By default it now outputs the default config to STDOUT, and can
optionally output this to file.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change introduced a config.Toml type that is used as the base for
config file processing and manipulation. This ensures that configs --
including commented values -- can be handled consistently.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Migrate to internal/config.Config structs for the NVIDIA Container Runtime Hook config.
See merge request nvidia/container-toolkit/container-toolkit!463
This change ensures that the Config structs from internal.Config
are used for the NVIDIA Container Runtime Hook config too.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change removes installation of the oci-nvidia-hook files.
These files conflict with CDI use in runtimes that support it.
The use of the hook should be considered deprecated on these platforms.
If a hook is required, the
nvidia-ctk runtime configure --config-mode=oci-hook
command should be used to create the hook file(s).
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change extends the nvidia-ctk runtime configure command
with a --config-mode=oci-hook that creates an OCI hook json file.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the centos7 and ubuntu18.04 packages are
published to the generic rpm and deb repos, respectively.
All other packages except the centos8-ppc64le packages are skipped
as these use cases are covered by the generic packages.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
In order to properly handle systems with both iGPU and dGPU
drivers included, we skip "sym" mount specifications which
refer to .so or .so.[1-9] files.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change explicitly generates a CDI specification from
the supplied CSV files when cdi mode is detected. This
ensures consistency between the behaviour on Tegra-based
systems.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
If the config.toml has an empty root specified, this could be
passed to the NVIDIA Container CLI through the --root flag
which causes argument parsing to fail. This change only
adds the --root flag if the config option is specified
and is non-empty.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change generates CDI specifications for Tegra devices
with the nvidia.com/gpu=0 name by default. The type-index
nameing strategy is also supported and will generate a device
with the name nvidia.com/gpu=gpu0.
The uuid naming strategy will raise an error if selected.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Since the incoming OCI spec has already been parsed and used to
construct a CUDA image representation, pass this to the CSV
modifier constructor instead of re-creating an image representation.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change sets the default CDI spec dirs at a config level instead
of when a CDI runtime modifier is constructed. This makes this setting
consistent with other options such as the nvidia-ctk path.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Include Shared Compiler Library (libnvidia-gpucomp.so) in the list of compute libaries.
See merge request nvidia/container-toolkit/container-toolkit!442
Path to locate the GSP firmware is explicitly set to /lib/firmware/nvidia.
Users may chose to install the GSP firmware in alternate locations where
the kernel would look for firmware on the root filesystem.
Add locate functionality which looks for the GSP firmware files in the
same location as the kernel would
(https://docs.kernel.org/driver-api/firmware/fw_search_path.html).
The paths searched in order are:
- path described in /sys/module/firmware_class/parameters/path
- /lib/firmware/updates/UTS_RELEASE/
- /lib/firmware/updates/
- /lib/firmware/UTS_RELEASE/
- /lib/firmware/
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The debian and rpm packages are updated to trigger the generation of
of a default config if no config exists at the expected location.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the nvidia-ctk config default command
generates a config file that is compatible with the official documentation
to, for example, disable cgroups in the NVIDIA Container CLI.
This requires that whitespace around comments is stripped before outputing the
contets.
This also adds an option to load a config and modify it in-place instead. This can
be triggered as a post-install step, for example.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The stable-dind image is out of date and has not been updated for 3 years.
This change updates to the latest dind image.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change uses the actual discovered path of nvidia-smi when
creating a symlink to the binary on WSL2 platforms.
This ensures that cases where multiple driver store paths are
detected are supported.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This changes splits the functionality in the internal system package
into two packages: one for dealing with devices and one for dealing
with kernel modules. This removes ambiguity around the meaning of
driver / device roots in each case.
In each case, a root can be specified where device nodes are created
or kernel modules loaded.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Thsi change adds the --nvidia-runtime-dir as a command line
argument when configuring container runtimes in the toolkit container.
This removes the need to set it via the command line.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a --create-device-nodes option to the
nvidia-ctk system create-dev-char-symlinks command to create
device nodes. The currently only creates control device nodes.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
These changes add a --load-kernel-modules option to the
nvidia-ctk system commands. If specified the NVIDIA kernel modules
(nvidia, nvidia-uvm, and nvidia-modeset) are loaded before any
operations on device nodes are performed.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change uses the installed NVIDIA Container Runtime Hook wrapper
as the path in the applied config. This prevents conflicts with other
installations of the NVIDIA Container Toolkit.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-container-runtime-hook.path config option
to allow the path used for the prestart hook to be overridden. This
is useful in cases where multiple NVIDIA Container Toolkit installations
are present.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change switches to generating a OCI runtime hook to create
individual symlinks instead of processing a CSV file in the hook.
This allows for better reuse of the logic generating CDI
specifications, for example.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a symlinks.Resolve function for resolving symlinks and
updates usages across the code to make use of it.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates go-nvlib to ensure that non-migcapable GPUs
are skipped when generating CDI specifications for MIG devices.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This chagne allows the csv mode option to specified in the
nvidia-ctk cdi generate command and adds a --csv.file option
that can be repeated to specify the CSV files to be processed.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that libcuda.so can be located on systems
where no patch version is specified in the driver version.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The nvcid api is extended to allow for merged device options to
be specified. If any options are specified, then a merged device
is generated.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
These changes remove the use of discover.Config which was used
to pass the driver root and the nvidiaCTK path in some cases.
Instead, the nvidiaCTKPath is resolved at the begining of runtime
invocation to ensure that this is valid at all points where it is
used.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Since the default configuration is now platform specific,
there is no need to install specific versions as part of
the package installation.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Also include manifest.txt with, for example, version
info when extracting packages from the packagin image.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a GetDefaultConfigToml function to the config package.
This function returns the default config in the form of raw TOML
including comments. This is useful for generating a default config at
installation time, with platform-specific differences codified.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This ensures that the artifacts associated with a particular
release version are preserved along with the container
images that are used as operands for this version.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a CLI command to generate a default config.
This config checks the host operating system to apply specific
modifications that were previously captured in static config
files.
These include:
* select /sbin/ldconfig or /sbin/ldconfig.real depending on which exists on the host
* set the user to allow device access on SUSE-based systems
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change renames the struct for storing CLI flag values options over
config to avoid a conflict with the config package.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Generate CDI specifications with 644 permissions to allow non-root clients to consume them
See merge request nvidia/container-toolkit/container-toolkit!381
By default, temporary files are created with permissions 600 and
this means that the files created when updating the ldcache are
not readable in non-root containers.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Support building nvidia-docker and nvidia-container-runtime as dist-independent packages
See merge request nvidia/container-toolkit/container-toolkit!379
In order to add the libnvidia-container0 packages to our ubuntu18.04
kitmaker archive, a workaround was added that downloaded these packages
before constructing the archive. Since the packages have now been
published -- and will not change -- this workaround is not longer needed.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the nvidia-docker2 and nvidia-container-runtime
components are not build and distributed for patch releases.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-container-runtime.runtimes config option.
If this is unset no changes are made to the config and the default values are used. This
allows this setting to be overridden in cases where this is required. One such example is
crio where crun is set as the default runtime.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-ctk system create-device-nodes command for
creating NVIDIA device nodes. Currently this is limited to control devices
(nvidia-uvm, nvidia-uvm-tools, nvidia-modeset, nvidiactl).
A --dry-run mode is included for outputing commands that would be executed and
the driver root can be specified.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-container-runtime.modes.cdi.annotation-prefixes config
option that defaults to cdi.k8s.io/. This allows the annotation prefixes parsed
for CDI devices to be overridden in cases where CDI support in container engines such
as containerd or crio need to be overridden.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows nvcdi.New to return an error in addition to the
constructed library instead of panicing.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
As simplified CDI spec has no duplicate entities in any single set of container edits.
Furthermore, contianer edits defined at a spec-level are not included in the container
edits for a device.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Since we relied on finding libcuda.so in the LDCache to determine both the CUDA
version and the expected directory for the driver libraries, the generation of the
management CDI specifications fails in containers where the LDCache has not been updated.
This change falls back to searching a set of predefined paths instead when the lookup of
libcuda.so in the cache fails.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
CDI generation modes such as management and wsl don't require
NVML. This change removes the top-level instantiation of nvmllib
and replaces it with an instanitation in the nvml CDI spec generation
code.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This chagne prefers (non-symlink) sockets at /run over /var/run for
nvidia-persistenced and nvidia-fabricmanager sockets.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds .cdi and .legacy mode-specific runtimes the list of
runtimes supported by the operator. These are also installed as
part of the NVIDIA Container Toolkit.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change switches to using nvidia-container-runtime.experimental as the
wrapper name over nvidia-container-runtime-experimental. This is consistent
with upcoming mode-specific binaries.
The wrapper is created at nvidia-container-runtime.experimental.real.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows the nvidia-container-runtime.modes.cdi.default-kind
to be set in the toolkit-container.
The NVIDIA_CONTAINER_RUNTIME_MODES_CDI_DEFAULT_KIND envvar is used.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The following changes are made:
* The default-cdi-kind config option is used to convert an envvar entry to a fully-qualified device name
* If annotation devices exist, these are used instead of the envvar devices.
* The `all` device is no longer treated as a special case and MUST exist in the CDI spec.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
These changes add support for generating a management spec to the nvcdi API.
A management spec consists of a single CDI device (`all`) which includes all expected
NVIDIA device nodes, driver libraries, binaries, and IPC sockets.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Fix handling of envvars in toolkit container which modify the NVIDIA Container Runtime config
See merge request nvidia/container-toolkit/container-toolkit!317
This change generates device folder permission hooks per device instead of
at a spec level. This ensures that the hook is not injected for a device that
does not have any nested device nodes.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
These changes add a wsl discovery mode to the nvidia-ctk cdi generate command.
If wsl mode is enabled, the driver store for the available devices is used as
the source for discovered entities.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds --discovery-mode flag to the nvidia-ctk cdi generate
command and plumbs this through to the CDI API.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change copies dxcore.h and dxcore.c from libnvidia-container to
allow for the driver store path to be queried. Modifications are made
to dxcore to remove the code associated with checking the components
in the driver store path.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows the nvidia-container-runtime.mode option to be set
by the toolkit container.
This is controlled by the --nvidia-container-runtime-mode command line
argument and the NVIDIA_CONTAINER_RUNTIME_MODE envvar.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Newer drivers have split the GSP firmware into multiple files so a simple match
against gsp.bin in the firmware directory is no longer possible. This patch
adds globbing capabilitis to match any GSP firmware files of the form gsp*.bin
and mount them all into the container.
Signed-off-by: Kevin Klues <kklues@nvidia.com>
This change adds an nvcdi package that exposes a basic API for
CDI spec generation. This is used from the nvidia-ctk cdi generate
command and can be consumed by DRA implementations and the device plugin.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-container-runtime.cdi executable that
overrides the runtime mode from the config to "cdi".
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The version for RPM release candidates has the form `1.13.0-0.1.rc.1-1` whereas debian packages have the form `1.13.0~rc.1-1`.
Note that since the `~` is handled in [the same way](https://docs.fedoraproject.org/en-US/packaging-guidelines/Versioning/#_handling_non_sorting_versions_with_tilde_dot_and_caret) as for Debian packages, there does not seem to be a specific reason for this and dealing with multiple version strings in our entire pipeline adds complexity.
This change aligns the package versioning for rpm packages with Debian packages.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This makes the intent of the command line argument clearer since this
relates specifically to the root where the NVIDIA driver is installed.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the update-ldcache hook is created in a manner
consistent with other nvidia-ctk hooks ensuring that a full path is
used.
Without this change the update-ldcache hook on Tegra-based sytems had an
invalid path.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
If this is not done, the default config which sets the nvidia-ctk.path
option as "nvidia-ctk" will result in an invalid OCI spec if a hook is
injected. This change ensures that the path used is always an absolute
path as required by the hook spec.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change simplifies how Kitmaker archives are constructed.
Currently only centos8 and ubuntu18.04 packages are included.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change uses the `index` mode for the --device-name-strategy when
generating CDI specifications by default. This generates device names
such as nvidia.com/gpu=0 or nvidia.com/gpu=1:0 by default.
Note that this requires a CDI spec version of 0.5.0 and for consumers
(e.g. podman) that are only compatible with older versions one of the
other stragegies (`type-index` or `uuid`) should be used instead to
generate a v0.3.0 or v0.4.0 specification.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a --create-all mode to the create-dev-char-symlinks hook.
This mode creates all POSSIBLE symlinks to device nodes for regular and cap
devices. With the number of GPUs inferred from the PCI device information.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a --watch option to the create-dev-char-symlinks hook. This
installs an fsnotify watcher that creates symlinks for ADDED device nodes under
/dev/char.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-ctk hook create-dev-char-symlinks
subcommand that creates symlinks to device nodes (as required by
systemd) under /dev/char.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a --device-name-strategy flag for generating a CDI
specificaion. This allows a CDI spec to be generated with the following
names used for device:
* type-index: gpu0 and mig0:1
* index: 0 and 0:1
* uuid: GPU and MIG UUIDs
Note that the use of 'index' generates a v0.5.0 CDI specification since
this relaxes the restriction on the device names.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the first match of an executable in the path
is retured instead of a list of candidates. This prevents a CDI spec,
for example, from containing multiple entries for a single executable
(e.g. nvidia-smi).
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change uses functionality from the CDI package to determine
the minimum required CDI spec version. This allows for a spec with
the widest compatibility to be specified.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change implements the discovery of versioned driver libaries
by reusing the mounts and update ldcache discoverers use for, for example,
CVS file discovery. This allows the container paths to be correctly generated
without requiring specific manipulation.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an --nvidia-ctk-path to the nvidia-ctk cdi generate
command. This ensures that the executable path for the generated
hooks can be specified consistently.
Since the NVIDIA Container Runtime already allows for the executable
path to be specified in the config the utility code to update the
LDCache and create other nvidia-ctk hooks are also updated.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change refactors the generation of CDI specifications
to use discoverers and generate the CDI specifications from these
discoverers. This allows for better reuse.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The HostPath field was added in the v0.5.0 CDI specification.
The cdi package uses strict unmarshalling when loading specs
from file causing failures for unexpected fields.
Since the behaviour for HostPath == "" and HostPath == Path are
equivalent, we clear HostPath if it is equal to Path to ensure
compatibility with the widest range of specs.
This allows, for example, a v0.4.0 spec to be generated as required.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change replaces the `--json` flag of the nvidia-ctk cdi generate
command with a --format flag that accepts a string format of either
json or yaml.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change extends the support for multiple envvars when
specifying swarm resources to consider ALL of the specified
environment variables instead of the first match.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Inject DRM device nodes into containers when Graphics or Display capabilities are requested
See merge request nvidia/container-toolkit/container-toolkit!235
This change adds the discovery of DRM devices associated with requested
devices. This means that the /dev/dri/card* and /dev/dri/renderD*
devices associated with each requested NVIDIA GPU are injected into
the container and that the /dev/dri/by-path symlinks associated with
these devices are created in the container.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds support for filtering entities by specifying a filter.
This can be used, for example, to check whether a mount or device
has a particular property and removing it from the set of discovered
entities if it does not.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a Devices abstraction to the CUDA image utilities. This
allows for checking whether a devices is selected, for example.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Add nvidia-ctk hook chmod command to set permissions and ensure permissions of `/dev/nvidia-caps` is set
See merge request nvidia/container-toolkit/container-toolkit!232
This change generates one or more createContainer hooks for ensuring
that subfolders in /dev have the required permissions in the container.
As an example, a user requires read permissions to the /dev/nvidia-caps
in addition to including the specific caps devices under this folder.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-ctk hook chmod command that can be used
to update the permissions for paths in the container.
This prepends the container root to the paths to allow these to be
updated by runtime executables.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the CDI spec mounts the ipc sockets with the
noexec flag to allow these to function in rootless mode with podman.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates the docker config update for simplicitly.
This also allows for the API to match the crio update code.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This adds support for updating crio configs (instead of installing hooks)
and adds crio support to the nvidia-ctk runtime configure command.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates the ordering of internal pipeline dependencies to
ensure that the correct rules are applied.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change includes meta devices (e.g. /dev/nvidiactl) in the
generated CDI spec. Missing device nodes are ignored.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change generates a v0.4.0 CDI spec instead of a v0.5.0 spec.
This allows older versions of podman, for example, to be used.
This requires that the device names do not start on a numeric character
and that the HostPath for a device is unspecified.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows the swarm-resource config option to specify a
comma-separated list of environment variables instead of a single
environment variable.
The first environment variable matched is considered and other
environment variables are ignored.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds functionality to generate CDI specifications
for all devices detected on the system. A specification containing
all GPUs and MIG devices is generated. All libraries on the host
ldcache that have an NVIDIA Driver Version suffix are included as
are the required binaries and IPC sockets.
A hook (based on the nvidia-ctk hook subcommand) to update the ldcache
in the container for the libraries being injected is also added to the
CDI specificiation.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows the NVIDIA Container Runtime to inject vulkan
loaders and libraries by modifying the OCI runtime specification.
This allows vulkan applications to run in containers without
additional modifications.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a Locator that can be used to locate libraries.
If library names are specified, the ldcache is searched otherwise
symlinks are resolved.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that a more concrete error is provided by the NVIDIA
Container Runtime if the NVIDIA Container Runtime hook cannot be
located.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows the destination / root to be set as the
first positional argument OR as a command line flag. This
allows for the GPU Operator to transition to a case where
on the flag / envvar is used.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This reuses the docker file for yum-based rpm distros (centos, amazonlinux)
instead of maintaining two files with the same contents.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change splits the nvidia-container-toolkit package into the top-level package and
an nvidia-container-toolkit-base package.
The nvidia-container-toolkit-base package allows the NVIDIA Container Runtime and
NVIDIA Container Toolkit CLI to be installed on systems without requiring that the
NVIDIA Container Runtine Hook and the transitive dependencies included in the NVIDIA
Container Library and NVIDIA Container CLI also be installed.
This allows the runtime to be used on systems where the CSV or CDI mode of the runtime
is used exclusively.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a modifier to that injects the tegra platform files
* /etc/nv_tegra_release
* /sys/devices/soc0/family
allowing these files to be used for platform detection in a containerized
context such as the GPU device plugin.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows the
* accept-nvidia-visible-devices-envvar-when-unprivileged
* accept-nvidia-visible-devices-as-volume-mounts
options to be set in the toolkit-container. These are controlled
by command line flags or the following environment variables:
* ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
* ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a root member to the mounts type that is used to
perform most of the lookups for files and devices. This allows
for consistent handling of relative paths.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change improves the error message when invoking the NVIDIA
Runtime Hook in non-legacy mode. This should guide users to specifying
the --runtime=nvidia flag when using docker.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The Relative method added to the Locator interface was
not correctly implemented in the file type. The root was
never set when instantiating the object.
This change removes this method from the interface and the file
type, switching to a local implementation in the mounts type
instead.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Since the creation of symlinks may include other libraries / folders
the ldcache should be updated AFTER the symlinks are created.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a `runtime configure` command to the nvidia-ctk CLI. This
command is currently limited to configuring the docker config on the
system by modifying the daemon.json config file associated with docker.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
In preparation for adding a command to the nvidia-ctk CLI to modify
the docker config, this change refactors load, update, and flush logic
from the toolkit container docker CLI to an internal package.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-container-runtime.modes.cdi.spec-dirs
config option that allows the default spec dirs to be overridden.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change reuse the code that checks for the existing NVIDIA
Container Runtime hook to ensure that both nvidia-container-toolkit
and nvidia-container-runtime-hook are detected.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change renames the nvidia-container-toolkit executable
to nvidia-container-runtime-hook. Here nvidia-container-toolkit
is created as a symlink to nvidia-container-runtime-hook.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
For rc releases we allow nvidia-container-toolkit versions
to not match libnvidia-container versions. This change ensures
that only changed packages are released.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This mirrors what is done in cri-o and allows for devices nodes
from, for example, the driver container to be injected into a
container at /dev instead of <ROOT>/dev
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a charDevices discoverer and using this
for CSV, GDS, and MOFED discovery. Internally the discoverer
is a "mounts" discoverer with a charDevice locator.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Instead of creating a set of discoverers per file, this change creates
a discoverer per type by first concatenating the mount specifications
from all files. This will allow all device nodes, for example, to
be treated as a single device.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This adds a Relative function to the Locator interface and uses
this to determine the host and container paths for located files
(and devices). This ensures that the root (e.g. the nvidia driver
root) is stripped from the container path.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change creates GDS and MOFED modifiers and adds them to the
modifer created for the selected runtime mode if the NVIDIA_GDS
and NVIDIA_MOFED envvars are set to "enabled", respectively.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change uses modifier compositioning and the discoverModifier to
refactor the existing CSV modifier.
This change adds a discoverModifier to the internal/modifier package and
refactors the CSV modifier to use this abstraction.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change prevents errors when downloading ubuntu repos on
amd64 architectures. The `stable` images were last pushed
2 years ago.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
- Fix bug when using just-in-time CDI spec generation
- Check for valid paths in create-symlinks hook
## v1.17.0-rc.2
- Fix bug in locating libcuda.so from ldcache
- Fix bug in sorting of symlink chain
- Remove unsupported print-ldcache command
- Remove csv-filename support from create-symlinks
### Changes in the Toolkit Container
- Fallback to `crio-status` if `crio status` does not work when configuring the crio runtime
## v1.17.0-rc.1
- Allow IMEX channels to be requested as volume mounts
- Fix typo in error message
- Add disable-imex-channel-creation feature flag
- Add -z,lazy to LDFLAGS
- Add imex channels to management CDI spec
- Add support to fetch current container runtime config from the command line.
- Add creation of select driver symlinks to CDI spec generation.
- Remove support for config overrides when configuring runtimes.
- Skip explicit creation of libnvidia-allocator.so.1 symlink
- Add vdpau as as a driver library search path.
- Add support for using libnvsandboxutils to generate CDI specifications.
### Changes in the Toolkit Container
- Allow opt-in features to be selected when deploying the toolkit-container.
- Bump CUDA base image version to 12.6.2
- Remove support for config overrides when configuring runtimes.
### Changes in libnvidia-container
- Add no-create-imex-channels command line option.
## v1.16.2
- Exclude libnvidia-allocator from graphics mounts. This fixes a bug that leaks mounts when a container is started with bi-directional mount propagation.
- Use empty string for default runtime-config-override. This removes a redundant warning for runtimes (e.g. Docker) where this is not applicable.
### Changes in the Toolkit Container
- Bump CUDA base image version to 12.6.0
### Changes in libnvidia-container
- Add no-gsp-firmware command line option
- Add no-fabricmanager command line option
- Add no-persistenced command line option
- Skip directories and symlinks when mounting libraries.
## v1.16.1
- Fix bug with processing errors during CDI spec generation for MIG devices
## v1.16.0
- Promote v1.16.0-rc.2 to v1.16.0
### Changes in the Toolkit Container
- Bump CUDA base image version to 12.5.1
## v1.16.0-rc.2
- Use relative path to locate driver libraries
- Add RelativeToRoot function to Driver
- Inject additional libraries for full X11 functionality
- Extract options from default runtime if runc does not exist
- Avoid using map pointers as maps are always passed by reference
- Reduce logging for the NVIDIA Container runtime
- Fix bug in argument parsing for logger creation
## v1.16.0-rc.1
- Support vulkan ICD files directly in a driver root. This allows for the discovery of vulkan files in GKE driver installations.
- Increase priority of ld.so.conf.d config file injected into container. This ensures that injected libraries are preferred over libraries present in the container.
- Set default CDI spec permissions to 644. This fixes permission issues when using the `nvidia-ctk cdi transform` functions.
- Add `dev-root` option to `nvidia-ctk system create-device-nodes` command.
- Fix location of `libnvidia-ml.so.1` when a non-standard driver root is used. This enabled CDI spec generation when using the driver container on a host.
- Recalculate minimum required CDI spec version on save.
- Move `nvidia-ctk hook` commands to a separate `nvidia-cdi-hook` binary. The same subcommands are supported.
- Use `:` as an `nvidia-ctk config --set` list separator. This fixes a bug when trying to set config options that are lists.
- [toolkit-container] Bump CUDA base image version to 12.5.0
- [toolkit-container] Allow the path to `toolkit.pid` to be specified directly.
- [toolkit-container] Remove provenance information from image manifests.
- [toolkit-container] Add `dev-root` option when configuring the toolkit. This adds support for GKE driver installations.
## v1.15.0
* Remove `nvidia-container-runtime` and `nvidia-docker2` packages.
* Use `XDG_DATA_DIRS` environment variable when locating config files such as graphics config files.
* Add support for v0.7.0 Container Device Interface (CDI) specification.
* Add `--config-search-path` option to `nvidia-ctk cdi generate` command. These paths are used when locating driver files such as graphics config files.
* Use D3DKMTEnumAdapters3 to enumerate adpaters on WSL2 if available.
* Add support for v1.2.0 OCI Runtime specification.
* Explicitly set `NVIDIA_VISIBLE_DEVICES=void` in generated CDI specifications. This prevents the NVIDIA Container Runtime from making additional modifications.
* [libnvidia-container] Use D3DKMTEnumAdapters3 to enumerate adpaters on WSL2 if available.
* [toolkit-container] Bump CUDA base image version to 12.4.1
## v1.15.0-rc.4
* Add a `--spec-dir` option to the `nvidia-ctk cdi generate` command. This allows specs outside of `/etc/cdi` and `/var/run/cdi` to be processed.
* Add support for extracting device major number from `/proc/devices` if `nvidia` is used as a device name over `nvidia-frontend`.
* Allow multiple device naming strategies for `nvidia-ctk cdi generate` command. This allows a single
CDI spec to be generated that includes GPUs by index and UUID.
* Set the default `--device-name-strategy` for the `nvidia-ctk cdi generate` command to `[index, uuid]`.
* Remove `libnvidia-container0` jetpack dependency included for legacy Tegra-based systems.
* Add `NVIDIA_VISIBLE_DEVICES=void` to generated CDI specifications.
* [toolkit-container] Remove centos7 image. The ubi8 image can be used on all RPM-based platforms.
* [toolkit-container] Bump CUDA base image version to 12.3.2
## v1.15.0-rc.3
* Fix bug in `nvidia-ctk hook update-ldcache` where default `--ldconfig-path` value was not applied.
## v1.15.0-rc.2
* Extend the `runtime.nvidia.com/gpu` CDI kind to support full-GPUs and MIG devices specified by index or UUID.
* Fix bug when specifying `--dev-root` for Tegra-based systems.
* Log explicitly requested runtime mode.
* Remove package dependency on libseccomp.
* Added detection of libnvdxgdmal.so.1 on WSL2
* Use devRoot to resolve MIG device nodes.
* Fix bug in determining default nvidia-container-runtime.user config value on SUSE-based systems.
* Add `crun` to the list of configured low-level runtimes.
* Added support for `--ldconfig-path` to `nvidia-ctk cdi generate` command.
* Fix `nvidia-ctk runtime configure --cdi.enabled` for Docker.
* Add discovery of the GDRCopy device (`gdrdrv`) if the `NVIDIA_GDRCOPY` environment variable of the container is set to `enabled`
* [toolkit-container] Bump CUDA base image version to 12.3.1.
## v1.15.0-rc.1
* Skip update of ldcache in containers without ldconfig. The .so.SONAME symlinks are still created.
* Normalize ldconfig path on use. This automatically adjust the ldconfig setting applied to ldconfig.real on systems where this exists.
* Include `nvidia/nvoptix.bin` in list of graphics mounts.
* Include `vulkan/icd.d/nvidia_layers.json` in list of graphics mounts.
* Add support for `--library-search-paths` to `nvidia-ctk cdi generate` command.
* Add support for injecting /dev/nvidia-nvswitch* devices if the NVIDIA_NVSWITCH=enabled envvar is specified.
* Added support for `nvidia-ctk runtime configure --enable-cdi` for the `docker` runtime. Note that this requires Docker >= 25.
* Fixed bug in `nvidia-ctk config` command when using `--set`. The types of applied config options are now applied correctly.
* Add `--relative-to` option to `nvidia-ctk transform root` command. This controls whether the root transformation is applied to host or container paths.
* Added automatic CDI spec generation when the `runtime.nvidia.com/gpu=all` device is requested by a container.
* [libnvidia-container] Fix device permission check when using cgroupv2 (fixes #227)
## v1.14.3
* [toolkit-container] Bump CUDA base image version to 12.2.2.
## v1.14.2
* Fix bug on Tegra-based systems where symlinks were not created in containers.
* Add --csv.ignore-pattern command line option to nvidia-ctk cdi generate command.
## v1.14.1
* Fixed bug where contents of `/etc/nvidia-container-runtime/config.toml` is ignored by the NVIDIA Container Runtime Hook.
* [libnvidia-container] Use libelf.so on RPM-based systems due to removed mageia repositories hosting pmake and bmake.
## v1.14.0
* Promote v1.14.0-rc.3 to v1.14.0
## v1.14.0-rc.3
* Added support for generating OCI hook JSON file to `nvidia-ctk runtime configure` command.
* Remove installation of OCI hook JSON from RPM package.
* Refactored config for `nvidia-container-runtime-hook`.
* Added a `nvidia-ctk config` command which supports setting config options using a `--set` flag.
* Added `--library-search-path` option to `nvidia-ctk cdi generate` command in `csv` mode. This allows folders where
libraries are located to be specified explicitly.
* Updated go-nvlib to support devices which are not present in the PCI device database. This allows the creation of dev/char symlinks on systems with such devices installed.
* Added `UsesNVGPUModule` info function for more robust platform detection. This is required on Tegra-based systems where libnvidia-ml.so is also supported.
* [toolkit-container] Set `NVIDIA_VISIBLE_DEVICES=void` to prevent injection of NVIDIA devices and drivers into the NVIDIA Container Toolkit container.
## v1.14.0-rc.2
* Fix bug causing incorrect nvidia-smi symlink to be created on WSL2 systems with multiple driver roots.
* Remove dependency on coreutils when installing package on RPM-based systems.
* Create output folders if required when running `nvidia-ctk runtime configure`
* Generate default config as post-install step.
* Added support for detecting GSP firmware at custom paths when generating CDI specifications.
* Added logic to skip the extraction of image requirements if `NVIDIA_DISABLE_REQUIRES` is set to `true`.
* [libnvidia-container] Include Shared Compiler Library (libnvidia-gpucomp.so) in the list of compute libaries.
* [toolkit-container] Ensure that common envvars have higher priority when configuring the container engines.
* [toolkit-container] Bump CUDA base image version to 12.2.0.
* [toolkit-container] Remove installation of nvidia-experimental runtime. This is superceded by the NVIDIA Container Runtime in CDI mode.
## v1.14.0-rc.1
* Add support for updating containerd configs to the `nvidia-ctk runtime configure` command.
* Create file in `etc/ld.so.conf.d` with permissions `644` to support non-root containers.
* Generate CDI specification files with `644` permissions to allow rootless applications (e.g. podman)
* Add `nvidia-ctk cdi list` command to show the known CDI devices.
* Add support for generating merged devices (e.g. `all` device) to the nvcdi API.
* Use *.* pattern to locate libcuda.so when generating a CDI specification to support platforms where a patch version is not specified.
* Update go-nvlib to skip devices that are not MIG capable when generating CDI specifications.
* Fix bug in creation of `/dev/char` symlinks by failing operation if kernel modules are not loaded.
* Add option to load kernel modules when creating device nodes
* Add option to create device nodes when creating `/dev/char` symlinks
* [libnvidia-container] Support OpenSSL 3 with the Encrypt/Decrypt library
* [toolkit-container] Allow same envars for all runtime configs
## v1.13.1
* Update `update-ldcache` hook to only update ldcache if it exists.
* Update `update-ldcache` hook to create `/etc/ld.so.conf.d` folder if it doesn't exist.
* Fix failure when libcuda cannot be located during XOrg library discovery.
* Fix CDI spec generation on systems that use `/etc/alternatives` (e.g. Debian)
## v1.13.0
* Promote 1.13.0-rc.3 to 1.13.0
## v1.13.0-rc.3
* Only initialize NVML for modes that require it when runing `nvidia-ctk cdi generate`.
* Prefer /run over /var/run when locating nvidia-persistenced and nvidia-fabricmanager sockets.
* Fix the generation of CDI specifications for management containers when the driver libraries are not in the LDCache.
* Add transformers to deduplicate and simplify CDI specifications.
* Generate a simplified CDI specification by default. This means that entities in the common edits in a spec are not included in device definitions.
* Also return an error from the nvcdi.New constructor instead of panicing.
* Detect XOrg libraries for injection and CDI spec generation.
* Add `nvidia-ctk system create-device-nodes` command to create control devices.
* Add `nvidia-ctk cdi transform` command to apply transforms to CDI specifications.
* Add `--vendor` and `--class` options to `nvidia-ctk cdi generate`
* [libnvidia-container] Fix segmentation fault when RPC initialization fails.
* [libnvidia-container] Build centos variants of the NVIDIA Container Library with static libtirpc v1.3.2.
* [libnvidia-container] Remove make targets for fedora35 as the centos8 packages are compatible.
* [toolkit-container] Add `nvidia-container-runtime.modes.cdi.annotation-prefixes` config option that allows the CDI annotation prefixes that are read to be overridden.
* [toolkit-container] Create device nodes when generating CDI specification for management containers.
* [toolkit-container] Add `nvidia-container-runtime.runtimes` config option to set the low-level runtime for the NVIDIA Container Runtime
## v1.13.0-rc.2
* Don't fail chmod hook if paths are not injected
* Only create `by-path` symlinks if CDI devices are actually requested.
* Fix possible blank `nvidia-ctk` path in generated CDI specifications
* Fix error in postun scriplet on RPM-based systems
* Only check `NVIDIA_VISIBLE_DEVICES` for environment variables if no annotations are specified.
* Add `cdi.default-kind` config option for constructing fully-qualified CDI device names in CDI mode
* Add support for `accept-nvidia-visible-devices-envvar-unprivileged` config setting in CDI mode
* Add `nvidia-container-runtime-hook.skip-mode-detection` config option to bypass mode detection. This allows `legacy` and `cdi` mode, for example, to be used at the same time.
* Add support for generating CDI specifications for GDS and MOFED devices
* Ensure CDI specification is validated on save when generating a spec
* Rename `--discovery-mode` argument to `--mode` for `nvidia-ctk cdi generate`
* [libnvidia-container] Fix segfault on WSL2 systems
* [toolkit-container] Add `--cdi-enabled` flag to toolkit config
* [toolkit-container] Install `nvidia-ctk` from toolkit container
* [toolkit-container] Use installed `nvidia-ctk` path in NVIDIA Container Toolkit config
* [toolkit-container] Bump CUDA base images to 12.1.0
* [toolkit-container] Set `nvidia-ctk` path in the
* [toolkit-container] Add `cdi.k8s.io/*` to set of allowed annotations in containerd config
* [toolkit-container] Generate CDI specification for use in management containers
* [toolkit-container] Install experimental runtime as `nvidia-container-runtime.experimental` instead of `nvidia-container-runtime-experimental`
* [toolkit-container] Install and configure mode-specific runtimes for `cdi` and `legacy` modes
## v1.13.0-rc.1
* Include MIG-enabled devices as GPUs when generating CDI specification
* Fix missing NVML symbols when running `nvidia-ctk` on some platforms [#49]
* Add CDI spec generation for WSL2-based systems to `nvidia-ctk cdi generate` command
* Add `auto` mode to `nvidia-ctk cdi generate` command to automatically detect a WSL2-based system over a standard NVML-based system.
* Add mode-specific (`.cdi` and `.legacy`) NVIDIA Container Runtime binaries for use in the GPU Operator
* Discover all `gsb*.bin` GSP firmware files when generating CDI specification.
* Align `.deb` and `.rpm` release candidate package versions
* Remove `fedora35` packaging targets
* [libnvidia-container] Include all `gsp*.bin` firmware files if present
* [libnvidia-container] Align `.deb` and `.rpm` release candidate package versions
* [toolkit-container] Install `nvidia-container-toolkit-operator-extensions` package for mode-specific executables.
* [toolkit-container] Allow `nvidia-container-runtime.mode` to be set when configuring the NVIDIA Container Toolkit
## v1.12.0
* Promote `v1.12.0-rc.5` to `v1.12.0`
* Rename `nvidia cdi generate``--root` flag to `--driver-root` to better indicate intent
* [libnvidia-container] Add nvcubins.bin to DriverStore components under WSL2
* [toolkit-container] Bump CUDA base images to 12.0.1
## v1.12.0-rc.5
* Fix bug here the `nvidia-ctk` path was not properly resolved. This causes failures to run containers when the runtime is configured in `csv` mode or if the `NVIDIA_DRIVER_CAPABILITIES` includes `graphics` or `display` (e.g. `all`).
## v1.12.0-rc.4
* Generate a minimum CDI spec version for improved compatibility.
* Add `--device-name-strategy` options to the `nvidia-ctk cdi generate` command that can be used to control how device names are constructed.
* Set default for CDI device name generation to `index` to generate device names such as `nvidia.com/gpu=0` or `nvidia.com/gpu=1:0` by default.
## v1.12.0-rc.3
* Don't fail if by-path symlinks for DRM devices do not exist
* Replace the --json flag with a --format [json|yaml] flag for the nvidia-ctk cdi generate command
* Ensure that the CDI output folder is created if required
* When generating a CDI specification use a blank host path for devices to ensure compatibility with the v0.4.0 CDI specification
* Add injection of Wayland JSON files
* Add GSP firmware paths to generated CDI specification
* Add --root flag to nvidia-ctk cdi generate command
## v1.12.0-rc.2
* Inject Direct Rendering Manager (DRM) devices into a container using the NVIDIA Container Runtime
* Improve logging of errors from the NVIDIA Container Runtime
* Improve CDI specification generation to support rootless podman
* Use `nvidia-ctk cdi generate` to generate CDI specifications instead of `nvidia-ctk info generate-cdi`
* [libnvidia-container] Skip creation of existing files when these are already mounted
## v1.12.0-rc.1
* Add support for multiple Docker Swarm resources
* Improve injection of Vulkan configurations and libraries
* Add `nvidia-ctk info generate-cdi` command to generated CDI specification for available devices
* [libnvidia-container] Include NVVM compiler library in compute libs
## v1.11.0
* Promote v1.11.0-rc.3 to v1.11.0
## v1.11.0-rc.3
* Build fedora35 packages
* Introduce an `nvidia-container-toolkit-base` package for better dependency management
* Fix removal of `nvidia-container-runtime-hook` on RPM-based systems
* Inject platform files into container on Tegra-based systems
* [toolkit container] Update CUDA base images to 11.7.1
* [libnvidia-container] Preload libgcc_s.so.1 on arm64 systems
## v1.11.0-rc.2
* Allow `accept-nvidia-visible-devices-*` config options to be set by toolkit container
* [libnvidia-container] Fix bug where LDCache was not updated when the `--no-pivot-root` option was specified
## v1.11.0-rc.1
* Add discovery of GPUDirect Storage (`nvidia-fs*`) devices if the `NVIDIA_GDS` environment variable of the container is set to `enabled`
* Add discovery of MOFED Infiniband devices if the `NVIDIA_MOFED` environment variable of the container is set to `enabled`
* Fix bug in CSV mode where libraries listed as `sym` entries in mount specification are not added to the LDCache.
* Rename `nvidia-container-toolkit` executable to `nvidia-container-runtime-hook` and create `nvidia-container-toolkit` as a symlink to `nvidia-container-runtime-hook` instead.
* Add `nvidia-ctk runtime configure` command to configure the Docker config file (e.g. `/etc/docker/daemon.json`) for use with the NVIDIA Container Runtime.
The NVIDIA Container Toolkit consists of the following artifacts:
- The NVIDIA Container Toolkit container
- Packages for debian-based systems
- Packages for rpm-based systems
# Release Process Checklist:
- [ ] Create a release PR:
- [ ] Run the `./hack/prepare-release.sh` script to update the version in all the needed files. This also creates a [release issue](https://github.com/NVIDIA/cloud-native-team/issues?q=is%3Aissue+is%3Aopen+label%3Arelease)
- [ ] Run the `./hack/generate-changelog.sh` script to generate the a draft changelog and update `CHANGELOG.md` with the changes.
- [ ] Create a PR from the created `bump-release-{{ .VERSION }}` branch.
- [ ] Merge the release PR
- [ ] Tag the release and push the tag to the `internal` mirror:
log.Panicf("Invalid value for config option '%v'; %v (supported: %v)\n",configName,config.SupportedDriverCapabilities,allSupportedDriverCapabilities.String())
}
returnconfig,nil
}
// getConfigOption returns the toml config option associated with the
log.Panicln("invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime instead.")
log.Panicln("invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime (e.g. specify the --runtime=nvidia flag) instead.")
@@ -85,3 +85,126 @@ Alternatively the NVIDIA Container Runtime can be set as the default runtime for
}
}
```
## Environment variables (OCI spec)
Each environment variable maps to an command-line argument for `nvidia-container-cli` from [libnvidia-container](https://github.com/NVIDIA/libnvidia-container).
These variables are already set in our [official CUDA images](https://hub.docker.com/r/nvidia/cuda/).
### `NVIDIA_VISIBLE_DEVICES`
This variable controls which GPUs will be made accessible inside the container.
#### Possible values
*`0,1,2`, `GPU-fef8089b` …: a comma-separated list of GPU UUID(s) or index(es).
*`all`: all GPUs will be accessible, this is the default value in our container images.
*`none`: no GPU will be accessible, but driver capabilities will be enabled.
*`void` or *empty* or *unset*: `nvidia-container-runtime` will have the same behavior as `runc`.
**Note**: When running on a MIG capable device, the following values will also be available:
*`0:0,0:1,1:0`, `MIG-GPU-fef8089b/0/1` …: a comma-separated list of MIG Device UUID(s) or index(es).
Where the MIG device indices have the form `<GPU Device Index>:<MIG Device Index>` as seen in the example output:
By default, all commands output to `STDOUT`, but specifying the `--output` flag writes the config to the specified file.
### Generate CDI specifications
The [Container Device Interface (CDI)](https://tags.cncf.io/container-device-interface) provides
a vendor-agnostic mechanism to make arbitrary devices accessible in containerized environments. To allow NVIDIA devices to be
used in these environments, the NVIDIA Container Toolkit CLI includes functionality to generate a CDI specification for the
available NVIDIA GPUs in a system.
In order to generate the CDI specification for the available devices, run the following command:\
```bash
nvidia-ctk cdi generate
```
The default is to print the specification to STDOUT and a filename can be specified using the `--output` flag.
The specification will contain a device entries as follows (where applicable):
* An `nvidia.com/gpu=gpu{INDEX}` device for each non-MIG-enabled full GPU in the system
* An `nvidia.com/gpu=mig{GPU_INDEX}:{MIG_INDEX}` device for each MIG-device in the system
* A special device called `nvidia.com/gpu=all` which represents all available devices.
For example, to generate the CDI specification in the default location where CDI-enabled tools such as `podman`, `containerd`, `cri-o`, or the NVIDIA Container Runtime can be configured to load it, the following command can be run:
Usage:"Generate CDI specifications for use with CDI-enabled runtimes",
Before:func(c*cli.Context)error{
returnm.validateFlags(c,&opts)
},
Action:func(c*cli.Context)error{
returnm.run(c,&opts)
},
}
c.Flags=[]cli.Flag{
&cli.StringSliceFlag{
Name:"config-search-path",
Usage:"Specify the path to search for config files when discovering the entities that should be included in the CDI specification.",
Destination:&opts.configSearchPaths,
},
&cli.StringFlag{
Name:"output",
Usage:"Specify the file to output the generated CDI specification to. If this is '' the specification is output to STDOUT",
Destination:&opts.output,
},
&cli.StringFlag{
Name:"format",
Usage:"The output format for the generated spec [json | yaml]. This overrides the format defined by the output file extension (if specified).",
Value:spec.FormatYAML,
Destination:&opts.format,
},
&cli.StringFlag{
Name:"mode",
Aliases:[]string{"discovery-mode"},
Usage:"The mode to use when discovering the available entities. One of [auto | nvml | wsl]. If mode is set to 'auto' the mode will be determined based on the system configuration.",
Value:nvcdi.ModeAuto,
Destination:&opts.mode,
},
&cli.StringFlag{
Name:"dev-root",
Usage:"Specify the root where `/dev` is located. If this is not specified, the driver-root is assumed.",
Destination:&opts.devRoot,
},
&cli.StringSliceFlag{
Name:"device-name-strategy",
Usage:"Specify the strategy for generating device names. If this is specified multiple times, the devices will be duplicated for each strategy. One of [index | uuid | type-index]",
Usage:"Specify the NVIDIA GPU driver root to use when discovering the entities that should be included in the CDI specification.",
Destination:&opts.driverRoot,
},
&cli.StringSliceFlag{
Name:"library-search-path",
Usage:"Specify the path to search for libraries when discovering the entities that should be included in the CDI specification.\n\tNote: This option only applies to CSV mode.",
Destination:&opts.librarySearchPaths,
},
&cli.StringFlag{
Name:"nvidia-cdi-hook-path",
Aliases:[]string{"nvidia-ctk-path"},
Usage:"Specify the path to use for the nvidia-cdi-hook in the generated CDI specification. "+
"If not specified, the PATH will be searched for `nvidia-cdi-hook`. "+
"NOTE: That if this is specified as `nvidia-ctk`, the PATH will be searched for `nvidia-ctk` instead.",
Destination:&opts.nvidiaCDIHookPath,
},
&cli.StringFlag{
Name:"ldconfig-path",
Usage:"Specify the path to use for ldconfig in the generated CDI specification",
Destination:&opts.ldconfigPath,
},
&cli.StringFlag{
Name:"vendor",
Aliases:[]string{"cdi-vendor"},
Usage:"the vendor string to use for the generated CDI specification.",
Value:"nvidia.com",
Destination:&opts.vendor,
},
&cli.StringFlag{
Name:"class",
Aliases:[]string{"cdi-class"},
Usage:"the class string to use for the generated CDI specification.",
Value:"gpu",
Destination:&opts.class,
},
&cli.StringSliceFlag{
Name:"csv.file",
Usage:"The path to the list of CSV files to use when generating the CDI specification in CSV mode.",
// config defines the options that can be set for the CLI through config files,
// environment variables, or command line config
typeconfigstruct{
dryRunbool
runtimestring
configFilePathstring
configSourcestring
modestring
hookFilePathstring
runtimeConfigOverrideJSONstring
nvidiaRuntimestruct{
namestring
pathstring
hookPathstring
setAsDefaultbool
}
// cdi-specific options
cdistruct{
enabledbool
}
}
func(mcommand)build()*cli.Command{
// Create a config struct to hold the parsed environment variables or command line flags
config:=config{}
// Create the 'configure' command
configure:=cli.Command{
Name:"configure",
Usage:"Add a runtime to the specified container engine",
Before:func(c*cli.Context)error{
returnm.validateFlags(c,&config)
},
Action:func(c*cli.Context)error{
returnm.configureWrapper(c,&config)
},
}
configure.Flags=[]cli.Flag{
&cli.BoolFlag{
Name:"dry-run",
Usage:"update the runtime configuration as required but don't write changes to disk",
Destination:&config.dryRun,
},
&cli.StringFlag{
Name:"runtime",
Usage:"the target runtime engine; one of [containerd, crio, docker]",
Value:defaultRuntime,
Destination:&config.runtime,
},
&cli.StringFlag{
Name:"config",
Usage:"path to the config file for the target runtime",
Destination:&config.configFilePath,
},
&cli.StringFlag{
Name:"config-mode",
Usage:"the config mode for runtimes that support multiple configuration mechanisms",
Destination:&config.mode,
},
&cli.StringFlag{
Name:"config-source",
Usage:"the source to retrieve the container runtime configuration; one of [command, file]\"",
Destination:&config.configSource,
Value:defaultConfigSource,
},
&cli.StringFlag{
Name:"oci-hook-path",
Usage:"the path to the OCI runtime hook to create if --config-mode=oci-hook is specified. If no path is specified, the generated hook is output to STDOUT.\n\tNote: The use of OCI hooks is deprecated.",
Destination:&config.hookFilePath,
},
&cli.StringFlag{
Name:"nvidia-runtime-name",
Usage:"specify the name of the NVIDIA runtime that will be added",
Value:defaultNVIDIARuntimeName,
Destination:&config.nvidiaRuntime.name,
},
&cli.StringFlag{
Name:"nvidia-runtime-path",
Aliases:[]string{"runtime-path"},
Usage:"specify the path to the NVIDIA runtime executable",
Value:defaultNVIDIARuntimeExecutable,
Destination:&config.nvidiaRuntime.path,
},
&cli.StringFlag{
Name:"nvidia-runtime-hook-path",
Usage:"specify the path to the NVIDIA Container Runtime hook executable",
Value:defaultNVIDIARuntimeHookExpecutablePath,
Destination:&config.nvidiaRuntime.hookPath,
},
&cli.BoolFlag{
Name:"nvidia-set-as-default",
Aliases:[]string{"set-as-default"},
Usage:"set the NVIDIA runtime as the default runtime",
Usage:"A utility to create symlinks to possible /dev/nv* devices in /dev/char",
Before:func(c*cli.Context)error{
returnm.validateFlags(c,&cfg)
},
Action:func(c*cli.Context)error{
returnm.run(c,&cfg)
},
}
c.Flags=[]cli.Flag{
&cli.StringFlag{
Name:"dev-char-path",
Usage:"The path at which the symlinks will be created. Symlinks will be created as `DEV_CHAR`/MAJOR:MINOR where MAJOR and MINOR are the major and minor numbers of a corresponding device node.",
Value:defaultDevCharPath,
Destination:&cfg.devCharPath,
EnvVars:[]string{"DEV_CHAR_PATH"},
},
&cli.StringFlag{
Name:"driver-root",
Usage:"The path to the driver root. `DRIVER_ROOT`/dev is searched for NVIDIA device nodes.",
RUNGOPATH=/artifacts go install github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime.experimental@${NVIDIA_CONTAINER_RUNTIME_EXPERIMENTAL_VERSION}
RUNGOPATH=/artifacts go install github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime.experimental@${NVIDIA_CONTAINER_RUNTIME_EXPERIMENTAL_VERSION}
# Hook for Project Atomic's fork of Docker: https://github.com/projectatomic/docker/tree/docker-1.13.1-rhel#add-dockerhooks-exec-custom-hooks-for-prestartpoststop-containerspatch
# This might not be useful on Amazon Linux, but it's simpler to keep the RHEL
# and Amazon Linux packages identical.
COPY oci-nvidia-hook $DIST_DIR/oci-nvidia-hook
# Hook for libpod/CRI-O: https://github.com/containers/libpod/blob/v0.8.5/pkg/hooks/docs/oci-hooks.5.md
# Hook for Project Atomic's fork of Docker: https://github.com/projectatomic/docker/tree/docker-1.13.1-rhel#add-dockerhooks-exec-custom-hooks-for-prestartpoststop-containerspatch
COPY oci-nvidia-hook $DIST_DIR/oci-nvidia-hook
# Hook for libpod/CRI-O: https://github.com/containers/libpod/blob/v0.8.5/pkg/hooks/docs/oci-hooks.5.md
# Hook for Project Atomic's fork of Docker: https://github.com/projectatomic/docker/tree/docker-1.13.1-rhel#add-dockerhooks-exec-custom-hooks-for-prestartpoststop-containerspatch
COPY oci-nvidia-hook $DIST_DIR/oci-nvidia-hook
# Hook for libpod/CRI-O: https://github.com/containers/libpod/blob/v0.8.5/pkg/hooks/docs/oci-hooks.5.md
tar -czvf ${PACKAGE_ROOT}/nvidia-container-toolkit_${VERSION#v}_deb_amd64.tar.gz ${PACKAGE_ROOT}/packages/ubuntu18.04/amd64/*_${PACKAGE_VERSION}-1_amd64.deb
tar -czvf ${PACKAGE_ROOT}/nvidia-container-toolkit_${VERSION#v}_deb_arm64.tar.gz ${PACKAGE_ROOT}/packages/ubuntu18.04/arm64/*_${PACKAGE_VERSION}-1_arm64.deb
tar -czvf ${PACKAGE_ROOT}/nvidia-container-toolkit_${VERSION#v}_rpm_aarch64.tar.gz ${PACKAGE_ROOT}/packages/centos7/aarch64/*-${PACKAGE_VERSION}-1.aarch64.rpm
tar -czvf ${PACKAGE_ROOT}/nvidia-container-toolkit_${VERSION#v}_rpm_x86_64.tar.gz ${PACKAGE_ROOT}/packages/centos7/x86_64/*-${PACKAGE_VERSION}-1.x86_64.rpm
is_command()(
command -v "$1" >/dev/null
)
hash_sha256()(
TARGET=${1:-/dev/stdin}
if is_command gsha256sum;then
hash=$(gsha256sum "$TARGET")||return1
echo"$hash"| cut -d ' ' -f 1
elif is_command sha256sum;then
hash=$(sha256sum "$TARGET")||return1
echo"$hash"| cut -d ' ' -f 1
elif is_command shasum;then
hash=$(shasum -a 256"$TARGET" 2>/dev/null)||return1
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.