This patch modifies the the container toolkit installer, used by the
GPU operator, to use the new Go wrapper program.
Signed-off-by: Jean-Francois Roy <jeroy@nvidia.com>
Some platforms and Kubernetes distributions do not include a shell. This
patch replaces the shell wrapper scripts with a small Go program.
Signed-off-by: Jean-Francois Roy <jeroy@nvidia.com>
This change adds basic toolkit installation unit tests. This required
that the source for files be specified when installing to allow for
a testdata folder to be used.
This replaces the currently unused shell-based tests in /test/container.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the toolkit works with older
versions of the GPU Operator where runtime-specific envvars are
used to set options such as the config file location.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change enables opt-in (off-by-default) features to be opted into.
These features can be toggled by name by specifying the (repeated)
--opt-in-features command line argument or as a comma-separated list
in the NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES environment variable.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
add default runtime binary path to runtimes field of toolkit config toml
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
[no-relnote] Get low-level runtimes consistently
We ensure that we use the same low-level runtimes regardless
of the runtime engine being configured. This ensures consistent
behaviour.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Co-authored-by: Evan Lezar <elezar@nvidia.com>
address review comment
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
This change converts the toolkit installation logic to a go package
and invokes this installation over the go API instead of starting
this executable.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change refactors the toml config file handlig for runtimes
such as containerd or crio. A toml.Loader is introduced that
encapsulates loading the required file.
This can be extended to allow other mechanisms for loading
loading the current config.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change enables opt-in (off-by-default) features to be opted into.
These features can be toggled by name by specifying the (repeated)
--opt-in-feature command line argument or as a comma-separated list
in the NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES environment variable.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates the logic to populate the options for the
nvidia runtime configs added to containerd or crio from a default runtime
if this is specified and a runc entry is not found.
This allows the default runtime values for settings such as SystemdCgroup
to be applied correctly.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change makes the following changes:
* Allows the toolkit.pid path to be specified
* Creates the toolkit.pid file at /run/nvidia/toolkit/toolkit.pid by default
* Handles failures to remove the /run/nvidia/toolkit folder
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This changes adds an option to the toolkit container to allow
the dev root to be specified. This adds support for driver installations
where the driver files are at one root and the dev nodes are created
elsewhere -- most typically at /. This is the case, for example, for
GKE driver installations.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change creates an nvidia-cdi-hook binary for implementing
CDI hooks. This allows for these hooks to be separated from the
nvidia-ctk command which may, for example, require libnvidia-ml
to support other functionality.
The nvidia-ctk hook subcommand is maintained as an alias for the
time being to allow for existing CDI specifications referring to
this path to work as expected.
Signed-off-by: Avi Deitcher <avi@deitcher.net>
This change adds a --create-device-nodes option to the toolkit config CLI.
Most noteably, this allows the creation of control devices to be skipped
when CDI spec generation is enabled.
Currently values of "", "node", and "control" are supported and can be set
via the command line flag or the CREATE_DEVICE_NODES environment variable.
The default value of CREATE_DEVICE_NODES=control will trigger the creation
of control device nodes. Setting this envvar to include the (comma-separated)
strings of "" or "none" will disable device node creation regardless of
whether other supported strings are included.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that CLI tools that require the path to the
driver root accept both the NVIDIA_DRIVER_ROOT and DRIVER_ROOT
environment variables in addition to the --driver-root command
line argument.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change renames the root transformer to indicate that it
operates on host paths and adds a container root transformer for
explicitly transforming container roots.
The transform.NewRootTransformer constructor still exists, but has
been marked as deprecated.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Instead of relying solely on a static config, we resolve the path
to ldconfig. The path is checked for existence and a .real suffix is preferred.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change extends the nvidia-ctk runtime configure command
with a --config-mode=oci-hook that creates an OCI hook json file.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This changes splits the functionality in the internal system package
into two packages: one for dealing with devices and one for dealing
with kernel modules. This removes ambiguity around the meaning of
driver / device roots in each case.
In each case, a root can be specified where device nodes are created
or kernel modules loaded.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Thsi change adds the --nvidia-runtime-dir as a command line
argument when configuring container runtimes in the toolkit container.
This removes the need to set it via the command line.
Signed-off-by: Evan Lezar <elezar@nvidia.com>