If the config.toml has an empty root specified, this could be
passed to the NVIDIA Container CLI through the --root flag
which causes argument parsing to fail. This change only
adds the --root flag if the config option is specified
and is non-empty.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the nvidia-ctk config default command
generates a config file that is compatible with the official documentation
to, for example, disable cgroups in the NVIDIA Container CLI.
This requires that whitespace around comments is stripped before outputing the
contets.
This also adds an option to load a config and modify it in-place instead. This can
be triggered as a post-install step, for example.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This changes splits the functionality in the internal system package
into two packages: one for dealing with devices and one for dealing
with kernel modules. This removes ambiguity around the meaning of
driver / device roots in each case.
In each case, a root can be specified where device nodes are created
or kernel modules loaded.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a --create-device-nodes option to the
nvidia-ctk system create-dev-char-symlinks command to create
device nodes. The currently only creates control device nodes.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
These changes add a --load-kernel-modules option to the
nvidia-ctk system commands. If specified the NVIDIA kernel modules
(nvidia, nvidia-uvm, and nvidia-modeset) are loaded before any
operations on device nodes are performed.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-container-runtime-hook.path config option
to allow the path used for the prestart hook to be overridden. This
is useful in cases where multiple NVIDIA Container Toolkit installations
are present.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a symlinks.Resolve function for resolving symlinks and
updates usages across the code to make use of it.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This chagne allows the csv mode option to specified in the
nvidia-ctk cdi generate command and adds a --csv.file option
that can be repeated to specify the CSV files to be processed.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The nvcid api is extended to allow for merged device options to
be specified. If any options are specified, then a merged device
is generated.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a CLI command to generate a default config.
This config checks the host operating system to apply specific
modifications that were previously captured in static config
files.
These include:
* select /sbin/ldconfig or /sbin/ldconfig.real depending on which exists on the host
* set the user to allow device access on SUSE-based systems
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change renames the struct for storing CLI flag values options over
config to avoid a conflict with the config package.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Generate CDI specifications with 644 permissions to allow non-root clients to consume them
See merge request nvidia/container-toolkit/container-toolkit!381
By default, temporary files are created with permissions 600 and
this means that the files created when updating the ldcache are
not readable in non-root containers.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-ctk system create-device-nodes command for
creating NVIDIA device nodes. Currently this is limited to control devices
(nvidia-uvm, nvidia-uvm-tools, nvidia-modeset, nvidiactl).
A --dry-run mode is included for outputing commands that would be executed and
the driver root can be specified.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows nvcdi.New to return an error in addition to the
constructed library instead of panicing.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
CDI generation modes such as management and wsl don't require
NVML. This change removes the top-level instantiation of nvmllib
and replaces it with an instanitation in the nvml CDI spec generation
code.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change generates device folder permission hooks per device instead of
at a spec level. This ensures that the hook is not injected for a device that
does not have any nested device nodes.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
These changes add a wsl discovery mode to the nvidia-ctk cdi generate command.
If wsl mode is enabled, the driver store for the available devices is used as
the source for discovered entities.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds --discovery-mode flag to the nvidia-ctk cdi generate
command and plumbs this through to the CDI API.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvcdi package that exposes a basic API for
CDI spec generation. This is used from the nvidia-ctk cdi generate
command and can be consumed by DRA implementations and the device plugin.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-container-runtime.cdi executable that
overrides the runtime mode from the config to "cdi".
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This makes the intent of the command line argument clearer since this
relates specifically to the root where the NVIDIA driver is installed.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change uses the `index` mode for the --device-name-strategy when
generating CDI specifications by default. This generates device names
such as nvidia.com/gpu=0 or nvidia.com/gpu=1:0 by default.
Note that this requires a CDI spec version of 0.5.0 and for consumers
(e.g. podman) that are only compatible with older versions one of the
other stragegies (`type-index` or `uuid`) should be used instead to
generate a v0.3.0 or v0.4.0 specification.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a --create-all mode to the create-dev-char-symlinks hook.
This mode creates all POSSIBLE symlinks to device nodes for regular and cap
devices. With the number of GPUs inferred from the PCI device information.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a --watch option to the create-dev-char-symlinks hook. This
installs an fsnotify watcher that creates symlinks for ADDED device nodes under
/dev/char.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-ctk hook create-dev-char-symlinks
subcommand that creates symlinks to device nodes (as required by
systemd) under /dev/char.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a --device-name-strategy flag for generating a CDI
specificaion. This allows a CDI spec to be generated with the following
names used for device:
* type-index: gpu0 and mig0:1
* index: 0 and 0:1
* uuid: GPU and MIG UUIDs
Note that the use of 'index' generates a v0.5.0 CDI specification since
this relaxes the restriction on the device names.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change uses functionality from the CDI package to determine
the minimum required CDI spec version. This allows for a spec with
the widest compatibility to be specified.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change implements the discovery of versioned driver libaries
by reusing the mounts and update ldcache discoverers use for, for example,
CVS file discovery. This allows the container paths to be correctly generated
without requiring specific manipulation.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an --nvidia-ctk-path to the nvidia-ctk cdi generate
command. This ensures that the executable path for the generated
hooks can be specified consistently.
Since the NVIDIA Container Runtime already allows for the executable
path to be specified in the config the utility code to update the
LDCache and create other nvidia-ctk hooks are also updated.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change refactors the generation of CDI specifications
to use discoverers and generate the CDI specifications from these
discoverers. This allows for better reuse.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change replaces the `--json` flag of the nvidia-ctk cdi generate
command with a --format flag that accepts a string format of either
json or yaml.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change extends the support for multiple envvars when
specifying swarm resources to consider ALL of the specified
environment variables instead of the first match.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a Devices abstraction to the CUDA image utilities. This
allows for checking whether a devices is selected, for example.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change generates one or more createContainer hooks for ensuring
that subfolders in /dev have the required permissions in the container.
As an example, a user requires read permissions to the /dev/nvidia-caps
in addition to including the specific caps devices under this folder.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-ctk hook chmod command that can be used
to update the permissions for paths in the container.
This prepends the container root to the paths to allow these to be
updated by runtime executables.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the CDI spec mounts the ipc sockets with the
noexec flag to allow these to function in rootless mode with podman.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates the docker config update for simplicitly.
This also allows for the API to match the crio update code.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This adds support for updating crio configs (instead of installing hooks)
and adds crio support to the nvidia-ctk runtime configure command.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change includes meta devices (e.g. /dev/nvidiactl) in the
generated CDI spec. Missing device nodes are ignored.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change generates a v0.4.0 CDI spec instead of a v0.5.0 spec.
This allows older versions of podman, for example, to be used.
This requires that the device names do not start on a numeric character
and that the HostPath for a device is unspecified.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows the swarm-resource config option to specify a
comma-separated list of environment variables instead of a single
environment variable.
The first environment variable matched is considered and other
environment variables are ignored.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds functionality to generate CDI specifications
for all devices detected on the system. A specification containing
all GPUs and MIG devices is generated. All libraries on the host
ldcache that have an NVIDIA Driver Version suffix are included as
are the required binaries and IPC sockets.
A hook (based on the nvidia-ctk hook subcommand) to update the ldcache
in the container for the libraries being injected is also added to the
CDI specificiation.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows the NVIDIA Container Runtime to inject vulkan
loaders and libraries by modifying the OCI runtime specification.
This allows vulkan applications to run in containers without
additional modifications.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a modifier to that injects the tegra platform files
* /etc/nv_tegra_release
* /sys/devices/soc0/family
allowing these files to be used for platform detection in a containerized
context such as the GPU device plugin.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change improves the error message when invoking the NVIDIA
Runtime Hook in non-legacy mode. This should guide users to specifying
the --runtime=nvidia flag when using docker.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a `runtime configure` command to the nvidia-ctk CLI. This
command is currently limited to configuring the docker config on the
system by modifying the daemon.json config file associated with docker.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change renames the nvidia-container-toolkit executable
to nvidia-container-runtime-hook. Here nvidia-container-toolkit
is created as a symlink to nvidia-container-runtime-hook.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change creates GDS and MOFED modifiers and adds them to the
modifer created for the selected runtime mode if the NVIDIA_GDS
and NVIDIA_MOFED envvars are set to "enabled", respectively.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change uses modifier compositioning and the discoverModifier to
refactor the existing CSV modifier.
This change adds a discoverModifier to the internal/modifier package and
refactors the CSV modifier to use this abstraction.
Signed-off-by: Evan Lezar <elezar@nvidia.com>