This change allows nvcdi.New to return an error in addition to the
constructed library instead of panicing.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
CDI generation modes such as management and wsl don't require
NVML. This change removes the top-level instantiation of nvmllib
and replaces it with an instanitation in the nvml CDI spec generation
code.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change generates device folder permission hooks per device instead of
at a spec level. This ensures that the hook is not injected for a device that
does not have any nested device nodes.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
These changes add a wsl discovery mode to the nvidia-ctk cdi generate command.
If wsl mode is enabled, the driver store for the available devices is used as
the source for discovered entities.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds --discovery-mode flag to the nvidia-ctk cdi generate
command and plumbs this through to the CDI API.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvcdi package that exposes a basic API for
CDI spec generation. This is used from the nvidia-ctk cdi generate
command and can be consumed by DRA implementations and the device plugin.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-container-runtime.cdi executable that
overrides the runtime mode from the config to "cdi".
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This makes the intent of the command line argument clearer since this
relates specifically to the root where the NVIDIA driver is installed.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change uses the `index` mode for the --device-name-strategy when
generating CDI specifications by default. This generates device names
such as nvidia.com/gpu=0 or nvidia.com/gpu=1:0 by default.
Note that this requires a CDI spec version of 0.5.0 and for consumers
(e.g. podman) that are only compatible with older versions one of the
other stragegies (`type-index` or `uuid`) should be used instead to
generate a v0.3.0 or v0.4.0 specification.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a --create-all mode to the create-dev-char-symlinks hook.
This mode creates all POSSIBLE symlinks to device nodes for regular and cap
devices. With the number of GPUs inferred from the PCI device information.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a --watch option to the create-dev-char-symlinks hook. This
installs an fsnotify watcher that creates symlinks for ADDED device nodes under
/dev/char.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-ctk hook create-dev-char-symlinks
subcommand that creates symlinks to device nodes (as required by
systemd) under /dev/char.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a --device-name-strategy flag for generating a CDI
specificaion. This allows a CDI spec to be generated with the following
names used for device:
* type-index: gpu0 and mig0:1
* index: 0 and 0:1
* uuid: GPU and MIG UUIDs
Note that the use of 'index' generates a v0.5.0 CDI specification since
this relaxes the restriction on the device names.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change uses functionality from the CDI package to determine
the minimum required CDI spec version. This allows for a spec with
the widest compatibility to be specified.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change implements the discovery of versioned driver libaries
by reusing the mounts and update ldcache discoverers use for, for example,
CVS file discovery. This allows the container paths to be correctly generated
without requiring specific manipulation.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an --nvidia-ctk-path to the nvidia-ctk cdi generate
command. This ensures that the executable path for the generated
hooks can be specified consistently.
Since the NVIDIA Container Runtime already allows for the executable
path to be specified in the config the utility code to update the
LDCache and create other nvidia-ctk hooks are also updated.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change refactors the generation of CDI specifications
to use discoverers and generate the CDI specifications from these
discoverers. This allows for better reuse.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change replaces the `--json` flag of the nvidia-ctk cdi generate
command with a --format flag that accepts a string format of either
json or yaml.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change extends the support for multiple envvars when
specifying swarm resources to consider ALL of the specified
environment variables instead of the first match.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a Devices abstraction to the CUDA image utilities. This
allows for checking whether a devices is selected, for example.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change generates one or more createContainer hooks for ensuring
that subfolders in /dev have the required permissions in the container.
As an example, a user requires read permissions to the /dev/nvidia-caps
in addition to including the specific caps devices under this folder.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-ctk hook chmod command that can be used
to update the permissions for paths in the container.
This prepends the container root to the paths to allow these to be
updated by runtime executables.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the CDI spec mounts the ipc sockets with the
noexec flag to allow these to function in rootless mode with podman.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates the docker config update for simplicitly.
This also allows for the API to match the crio update code.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This adds support for updating crio configs (instead of installing hooks)
and adds crio support to the nvidia-ctk runtime configure command.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change includes meta devices (e.g. /dev/nvidiactl) in the
generated CDI spec. Missing device nodes are ignored.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change generates a v0.4.0 CDI spec instead of a v0.5.0 spec.
This allows older versions of podman, for example, to be used.
This requires that the device names do not start on a numeric character
and that the HostPath for a device is unspecified.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows the swarm-resource config option to specify a
comma-separated list of environment variables instead of a single
environment variable.
The first environment variable matched is considered and other
environment variables are ignored.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds functionality to generate CDI specifications
for all devices detected on the system. A specification containing
all GPUs and MIG devices is generated. All libraries on the host
ldcache that have an NVIDIA Driver Version suffix are included as
are the required binaries and IPC sockets.
A hook (based on the nvidia-ctk hook subcommand) to update the ldcache
in the container for the libraries being injected is also added to the
CDI specificiation.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows the NVIDIA Container Runtime to inject vulkan
loaders and libraries by modifying the OCI runtime specification.
This allows vulkan applications to run in containers without
additional modifications.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a modifier to that injects the tegra platform files
* /etc/nv_tegra_release
* /sys/devices/soc0/family
allowing these files to be used for platform detection in a containerized
context such as the GPU device plugin.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change improves the error message when invoking the NVIDIA
Runtime Hook in non-legacy mode. This should guide users to specifying
the --runtime=nvidia flag when using docker.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a `runtime configure` command to the nvidia-ctk CLI. This
command is currently limited to configuring the docker config on the
system by modifying the daemon.json config file associated with docker.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change renames the nvidia-container-toolkit executable
to nvidia-container-runtime-hook. Here nvidia-container-toolkit
is created as a symlink to nvidia-container-runtime-hook.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change creates GDS and MOFED modifiers and adds them to the
modifer created for the selected runtime mode if the NVIDIA_GDS
and NVIDIA_MOFED envvars are set to "enabled", respectively.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change uses modifier compositioning and the discoverModifier to
refactor the existing CSV modifier.
This change adds a discoverModifier to the internal/modifier package and
refactors the CSV modifier to use this abstraction.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds version output to the nvidia-continer-runtime,
nvidia-container-toolkit, and nvidia-ctk CLIs. The same version
is used in all cases and includes a version string and a git
revision if set.
The construction of the version string mirrors what is done in runc.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
These changes replace the nvidia-container-runtime config options
experimental and discover-mode with a single mode config option.
Note that mode is now a string with a default value of "auto"
and a mode value of "legacy" is equivalent to experimental == false.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates the create-symlinks hook to also create symlinks for
libcuda.so, libGLX_indirect.so.0, and libnvidia-opticalflow.so
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change processes and supports runc logging command line arguments.
This allows for better integration into container engines such as
docker.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This also removes a test that invokes nvidia-container-runtime run --bundle
expecting an error. This test is no longer valid since this command line
is forwared to runc where the error should be detected.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a DefaultExecutableDir = /usr/bin constant that is used
to construct default paths for executables instead of specifying these
explicitly.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a discovered hook for updating the ldcache as a container-create
hook. The mounts from a discoverer are inspected to determine the folders that must
be added to the cache using the nvidia-ctk hook update-ldcache command.
This is added to the "csv" discovery mode for the experimental runtime.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-ctk CLI that is used as the basis for
utilities related to the NVIDIA Container Toolkit.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an 'auto' discover mode that attempts to select the correct mode
for a given platform. This currently attempts to detect whether the platform is a
Tegra-based system in which case the 'csv' discover mode is used. The 'legacy'
discover mode is used as the fallback.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that by default, the CSV discovery only considers the base CSV
files (l4t.csv, drivers.csv, devices.csv) and skips the rest unless the
NVIDIA_REQUIRE_JETPACK is set to "csv-mounts=all", in which case, all CSV files in the
specified folder are considered.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds support for a "csv" discovery mode to the experimental runtime.
If this is set with experimental = true, a CSV-based discovery of devices and
mounts are used to define the modifications required to the OCI spec. The edits
are expressed as CDI ContainerEdits.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change enables the experimental mode of the NVIDIA Container Runtime. If
enabled, the nvidia-container-runtime.discover-mode config option is
queried to determine how required OCI spec modifications should be defined.
If "legacy" is selected, the existing NVIDIA Container Runtime hooks is
discovered and injected into the OCI spec.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an experimental option to the NVIDIA Container Runtime config. To
simplify the extension of this experimental mode in future an error is raised if
this is enabled.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change moves the code defining the insertion of the nvidia-container-runtime
hook to a separate package. This allows for better distinction between the existing
and experimental modifications.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change imports the modifying runtime abstraction from the
experimental branch. This encapsulates the checks for whether
modification is required, and forwards the loaded spec to
the specified modifier. This allows for the same code to be
reused when performing more complex modifications.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change removes unneeded logging and renames the return error value to rerr
to avoid it being aliased by local error values.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds support for a supported-driver-capabilities config
option in the config.toml file that allows the driver capabilities
associated with the NVIDIA_DRIVER_CAPABILITIES=all environment variable.
This can be used on platforms such as Jetson to remove unsupported
capabilities such as "ngx".
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change copies the cmd/nvidia-container-runtime, internal, and test
folders from github.com/NVIDIA/nvidia-container-runtime@8a63b4b34f3ce3b4167f0516aa3f7207ca280dfb
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds support for the NVIDIA_FABRIC_DEVICES envvar. The (non-empty)
value of this envvar is passed to the NVIDIA Container CLI using the --fabric-device
command line flag and allows for nvswitch and nvlink devices to be mounted
into the container.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change moves the pkg folder to `cmd/nvidia-container-toolkit` to
better match go best practices. This allows, for example, for the
`cmd/nvidia-container-toolkit` to be go installed.
The only package included in `pkg` was `main`.
Signed-off-by: Evan Lezar <elezar@nvidia.com>