Commit Graph

82 Commits

Author SHA1 Message Date
Evan Lezar
f63ad3d9e7 Refactor symlink filter
This change refactors the use of the symlink filter to make it extendible.
A blocked filter can be set on the Tegra CSV discoverer to ensure that the correct
symlink libraries are filtered out. Here, globs can be used to select mulitple libraries,
and a **/ prefix on the globs indicates that the pattern that follows is only applied to
the filename of the symlink entry in the CSV file.

A --csv.ignore-pattern command line argument is added to the nvidia-ctk cdi generate
command that allows this to be set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:04:06 +02:00
Evan Lezar
c4b4478d1a Remove default symlink filter
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:02:51 +02:00
Evan Lezar
be570fce65 Bump version to 1.14.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:02:28 +02:00
Evan Lezar
7187608a36
Update libnvidia-container 2023-09-07 15:55:06 +00:00
Evan Lezar
56dd69ff1c Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-07 17:35:48 +02:00
Evan Lezar
4ec9bd751e Add required option to new toml config
This change adds a "required" option to the new toml config
that controls whether a default config is returned or not.
This is useful from the NVIDIA Container Runtime Hook, where
/run/driver/nvidia/etc/nvidia-container-runtime/config.toml
is checked before the standard path.

This fixes a bug where the default config was always applied
when this config was not used.

See https://github.com/NVIDIA/nvidia-container-toolkit/issues/106

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-07 11:56:01 +02:00
Evan Lezar
d74f7fef4e Update libnvidia-container to fix rpm builds
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-07 11:55:46 +02:00
Evan Lezar
538d4020df Bump version to v1.14.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-06 17:51:52 +02:00
Evan Lezar
2bf8017516 Bump verison to v1.14.0
Note that v1.14.0-rc.3 was an internal-only release.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-29 16:03:41 +02:00
Evan Lezar
1dc028cdf2 Add UsesNVGPUModule info function
This change adds a UsesNVGPUModule function that checks whether the nvgpu
kernel module is used by NVML. This allows for more robust detection of
Tegra-based platforms where libnvidia-ml.so is supported to enumerate the
iGPU.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-25 11:24:34 +02:00
Evan Lezar
546f810159 Update go-nvlib dependency to v0.0.0-20230818092907-09424fdc8884
This change updates go-nvlib to include logic to skip NVIDIA PCI-E
devices where the name or class id is not known.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-18 11:40:15 +02:00
Evan Lezar
7221b6b24b Set NVIDIA_VISIBLE_DEVICES=void in toolkit-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-15 11:52:56 +02:00
Evan Lezar
86df7c6696 Add library-search-path option to cdi generate
This change renames the csv.library-search-path option to
library-search-path so as to be more generally applicable in
future. Note that the option is still only applied in csv mode.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 15:04:33 +02:00
Evan Lezar
4addb292b1 Extend nvidia-ctk config command to allow options to be set
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 11:33:26 +02:00
Evan Lezar
4dcaa61167 Use internal/config structs in hook
This change ensures that the Config structs from internal.Config
are used for the NVIDIA Container Runtime Hook config too.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:40:41 +02:00
Evan Lezar
65f6f46846 Remove installation of oci-nvidia-hook files in RPM packages
This change removes installation of the oci-nvidia-hook files.
These files conflict with CDI use in runtimes that support it.

The use of the hook should be considered deprecated on these platforms.

If a hook is required, the

nvidia-ctk runtime configure --config-mode=oci-hook

command should be used to create the hook file(s).

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-11 16:34:58 +02:00
Evan Lezar
f6a4986c15 Add support for creating oci hook to nvidia-ctk
This change extends the nvidia-ctk runtime configure command
with a --config-mode=oci-hook that creates an OCI hook json file.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-11 16:34:58 +02:00
Evan Lezar
7c2c42b8da Bump version to 1.14.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 15:12:29 +02:00
Evan Lezar
a883c65dd6 Fix changelog for 1.14.0-rc.2 2023-07-17 12:04:38 +00:00
Evan Lezar
e25576d26d Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-13 14:15:33 +02:00
Evan Lezar
1081cecea9 Return empty requirements if NVIDIA_DISABLE_REQUIRE is true
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 13:47:37 +02:00
Evan Lezar
81908c8cc9 Search custom firmware paths first
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 10:34:14 +02:00
Evan Lezar
0938576618 Remove NVIDIA experimental runtime from toolkit container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-10 11:44:55 +02:00
Evan Lezar
6fac6c237b Bump cuda base image to 12.2.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:28:32 +02:00
Evan Lezar
bc6ca7ff88 Generate default config post-install
The debian and rpm packages are updated to trigger the generation of
of a default config if no config exists at the expected location.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:26:53 +02:00
Evan Lezar
6342dae0e9 Ensure that parent directories exist for config files
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-03 15:30:31 +02:00
Evan Lezar
3dee9d9a4c Support OpenSSL 3 with the Encrypt/Decrypt library
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-06 16:47:21 +02:00
Evan Lezar
78f619b1e7 Bump CUDA baseimage version to 12.1.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-01 14:42:30 +02:00
Evan Lezar
eca13e72bf Update CHANGELOG
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 19:33:31 +02:00
Evan Lezar
ac11727ec5 Add nvidia-contianer-runtime-hook.path config option
This change adds an nvidia-container-runtime-hook.path config option
to allow the path used for the prestart hook to be overridden. This
is useful in cases where multiple NVIDIA Container Toolkit installations
are present.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-25 12:05:33 +02:00
Evan Lezar
e11f65e51e Update go-nvlib to skip non-MIG devices
This change updates go-nvlib to ensure that non-migcapable GPUs
are skipped when generating CDI specifications for MIG devices.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 15:36:55 +02:00
Evan Lezar
fe37196788 Generate all device using merged transform
The nvcid api is extended to allow for merged device options to
be specified. If any options are specified, then a merged device
is generated.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-12 13:52:58 +02:00
Evan Lezar
3945abb2f2 Add nvidia-ctk cdi list command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-09 19:59:00 +02:00
Evan Lezar
3056428eda Generate spec file with 644 permissions
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-02 16:47:44 +02:00
Evan Lezar
d77f46aa09 Create ld.so.conf file with permissions 644
By default, temporary files are created with permissions 600 and
this means that the files created when updating the ldcache are
not readable in non-root containers.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-02 12:51:27 +02:00
Evan Lezar
70920d7a04 Add support for containerd to the runtime configure CLI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-24 18:32:28 +02:00
Evan Lezar
fa9c6116a4 Bump version to 1.14.0-rc.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-24 17:33:04 +02:00
Evan Lezar
fc7c8f7520 Resolve all symlinks in ldcache
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-21 17:28:49 +02:00
Evan Lezar
2136266d1d Make discovery of Xorg libraries optional
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-13 18:41:38 +02:00
Evan Lezar
29c6288128 Only update ldcache if it exists
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-13 17:18:09 +02:00
Evan Lezar
7f6c9851fe Bump version to 1.13.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-13 13:11:09 +02:00
Evan Lezar
622a0649ce Bump version to v1.13.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-31 11:35:53 +02:00
Evan Lezar
c46b118f37 Add nvidia-container-runtime.modes.cdi.annotation-prefixes config option.
This change adds an nvidia-container-runtime.modes.cdi.annotation-prefixes config
option that defaults to cdi.k8s.io/. This allows the annotation prefixes parsed
for CDI devices to be overridden in cases where CDI support in container engines such
as containerd or crio need to be overridden.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-27 16:36:54 +02:00
Evan Lezar
c13c6ebadb Inject xorg libs and config in container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-26 17:04:06 +02:00
Evan Lezar
ff2767ee7b Reorder changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-26 17:03:05 +02:00
Evan Lezar
a3ee58a294 Reorder changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-26 16:51:59 +02:00
Evan Lezar
226c54613e Also return an error from nvcdi.New
This change allows nvcdi.New to return an error in addition to the
constructed library instead of panicing.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-26 16:13:12 +02:00
Evan Lezar
33f6fe0217 Generate a simplified CDI spec by default
As simplified CDI spec has no duplicate entities in any single set of container edits.
Furthermore, contianer edits defined at a spec-level are not included in the container
edits for a device.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-24 11:01:46 +02:00
Evan Lezar
5ff206e1a9 Add transform to deduplicate entities in CDI spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-24 11:01:23 +02:00
Evan Lezar
9506bd9da0 Fix generation of management CDI spec in containers
Since we relied on finding libcuda.so in the LDCache to determine both the CUDA
version and the expected directory for the driver libraries, the generation of the
management CDI specifications fails in containers where the LDCache has not been updated.

This change falls back to searching a set of predefined paths instead when the lookup of
libcuda.so in the cache fails.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-23 15:59:01 +02:00