Commit Graph

277 Commits

Author SHA1 Message Date
Evan Lezar
2a3afdd5d9 Merge branch 'fix-platform-detection' into 'main'
Add UsesNVGPUModule info function

See merge request nvidia/container-toolkit/container-toolkit!473
2023-08-28 15:58:41 +00:00
Evan Lezar
1dc028cdf2 Add UsesNVGPUModule info function
This change adds a UsesNVGPUModule function that checks whether the nvgpu
kernel module is used by NVML. This allows for more robust detection of
Tegra-based platforms where libnvidia-ml.so is supported to enumerate the
iGPU.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-25 11:24:34 +02:00
Evan Lezar
ca1055588d Remove /sys/devices/soc0/family from CDI spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-25 10:25:33 +02:00
Tariq Ibrahim
f904ec41eb Merge branch 'log-unresolved-devices' into 'main'
add a warning statement listing unresolved CDI devices

See merge request nvidia/container-toolkit/container-toolkit!461
2023-08-14 17:22:36 +00:00
Evan Lezar
4addb292b1 Extend nvidia-ctk config command to allow options to be set
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 11:33:26 +02:00
Evan Lezar
a69657dde7 Add config.Toml type to handle config files
This change introduced a config.Toml type that is used as the base for
config file processing and manipulation. This ensures that configs --
including commented values -- can be handled consistently.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 11:32:54 +02:00
Evan Lezar
c2d4de54b0 Add function to get config file path. 2023-08-14 11:32:54 +02:00
Evan Lezar
b18ac09f77 Refactor handling of DriverCapabilities
This change consolidates the handling of NVIDIA_DRIVER_CAPABILITIES in the
interal/image package.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:40:42 +02:00
Evan Lezar
4dcaa61167 Use internal/config structs in hook
This change ensures that the Config structs from internal.Config
are used for the NVIDIA Container Runtime Hook config too.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:40:41 +02:00
Evan Lezar
8bf52e1dec Export config.GetDefault function
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:35:33 +02:00
Tariq Ibrahim
6d3b29f3ca add a warning statement listing unresolved CDI devices 2023-08-10 08:38:33 -07:00
Evan Lezar
8553fce68a Specify library search paths for CSV CDI spec generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
03a4e2f8a9 Skip symlinks to libraries
In order to properly handle systems with both iGPU and dGPU
drivers included, we skip "sym" mount specifications which
refer to .so or .so.[1-9] files.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
918bd03488 Move tegra-specifics to new package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
01a7f7bb8e Explicitly generate CDI spec for CSV mode
This change explicitly generates a CDI specification from
the supplied CSV files when cdi mode is detected. This
ensures consistency between the behaviour on Tegra-based
systems.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
6b48cbd1dc Move CDI modifier to separate package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
ec63533eb1 Ensure default config comments are consistent
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-19 14:37:49 +02:00
Evan Lezar
ce7d5f7a51 Use functional options when constructing direcory locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:36:03 +02:00
Evan Lezar
9b64d74f6a Use functional options when constructing Symlink locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:31:15 +02:00
Evan Lezar
cca343abb0 Pass image when constructing CSV modifier
Since the incoming OCI spec has already been parsed and used to
construct a CUDA image representation, pass this to the CSV
modifier constructor instead of re-creating an image representation.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:27:16 +02:00
Evan Lezar
e2f8d2a15f Set default spec dirs at config level
This change sets the default CDI spec dirs at a config level instead
of when a CDI runtime modifier is constructed. This makes this setting
consistent with other options such as the nvidia-ctk path.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:23:09 +02:00
Evan Lezar
481000b4ce Remove unused argument
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:20:24 +02:00
Evan Lezar
083b789102 Use cdi parser package for IsQualiedName
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:16:25 +02:00
Evan Lezar
6750ce1667 Print invalid version on parse error
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 13:47:39 +02:00
Evan Lezar
1081cecea9 Return empty requirements if NVIDIA_DISABLE_REQUIRE is true
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 13:47:37 +02:00
Evan Lezar
0938576618 Remove NVIDIA experimental runtime from toolkit container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-10 11:44:55 +02:00
Evan Lezar
f78d3a858f Rework default config generation to not use toml
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:26:55 +02:00
Evan Lezar
65ae6f1dab Fix generation of default config
This change ensures that the nvidia-ctk config default command
generates a config file that is compatible with the official documentation
to, for example, disable cgroups in the NVIDIA Container CLI.

This requires that whitespace around comments is stripped before outputing the
contets.

This also adds an option to load a config and modify it in-place instead. This can
be triggered as a post-install step, for example.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:26:04 +02:00
Evan Lezar
d52dbeaa7a Split internal system package
This changes splits the functionality in the internal system package
into two packages: one for dealing with devices and one for dealing
with kernel modules. This removes ambiguity around the meaning of
driver / device roots in each case.

In each case, a root can be specified where device nodes are created
or kernel modules loaded.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-15 09:01:13 +02:00
Evan Lezar
82347eb9bc Resolve auto mode as cdi for fully-qualified names
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-13 16:05:37 +02:00
Evan Lezar
84c7bf8b18 Minor refactor of mode resolver
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-13 16:04:03 +02:00
Evan Lezar
d92300506c Construct CUDA image object once
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-13 10:36:02 +02:00
Evan Lezar
1d0a733487 Replace logger.Warn(f) with logger.Warning(f)
This aligns better with klog used in other projects.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-12 10:48:04 +02:00
Evan Lezar
9464953924 Use logger.Interface when resolving auto mode
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-12 10:46:11 +02:00
Evan Lezar
a02bc27c3e Define a basic logger interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-12 10:46:10 +02:00
Evan Lezar
3f03a71afd Skip additional modifications in CDI mode
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-05 15:01:58 +02:00
Evan Lezar
43c44a0f48 Merge branch 'treat-log-errors-as-non-fatal' into 'main'
Ignore errors when creating debug log file

See merge request nvidia/container-toolkit/container-toolkit!404
2023-06-01 07:44:56 +00:00
Evan Lezar
6b1e8171c8 Merge branch 'add-mod-probe' into 'main'
Add option to load NVIDIA kernel modules

See merge request nvidia/container-toolkit/container-toolkit!409
2023-05-31 18:14:45 +00:00
Evan Lezar
2e50b3da7c Merge branch 'ldcache-resolve-circular' into 'main'
Fix infinite recursion when resolving libraries in LDCache

Closes #13

See merge request nvidia/container-toolkit/container-toolkit!406
2023-05-31 17:35:27 +00:00
Evan Lezar
7b801a0ce0 Add option to load NVIDIA kernel modules
These changes add a --load-kernel-modules option to the
nvidia-ctk system commands. If specified the NVIDIA kernel modules
(nvidia, nvidia-uvm, and nvidia-modeset) are loaded before any
operations on device nodes are performed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 19:31:38 +02:00
Evan Lezar
528cbbb636 Merge branch 'fix-device-symlinks' into 'main'
Fix creation of device symlinks in /dev/char

See merge request nvidia/container-toolkit/container-toolkit!399
2023-05-31 17:31:04 +00:00
Evan Lezar
a6b0f45d2c Fix infinite recursion when resolving ldcache
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-30 11:03:36 +02:00
Evan Lezar
05632c0a40 Treat missing nvidia device majors as an error
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-26 10:24:36 +02:00
Evan Lezar
02656b624d Create log directory if required
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-25 15:17:00 +02:00
Evan Lezar
61af2aee8e Ignore errors when creating debug log file
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-25 14:44:00 +02:00
Evan Lezar
ac11727ec5 Add nvidia-contianer-runtime-hook.path config option
This change adds an nvidia-container-runtime-hook.path config option
to allow the path used for the prestart hook to be overridden. This
is useful in cases where multiple NVIDIA Container Toolkit installations
are present.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-25 12:05:33 +02:00
Evan Lezar
013a1b413b Fix ineffectual assignment
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 21:14:02 +02:00
Evan Lezar
3be16d8077 Create individual links instead of processing CSV
This change switches to generating a OCI runtime hook to create
individual symlinks instead of processing a CSV file in the hook.
This allows for better reuse of the logic generating CDI
specifications, for example.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 20:43:36 +02:00
Evan Lezar
927ec78b6e Add symlinks package with Resolve function
This change adds a symlinks.Resolve function for resolving symlinks and
updates usages across the code to make use of it.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 20:42:17 +02:00
Evan Lezar
e7d2a9c212 Merge branch 'CNT-1876/cdi-specs-from-csv' into 'main'
Add csv mode to CDI spec generation

See merge request nvidia/container-toolkit/container-toolkit!393
2023-05-23 14:47:19 +00:00