Commit Graph

326 Commits

Author SHA1 Message Date
Evan Lezar
c90211e070 Log explicitly requested runtime mode
For users running the nvidia-container-runtime it would be useful
to determine the runtime mode used from the logs directly instead
of relying on other log messages as signals. This change ensures
that an explicitly selected mode is also logged instead of only
when mode=auto.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-15 15:35:35 +01:00
Jared Baur
95b8ebc297
Use devRoot for discovering character devices on Tegra platforms
Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2023-12-14 11:46:21 -08:00
Jared Baur
508438a0c5
Fix using devRoot on Tegra platforms
Using `WithDevRoot` on Tegra platforms was incorrectly setting
`driverRoot`, fix it so that it correctly sets `devRoot`.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2023-12-13 19:56:02 -08:00
Christopher Desiniotis
32c3bd1ded Fallback to standard CDI modifier when creation of automatic CDI modifier fails
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
b9ac54b922 Add GetDeviceSpecsByID() API to the nvcdi Interface
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
ae1b7e126c Extend the 'runtime.nvidia.com/gpu' CDI device kind to support full-GPUs specified by index or UUID
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Tariq Ibrahim
7627d48a5c run goimports -local against the entire codebase
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-01 11:13:17 +01:00
Evan Lezar
efae501834 Add support for injecting NVSWITCH devices
This change adds support for an NVIDIA_NVSWITCH environment variable.
When set to `enabled` this striggers the injection of all available
/dev/nvidia-nvswitch* device nodes.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 21:59:39 +01:00
Evan Lezar
3045954cd9 Consolidate GDS and MOFED modifiers
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 21:59:17 +01:00
Evan Lezar
1ab3ef0af4 Locate libnvidia-egl-gbm.so.*
Searching for a pattern allows platforms where no `.so` symlink
exists to function as expected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 21:57:36 +01:00
Evan Lezar
1909b1fe60 Merge branch 'library-search-path-cdi-generate' into 'main'
Allow search paths when locating libcuda.so

See merge request nvidia/container-toolkit/container-toolkit!462
2023-11-22 19:49:15 +00:00
Evan Lezar
7d79b311d8 Include vulkan/icd.d/nvidia_layers.json
This change includes vulkan/icd.d/nvidia_layers.json in the list of
possible graphics mounts.
2023-11-22 13:54:12 +01:00
Evan Lezar
b46bc10c44 Include nvidia/nvoptix.bin in graphics mounts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 13:53:59 +01:00
Evan Lezar
bbd9222206 Add driver root abstraction
This change adds a driver root abstraction that defines how
libraries are located relative to the root. This allows for
this driver root to be constructed once and passed to discovery
code.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 13:27:48 +01:00
Evan Lezar
f20ab793a2 Add support for specifying search paths
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 13:27:47 +01:00
Evan Lezar
e5391760e6 Remove duplicate not found error
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 18:09:42 +01:00
Evan Lezar
5505886655 Use options for NewLibraryLocator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 18:08:53 +01:00
Evan Lezar
64f554ef41 Add builder for file locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 18:07:47 +01:00
Evan Lezar
232df647c1 Resolve LDConfig path passed to nvidia-container-cli
Instead of relying solely on a static config, we resolve the path
to ldconfig. The path is checked for existence and a .real suffix is preferred.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 15:31:12 +01:00
Evan Lezar
039d7fd324 Merge branch 'remove-config-import-from-discover' into 'main'
Remove NewGraphicsDiscoverer API simplification

See merge request nvidia/container-toolkit/container-toolkit!498
2023-11-20 22:52:02 +00:00
Evan Lezar
255181a5ff Rename NewGraphicsDiscoverer as NewDRMNodesDiscoverer
This change renames NewGraphicsDiscoverer to NewDRMNodesDiscoverer and
instead calls NewGraphicsMountsDiscoverer explicitly when constructing
a graphics modifier.

This avoids the import of config.Config into the discover package
which leads to a transitive dependency on toml-specifics and
requires that the vendor/github.com/pelletier/ package
be vendored in to consumers.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 23:10:57 +01:00
Christopher Desiniotis
dc36ea76e8 Automatically generate CDI spec for the runtime.nvidia.com/gpu=all device
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-11-20 13:35:07 -08:00
Evan Lezar
b4c6832828 Add additional debug
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 21:29:35 +01:00
Evan Lezar
3a96a00362 Simplify meta device discovery
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 21:29:35 +01:00
Evan Lezar
d4e21fdd10 Add devRoot option to CDI api
A driverRoot defines both the driver library root and the
root for device nodes. In the case of preinstalled drivers or
the driver container, these are equal, but in cases such as GKE
they do not match. In this case, drivers are extracted to a folder
and devices exist at the root /.

The changes here add a devRoot option to the nvcdi API that allows the
parent of /dev to be specified explicitly.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 21:29:35 +01:00
Evan Lezar
e609e41a64 Allow multiple pattern matches for symlinks
Since we allow pattern inputs for locating symlinks we could have
multiples. The error being checked is resolved by the deduplication.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-17 10:43:52 +01:00
Evan Lezar
80ecd024ee Add tests for library locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-17 10:43:52 +01:00
Evan Lezar
e8dbb216a5 Return empty ldcache if cache does not exist
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-17 10:14:03 +01:00
Christopher Desiniotis
f5d8d248b7 Deduplicate symlinks
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-11-16 17:57:31 -08:00
Evan Lezar
c63fb35ba8 Use github.com/NVIDIA/go-nvlib imports
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-15 21:38:26 +01:00
Evan Lezar
04b28d116c Make library lookups more robust
These changes make library lookups more robust. The core change is that
library lookups now first look a set of predefined locations before checking
the ldcache. This also handles cases where an ldcache is not available more
gracefully.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-06 12:15:28 -06:00
Evan Lezar
e56bb09889 Use tags.cncf.io for CDI imports
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-01 12:40:51 +01:00
Evan Lezar
833254fa59 Support CDI devices as mounts
This change allows CDI devices to be requested as mounts in the
container. This enables their use in environments such as kind
where environment variables or annotations cannot be used.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-27 21:24:53 +02:00
Evan Lezar
acc50969dc Fix ifElseChain lint errors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
48d68e4eff Add nolint for exec calls
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
709e27bf4b Fix implicit memory aliasing in for loop
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
1b16b341dd Fix default permissions
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
2e1f94aedf Fix append assignments
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
f8870b31be Fix filepath.Join with single arg
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
73857eb8e3 Fix unnecessary conversion
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
dd2f218226 Use MustCompile for static regexp
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
8a9f367067 Check returned error values
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
e0df157f70 Remove unnecessary assignment to the blank identifier
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
12dc12ce09 Fix misspellings
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
73749285d5 Remove unused loadSaver interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
f63ad3d9e7 Refactor symlink filter
This change refactors the use of the symlink filter to make it extendible.
A blocked filter can be set on the Tegra CSV discoverer to ensure that the correct
symlink libraries are filtered out. Here, globs can be used to select mulitple libraries,
and a **/ prefix on the globs indicates that the pattern that follows is only applied to
the filename of the symlink entry in the CSV file.

A --csv.ignore-pattern command line argument is added to the nvidia-ctk cdi generate
command that allows this to be set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:04:06 +02:00
Evan Lezar
c4b4478d1a Remove default symlink filter
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:02:51 +02:00
Evan Lezar
963250a58f Refactor CSV discovery for testability
This change improves the testibility of the CSV discoverer.
This is done by adding injection points for mocks for library discovery and
symlink resolution.

Note that this highlights a bug in the current implementation where the
library filter causes valid symlinks to be skipped.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:02:30 +02:00
Evan Lezar
4ec9bd751e Add required option to new toml config
This change adds a "required" option to the new toml config
that controls whether a default config is returned or not.
This is useful from the NVIDIA Container Runtime Hook, where
/run/driver/nvidia/etc/nvidia-container-runtime/config.toml
is checked before the standard path.

This fixes a bug where the default config was always applied
when this config was not used.

See https://github.com/NVIDIA/nvidia-container-toolkit/issues/106

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-07 11:56:01 +02:00
Evan Lezar
2a3afdd5d9 Merge branch 'fix-platform-detection' into 'main'
Add UsesNVGPUModule info function

See merge request nvidia/container-toolkit/container-toolkit!473
2023-08-28 15:58:41 +00:00