Commit Graph

454 Commits

Author SHA1 Message Date
Evan Lezar
ed3b52eb8d
Add allow-cuda-compat-libs-from-container feature flag
This change adds an allow-cuda-compat-libs-from-container feature flag
to the NVIDIA Container Toolkit config. This allows a user to opt-in
to the previous default behaviour of overriding certain driver
libraries with CUDA compat libraries from the container.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-22 17:34:20 +01:00
Evan Lezar
991b9c222f
Skip graphics modifier in CSV mode
In CSV mode the CSV files at /etc/nvidia-container-runtime/host-files-for-container.d/
should be the source of truth for container modifications. This change skips graphics
modifications to a container. This prevents conflicts when handling files such as
vulkan icd files which are already defined in the CSV file.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-22 13:58:31 +01:00
Evan Lezar
fdad3927b4
[no-relnote] Refactor oci spec modifier list
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-22 13:58:31 +01:00
Evan Lezar
d584f9a40b
[no-relnote] Sort feature flags
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-15 13:27:14 +01:00
Evan Lezar
2529aebd6c
Fix create-device-node test when devices exist
This changes fixes the TestCreateControlDevices test on systems
where device nodes exist.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-06 14:05:51 +01:00
Evan Lezar
00f1d5a627
Only allow host-relative LDConfig paths
This change only allows host-relative LDConfig paths.

An allow-ldconfig-from-container feature flag is added to allow for this
the default behaviour to be changed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-22 14:25:17 +01:00
Evan Lezar
1afada7de5
[no-relnote] Remove unused code
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-22 13:11:04 +01:00
Evan Lezar
e188090a92
Fix NVIDIA_IMEX_CHANNELS handling on legacy images
For legacy images (images with a CUDA_VERSION set but no CUDA_REQUIRES set), the
default behaviour for device envvars is to treat non-existence as all.

This change ensures that the NVIDIA_IMEX_CHANNELS envvar is not treated in the same
way, instead returning no devices if the envvar is not set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-14 13:26:30 -07:00
Evan Lezar
832818084d
Merge pull request #773 from elezar/fix-libcuda-symlink
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Force creation of symlinks in create-symlink hook
2024-11-07 17:41:20 +01:00
Evan Lezar
0c687be794
[no-relnote] Also validate CDI management spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-05 14:23:36 -08:00
Evan Lezar
324096c979
Force symlink creation in create-symlink hook
This change updates the create-symlink hook to be equivalent to
ln -f -s target link

This ensures that links are updated even if they exist in the container
being run.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-05 09:39:11 -08:00
Evan Lezar
fa5a4ac499
Read ldcache at construction instead of on each locate call
This change udpates the ldcache locator to read the ldcache at construction
and use these contents to perform future lookups against. Each of the cache
entries are resolved and lookups return the resolved target.

Assuming a symlink chain: libcuda.so -> libcuda.so.1 -> libcuda.so.VERSION, this
means that libcuda.so.VERION will be returned for any of the following inputs:
libcuda.so, libcuda.so.1, libcudal.so.*.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 23:12:58 +02:00
Evan Lezar
9f1bd62c42 [no-relnote] Add failing libcuda locate test
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:53 +02:00
Evan Lezar
9534249936 [no-relnote] Add test for libcuda lookup
This change adds a test for locating libcuda as a driver library.
This includes a failing test on a system where libcuda.so.1 is in
the ldcache, but not at one of the predefined library search paths.

A testdata folder with sample root filesystems is included to test
various combinations.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Evan Lezar
e1ea0056b9 Fix bug in sorting of symlink chain
Since we use a map to keep track of the elements of a symlink chain
the construction of the final list of located elements is not stable.
This change constructs the output as this is being discovered and as
such maintains the original ordering.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Evan Lezar
e5175c270e
Merge pull request #745 from elezar/fix-symlink-logging
Fix symlink resolution error message
2024-10-17 18:04:54 +02:00
Evan Lezar
2e6712d2bc Allow IMEX channels to be requested as volume mounts
This change allows IMEX channels to be requested using the
volume mount mechanism.

A mount from /dev/null to /var/run/nvidia-container-devices/imex/{{ .ChannelID }}
is equivalent to including {{ .ChannelID }} in the NVIDIA_IMEX_CHANNELS
envvironment variables.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:54:29 +02:00
Evan Lezar
92df542f2f [no-relnote] Use image.CUDA to extract visible devices
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:53:17 +02:00
Evan Lezar
c30ca0fdc3 Fix typo in error message
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 15:46:49 +02:00
Evan Lezar
457d71c170 Add disable-imex-channel-creation feature flag
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 14:26:24 +02:00
Evan Lezar
1d9d0acf7d [no-relnote] Remove feature flag for per-container features
This change REMOVES the ability to set opt-in features
(e.g. GDS, MOFED, GDRCOPY) in the config file. The existing
per-container envvars are unaffected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-16 15:30:31 +02:00
Tariq Ibrahim
b90ee5d100 [no-relnote] minor cleanup and improvements
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-11 16:14:41 +02:00
Tariq Ibrahim
f477dc0df1
fetch current container runtime config through the command line
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>

add default runtime binary path to runtimes field of toolkit config toml

Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>

[no-relnote] Get low-level runtimes consistently

We ensure that we use the same low-level runtimes regardless
of the runtime engine being configured. This ensures consistent
behaviour.

Signed-off-by: Evan Lezar <elezar@nvidia.com>

Co-authored-by: Evan Lezar <elezar@nvidia.com>

address review comment

Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-10-10 01:13:20 -07:00
Evan Lezar
82ae2e615a Add creation of select driver symlinks to CDI spec
This change aligns the creation of symlinks under CDI with
the implementation in libnvidia-container. If the driver libraries
are present, the following symlinks are created:

* {{ .LibRoot }}/libcuda.so -> libcuda.so.1
* {{ .LibRoot }}/libnvidia-opticalflow.so -> libnvidia-opticalflow.so.1
* {{ .LibRoot }}/libGLX_indirect.so.0 -> libGLX_nvidia.so.{{ .Version }}

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-01 11:34:58 +02:00
Evan Lezar
f8141aab27 Skip explicit creation of libnvidia-allocator.so.1 symlink
Since we expect .so.1 symlinks to be created by ldconfig, we don't
explicitly request these. This change removes the creation of
a libnvidia-allocator.so.1 -> libnvidia-allocator.so.RM_VERSION symlink
through the create-symlinks hook.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-26 16:34:39 +02:00
Sananya Majumder
906531fee3 Fix incompatible pointer conversion
This change adds a safe pointer conversion to fix an
incompatible C pointer conversion, which caused build failures on some
architectures.

Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-25 16:40:43 -07:00
Sananya Majumder
563db0e0be nvsandboxutils: Add usage of GetGpuResource and GetFileContent APIs
This change adds a new discoverer for Sandboxutils to report the file
system paths and associated symbolic links using GetGpuResource and
GetFileContent APIs. Both GPU and MIG devices are supported. If the
Sandboxutils discoverer fails, the NVML discoverer is used to report
the file system information.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:05:14 -07:00
Sananya Majumder
dcbf5bc81f nvsandboxutils: Add implementation for the APIs
This change adds manual wrappers around the generated bindings to make
them into more user-friendly APIs for the caller. Some helper functions
are also added.
The APIs that are currently present in the library and implemented here
are:
nvSandboxUtilsInit
nvSandboxUtilsShutdown
nvSandboxUtilsGetDriverVersion
nvSandboxUtilsGetGpuResource
nvSandboxUtilsGetFileContent

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:05:09 -07:00
Sananya Majumder
978d439cf8 nvsandboxutils: Add internal bindings
This change adds the internal bindings for Sandboxutils, some of which
have been automatically generated with the help of c-for-go. The format
followed is similar to what is used in go-nvml. These would need to be
regenerated when the header file is modified and new APIs are added.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:05:05 -07:00
Sananya Majumder
aa946f3f59 nvsandboxutils: Add script to generate bindings
This change adds a script and related files to generate the internal
bindings for Sandboxutils library with the help of c-for-go.
This can be used to update the bindings when the header file is modified
with reference to how they are generated with the Makefile in go-nvml.

Run: ./update-bindings.sh

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:04:48 -07:00
Evan Lezar
5145b0a4b6 Revert "Merge pull request #694 from elezar/add-opt-in-to-sockets"
This reverts commit b061446694, reversing
changes made to c490baab63.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-20 20:26:45 +02:00
Evan Lezar
a4bfccc3fe Use include-persistenced-socket feature for CDI mode
This change ensures that the internal CDI representation includes
the persistenced socket if the include-persistenced-socket feature
flag is enabled.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 22:30:27 +02:00
Evan Lezar
ba1ed3232f Skip injection of nvidia-persistenced socket by default
This changes skips the injection of the nvidia-persistenced socket by
default.

An include-persistenced-socket feature flag is added to allow the
injection of this socket to be explicitly requested.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 22:10:09 +02:00
Evan Lezar
c3c0cdcc89
Merge pull request #660 from elezar/fix-libnvidia-allocator-duplicate-mount
Exclude libnvidia-allocator from graphics mounts
2024-08-22 14:00:25 +02:00
Evan Lezar
beb921fafe Exclude libnvidia-allocator from graphics mounts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-08-20 16:28:14 +02:00
Evan Lezar
5316c2a618
Merge pull request #657 from yeahdongcn/typo
Rename icp_test.go to ipc_test.go
2024-08-19 11:38:56 +02:00
dependabot[bot]
070b40d62a Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.59.1 to 1.60.1.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.59.1...v1.60.1)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-19 11:31:11 +02:00
Xiaodong Ye
13862f3c75 Rename icp_test.go to ipc_test.go
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2024-08-19 15:39:54 +08:00
Evan Lezar
122136fe92
Merge pull request #627 from elezar/add-mig-cdi-test
[no-relnotes] Add initial unit test for MIG CDI spec generation
2024-08-01 11:00:07 +02:00
Evan Lezar
fa16d83494 [no-relnotes] Add initial unit test for MIG CDI spec generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-31 11:56:36 +02:00
Christopher Desiniotis
d8ccd50349
[bugfix] Process error type instead of nvml.Return in internal/platform-support/dgpu
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-07-22 13:57:14 -07:00
Evan Lezar
448a3853ad
Merge pull request #552 from elezar/refactor-dgpu-discovery
Refactor dGPU device discovery
2024-07-10 11:48:30 +02:00
Evan Lezar
e527cc1ff5 Use relative path to locate driver libraries
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:54:07 +02:00
Evan Lezar
55ea268829 Add RelativeToRoot function to Driver
This adds a function to return a path as a path relative to
the specified driver root.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:54:07 +02:00
Seungmin Kim
aae3da88c3 Inject additional libraries for full X11 functionality
This change updates X11 library detection to ensure that this
works for a wider range of driver installations.

Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
Signed-off-by: tux-rampage <tuxrampage@gmail.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:53:55 +02:00
Evan Lezar
be11cf428b [no-relnote] Add MIG discoverer to dgpu package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:34:04 +02:00
Evan Lezar
b42a5d3e3a [no-relnote] Refactor dGPU device discovery
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:34:04 +02:00
Evan Lezar
3aeba886d4 Reduce logging for the NVIDIA Container runtime
This change moves most of the logging for the NVIDIA Container runtime
from Info to Debug or Trace -- especially in the case where no
modifications are requried.

This removes execessive logging.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 12:12:38 +02:00
Evan Lezar
490c7dd599 Add Tracef to logger Interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 12:12:38 +02:00
Evan Lezar
8df5e33ef6 Add String function to oci.Runtime interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 12:12:38 +02:00
Evan Lezar
2c8431c1f8 Fix bug in argument parsing for logger creation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 12:03:53 +02:00
German Maltsev
692dac4cbd [no-relnote] Fixed spelling error
Signed-off-by: German Maltsev <maltsev.german@gmail.com>
2024-06-19 17:40:56 +03:00
Evan Lezar
5f2be72335 Support vulkan ICD files in a driver root
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-17 11:33:29 +02:00
Evan Lezar
ef57c07199 Bump github.com/NVIDIA/go-nvlib to v0.5.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-28 13:28:28 +02:00
Evan Lezar
edda11d647
Merge pull request #428 from elezar/fix-cdi-mode-resolution
Fix cdi mode resolution
2024-05-21 13:22:10 +02:00
Evan Lezar
3defc6babb Use go-nvlib mode resolution
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-21 12:25:54 +02:00
Avi Deitcher
179d8655f9 Move nvidia-ctk hook command into own binary
This change creates an nvidia-cdi-hook binary for implementing
CDI hooks. This allows for these hooks to be separated from the
nvidia-ctk command which may, for example, require libnvidia-ml
to support other functionality.

The nvidia-ctk hook subcommand is maintained as an alias for the
time being to allow for existing CDI specifications referring to
this path to work as expected.

Signed-off-by: Avi Deitcher <avi@deitcher.net>
2024-05-21 12:19:44 +02:00
Evan Lezar
d5f6e6f868 Use nvml/mock package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-18 14:53:37 +02:00
Evan Lezar
082ce066ed Replace go-nvlib/pkg/nvml with go-nvml/pkg/nvml
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-18 14:53:37 +02:00
Jared Baur
5788e622f4 Use XDG_DATA_DIRS instead of hardcoding /usr/share
When running nvidia-ctk on a system that uses a custom XDG_DATA_DIRS
environment variable value, the configuration files for `glvnd`,
`vulkan`, and `egl` fail to get passed through from the host to the
container. Reading from XDG_DATA_DIRS instead of hardcoding the default
value allows for finding said files so they can be mounted in the
container.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-11 17:13:02 +02:00
Evan Lezar
29c0f82ed2
Merge pull request #327 from elezar/add-driver-config
Add config search path option to driver root
2024-04-11 16:58:33 +02:00
Evan Lezar
011c658945 Create root.Driver instance at first usage
This allows for testing through injection of the driver root.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-03 15:03:33 +02:00
Evan Lezar
09341a0934 Add support for feature flags
This change adds a features config that allows
individual features to be toggled at a global level. Each feature can (by default)
be controlled by an environment variable.

The GDS, MOFED, NVSWITCH, and GDRCOPY features are examples of such features.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-03 11:58:37 +02:00
Evan Lezar
2a9e3537ec Add config search paths option to driver root.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-02 23:03:05 +02:00
Evan Lezar
643b89e539 Add driver.Config
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-25 20:18:15 +02:00
Evan Lezar
93763d25f0 Use functional options to construct driver root
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-25 20:13:25 +02:00
Jakub Bujak
44ae31d101 Use D3DKMTEnumAdapters3 for adapter enumeration
D3DKMTEnumAdapters3 is required to enumerate MCDM compute-only adapters

Signed-off-by: Jakub Bujak <jbujak@nvidia.com>
2024-03-12 11:42:08 +01:00
Evan Lezar
6780afbed1
Merge pull request #379 from NVIDIA/exists-fallback
[R550 driver support] add fallback logic to device.Exists(name)
2024-02-27 22:25:33 +02:00
Tariq Ibrahim
f80f4c485d [R550 driver support] add fallback logic to device.Exists(name)
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-02-27 11:59:35 -08:00
Evan Lezar
a2a1a78620
Merge pull request #330 from tariq1890/nvidia-dev-maj-num-lookup
add fallback logic when retrieving major number of the nvidia control device
2024-02-12 17:22:32 +01:00
Evan Lezar
e64b723b71 Add proc.devices.New constructor
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-06 11:19:43 +01:00
Tariq Ibrahim
f414ac2865 add fallback logic when retrieving major number of the nvidia control device
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-02-05 22:55:54 -08:00
Evan Lezar
772cf77dcc Fix build and test on darwin
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-05 23:58:28 +01:00
Christopher Desiniotis
55097b3d7d Add a new gated modifier for GDRCopy which injects the gdrdrv device node
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-01-24 14:25:58 -08:00
Tariq Ibrahim
9c1f0bb08b fix minor typos and rm unused logger param
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-01-22 16:48:11 -08:00
Jared Baur
838493b8b9
Allow for customizing the path to ldconfig
Since the `createContainer` `runc` hook runs with the environment that
the container's config.json specifies, the path to `ldconfig` may not be
easily resolvable if the host environment differs enough from the
container (e.g. on a NixOS host where all binaries are under hashed
paths in /nix/store with an Ubuntu container whose PATH contains
FHS-style paths such as /bin and /usr/bin). This change allows for
specifying exactly where ldconfig comes from.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2024-01-17 21:07:00 -08:00
Evan Lezar
f6c252cbde Add crun as a default low-level runtime.
This change adds crun as a configured low-level runtime.
Note that runc still preferred and will be used if present on the
system.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-17 11:31:07 +01:00
Evan Lezar
9c029cac72 Fix bug in determining CLI user on SUSE systems
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-11 13:54:40 +01:00
Evan Lezar
c90211e070 Log explicitly requested runtime mode
For users running the nvidia-container-runtime it would be useful
to determine the runtime mode used from the logs directly instead
of relying on other log messages as signals. This change ensures
that an explicitly selected mode is also logged instead of only
when mode=auto.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-15 15:35:35 +01:00
Jared Baur
95b8ebc297
Use devRoot for discovering character devices on Tegra platforms
Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2023-12-14 11:46:21 -08:00
Jared Baur
508438a0c5
Fix using devRoot on Tegra platforms
Using `WithDevRoot` on Tegra platforms was incorrectly setting
`driverRoot`, fix it so that it correctly sets `devRoot`.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2023-12-13 19:56:02 -08:00
Christopher Desiniotis
32c3bd1ded Fallback to standard CDI modifier when creation of automatic CDI modifier fails
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
b9ac54b922 Add GetDeviceSpecsByID() API to the nvcdi Interface
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
ae1b7e126c Extend the 'runtime.nvidia.com/gpu' CDI device kind to support full-GPUs specified by index or UUID
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Tariq Ibrahim
7627d48a5c run goimports -local against the entire codebase
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-01 11:13:17 +01:00
Evan Lezar
efae501834 Add support for injecting NVSWITCH devices
This change adds support for an NVIDIA_NVSWITCH environment variable.
When set to `enabled` this striggers the injection of all available
/dev/nvidia-nvswitch* device nodes.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 21:59:39 +01:00
Evan Lezar
3045954cd9 Consolidate GDS and MOFED modifiers
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 21:59:17 +01:00
Evan Lezar
1ab3ef0af4 Locate libnvidia-egl-gbm.so.*
Searching for a pattern allows platforms where no `.so` symlink
exists to function as expected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 21:57:36 +01:00
Evan Lezar
1909b1fe60 Merge branch 'library-search-path-cdi-generate' into 'main'
Allow search paths when locating libcuda.so

See merge request nvidia/container-toolkit/container-toolkit!462
2023-11-22 19:49:15 +00:00
Evan Lezar
7d79b311d8 Include vulkan/icd.d/nvidia_layers.json
This change includes vulkan/icd.d/nvidia_layers.json in the list of
possible graphics mounts.
2023-11-22 13:54:12 +01:00
Evan Lezar
b46bc10c44 Include nvidia/nvoptix.bin in graphics mounts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 13:53:59 +01:00
Evan Lezar
bbd9222206 Add driver root abstraction
This change adds a driver root abstraction that defines how
libraries are located relative to the root. This allows for
this driver root to be constructed once and passed to discovery
code.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 13:27:48 +01:00
Evan Lezar
f20ab793a2 Add support for specifying search paths
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 13:27:47 +01:00
Evan Lezar
e5391760e6 Remove duplicate not found error
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 18:09:42 +01:00
Evan Lezar
5505886655 Use options for NewLibraryLocator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 18:08:53 +01:00
Evan Lezar
64f554ef41 Add builder for file locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 18:07:47 +01:00
Evan Lezar
232df647c1 Resolve LDConfig path passed to nvidia-container-cli
Instead of relying solely on a static config, we resolve the path
to ldconfig. The path is checked for existence and a .real suffix is preferred.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 15:31:12 +01:00
Evan Lezar
039d7fd324 Merge branch 'remove-config-import-from-discover' into 'main'
Remove NewGraphicsDiscoverer API simplification

See merge request nvidia/container-toolkit/container-toolkit!498
2023-11-20 22:52:02 +00:00
Evan Lezar
255181a5ff Rename NewGraphicsDiscoverer as NewDRMNodesDiscoverer
This change renames NewGraphicsDiscoverer to NewDRMNodesDiscoverer and
instead calls NewGraphicsMountsDiscoverer explicitly when constructing
a graphics modifier.

This avoids the import of config.Config into the discover package
which leads to a transitive dependency on toml-specifics and
requires that the vendor/github.com/pelletier/ package
be vendored in to consumers.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 23:10:57 +01:00
Christopher Desiniotis
dc36ea76e8 Automatically generate CDI spec for the runtime.nvidia.com/gpu=all device
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-11-20 13:35:07 -08:00