Evan Lezar
ac61306900
Use require-nvidia-kernel-modules feature for toolkit installation
...
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-30 15:22:44 +01:00
Evan Lezar
7263d26817
Add feature gate to require NVIDIA kernel modules
...
This change adds an opt-in feature to the NVIDIA Container Runtime that
only uses the NVIDIA runtime if the NVIDIA kernel modules are loaded.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-30 15:15:18 +01:00
Evan Lezar
efb18a72ad
Merge pull request #762 from elezar/fix-auto-cdi-runtime-mode
...
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Fix bug when using just-in-time CDI spec generation
2024-10-30 13:08:26 +01:00
Evan Lezar
75376d3df2
Fix bug when using just-in-time CDI spec generation
...
This change fixes a bug when using just-in-time CDI spec generation for the
NVIDIA Container Runtime for specific devices (i.e. not 'all').
Instead of unconditionally using the default nvsandboxutils library -- leading
to errors due to undefined symbols -- we check whether the library can be
properly initialised before continuing.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-30 12:20:36 +01:00
Evan Lezar
d78868cd31
Merge pull request #760 from elezar/bump-release-v1.17.0-rc.2
...
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump version for v1.17.0-rc.2 release
2024-10-28 14:26:53 +01:00
Evan Lezar
74b1e5ea8c
Bump version for v1.17.0-rc.2 release
...
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-28 14:09:12 +01:00
Evan Lezar
88608781b6
Merge pull request #755 from elezar/fix-libcuda-so
...
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Fix bug where libcuda.so is not found in ldcache
2024-10-24 23:33:12 +02:00
Evan Lezar
fa5a4ac499
Read ldcache at construction instead of on each locate call
...
This change udpates the ldcache locator to read the ldcache at construction
and use these contents to perform future lookups against. Each of the cache
entries are resolved and lookups return the resolved target.
Assuming a symlink chain: libcuda.so -> libcuda.so.1 -> libcuda.so.VERSION, this
means that libcuda.so.VERION will be returned for any of the following inputs:
libcuda.so, libcuda.so.1, libcudal.so.*.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 23:12:58 +02:00
Evan Lezar
9f1bd62c42
[no-relnote] Add failing libcuda locate test
...
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:53 +02:00
Evan Lezar
9534249936
[no-relnote] Add test for libcuda lookup
...
This change adds a test for locating libcuda as a driver library.
This includes a failing test on a system where libcuda.so.1 is in
the ldcache, but not at one of the predefined library search paths.
A testdata folder with sample root filesystems is included to test
various combinations.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Evan Lezar
e1ea0056b9
Fix bug in sorting of symlink chain
...
Since we use a map to keep track of the elements of a symlink chain
the construction of the final list of located elements is not stable.
This change constructs the output as this is being discovered and as
such maintains the original ordering.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Evan Lezar
c802c3089c
Remove unsupported print-ldcache command
...
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Tariq
771ac6b88a
Merge pull request #756 from NVIDIA/cli-source-fallback
...
[TOML ConfigSource] add support for executing fallback CLI commands
2024-10-23 14:31:45 -07:00
Tariq Ibrahim
0f7aba9c3c
[TOML ConfigSource] add support for executing fallback CLI commands
...
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Co-authored-by: Evan Lezar <elezar@nvidia.com>
2024-10-23 14:26:17 -07:00
Tariq
3c07ea0b17
Merge pull request #726 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.23.2
...
Bump golang from 1.23.1 to 1.23.2 in /deployments/devel
2024-10-21 10:11:21 -07:00
Evan Lezar
183dff9161
Merge pull request #750 from elezar/remove-csv-filenames-support
...
Remove csv filenames support
2024-10-21 11:10:27 +02:00
Evan Lezar
5e3e91a010
[no-relnote] Minor cleanup in create-symlinks
...
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 16:27:38 +02:00
Evan Lezar
dc0e191093
Remove csv-filename support from create-symlinks
...
This change removes support for specifying csv-filenames when
calling the create-symlinks hook. This is no longer required
as tegra-based systems generate hooks with `--link` arguments.
This also allows the hook to better serve as a reference implementation
for upstream projects wanting to implement a set of standard CDI hooks.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 16:27:27 +02:00
Evan Lezar
8a6c1944a5
Merge pull request #749 from elezar/bump-release-v1.17.0-rc.1
...
Bump version for v1.17.0-rc.1 release
2024-10-18 15:35:34 +02:00
Evan Lezar
5d057dce66
Bump version for v1.17.0-rc.1 release
...
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 15:33:12 +02:00
Evan Lezar
5931136879
Merge pull request #748 from elezar/fix-operator
...
Add aliases for runtime-specific envvars
2024-10-18 14:49:32 +02:00
Evan Lezar
1145ce2283
Add aliases for runtime-specific envvars
...
This change ensures that the toolkit works with older
versions of the GPU Operator where runtime-specific envvars are
used to set options such as the config file location.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 12:16:50 +02:00
Evan Lezar
38790c5df0
Merge pull request #747 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-63d366e
...
Bump third_party/libnvidia-container from `921e2f3` to `63d366e`
2024-10-17 18:18:53 +02:00
Evan Lezar
e5175c270e
Merge pull request #745 from elezar/fix-symlink-logging
...
Fix symlink resolution error message
2024-10-17 18:04:54 +02:00
dependabot[bot]
d18a2b6fc7
Bump third_party/libnvidia-container from 921e2f3
to 63d366e
...
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container ) from `921e2f3` to `63d366e`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases )
- [Commits](921e2f3197...63d366ee3b
)
---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
2024-10-17 16:02:52 +00:00
Evan Lezar
2987c4d670
Merge pull request #740 from elezar/imex-by-volume-mount
...
Allow IMEX channel requests by volume mount
2024-10-17 17:56:12 +02:00
Evan Lezar
2e6712d2bc
Allow IMEX channels to be requested as volume mounts
...
This change allows IMEX channels to be requested using the
volume mount mechanism.
A mount from /dev/null to /var/run/nvidia-container-devices/imex/{{ .ChannelID }}
is equivalent to including {{ .ChannelID }} in the NVIDIA_IMEX_CHANNELS
envvironment variables.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:54:29 +02:00
Evan Lezar
92df542f2f
[no-relnote] Use image.CUDA to extract visible devices
...
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:53:17 +02:00
Evan Lezar
1991b3ef2a
[no-relnote] Use string slice for devices in hook
...
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:53:17 +02:00
Evan Lezar
cdf39fbad3
[no-relnote] Use symlinks.Resolve in hook
...
This change removes duplicate logic from the create-symlinks hook
and uses symlinks.Resolve instead.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 15:47:13 +02:00
Evan Lezar
c30ca0fdc3
Fix typo in error message
...
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 15:46:49 +02:00
Evan Lezar
b077e2648d
Merge pull request #741 from elezar/imex-default
...
Add disableIMEXChannelCreation feature flag
2024-10-17 15:26:21 +02:00
Evan Lezar
457d71c170
Add disable-imex-channel-creation feature flag
...
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 14:26:24 +02:00
Evan Lezar
bc9180b59d
Expose opt-in features in toolkit-container
...
This change enables opt-in (off-by-default) features to be opted into.
These features can be toggled by name by specifying the (repeated)
--opt-in-features command line argument or as a comma-separated list
in the NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES environment variable.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 14:26:24 +02:00
Evan Lezar
ec8dfaf779
Merge pull request #743 from elezar/remove-opt-in-features
...
Remove ability to set per-container features in the config file
2024-10-17 13:46:23 +02:00
Evan Lezar
c129122da6
Merge pull request #742 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.6.2-base-ubuntu20.04
...
Bump nvidia/cuda from 12.6.1-base-ubuntu20.04 to 12.6.2-base-ubuntu20.04 in /deployments/container
2024-10-17 11:49:05 +02:00
Evan Lezar
0abf800000
Merge pull request #744 from elezar/fix-script
...
[no-relnote] Fix typo in script
2024-10-16 15:32:09 +02:00
Evan Lezar
1d9d0acf7d
[no-relnote] Remove feature flag for per-container features
...
This change REMOVES the ability to set opt-in features
(e.g. GDS, MOFED, GDRCOPY) in the config file. The existing
per-container envvars are unaffected.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-16 15:30:31 +02:00
Evan Lezar
17f14278a9
[no-relnote] Fix typo in script
...
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-16 10:53:45 +02:00
dependabot[bot]
1fa5bbf351
Bump nvidia/cuda in /deployments/container
...
Bumps nvidia/cuda from 12.6.1-base-ubuntu20.04 to 12.6.2-base-ubuntu20.04.
---
updated-dependencies:
- dependency-name: nvidia/cuda
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2024-10-15 09:07:59 +00:00
Evan Lezar
f794d09df1
Merge pull request #729 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.26.0
...
Bump golang.org/x/sys from 0.25.0 to 0.26.0
2024-10-11 16:16:15 +02:00
Evan Lezar
17a2377ad5
Merge pull request #734 from NVIDIA/minor-cleanup
...
minor cleanup and improvements
2024-10-11 16:15:19 +02:00
Tariq Ibrahim
b90ee5d100
[no-relnote] minor cleanup and improvements
...
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-11 16:14:41 +02:00
Evan Lezar
1ef3f4048f
Merge pull request #733 from elezar/add-imex-channels-to-management-spec
...
Add imex channels to management CDI spec
2024-10-11 15:28:50 +02:00
Evan Lezar
7fb31bd1dc
Merge pull request #732 from elezar/add-z-lazy
...
Add -z,lazy to LDFLAGS
2024-10-11 15:20:30 +02:00
Evan Lezar
e2fe591535
Add -z,lazy to LDFLAGS
...
This fixes undefined symbol errors on platforms where -z,lazy may
not be the default.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-11 15:20:06 +02:00
Evan Lezar
adf3708d0b
Add imex channels to management CDI spec
...
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-10 14:38:33 +02:00
Evan Lezar
a06d838b1c
Merge pull request #686 from NVIDIA/get-config-from-cmdline
...
Fetch current container runtime config
2024-10-10 11:58:08 +02:00
Tariq Ibrahim
f477dc0df1
fetch current container runtime config through the command line
...
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
add default runtime binary path to runtimes field of toolkit config toml
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
[no-relnote] Get low-level runtimes consistently
We ensure that we use the same low-level runtimes regardless
of the runtime engine being configured. This ensures consistent
behaviour.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Co-authored-by: Evan Lezar <elezar@nvidia.com>
address review comment
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-10-10 01:13:20 -07:00
dependabot[bot]
879bb9ffd5
Bump golang.org/x/sys from 0.25.0 to 0.26.0
...
Bumps [golang.org/x/sys](https://github.com/golang/sys ) from 0.25.0 to 0.26.0.
- [Commits](https://github.com/golang/sys/compare/v0.25.0...v0.26.0 )
---
updated-dependencies:
- dependency-name: golang.org/x/sys
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
2024-10-06 08:57:11 +00:00