Commit Graph

2032 Commits

Author SHA1 Message Date
Evan Lezar
5bc0315448
Merge pull request #766 from elezar/bump-release-v1.17.0
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump version for v1.17.0 release
2024-10-31 10:17:13 +01:00
Evan Lezar
3fb1615d26
[no-relnote] Address lint errors in test
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-31 10:11:58 +01:00
Evan Lezar
9e4696bf7d
Bump version for v1.17.0 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-31 07:20:10 +01:00
Evan Lezar
8c9d3d8f65
Merge commit from fork
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Check for valid paths in create-symlinks hook
2024-10-31 07:16:01 +01:00
Evan Lezar
efb18a72ad
Merge pull request #762 from elezar/fix-auto-cdi-runtime-mode
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Fix bug when using just-in-time CDI spec generation
2024-10-30 13:08:26 +01:00
Evan Lezar
75376d3df2
Fix bug when using just-in-time CDI spec generation
This change fixes a bug when using just-in-time CDI spec generation for the
NVIDIA Container Runtime for specific devices (i.e. not 'all').
Instead of unconditionally using the default nvsandboxutils library -- leading
to errors due to undefined symbols -- we check whether the library can be
properly initialised before continuing.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-30 12:20:36 +01:00
Christopher Desiniotis
7e0cd45b1c
Check for valid paths in create-symlinks hook
This change updates the create-symlinks hook to always evaluate
link paths in the container's root filesystem. In addition the
executable is updated to return an error if a link could not
be created.

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-10-29 12:16:51 -07:00
Christopher Desiniotis
a04e3ac4f7
Write failing test case for create-symlinks hook
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-10-29 12:16:51 -07:00
Christopher Desiniotis
92779e71b3
Handle case where symlink already exists in create-symlinks hook
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-10-29 12:16:51 -07:00
Christopher Desiniotis
23f1ba3e93
Add unit tests for create-symlinks hook
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:51 -07:00
Evan Lezar
d0d85a8c5c
Always use paths relative to the container root for links
This chagne ensures that we always treat the link path as a path
relative to the container root. Without this change, relative paths
in link paths would result links being created relative to the
current working directory where the hook is executed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:51 -07:00
Evan Lezar
bfea673d6a
[no-relnote] Remove unused hostRoot argument
The hostRoot argument is always empty and not applicable to
how links are specified.

Links are specified by the paths in the container filesystem and as such
the only transform required to change the root is a join of the filepath.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:50 -07:00
Evan Lezar
6a6a3e6055
[no-relnote] Remove redundant changeRoot for link target
Since hostRoot is always the empty string and we are changing the root in the
target path to /, the call to changeRoot is redundant.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:50 -07:00
Evan Lezar
fa59d12973
[no-relnote] Check created outside of create loop
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:49 -07:00
Evan Lezar
d78868cd31
Merge pull request #760 from elezar/bump-release-v1.17.0-rc.2
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump version for v1.17.0-rc.2 release
2024-10-28 14:26:53 +01:00
Evan Lezar
74b1e5ea8c
Bump version for v1.17.0-rc.2 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-28 14:09:12 +01:00
Evan Lezar
88608781b6
Merge pull request #755 from elezar/fix-libcuda-so
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Fix bug where libcuda.so is not found in ldcache
2024-10-24 23:33:12 +02:00
Evan Lezar
fa5a4ac499
Read ldcache at construction instead of on each locate call
This change udpates the ldcache locator to read the ldcache at construction
and use these contents to perform future lookups against. Each of the cache
entries are resolved and lookups return the resolved target.

Assuming a symlink chain: libcuda.so -> libcuda.so.1 -> libcuda.so.VERSION, this
means that libcuda.so.VERION will be returned for any of the following inputs:
libcuda.so, libcuda.so.1, libcudal.so.*.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 23:12:58 +02:00
Evan Lezar
9f1bd62c42 [no-relnote] Add failing libcuda locate test
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:53 +02:00
Evan Lezar
9534249936 [no-relnote] Add test for libcuda lookup
This change adds a test for locating libcuda as a driver library.
This includes a failing test on a system where libcuda.so.1 is in
the ldcache, but not at one of the predefined library search paths.

A testdata folder with sample root filesystems is included to test
various combinations.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Evan Lezar
e1ea0056b9 Fix bug in sorting of symlink chain
Since we use a map to keep track of the elements of a symlink chain
the construction of the final list of located elements is not stable.
This change constructs the output as this is being discovered and as
such maintains the original ordering.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Evan Lezar
c802c3089c Remove unsupported print-ldcache command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Tariq
771ac6b88a
Merge pull request #756 from NVIDIA/cli-source-fallback
[TOML ConfigSource] add support for executing fallback CLI commands
2024-10-23 14:31:45 -07:00
Tariq Ibrahim
0f7aba9c3c
[TOML ConfigSource] add support for executing fallback CLI commands
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Co-authored-by: Evan Lezar <elezar@nvidia.com>
2024-10-23 14:26:17 -07:00
Tariq
3c07ea0b17
Merge pull request #726 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.23.2
Bump golang from 1.23.1 to 1.23.2 in /deployments/devel
2024-10-21 10:11:21 -07:00
Evan Lezar
183dff9161
Merge pull request #750 from elezar/remove-csv-filenames-support
Remove csv filenames support
2024-10-21 11:10:27 +02:00
Evan Lezar
5e3e91a010 [no-relnote] Minor cleanup in create-symlinks
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 16:27:38 +02:00
Evan Lezar
dc0e191093 Remove csv-filename support from create-symlinks
This change removes support for specifying csv-filenames when
calling the create-symlinks hook. This is no longer required
as tegra-based systems generate hooks with `--link` arguments.

This also allows the hook to better serve as a reference implementation
for upstream projects wanting to implement a set of standard CDI hooks.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 16:27:27 +02:00
Evan Lezar
8a6c1944a5
Merge pull request #749 from elezar/bump-release-v1.17.0-rc.1
Bump version for v1.17.0-rc.1 release
2024-10-18 15:35:34 +02:00
Evan Lezar
5d057dce66 Bump version for v1.17.0-rc.1 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 15:33:12 +02:00
Evan Lezar
5931136879
Merge pull request #748 from elezar/fix-operator
Add aliases for runtime-specific envvars
2024-10-18 14:49:32 +02:00
Evan Lezar
1145ce2283 Add aliases for runtime-specific envvars
This change ensures that the toolkit works with older
versions of the GPU Operator where runtime-specific envvars are
used to set options such as the config file location.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 12:16:50 +02:00
Evan Lezar
38790c5df0
Merge pull request #747 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-63d366e
Bump third_party/libnvidia-container from `921e2f3` to `63d366e`
2024-10-17 18:18:53 +02:00
Evan Lezar
e5175c270e
Merge pull request #745 from elezar/fix-symlink-logging
Fix symlink resolution error message
2024-10-17 18:04:54 +02:00
dependabot[bot]
d18a2b6fc7
Bump third_party/libnvidia-container from 921e2f3 to 63d366e
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `921e2f3` to `63d366e`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](921e2f3197...63d366ee3b)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-17 16:02:52 +00:00
Evan Lezar
2987c4d670
Merge pull request #740 from elezar/imex-by-volume-mount
Allow IMEX channel requests by volume mount
2024-10-17 17:56:12 +02:00
Evan Lezar
2e6712d2bc Allow IMEX channels to be requested as volume mounts
This change allows IMEX channels to be requested using the
volume mount mechanism.

A mount from /dev/null to /var/run/nvidia-container-devices/imex/{{ .ChannelID }}
is equivalent to including {{ .ChannelID }} in the NVIDIA_IMEX_CHANNELS
envvironment variables.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:54:29 +02:00
Evan Lezar
92df542f2f [no-relnote] Use image.CUDA to extract visible devices
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:53:17 +02:00
Evan Lezar
1991b3ef2a [no-relnote] Use string slice for devices in hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:53:17 +02:00
Evan Lezar
cdf39fbad3 [no-relnote] Use symlinks.Resolve in hook
This change removes duplicate logic from the create-symlinks hook
and uses symlinks.Resolve instead.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 15:47:13 +02:00
Evan Lezar
c30ca0fdc3 Fix typo in error message
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 15:46:49 +02:00
Evan Lezar
b077e2648d
Merge pull request #741 from elezar/imex-default
Add disableIMEXChannelCreation feature flag
2024-10-17 15:26:21 +02:00
Evan Lezar
457d71c170 Add disable-imex-channel-creation feature flag
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 14:26:24 +02:00
Evan Lezar
bc9180b59d Expose opt-in features in toolkit-container
This change enables opt-in (off-by-default) features to be opted into.
These features can be toggled by name by specifying the (repeated)
--opt-in-features command line argument or as a comma-separated list
in the NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES environment variable.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 14:26:24 +02:00
Evan Lezar
ec8dfaf779
Merge pull request #743 from elezar/remove-opt-in-features
Remove ability to set per-container features in the config file
2024-10-17 13:46:23 +02:00
Evan Lezar
c129122da6
Merge pull request #742 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.6.2-base-ubuntu20.04
Bump nvidia/cuda from 12.6.1-base-ubuntu20.04 to 12.6.2-base-ubuntu20.04 in /deployments/container
2024-10-17 11:49:05 +02:00
Evan Lezar
0abf800000
Merge pull request #744 from elezar/fix-script
[no-relnote] Fix typo in script
2024-10-16 15:32:09 +02:00
Evan Lezar
1d9d0acf7d [no-relnote] Remove feature flag for per-container features
This change REMOVES the ability to set opt-in features
(e.g. GDS, MOFED, GDRCOPY) in the config file. The existing
per-container envvars are unaffected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-16 15:30:31 +02:00
Evan Lezar
17f14278a9 [no-relnote] Fix typo in script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-16 10:53:45 +02:00
dependabot[bot]
1fa5bbf351
Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.6.1-base-ubuntu20.04 to 12.6.2-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-15 09:07:59 +00:00