Commit Graph

2450 Commits

Author SHA1 Message Date
Evan Lezar
d1286bceed
Merge pull request #763 from elezar/allow-config-override-by-envvar
Add support for specifying the config file path in an environment variable
2025-06-24 19:54:36 +02:00
Evan Lezar
2f204147f9
Merge pull request #1030 from elezar/add-driver-lib-root-envvar
Add envvar for libcuda.so parent dir to CDI spec
2025-06-24 14:00:46 +02:00
Evan Lezar
4bab94baa6
Add envvar for libcuda.so parent dir to CDI spec
This change adds an NVIDIA_CTK_LIBCUDA_DIR envvar to
a generated CDI specification. This reports where the `libcuda.so.*`
libraries will be injected into the container.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-24 13:24:50 +02:00
Evan Lezar
f642825ad4
Add EnvVar to Discover interface
This change adds environment variables to the Discover interface.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-24 13:24:50 +02:00
Evan Lezar
5bc2f50299
Merge pull request #1154 from elezar/switch-to-distroless
Switch to distroless Base image
2025-06-24 11:05:35 +02:00
Evan Lezar
60706815a5
Create /work/nvidia-toolkit symlink
This change ensures that a symlink from /work/nvidia-toolkit to
/work/nvidia-ctk-installer exists to allow GPU Operator versions
that override the entrypoint and assume nvidia-toolkit as the
original entrypoint.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-19 10:23:09 +02:00
Evan Lezar
69b0f0ba61
[no-relnote] Update release scripts for distroless
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-19 10:20:59 +02:00
Evan Lezar
7abf5fa6a4
Use Apache license for images
This change removes the NGC-DL-CONTAINER-LICENSE (since this
is not available in the distroless images) and includes the
repo's Apache LICENSE file in the image.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-19 10:20:59 +02:00
Evan Lezar
0dddd5cfd8
Merge pull request #910 from elezar/default-to-cdi
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Use just-in-time CDI spec generation by default in the NVIDIA Container Runtime
2025-06-18 23:40:44 +02:00
Evan Lezar
28ddc1454c
Switch to golang distroless image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 23:39:26 +02:00
Evan Lezar
17c5d1dc87
Resolve to legacy by default in nvidia-container-runtime-hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 23:36:12 +02:00
Evan Lezar
6149592bf6
Default to jit-cdi mode in the nvidia runtime
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 23:36:12 +02:00
Evan Lezar
d3ece78bc9
[no-relnote] Add RuntimeMode type
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 23:36:12 +02:00
Evan Lezar
980ca5d1bc
Use functional options to construct runtime mode resolver
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 23:36:12 +02:00
Evan Lezar
76b71a5498
Merge pull request #1083 from elezar/bump-image-in-tests
[no-relnote] Use cuda 12.9.0 image in tests
2025-06-18 23:33:54 +02:00
Evan Lezar
5a1b4e7c1e
[no-relnote] Fix tests for compat mode
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 15:22:08 +02:00
Evan Lezar
39fd15d273
[no-relnote] Use 550 driver in tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 15:18:21 +02:00
Evan Lezar
da6b849cf6
[no-relnote] Use cuda 12.9.0 image in tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 15:18:21 +02:00
Evan Lezar
849691d290
[no-relnote] Use NVIDIA_CTK_CONFIG_FILE_PATH in toolkit install
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 15:06:00 +02:00
Evan Lezar
f672d38aa5
[no-relnote] Rename config constants
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 15:02:35 +02:00
Evan Lezar
eb39b972a5
Add NVIDIA_CTK_CONFIG_FILE_PATH envvar
This change adds support for explicitly specifying the path
to the config file through an environment variable.
The NVIDIA_CTK_CONFIG_FILE_PATH variable is used for both the nvidia-container-runtime
and nvidia-container-runtime-hook.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 15:01:46 +02:00
Evan Lezar
d9c7ec9714
[no-relnote] Don't refer to target image distribution
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 14:32:54 +02:00
Evan Lezar
614e469dac
Merge pull request #1153 from elezar/switch-to-ubi9
Switch to cuda ubi9 base image
2025-06-18 13:10:32 +02:00
Evan Lezar
aafe4d7ad0
Switch to cuda ubi9 base image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 11:58:23 +02:00
Evan Lezar
1f7c7ffec2
Merge pull request #1152 from elezar/remove-dist-tag
Update image tags for single image
2025-06-18 11:56:14 +02:00
Evan Lezar
7e1beb7aa6
[no-relnote] Update gitlab CI for new image names
We now release all images with vX.Y.Z and vX.Y.Z-packaging tags.

This change updates the gitlab CI to allow for this.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 11:51:52 +02:00
Evan Lezar
d560888f1f
Use single version tag for image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-17 19:46:42 +02:00
Evan Lezar
81fb7bb9c1
Merge pull request #602 from elezar/use-ubi8-image
Use single image for deb and rpm-based systems
2025-06-17 15:09:47 +02:00
Evan Lezar
208896d87d
Merge pull request #1130 from ArangoGutierrez/fix/1049
BUGFIX: modifier: respect GPU volume-mount device requests
2025-06-17 15:04:22 +02:00
Evan Lezar
82b62898bf
[no-relnote] Make devicesFromEnvvars private
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-17 11:55:00 +02:00
Carlos Eduardo Arango Gutierrez
d03a06029a
BUGFIX: modifier: respect GPU volume-mount device requests
The gated modifiers used to add support for GDS, Mofed, and CUDA Forward Comatibility
only check the NVIDIA_VISIBLE_DEVICES envvar to determine whether GPUs are requested
and modifications should be made. This means that use cases where volume mounts are
used to request devices are not supported.

This change ensures that device extraction is consistent for all use cases.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-17 11:55:00 +02:00
Evan Lezar
f4f7da65f1
Ensure consistent sorting of annotation devices
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-17 11:55:00 +02:00
Carlos Eduardo Arango Gutierrez
5fe7b06514
[no-relnote] new helper func visibleEnvVars
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-17 11:54:53 +02:00
Evan Lezar
5606caa5af
Extract deb and rpm packages to single image
This change swithces to using a single image for the NVIDIA Container Toolkit contianer.
Here the contents of the architecture-specific deb and rpm packages are extracted
to a known root. These contents can then be installed using the updated installation
mechanism which has been updated to detect the source root based on the packaging type.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-16 19:15:56 +02:00
Evan Lezar
8149be09ac
Merge pull request #1149 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.7
Bump github.com/urfave/cli/v2 from 2.27.6 to 2.27.7
2025-06-16 16:28:29 +02:00
Evan Lezar
d935648722
Merge pull request #1140 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvlib-0.7.3
Bump github.com/NVIDIA/go-nvlib from 0.7.2 to 0.7.3
2025-06-16 16:26:57 +02:00
dependabot[bot]
1f43b71dd8
Bump github.com/urfave/cli/v2 from 2.27.6 to 2.27.7
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.27.6 to 2.27.7.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.27.6...v2.27.7)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-version: 2.27.7
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-16 08:56:44 +00:00
Evan Lezar
b33d475ff3
Merge pull request #1145 from elezar/remove-docker-runc
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Remove docker-run as default runtime candidate
2025-06-13 19:27:08 +02:00
Evan Lezar
6359cc9919
Remove docker-run as default runtime candidate
This change removes docker-runc as the highest priority
default candidate for the low-level runtimes supported by
the nvidia-container-runtime.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 17:26:19 +02:00
Evan Lezar
4f9c860a37
Merge pull request #927 from elezar/disable-device-node-creation
Disable device node creation in CDI mode
2025-06-13 16:44:18 +02:00
Evan Lezar
bab9fdf607
Merge pull request #1144 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-710a0f1
Bump third_party/libnvidia-container from `6eda4d7` to `710a0f1`
2025-06-13 16:42:53 +02:00
Evan Lezar
cc7812470f
Merge pull request #1143 from elezar/add-device-ids-to-getspec
Add device IDs to nvcdi.GetSpec API
2025-06-13 16:41:43 +02:00
Evan Lezar
bdcdcb7449
Merge pull request #1132 from elezar/make-cdi-device-extraction-consistent
Make CDI device requests consistent with other methods
2025-06-13 16:05:44 +02:00
Evan Lezar
8be03cfc41
[no-relnote] Ignore annotation devices for non-CDI modes
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 15:26:48 +02:00
Evan Lezar
8650ca6533
[no-relnote] Move hookCreator initialisation for readability
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 15:00:51 +02:00
Evan Lezar
1bc2a9fee3
Return annotation devices from VisibleDevices
This change includes annotation devices in CUDA.VisibleDevices
with the highest priority. This allows for the CDI device
request extraction to be consistent across all request mechanisms.

Note that this does change behaviour in the following ways:
1. Annotations are considered when resolving the runtime mode.
2. Incorrectly formed device names in annotations are no longer treated as an error.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 14:56:08 +02:00
Evan Lezar
dc87dcf786
Make CDI device requests consistent with other methods
Following the refactoring of device request extraction, we can
now make CDI device requests consistent with other methods.

This change moves to using image.VisibleDevices instead of
separate calls to CDIDevicesFromMounts and VisibleDevicesFromEnvVar.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 14:34:02 +02:00
Evan Lezar
f17d424248
Construct container info once
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 14:05:51 +02:00
Evan Lezar
426186c992
Add logic to extract annotation device requests to image type
This change updates the image.CUDA type to also extract CDI
device requests. These are only relevant IF CDI prefixes are
specifically set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 14:05:51 +02:00
Evan Lezar
6849ebd621
Add IsPrivileged function to CUDA container type
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 14:05:51 +02:00