Compare commits

..

1348 Commits

Author SHA1 Message Date
Evan Lezar
178348b782 Merge pull request #1159 from elezar/fix-gitlab-references
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
[no-relnote] Replace GitLab with GitHub URLs
2025-06-25 16:34:25 +02:00
Evan Lezar
433687e57a Merge pull request #1158 from elezar/fix-dummy-sign-job
[no-relnote] Fix dummy image signing job
2025-06-25 16:33:57 +02:00
Evan Lezar
468790938c [no-relnote] Replace GitLab with GitHub URLs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-25 13:07:21 +02:00
Evan Lezar
3493ec59c2 [no-relnote] Fix dummy image signing job
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-25 12:38:29 +02:00
Evan Lezar
ced79e51ed Merge pull request #1131 from elezar/bump-release-v1.18.0-rc.1
Bump release v1.18.0 rc.1
2025-06-25 12:04:28 +02:00
Evan Lezar
d1e25abd6c [no-relnote] Add v1.18.0-rc.1 changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-25 12:02:19 +02:00
Evan Lezar
7d3defccd2 Bump version for v1.18.0-rc.1 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-24 19:59:15 +02:00
Evan Lezar
c95d36db52 Merge pull request #947 from elezar/ensure-libcuda.so-in-ldcache
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Ensure that libcuda.so is in the ldcache
2025-06-24 19:55:51 +02:00
Evan Lezar
36950ba03f Merge pull request #1100 from elezar/pin-libnvidia-container-tools-version
Require matching version of libnvidia-container-tools
2025-06-24 19:55:10 +02:00
Evan Lezar
d1286bceed Merge pull request #763 from elezar/allow-config-override-by-envvar
Add support for specifying the config file path in an environment variable
2025-06-24 19:54:36 +02:00
Evan Lezar
2f204147f9 Merge pull request #1030 from elezar/add-driver-lib-root-envvar
Add envvar for libcuda.so parent dir to CDI spec
2025-06-24 14:00:46 +02:00
Evan Lezar
39975fc77b [no-relnote] Refactor ldconfig hooks
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-24 13:53:20 +02:00
Evan Lezar
4bf7421a80 Add create-soname-symlinks hook
This change adds a create-soname-symlinks hook that can be used to ensure
that the soname symlinks for injected libraries exist in a container.

This is done by calling ldconfig -n -N for the directories containing the injected
libraries.

This also ensures that libcuda.so is present in the ldcache when the update-ldcache
hook is run.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-24 13:49:24 +02:00
Evan Lezar
f0ea60a28f Require matching version of libnvidia-container-tools
Since the nvidia-container-tools and libnvidia-container* packages
are now released at the same time with the same version, a more
restrictive version makes sense. Here we specifically require an
equal version for the nvidia-container-toolkit* packages and the
libnvidia-container* packages.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-24 13:26:31 +02:00
Evan Lezar
4bab94baa6 Add envvar for libcuda.so parent dir to CDI spec
This change adds an NVIDIA_CTK_LIBCUDA_DIR envvar to
a generated CDI specification. This reports where the `libcuda.so.*`
libraries will be injected into the container.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-24 13:24:50 +02:00
Evan Lezar
f642825ad4 Add EnvVar to Discover interface
This change adds environment variables to the Discover interface.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-24 13:24:50 +02:00
Evan Lezar
5bc2f50299 Merge pull request #1154 from elezar/switch-to-distroless
Switch to distroless Base image
2025-06-24 11:05:35 +02:00
Evan Lezar
60706815a5 Create /work/nvidia-toolkit symlink
This change ensures that a symlink from /work/nvidia-toolkit to
/work/nvidia-ctk-installer exists to allow GPU Operator versions
that override the entrypoint and assume nvidia-toolkit as the
original entrypoint.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-19 10:23:09 +02:00
Evan Lezar
69b0f0ba61 [no-relnote] Update release scripts for distroless
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-19 10:20:59 +02:00
Evan Lezar
7abf5fa6a4 Use Apache license for images
This change removes the NGC-DL-CONTAINER-LICENSE (since this
is not available in the distroless images) and includes the
repo's Apache LICENSE file in the image.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-19 10:20:59 +02:00
Evan Lezar
0dddd5cfd8 Merge pull request #910 from elezar/default-to-cdi
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Use just-in-time CDI spec generation by default in the NVIDIA Container Runtime
2025-06-18 23:40:44 +02:00
Evan Lezar
28ddc1454c Switch to golang distroless image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 23:39:26 +02:00
Evan Lezar
17c5d1dc87 Resolve to legacy by default in nvidia-container-runtime-hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 23:36:12 +02:00
Evan Lezar
6149592bf6 Default to jit-cdi mode in the nvidia runtime
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 23:36:12 +02:00
Evan Lezar
d3ece78bc9 [no-relnote] Add RuntimeMode type
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 23:36:12 +02:00
Evan Lezar
980ca5d1bc Use functional options to construct runtime mode resolver
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 23:36:12 +02:00
Evan Lezar
76b71a5498 Merge pull request #1083 from elezar/bump-image-in-tests
[no-relnote] Use cuda 12.9.0 image in tests
2025-06-18 23:33:54 +02:00
Evan Lezar
5a1b4e7c1e [no-relnote] Fix tests for compat mode
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 15:22:08 +02:00
Evan Lezar
39fd15d273 [no-relnote] Use 550 driver in tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 15:18:21 +02:00
Evan Lezar
da6b849cf6 [no-relnote] Use cuda 12.9.0 image in tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 15:18:21 +02:00
Evan Lezar
849691d290 [no-relnote] Use NVIDIA_CTK_CONFIG_FILE_PATH in toolkit install
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 15:06:00 +02:00
Evan Lezar
f672d38aa5 [no-relnote] Rename config constants
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 15:02:35 +02:00
Evan Lezar
eb39b972a5 Add NVIDIA_CTK_CONFIG_FILE_PATH envvar
This change adds support for explicitly specifying the path
to the config file through an environment variable.
The NVIDIA_CTK_CONFIG_FILE_PATH variable is used for both the nvidia-container-runtime
and nvidia-container-runtime-hook.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 15:01:46 +02:00
Evan Lezar
d9c7ec9714 [no-relnote] Don't refer to target image distribution
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 14:32:54 +02:00
Evan Lezar
614e469dac Merge pull request #1153 from elezar/switch-to-ubi9
Switch to cuda ubi9 base image
2025-06-18 13:10:32 +02:00
Evan Lezar
aafe4d7ad0 Switch to cuda ubi9 base image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 11:58:23 +02:00
Evan Lezar
1f7c7ffec2 Merge pull request #1152 from elezar/remove-dist-tag
Update image tags for single image
2025-06-18 11:56:14 +02:00
Evan Lezar
7e1beb7aa6 [no-relnote] Update gitlab CI for new image names
We now release all images with vX.Y.Z and vX.Y.Z-packaging tags.

This change updates the gitlab CI to allow for this.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-18 11:51:52 +02:00
Evan Lezar
d560888f1f Use single version tag for image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-17 19:46:42 +02:00
Evan Lezar
81fb7bb9c1 Merge pull request #602 from elezar/use-ubi8-image
Use single image for deb and rpm-based systems
2025-06-17 15:09:47 +02:00
Evan Lezar
208896d87d Merge pull request #1130 from ArangoGutierrez/fix/1049
BUGFIX: modifier: respect GPU volume-mount device requests
2025-06-17 15:04:22 +02:00
Evan Lezar
82b62898bf [no-relnote] Make devicesFromEnvvars private
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-17 11:55:00 +02:00
Carlos Eduardo Arango Gutierrez
d03a06029a BUGFIX: modifier: respect GPU volume-mount device requests
The gated modifiers used to add support for GDS, Mofed, and CUDA Forward Comatibility
only check the NVIDIA_VISIBLE_DEVICES envvar to determine whether GPUs are requested
and modifications should be made. This means that use cases where volume mounts are
used to request devices are not supported.

This change ensures that device extraction is consistent for all use cases.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-17 11:55:00 +02:00
Evan Lezar
f4f7da65f1 Ensure consistent sorting of annotation devices
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-17 11:55:00 +02:00
Carlos Eduardo Arango Gutierrez
5fe7b06514 [no-relnote] new helper func visibleEnvVars
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-17 11:54:53 +02:00
Evan Lezar
5606caa5af Extract deb and rpm packages to single image
This change swithces to using a single image for the NVIDIA Container Toolkit contianer.
Here the contents of the architecture-specific deb and rpm packages are extracted
to a known root. These contents can then be installed using the updated installation
mechanism which has been updated to detect the source root based on the packaging type.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-16 19:15:56 +02:00
Evan Lezar
8149be09ac Merge pull request #1149 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.7
Bump github.com/urfave/cli/v2 from 2.27.6 to 2.27.7
2025-06-16 16:28:29 +02:00
Evan Lezar
d935648722 Merge pull request #1140 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvlib-0.7.3
Bump github.com/NVIDIA/go-nvlib from 0.7.2 to 0.7.3
2025-06-16 16:26:57 +02:00
dependabot[bot]
1f43b71dd8 Bump github.com/urfave/cli/v2 from 2.27.6 to 2.27.7
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.27.6 to 2.27.7.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.27.6...v2.27.7)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-version: 2.27.7
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-16 08:56:44 +00:00
Evan Lezar
b33d475ff3 Merge pull request #1145 from elezar/remove-docker-runc
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Remove docker-run as default runtime candidate
2025-06-13 19:27:08 +02:00
Evan Lezar
6359cc9919 Remove docker-run as default runtime candidate
This change removes docker-runc as the highest priority
default candidate for the low-level runtimes supported by
the nvidia-container-runtime.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 17:26:19 +02:00
Evan Lezar
4f9c860a37 Merge pull request #927 from elezar/disable-device-node-creation
Disable device node creation in CDI mode
2025-06-13 16:44:18 +02:00
Evan Lezar
bab9fdf607 Merge pull request #1144 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-710a0f1
Bump third_party/libnvidia-container from `6eda4d7` to `710a0f1`
2025-06-13 16:42:53 +02:00
Evan Lezar
cc7812470f Merge pull request #1143 from elezar/add-device-ids-to-getspec
Add device IDs to nvcdi.GetSpec API
2025-06-13 16:41:43 +02:00
Evan Lezar
bdcdcb7449 Merge pull request #1132 from elezar/make-cdi-device-extraction-consistent
Make CDI device requests consistent with other methods
2025-06-13 16:05:44 +02:00
Evan Lezar
8be03cfc41 [no-relnote] Ignore annotation devices for non-CDI modes
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 15:26:48 +02:00
Evan Lezar
8650ca6533 [no-relnote] Move hookCreator initialisation for readability
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 15:00:51 +02:00
Evan Lezar
1bc2a9fee3 Return annotation devices from VisibleDevices
This change includes annotation devices in CUDA.VisibleDevices
with the highest priority. This allows for the CDI device
request extraction to be consistent across all request mechanisms.

Note that this does change behaviour in the following ways:
1. Annotations are considered when resolving the runtime mode.
2. Incorrectly formed device names in annotations are no longer treated as an error.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 14:56:08 +02:00
Evan Lezar
dc87dcf786 Make CDI device requests consistent with other methods
Following the refactoring of device request extraction, we can
now make CDI device requests consistent with other methods.

This change moves to using image.VisibleDevices instead of
separate calls to CDIDevicesFromMounts and VisibleDevicesFromEnvVar.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 14:34:02 +02:00
Evan Lezar
f17d424248 Construct container info once
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 14:05:51 +02:00
Evan Lezar
426186c992 Add logic to extract annotation device requests to image type
This change updates the image.CUDA type to also extract CDI
device requests. These are only relevant IF CDI prefixes are
specifically set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 14:05:51 +02:00
Evan Lezar
6849ebd621 Add IsPrivileged function to CUDA container type
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-13 14:05:51 +02:00
dependabot[bot]
4a6685d3a8 Bump third_party/libnvidia-container from 6eda4d7 to 710a0f1
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `6eda4d7` to `710a0f1`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](6eda4d76c8...710a0f1304)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-version: 710a0f1304dacb9de06716a4e24160698952aa81
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-13 08:14:38 +00:00
Evan Lezar
2ccf67c40f [no-relnote] Remove test output file
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-12 16:34:08 +02:00
Evan Lezar
0134ba4250 Add device IDs to nvcdi.GetSpec API
This change allows device IDs to the specified in the GetSpec API.
This simplifies cases where CDI specs are being generated for specific
devices by ID.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-12 16:23:11 +02:00
dependabot[bot]
eab9cdf1c2 Bump github.com/NVIDIA/go-nvlib from 0.7.2 to 0.7.3
Bumps [github.com/NVIDIA/go-nvlib](https://github.com/NVIDIA/go-nvlib) from 0.7.2 to 0.7.3.
- [Release notes](https://github.com/NVIDIA/go-nvlib/releases)
- [Commits](https://github.com/NVIDIA/go-nvlib/compare/v0.7.2...v0.7.3)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvlib
  dependency-version: 0.7.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-11 12:58:33 +00:00
Evan Lezar
dba15acdcc Merge pull request #1135 from NVIDIA/dependabot/go_modules/tests/main/golang.org/x/crypto-0.39.0
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump golang.org/x/crypto from 0.38.0 to 0.39.0 in /tests
2025-06-11 14:57:38 +02:00
Evan Lezar
8339fb1ec3 Merge pull request #1139 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvml-0.12.9-0
Bump github.com/NVIDIA/go-nvml from 0.12.4-1 to 0.12.9-0
2025-06-11 14:57:18 +02:00
Evan Lezar
b9d646c80d Merge pull request #1136 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.24.4
Bump golang from 1.24.3 to 1.24.4 in /deployments/devel
2025-06-11 14:53:57 +02:00
Evan Lezar
7380cff645 Merge pull request #1134 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.25.0
Bump golang.org/x/mod from 0.24.0 to 0.25.0
2025-06-11 14:53:34 +02:00
dependabot[bot]
f91736d832 Bump github.com/NVIDIA/go-nvml from 0.12.4-1 to 0.12.9-0
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.4-1 to 0.12.9-0.
- [Release notes](https://github.com/NVIDIA/go-nvml/releases)
- [Commits](https://github.com/NVIDIA/go-nvml/compare/v0.12.4-1...v0.12.9-0)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvml
  dependency-version: 0.12.9-0
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-11 09:03:11 +00:00
dependabot[bot]
5ccac4da5a Bump golang from 1.24.3 to 1.24.4 in /deployments/devel
Bumps golang from 1.24.3 to 1.24.4.

---
updated-dependencies:
- dependency-name: golang
  dependency-version: 1.24.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-06 09:01:57 +00:00
dependabot[bot]
1aee45be2d Bump golang.org/x/crypto from 0.38.0 to 0.39.0 in /tests
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.38.0 to 0.39.0.
- [Commits](https://github.com/golang/crypto/compare/v0.38.0...v0.39.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-version: 0.39.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-06 08:43:50 +00:00
dependabot[bot]
0d4a7f1d5a Bump golang.org/x/mod from 0.24.0 to 0.25.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.24.0 to 0.25.0.
- [Commits](https://github.com/golang/mod/compare/v0.24.0...v0.25.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-version: 0.25.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-06 08:43:41 +00:00
Evan Lezar
27f5ec83de Merge pull request #1125 from elezar/vulkan-target-cpu
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Add discovery of arch-specific vulkan ICD
2025-06-05 14:09:50 +02:00
Carlos Eduardo Arango Gutierrez
0a3146f74f Merge pull request #1076 from ArangoGutierrez/refresh_cdi
Add nvidia-cdi-refresh service
2025-06-05 13:08:57 +02:00
Carlos Eduardo Arango Gutierrez
28add0a532 Merge pull request #1123 from ArangoGutierrez/cdi_generate_env_cli
Add EnvVars option for all nvidia-ctk cdi commands
2025-06-05 13:05:46 +02:00
Evan Lezar
b55255e31f Merge pull request #1110 from ArangoGutierrez/i/1049
Refactor extracting requested devices from the container image
2025-06-05 13:00:32 +02:00
Carlos Eduardo Arango Gutierrez
dede03f322 Refactor extracting requested devices from the container image
This change consolidates the logic for determining requested devices
from the container image. The logic for this has been integrated into
the image.CUDA type so that multiple implementations are not required.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Co-authored-by: Evan Lezar <elezar@nvidia.com>
2025-06-05 12:38:45 +02:00
Carlos Eduardo Arango Gutierrez
ce3e2c1ed5 Add EnvVars option for all nvidia-ctk cdi commands
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-06-05 11:22:35 +02:00
Carlos Eduardo Arango Gutierrez
a537d0323d Add nvidia-cdi-refresh service
Automatic regeneration of /var/run/cdi/nvidia.yaml
New units:
	•	nvidia-cdi-refresh.service – one-shot wrapper for
			nvidia-ctk cdi generate (adds sleep + required caps).
	•	nvidia-cdi-refresh.path   – fires on driver install/upgrade via
			modules.dep.bin changes.
Packaging
	•	RPM %post reloads systemd and enables the path unit on fresh
			installs.
	•	DEB postinst does the same (configure, skip on upgrade).

Result: CDI spec is always up to date

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-06-05 10:54:15 +02:00
Evan Lezar
2de997e25b Add discovery of arch-specific vulkan ICD
On some RPM-based platforms, the path of the Vulkan ICD file
include an architecture-specific infix to distinguish it from
other architectures. (Most notably x86_64 vs i686). This change
attempts to discover the arch-specific ICD file in addition to
the standard nvidia_icd.json.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-04 23:06:16 +02:00
Evan Lezar
e046d6ae79 Add disabled-device-node-modification hook to CDI spec
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
This hook is not added to management specs.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-04 19:08:48 +02:00
Evan Lezar
0c8723a93a Add a hook to disable device node creation in a container
If required, this hook creates a modified params file (with ModifyDeviceFiles: 0) in a tmpfs
and mounts this over /proc/driver/nvidia/params.

This prevents device node creation when running tools such as nvidia-smi. In general the
creation of these devices is cosmetic as a container does not have the required cgroup
access for the devices.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-04 19:08:48 +02:00
Evan Lezar
fdcd250362 Merge pull request #1129 from elezar/fix-deduplicate-driver-store-wsl
Minor cleanup of WSL2 CDI spec generation
2025-06-04 10:16:36 +02:00
Evan Lezar
b66d37bedb [no-relnote] Minor code cleanup in WSL2 discoverer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-03 23:43:17 +02:00
Evan Lezar
0c905d0de2 [no-relnote] Remove unneeded indirection
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-03 23:43:17 +02:00
Evan Lezar
0d0b56816e Remove redundant deduplication of search paths for WSL
The GetDriverStorePaths function is implemented so as to
remove duplicate driver store paths which means that the
additional deduplication (which had a bug) can be removed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-03 23:43:10 +02:00
Evan Lezar
d59fd3da11 Merge pull request #1077 from ArangoGutierrez/1074
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Add the ability to disable specific (or all) CDI hooks when generating a CDI specification
2025-06-03 16:03:18 +02:00
Carlos Eduardo Arango Gutierrez
6cf0248321 Added ability to disable specific (or all) CDI hooks
This change adds the ability to disabled specific (or all) CDI hooks to
both the nvidia-ctk cdi generate command and the nvcdi API.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-06-03 16:01:20 +02:00
Carlos Eduardo Arango Gutierrez
b4787511d2 Consolidate HookName functionality on internal/discover pkg
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-06-03 15:24:43 +02:00
Evan Lezar
890db82b46 Merge pull request #1127 from ArangoGutierrez/e2e/internal_runner
[no-relnote] E2E GitHub action to run with internal runner
2025-06-03 15:18:42 +02:00
Carlos Eduardo Arango Gutierrez
5915328be5 [no-relnote] E2E GitHub action to run with internal runner
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-06-03 12:47:35 +02:00
Evan Lezar
bb3a54f7f4 Merge pull request #1126 from elezar/bump-holodeck-0.2.12
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Bump NVIDIA/holodeck from 0.2.7 to 0.2.12
2025-06-03 10:55:50 +02:00
dependabot[bot]
a909914cd6 Bump NVIDIA/holodeck from 0.2.7 to 0.2.12
Bumps [NVIDIA/holodeck](https://github.com/nvidia/holodeck) from 0.2.7 to 0.2.12.
- [Release notes](https://github.com/nvidia/holodeck/releases)
- [Commits](https://github.com/nvidia/holodeck/compare/v0.2.7...v0.2.12)

---
updated-dependencies:
- dependency-name: NVIDIA/holodeck
  dependency-version: 0.2.12
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-03 10:11:47 +02:00
Evan Lezar
f973271da1 Merge pull request #1119 from elezar/use-public-runners
[no-relnote] Switch to public runners
2025-06-02 21:53:02 +02:00
Evan Lezar
535e023828 Merge pull request #1120 from elezar/remove-release-archive
[no-relnote] Remove release:archive CI step
2025-06-02 13:44:33 +02:00
Evan Lezar
f2cf3e8deb [no-relnote] Remove release:archive CI step
Remove unneeded release:archive internal CI step.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-30 17:10:03 +02:00
Evan Lezar
03e8b9e0f5 [no-relnote] Switch to public runners
Since we now use pr-copy-bot, we are not running image builds
from forks and as such the required secrets for pushing to the
ghcr are present.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-30 16:40:29 +02:00
Evan Lezar
450f73a046 Merge commit from fork
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Add `NVIDIA_CTK_DEBUG=false` to hook envs
2025-05-30 15:31:26 +02:00
Evan Lezar
479df7134a Add envvar to control debug logging in CDI hooks
This change allows hooks to be configured with debug logging. This
is currently only enabled for the hooks generated from the runtime.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-30 15:27:52 +02:00
Evan Lezar
19a83e3542 Merge pull request #1112 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-6eda4d7
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump third_party/libnvidia-container from `caf057b` to `6eda4d7`
2025-05-28 10:33:44 +02:00
dependabot[bot]
d2344cba34 Bump third_party/libnvidia-container from caf057b to 6eda4d7
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `caf057b` to `6eda4d7`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](caf057b009...6eda4d76c8)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-version: 6eda4d76c8c5f8fc174e4abca83e513fb4dd63b0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-28 08:16:30 +00:00
Evan Lezar
c8c22162b7 Merge pull request #958 from ArangoGutierrez/codecov
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
[no-relnote] Enable Coveralls for code coverage
2025-05-27 15:48:54 +02:00
Evan Lezar
ea9b8721c0 Merge pull request #1108 from NVIDIA/dependabot/github_actions/main/NVIDIA/holodeck-0.2.9
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Bump NVIDIA/holodeck from 0.2.7 to 0.2.9
2025-05-27 10:21:15 +02:00
dependabot[bot]
eaaa8536e4 Bump NVIDIA/holodeck from 0.2.7 to 0.2.9
Bumps [NVIDIA/holodeck](https://github.com/nvidia/holodeck) from 0.2.7 to 0.2.9.
- [Release notes](https://github.com/nvidia/holodeck/releases)
- [Commits](https://github.com/nvidia/holodeck/compare/v0.2.7...v0.2.9)

---
updated-dependencies:
- dependency-name: NVIDIA/holodeck
  dependency-version: 0.2.9
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-27 08:19:31 +00:00
Carlos Eduardo Arango Gutierrez
e955f65d8f [no-relnote] Enable Coveralls
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-05-23 13:53:45 +02:00
Evan Lezar
b934c68bef Merge pull request #1103 from elezar/reenable-nvsandboxutils
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Reenable nvsandboxutils for driver discovery
2025-05-23 11:38:46 +02:00
Evan Lezar
7bd65da91e Add FeatureFlags to the nvcdi API
This change adds support for feature flags to the nvcdi API.

A feature flag to disable nvsandboxutils is also added to allow
more flexibility in cases where this library causes issue.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-22 17:26:41 +02:00
Evan Lezar
872aa2fe1c Reenable nvsandboxutils for driver discovery
This change reenables nvsandboxutils for driver discovery. This
was disabled due to an error in a specific driver version (v565)
so as to not block the release of the DRA driver for ComputeDomains.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-22 17:26:40 +02:00
Carlos Eduardo Arango Gutierrez
be6a36c023 Merge pull request #1102 from ArangoGutierrez/i/1094
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Edit discover.mounts to have a deterministic output
2025-05-22 15:53:22 +02:00
Carlos Eduardo Arango Gutierrez
aaaa3c6275 Edit discover.mounts to have a deterministic output
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-22 15:25:03 +02:00
Evan Lezar
f93d96a0de Merge pull request #1090 from ArangoGutierrez/hookcreator
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Refactor the way we handle Hook Creation
2025-05-21 14:23:51 +02:00
Carlos Eduardo Arango Gutierrez
2a4cf4c0a0 [no-relnote] Update gitignore
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-05-21 10:19:52 +02:00
Carlos Eduardo Arango Gutierrez
cf3b9317ef Refactor the way we create CDI Hooks
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-05-21 10:19:47 +02:00
Evan Lezar
6ba25e7288 Merge pull request #1095 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-caf057b
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Bump third_party/libnvidia-container from `51a7f20` to `caf057b`
2025-05-20 20:31:49 +02:00
dependabot[bot]
296633d148 Bump third_party/libnvidia-container from 51a7f20 to caf057b
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `51a7f20` to `caf057b`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](51a7f20088...caf057b009)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-version: caf057b00987a6d874d519cc80b742c43faa859a
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-20 13:20:23 +00:00
Evan Lezar
ac8f190c99 Merge commit from fork
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Run update-ldcache in isolated namespaces
2025-05-16 15:15:21 +02:00
Evan Lezar
3c1f1a6519 Merge pull request #1086 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-51a7f20
Bump third_party/libnvidia-container from `d26524a` to `51a7f20`
2025-05-16 14:18:17 +02:00
dependabot[bot]
3ee5ff0aa2 Bump third_party/libnvidia-container from d26524a to 51a7f20
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `d26524a` to `51a7f20`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](d26524ab5d...51a7f20088)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-version: 51a7f20088dc0c3e7ddbb67629bf8e63b9130339
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-16 08:27:41 +00:00
Evan Lezar
6dfd63f4a8 Merge pull request #980 from elezar/add-rprivate-to-mount-options
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Add rprivate to CDI mount options
2025-05-16 07:53:39 +02:00
Evan Lezar
35e583b623 Merge pull request #1000 from elezar/ignore-unknown-hooks
Issue warning on unsupported CDI hook
2025-05-16 07:52:25 +02:00
Evan Lezar
7d71932d2a Merge pull request #1085 from elezar/add-security-md
[no-relnote] Add SECURITY.md to repo
2025-05-16 07:51:27 +02:00
Evan Lezar
d3ea72c440 [no-relnote] Add SECURITY.md to repo
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-15 16:38:43 +02:00
Evan Lezar
c0dda358a3 Issue warning on unsupported CDI hook
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
To allow for CDI hooks to be added gradually we provide a generic no-op hook
for unrecognised subcommands. This will log a warning instead of erroring out.

An unsupported hook could be the result of a CDI specification referring to a
new hook that is not yet supported by an older NVIDIA Container Toolkit
version or a hook that has been removed in newer version.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-15 14:05:19 +02:00
Evan Lezar
ec29b602c3 Run update-ldcache in isolated namespaces
This change uses the reexec package to run the update of the
ldcache in a container in a process with isolated namespaces.
Since the hook is invoked as a createContainer hook, these
namespaces are cloned from the container's namespaces.

In the reexec handler, we further isolate the proc filesystem,
mount the host ldconfig to a tmpfs, and pivot into the containers
root.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-15 12:45:49 +02:00
Carlos Eduardo Arango Gutierrez
241881f12f Merge pull request #1048 from ArangoGutierrez/updated_e2e
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
[no-relnote] Update E2E test suite
2025-05-14 12:27:01 +02:00
Carlos Eduardo Arango Gutierrez
eb40f240ac [no-relnote] Update E2E suite
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-14 11:22:14 +02:00
Evan Lezar
72b2ee9ce0 Merge pull request #1055 from elezar/add-cuda-compat-mode
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Add nvidia-container-cli.compat-mode config option
2025-05-13 21:56:18 +02:00
Evan Lezar
f4981f0876 Add cuda-compat-mode config option
This change adds an nvidia-container-runtime.modes.legacy.cuda-compat-mode
config option. This can be set to one of four values:

* ldconfig (default): the --cuda-compat-mode=ldconfig flag is passed to the nvidia-container-cli
* mount: the --cuda-compat-mode=mount flag is passed to the nvidia-conainer-cli
* disabled: the --cuda-compat-mode=disabled flag is passed to the nvidia-container-cli
* hook: the --cuda-compat-mode=disabled flag is passed to the nvidia-container-cli AND the
  enable-cuda-compat hook is used to provide forward compatibility.

Note that the disable-cuda-compat-lib-hook feature flag will prevent the enable-cuda-compat
hook from being used. This change also means that the allow-cuda-compat-libs-from-container
feature flag no longer has any effect.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-13 21:49:53 +02:00
Evan Lezar
2ec67033c0 Merge pull request #1081 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-d26524a
Bump third_party/libnvidia-container from `a198166` to `d26524a`
2025-05-13 21:49:21 +02:00
dependabot[bot]
f8eda79aaf Bump third_party/libnvidia-container from a198166 to d26524a
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `a198166` to `d26524a`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](a198166e1c...d26524ab5d)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-version: d26524ab5db96a55ae86033f53de50d3794fb547
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-13 19:48:20 +00:00
Evan Lezar
51504097d8 Merge pull request #1078 from elezar/add-thor-support
Fix mode detection on Thor-based systems
2025-05-13 21:33:25 +02:00
Evan Lezar
a4dc28bb3f Fix mode detection on Thor-based systems
This change updates github.com/NVIDIA/go-nvlib from v0.7.1 to v0.7.2
to allow Thor systems to be detected as Tegra-based. This allows fixes
automatic mode detection to work on these systems.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-13 21:25:11 +02:00
Evan Lezar
d0103aa6a3 Add rprivate to CDI mount options
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
This ensures that mount propagation is set to rprivate for
mounts from the host into the container. This aligns with the
default in docker.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-09 15:16:13 +02:00
Evan Lezar
adb5e6719d Merge pull request #1046 from elezar/resolve-ldcache-libs-on-arm64
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Fix resolution of libs in LDCache on ARM
2025-05-09 15:04:00 +02:00
Evan Lezar
0c254711e7 Merge pull request #1066 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.9.0-base-ubuntu20.04
Bump nvidia/cuda from 12.8.1-base-ubuntu20.04 to 12.9.0-base-ubuntu20.04 in /deployments/container
2025-05-09 13:51:10 +02:00
Evan Lezar
27adebaa44 Merge pull request #1065 from elezar/skip-nill-discoverers
Skip nil discoverers in merge
2025-05-09 13:50:44 +02:00
dependabot[bot]
496cdb5463 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.8.1-base-ubuntu20.04 to 12.9.0-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-version: 12.9.0-base-ubuntu20.04
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-08 08:38:12 +00:00
Evan Lezar
132c9afb6c Merge pull request #1063 from NVIDIA/dependabot/github_actions/main/slackapi/slack-github-action-2.1.0
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump slackapi/slack-github-action from 2.0.0 to 2.1.0
2025-05-07 17:19:08 +02:00
Evan Lezar
c879fb59c1 Merge pull request #1058 from NVIDIA/dependabot/github_actions/main/golangci/golangci-lint-action-8
Bump golangci/golangci-lint-action from 7 to 8
2025-05-07 16:54:45 +02:00
Evan Lezar
fbff2c4943 Merge pull request #1064 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.24.3
Bump golang from 1.24.2 to 1.24.3 in /deployments/devel
2025-05-07 16:54:18 +02:00
Evan Lezar
0c765c6536 Skip nil discoverers in merge
When constructing a list of discoverers using discover.Merge we
explicitly skip `nil` discoverers to simplify usage as we don't
have to explicitly check validity when processing the discoverers
in the list.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-05-07 12:51:38 +02:00
dependabot[bot]
0863749de3 Bump golang from 1.24.2 to 1.24.3 in /deployments/devel
Bumps golang from 1.24.2 to 1.24.3.

---
updated-dependencies:
- dependency-name: golang
  dependency-version: 1.24.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-07 08:53:04 +00:00
dependabot[bot]
a8ca8e91f2 Bump slackapi/slack-github-action from 2.0.0 to 2.1.0
Bumps [slackapi/slack-github-action](https://github.com/slackapi/slack-github-action) from 2.0.0 to 2.1.0.
- [Release notes](https://github.com/slackapi/slack-github-action/releases)
- [Commits](https://github.com/slackapi/slack-github-action/compare/v2.0.0...v2.1.0)

---
updated-dependencies:
- dependency-name: slackapi/slack-github-action
  dependency-version: 2.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-07 08:29:10 +00:00
Evan Lezar
cf395e765a Merge pull request #1061 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.33.0
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump golang.org/x/sys from 0.32.0 to 0.33.0
2025-05-06 14:32:35 +02:00
dependabot[bot]
f859c9a671 Bump golang.org/x/sys from 0.32.0 to 0.33.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.32.0 to 0.33.0.
- [Commits](https://github.com/golang/sys/compare/v0.32.0...v0.33.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-version: 0.33.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-06 09:31:22 +00:00
Carlos Eduardo Arango Gutierrez
f50e815837 Merge pull request #1062 from NVIDIA/dependabot/go_modules/tests/main/golang.org/x/crypto-0.38.0
Bump golang.org/x/crypto from 0.37.0 to 0.38.0 in /tests
2025-05-06 11:30:11 +02:00
dependabot[bot]
ffcef4f9a8 Bump golang.org/x/crypto from 0.37.0 to 0.38.0 in /tests
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.37.0 to 0.38.0.
- [Commits](https://github.com/golang/crypto/compare/v0.37.0...v0.38.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-version: 0.38.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-06 08:42:33 +00:00
dependabot[bot]
194a1663ab Bump golangci/golangci-lint-action from 7 to 8
Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 7 to 8.
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](https://github.com/golangci/golangci-lint-action/compare/v7...v8)

---
updated-dependencies:
- dependency-name: golangci/golangci-lint-action
  dependency-version: '8'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-05 09:11:37 +00:00
Evan Lezar
51d603aec6 Merge pull request #1024 from NVIDIA/dependabot/go_modules/tests/main/golang.org/x/crypto-0.37.0
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump golang.org/x/crypto from 0.36.0 to 0.37.0 in /tests
2025-05-02 13:14:34 +02:00
Evan Lezar
3f9359eba2 Merge pull request #1056 from tariq1890/bump-runc-dep
Bump github.com/opencontainers/runc from v1.2.6 to v1.3.0
2025-05-02 13:14:13 +02:00
dependabot[bot]
574d204953 Bump golang.org/x/crypto from 0.36.0 to 0.37.0 in /tests
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.36.0 to 0.37.0.
- [Commits](https://github.com/golang/crypto/compare/v0.36.0...v0.37.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-version: 0.37.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-30 13:52:05 +00:00
Evan Lezar
ca061bb4f0 Merge pull request #1026 from NVIDIA/dependabot/go_modules/tests/main/github.com/onsi/ginkgo/v2-2.23.4
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump github.com/onsi/ginkgo/v2 from 2.23.3 to 2.23.4 in /tests
2025-04-30 15:50:42 +02:00
Tariq Ibrahim
f7a415f480 bump runc go dep to v1.3.0
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2025-04-29 19:01:38 -07:00
Evan Lezar
e6cd7a3b53 Fix resolution of libs in LDCache on ARM
Since we explicitly check for the architecture of the
libraries in the ldcache, we need to also check the architecture
flag against the ARM constants.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-23 14:28:28 +02:00
dependabot[bot]
9f6b45817b Bump github.com/onsi/ginkgo/v2 from 2.23.3 to 2.23.4 in /tests
Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.23.3 to 2.23.4.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/ginkgo/compare/v2.23.3...v2.23.4)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.23.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-22 08:27:11 +00:00
Evan Lezar
de3d736663 Merge pull request #1035 from NVIDIA/dependabot/go_modules/tests/golang.org/x/net-0.38.0
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump golang.org/x/net from 0.37.0 to 0.38.0 in /tests
2025-04-22 10:25:58 +02:00
Evan Lezar
e4e7c5d857 Merge pull request #1039 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-a198166
Bump third_party/libnvidia-container from `95d3e86` to `a198166`
2025-04-22 09:34:29 +02:00
dependabot[bot]
0620dfa6f9 Bump third_party/libnvidia-container from 95d3e86 to a198166
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `95d3e86` to `a198166`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](95d3e86522...a198166e1c)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-version: a198166e1c1166f4847598438115ea97dacc7a92
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-18 08:16:29 +00:00
Evan Lezar
6394e9e9e7 Merge pull request #1033 from JunAr7112/migrate_ngc_changes
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Updated .release:staging to stage images in nvstaging
2025-04-17 15:13:43 +02:00
Evan Lezar
a2e2a44516 Merge pull request #990 from elezar/refactor-toolkit-installer
Refactor toolkit installer
2025-04-17 15:08:39 +02:00
Arjun
6605bfb5fa Updated .release:staging to stage images in nvstaging
Signed-off-by: Arjun <arjun.gadiyar@gmail.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-17 14:02:33 +02:00
Evan Lezar
14806f019b Refactor toolkit installer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-17 13:56:58 +02:00
Evan Lezar
2437630421 [no-relnote] Explicitly use blank config
Since this is running in a contianer the contents of the
/etc/nvidia-container-runtime/config.toml file is equivalent to the
default config. This change makes it explicit.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-17 13:56:58 +02:00
Evan Lezar
cdad158f0f [no-relnote] Move toolkit installer package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-17 13:56:56 +02:00
Evan Lezar
baa4f907ab [no-relnote] Allow relative paths from operator structs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-17 13:56:14 +02:00
Evan Lezar
afc05f6713 Merge pull request #1004 from dschervov/main
Add support for building ubuntu22.04 on amd64
2025-04-17 13:54:08 +02:00
dependabot[bot]
abea8d375c Bump golang.org/x/net from 0.37.0 to 0.38.0 in /tests
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.37.0 to 0.38.0.
- [Commits](https://github.com/golang/net/compare/v0.37.0...v0.38.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-version: 0.38.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-16 23:29:16 +00:00
Evan Lezar
af985c22ea Merge pull request #1025 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.32.0
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump golang.org/x/sys from 0.31.0 to 0.32.0
2025-04-09 15:47:53 +02:00
Evan Lezar
b4edc3e730 Merge pull request #1016 from elezar/allow-runtime-path
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Allow container runtime executable path to be specified
2025-04-08 17:35:19 +02:00
Evan Lezar
c57bdf3391 Allow container runtime executable path to be specified
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
This change adds support for specifying the container runtime
executable path. This can be used if, for example, there are
two containerd or crio executables and a specific one must be used.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-07 17:28:11 +02:00
dependabot[bot]
4c241f22ef Bump golang.org/x/sys from 0.31.0 to 0.32.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.31.0 to 0.32.0.
- [Commits](https://github.com/golang/sys/compare/v0.31.0...v0.32.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-version: 0.32.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-07 09:35:18 +00:00
Dmitrii Chervov
cd3fdd9f0f Add support for building ubuntu22.04 on arm64
Signed-off-by: Dmitrii Chervov <dschervov1@yandex.ru>
2025-04-06 18:06:36 +03:00
Evan Lezar
cb7605e132 [no-relnote] Remove unused runtimeConfigOverideJSON variable
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-04 10:57:12 +02:00
Evan Lezar
a2caaec99d Merge pull request #1014 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.24.2
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump golang from 1.24.1 to 1.24.2 in /deployments/devel
2025-04-03 15:52:17 +02:00
Carlos Eduardo Arango Gutierrez
7dc93d6c9a Merge pull request #1017 from NVIDIA/dependabot/go_modules/tests/main/github.com/onsi/gomega-1.37.0
Bump github.com/onsi/gomega from 1.36.2 to 1.37.0 in /tests
2025-04-03 14:25:08 +02:00
dependabot[bot]
aacfaed40b Bump github.com/onsi/gomega from 1.36.2 to 1.37.0 in /tests
Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.36.2 to 1.37.0.
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/gomega/compare/v1.36.2...v1.37.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.37.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-03 08:30:59 +00:00
Evan Lezar
7833723be1 Merge pull request #737 from elezar/use-with-cache-for-mounts
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Fix race condition in mounts cache
2025-04-02 16:30:05 +02:00
Evan Lezar
986f3db971 Fix race condition in mounts cache
This change switches to using the WithCache decorator for
mounts instead of keeping track of a cache locally.

This addresses a race condition when using the mounts structure.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-02 16:29:49 +02:00
Evan Lezar
98deb7e4bc Merge pull request #1015 from NVIDIA/dependabot/github_actions/main/NVIDIA/holodeck-0.2.7
Bump NVIDIA/holodeck from 0.2.6 to 0.2.7
2025-04-02 16:28:42 +02:00
dependabot[bot]
3fbb6a8dc6 Bump NVIDIA/holodeck from 0.2.6 to 0.2.7
Bumps [NVIDIA/holodeck](https://github.com/nvidia/holodeck) from 0.2.6 to 0.2.7.
- [Release notes](https://github.com/nvidia/holodeck/releases)
- [Commits](https://github.com/nvidia/holodeck/compare/v0.2.6...v0.2.7)

---
updated-dependencies:
- dependency-name: NVIDIA/holodeck
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-02 12:25:55 +00:00
Evan Lezar
50a7e2fd15 Merge pull request #1009 from NVIDIA/dependabot/github_actions/main/golangci/golangci-lint-action-7
Bump golangci/golangci-lint-action from 6 to 7
2025-04-02 14:24:43 +02:00
dependabot[bot]
8ab6ab984e Bump golang from 1.24.1 to 1.24.2 in /deployments/devel
Bumps golang from 1.24.1 to 1.24.2.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-02 12:18:33 +00:00
Evan Lezar
0fb3eec1bb [no-relnote] Fix ST1005: error strings should not be capitalized lint errors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-02 14:18:32 +02:00
Evan Lezar
3913a6392b [no-relnote] Fix QF1012: Use fmt.Fprintf(...) instead of WriteString(fmt.Sprintf(...)) lint errors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-02 14:18:32 +02:00
Evan Lezar
80fb4dc0e9 [no-relnote] Fix ST1019: package tags.cncf.io/container-device-interface/specs-go is being imported more than once lint errors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-02 14:18:32 +02:00
Evan Lezar
be9d7b6db1 [no-relnote] Fix QF1002: could use tagged switch on info.SubType lint errors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-02 14:18:32 +02:00
Evan Lezar
c1c6534b1f [no-relnote] Fix QF1008: could remove embedded field lint errors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-02 14:18:32 +02:00
Evan Lezar
bee1969bbf [no-relnote] Migrate golangci-lint config to v2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-02 14:18:32 +02:00
dependabot[bot]
47fc33ea9b Bump golangci/golangci-lint-action from 6 to 7
Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 6 to 7.
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](https://github.com/golangci/golangci-lint-action/compare/v6...v7)

---
updated-dependencies:
- dependency-name: golangci/golangci-lint-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-02 14:18:32 +02:00
Evan Lezar
c1cf52d1f7 Merge pull request #1008 from NVIDIA/dependabot/go_modules/main/tags.cncf.io/container-device-interface-1.0.1
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Bump tags.cncf.io/container-device-interface from 1.0.0 to 1.0.1
2025-04-02 12:11:02 +02:00
Evan Lezar
1c845ea2d5 [no-relnote] Update CDI tests for new whitespace formatting
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-04-02 11:38:46 +02:00
dependabot[bot]
5549824559 Bump tags.cncf.io/container-device-interface from 1.0.0 to 1.0.1
Bumps [tags.cncf.io/container-device-interface](https://github.com/cncf-tags/container-device-interface) from 1.0.0 to 1.0.1.
- [Release notes](https://github.com/cncf-tags/container-device-interface/releases)
- [Changelog](https://github.com/cncf-tags/container-device-interface/blob/main/RELEASE.md)
- [Commits](https://github.com/cncf-tags/container-device-interface/compare/v1.0.0...v1.0.1)

---
updated-dependencies:
- dependency-name: tags.cncf.io/container-device-interface
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-24 09:08:27 +00:00
Dmitrii Chervov
35b471d833 Add support for building ubuntu22.04 on amd64
Signed-off-by: Dmitrii Chervov <dschervov1@yandex.ru>
2025-03-21 19:34:07 +03:00
Evan Lezar
e33f15a128 Merge pull request #1002 from elezar/improve-update-ldcache-args
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Fix update-ldcache arguments
2025-03-20 13:19:25 +02:00
Evan Lezar
77bf9157ab Merge pull request #989 from NVIDIA/dependabot/go_modules/tests/golang.org/x/net-0.36.0
Bump golang.org/x/net from 0.35.0 to 0.36.0 in /tests
2025-03-20 13:01:20 +02:00
Evan Lezar
995e56306d Fix update-ldcache arguments
This change updates how the lconfig arguments are constructed. This
makes the update-ldcache more robust and ensures that folders are
specified last if at al.

Checks are also included for empty container roots at the start of the
hook to simplify later checks.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-20 12:51:31 +02:00
dependabot[bot]
3f58c67fed Bump golang.org/x/net from 0.35.0 to 0.36.0 in /tests
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.35.0 to 0.36.0.
- [Commits](https://github.com/golang/net/compare/v0.35.0...v0.36.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-18 13:26:57 +02:00
Evan Lezar
e7a0067aae Merge pull request #998 from NVIDIA/dependabot/go_modules/main/github.com/opencontainers/runc-1.2.6
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump github.com/opencontainers/runc from 1.2.5 to 1.2.6
2025-03-18 13:26:18 +02:00
dependabot[bot]
002197d5b1 Bump github.com/opencontainers/runc from 1.2.5 to 1.2.6
Bumps [github.com/opencontainers/runc](https://github.com/opencontainers/runc) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/opencontainers/runc/releases)
- [Changelog](https://github.com/opencontainers/runc/blob/v1.2.6/CHANGELOG.md)
- [Commits](https://github.com/opencontainers/runc/compare/v1.2.5...v1.2.6)

---
updated-dependencies:
- dependency-name: github.com/opencontainers/runc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-18 08:39:18 +00:00
Evan Lezar
a72050442e Merge pull request #992 from elezar/cleanup-nvidia-ctk-installer
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Cleanup nvidia ctk installer
2025-03-17 12:41:04 +02:00
Evan Lezar
75a30af36a Remove positional arguments from nvidia-ctk-installer
Parsing positional arguments require additional processing
instead of relying on named flags. This change switches to
using a named flag for specifying the toolkit installation directory.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-14 14:57:24 +02:00
Evan Lezar
89b99ff786 Remove deprecated --runtime-args from nvidia-ctk-installer
The GPU Operator no longer sets the RUNTIME_ARGS environment variable
to control the runtime and instead sets general environment variables.

This change removes support for setting RUNTIME_ARGS in the CLI as these
settings have no effect.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-14 14:57:24 +02:00
Evan Lezar
025407543f Merge pull request #993 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.8.1-base-ubuntu20.04
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump nvidia/cuda from 12.8.0-base-ubuntu20.04 to 12.8.1-base-ubuntu20.04 in /deployments/container
2025-03-14 11:24:11 +02:00
dependabot[bot]
41fd83ab2b Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.8.0-base-ubuntu20.04 to 12.8.1-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-14 08:20:16 +00:00
Evan Lezar
dd55eeecc9 Merge pull request #991 from elezar/rename-app-name
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Minor updates to nvidia-ctk-installer
2025-03-13 16:17:54 +02:00
Evan Lezar
62497870fa Add version info to nvidia-ctk-installer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-13 16:15:29 +02:00
Evan Lezar
eb932bef8a Update nvidia-ctk-installer app name to match binary name
This change updates the nvidia-ctk-installer app name to match the binary name.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-13 16:12:13 +02:00
Evan Lezar
e4547bdda6 Merge pull request #969 from elezar/allow-comma-separated-config-values
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Allow nvidia-ctk config --set to accept comma-separated lists
2025-03-12 16:21:52 +02:00
Evan Lezar
d32449b2d2 Allow nvidia-ctk config --set to accept comma-separated lists
The urfave update to v2.27.6 fixes the behaviour when disabling a separator
for repeated StringSliceFlags. This change updates the nvidia-ctk config
command to allow list options to be specified as comma-separated values.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-12 15:47:32 +02:00
Evan Lezar
63ed478ce9 Merge pull request #959 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.24.1
Bump golang from 1.24.0 to 1.24.1 in /deployments/devel
2025-03-12 14:05:19 +02:00
Evan Lezar
047225a9ae Merge pull request #987 from elezar/fix-mock-generation
[no-relnotes] Update moq to use rm and goimports
2025-03-12 13:44:02 +02:00
Evan Lezar
2a9bae8e80 [no-relnotes] Update moq to use rm and goimports
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-12 13:12:13 +02:00
Evan Lezar
57f077fce7 Merge pull request #985 from elezar/fix-signing-container
[no-relnote] Use centos:stream9 for signing container
2025-03-12 12:45:57 +02:00
Evan Lezar
26101ea023 [no-relnote] Use centos:stream9 for signing container
The signing container need not be based on a legacy centos version.
This change updates the signing contianer to be centos:stream9 based.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-12 12:44:35 +02:00
Evan Lezar
64dd017801 Merge pull request #984 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.64.7
Bump github.com/golangci/golangci-lint from 1.64.6 to 1.64.7 in /deployments/devel
2025-03-12 11:49:00 +02:00
dependabot[bot]
37ba7a2801 Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.64.6 to 1.64.7.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.64.6...v1.64.7)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-12 08:20:59 +00:00
Evan Lezar
3df59b955a Merge pull request #978 from NVIDIA/dependabot/go_modules/main/tags.cncf.io/container-device-interface-1.0.0
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump tags.cncf.io/container-device-interface from 0.8.1 to 1.0.0
2025-03-10 10:45:38 +02:00
Evan Lezar
33280cd2b2 [no-relnote] Address stricter validation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-10 10:41:40 +02:00
Evan Lezar
3306d5081e [no-relnote] Output sorted specs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-10 10:41:40 +02:00
Evan Lezar
7c3ab75d08 [no-relnote] Update cdi.CurrentVersion reference
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-10 10:41:40 +02:00
dependabot[bot]
71985df972 Bump tags.cncf.io/container-device-interface from 0.8.1 to 1.0.0
Bumps [tags.cncf.io/container-device-interface](https://github.com/cncf-tags/container-device-interface) from 0.8.1 to 1.0.0.
- [Release notes](https://github.com/cncf-tags/container-device-interface/releases)
- [Changelog](https://github.com/cncf-tags/container-device-interface/blob/main/RELEASE.md)
- [Commits](https://github.com/cncf-tags/container-device-interface/compare/v0.8.1...v1.0.0)

---
updated-dependencies:
- dependency-name: tags.cncf.io/container-device-interface
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-10 10:41:40 +02:00
Evan Lezar
4255d73d89 Merge pull request #981 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-95d3e86
Bump third_party/libnvidia-container from `f23e5e5` to `95d3e86`
2025-03-10 10:40:48 +02:00
dependabot[bot]
9bdb74aec2 Bump third_party/libnvidia-container from f23e5e5 to 95d3e86
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `f23e5e5` to `95d3e86`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](f23e5e55ea...95d3e86522)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-10 08:34:52 +00:00
Evan Lezar
e436533a6f Merge pull request #968 from elezar/allow-hooks-disable
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Allow enable-cuda-compat hook to be disabled in CDI spec generation
2025-03-07 16:54:34 +02:00
Evan Lezar
0f299c3431 Disable enable-cuda-compat hook for management containers
Management containers don't generally need forward compatibility.
We disable the enable-cuda-compat hook to not include this in the
generated CDI specifications.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-07 16:54:09 +02:00
Evan Lezar
f852043078 Allow enable-cuda-compat hook to be disabled in CDI spec generation
This change adds support to the nvcdi package to opt out of specific hooks.

Currently only the `enable-cuda-compat` hook is supported. This allows clients to
generate a CDI spec that is compatible with older nvidia-cdi-hook CLIs.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-07 16:54:09 +02:00
Carlos Eduardo Arango Gutierrez
ef0b16bc24 Merge pull request #966 from NVIDIA/dependabot/go_modules/tests/main/golang.org/x/crypto-0.36.0
Bump golang.org/x/crypto from 0.35.0 to 0.36.0 in /tests
2025-03-07 15:40:44 +01:00
dependabot[bot]
225dfec83f Bump golang.org/x/crypto from 0.35.0 to 0.36.0 in /tests
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.35.0 to 0.36.0.
- [Commits](https://github.com/golang/crypto/compare/v0.35.0...v0.36.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-07 14:32:55 +00:00
Carlos Eduardo Arango Gutierrez
03c48a6824 Merge pull request #956 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.64.6
Bump github.com/golangci/golangci-lint from 1.64.5 to 1.64.6 in /deployments/devel
2025-03-07 15:16:06 +01:00
dependabot[bot]
6530826293 Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.64.5 to 1.64.6.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.64.5...v1.64.6)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-07 14:05:34 +00:00
Carlos Eduardo Arango Gutierrez
971fd195b3 Merge pull request #964 from NVIDIA/dependabot/go_modules/tests/main/github.com/onsi/ginkgo/v2-2.23.0
Bump github.com/onsi/ginkgo/v2 from 2.22.2 to 2.23.0 in /tests
2025-03-07 15:04:17 +01:00
dependabot[bot]
3b10afd0fe Bump github.com/onsi/ginkgo/v2 from 2.22.2 to 2.23.0 in /tests
Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.22.2 to 2.23.0.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/ginkgo/compare/v2.22.2...v2.23.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-07 13:44:49 +00:00
Evan Lezar
6b7ed26fba Merge pull request #963 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.6
Bump github.com/urfave/cli/v2 from 2.27.5 to 2.27.6
2025-03-07 15:43:32 +02:00
dependabot[bot]
8d5f1e2427 Bump github.com/urfave/cli/v2 from 2.27.5 to 2.27.6
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.27.5 to 2.27.6.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.27.5...v2.27.6)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-06 10:48:36 +00:00
Evan Lezar
d82a9ccd89 Merge pull request #961 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.31.0
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump golang.org/x/sys from 0.30.0 to 0.31.0
2025-03-06 12:47:14 +02:00
dependabot[bot]
8ac213e3e6 Bump golang.org/x/sys from 0.30.0 to 0.31.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.30.0 to 0.31.0.
- [Commits](https://github.com/golang/sys/compare/v0.30.0...v0.31.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-06 09:39:17 +00:00
Evan Lezar
0128762832 Merge pull request #962 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.24.0
Bump golang.org/x/mod from 0.23.0 to 0.24.0
2025-03-06 11:38:07 +02:00
dependabot[bot]
d7b150a2e6 Bump golang.org/x/mod from 0.23.0 to 0.24.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.23.0 to 0.24.0.
- [Commits](https://github.com/golang/mod/compare/v0.23.0...v0.24.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-06 11:34:42 +02:00
Evan Lezar
57c917e3b1 [no-relnote] Use --exit-code instead of --quiet for mod check
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-03-06 11:33:48 +02:00
dependabot[bot]
ed8faa2d2e Bump golang from 1.24.0 to 1.24.1 in /deployments/devel
Bumps golang from 1.24.0 to 1.24.1.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-05 08:17:42 +00:00
Evan Lezar
bc9ec77fdd Merge pull request #943 from elezar/add-disable-imex-channels-feature
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Add ignore-imex-channel-requests feature flag
2025-02-28 17:53:28 +02:00
Evan Lezar
82f2eb7b73 Merge pull request #949 from NVIDIA/dependabot/go_modules/main/github.com/opencontainers/runtime-spec-1.2.1
Bump github.com/opencontainers/runtime-spec from 1.2.0 to 1.2.1
2025-02-28 14:49:09 +02:00
dependabot[bot]
712d829018 Bump github.com/opencontainers/runtime-spec from 1.2.0 to 1.2.1
Bumps [github.com/opencontainers/runtime-spec](https://github.com/opencontainers/runtime-spec) from 1.2.0 to 1.2.1.
- [Release notes](https://github.com/opencontainers/runtime-spec/releases)
- [Changelog](https://github.com/opencontainers/runtime-spec/blob/main/ChangeLog)
- [Commits](https://github.com/opencontainers/runtime-spec/compare/v1.2.0...v1.2.1)

---
updated-dependencies:
- dependency-name: github.com/opencontainers/runtime-spec
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-28 12:01:30 +00:00
Evan Lezar
598b9740fc Merge pull request #941 from elezar/seal-ldconfig
Use memfd when running ldconfig
2025-02-28 14:00:23 +02:00
Evan Lezar
968e2ccca4 Merge pull request #906 from elezar/add-compat-lib-hook
Add CUDA forward compatibility hook
2025-02-27 17:25:19 +02:00
Evan Lezar
aff9301f2e Add disable-cuda-compat-lib-hook feature flag
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-27 15:58:15 +02:00
Evan Lezar
011fb72330 Add basic integration tests for forward compat
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-27 15:58:15 +02:00
Evan Lezar
2adef9903e Ensure that mode hook is executed last
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-27 15:58:15 +02:00
Evan Lezar
70b1f5af98 Add enable-cuda-compat hook to CDI spec generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-27 15:58:15 +02:00
Evan Lezar
c9422f12b3 [no-relnote] Add basic CDI generate test
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-27 15:58:15 +02:00
Evan Lezar
b7fbd56f7e Add ldconfig hook in legacy mode
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-27 15:58:15 +02:00
Evan Lezar
bd87c009ba Add enable-cuda-compat hook if required
This change adds the enable-cuda-compat hook to the incomming OCI runtime spec
if the allow-cuda-compat-libs-from-container feature flag is not enabled.

An update-ldcache hook is also injected to ensure that the required folders
are processed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-27 15:58:15 +02:00
Evan Lezar
fc65d3a784 Add enable-cuda-compat hook to allow compat libs to be discovered
This change adds an nvidia-cdi-hook enable-cuda-compat hook that checks the
container for cuda compat libs and updates /etc/ld.so.conf.d to include their
parent folder if their driver major version is sufficient.

This allows CUDA Forward Compatibility to be used when this is not available
through the libnvidia-container.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-27 15:58:15 +02:00
Evan Lezar
52b9631333 Use libcontainer execseal to run ldconfig
This change copies ldconfig into a memfd before executing it from
the createContainer hook.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-27 13:52:27 +02:00
Evan Lezar
9429fbac5f [no-relnote] Move root to separate file
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-27 13:48:29 +02:00
Evan Lezar
04e9bf4ac1 Merge pull request #937 from NVIDIA/dependabot/go_modules/main/tags.cncf.io/container-device-interface-0.8.1
Bump tags.cncf.io/container-device-interface from 0.8.0 to 0.8.1
2025-02-27 11:22:09 +02:00
Evan Lezar
3ceaf1f85c Merge pull request #938 from NVIDIA/dependabot/go_modules/tests/main/golang.org/x/crypto-0.35.0
Bump golang.org/x/crypto from 0.33.0 to 0.35.0 in /tests
2025-02-27 11:12:45 +02:00
Evan Lezar
9f0c1042c4 Merge pull request #935 from elezar/disable-nvsandboxutils
Disable nvsandboxutils in nvcdi API
2025-02-27 11:07:42 +02:00
Evan Lezar
352b55c8ce Add ignore-imex-channel-requests feature flag
This allows the NVIDIA Container Toolkit to ignore IMEX channel requests
through the NVIDIA_IMEX_CHANNELS envvar or volume mounts and ensures that
the NVIDIA Container Toolkit cannot be used to provide out-of-band access
to an IMEX channel by simply specifying an environment variable, possibly
bypassing other checks by an orchestration system such as kubernetes.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-26 17:46:36 +02:00
Evan Lezar
b13139793b Disable nvsandboxutils in nvcdi API
Repeated calls to nvsandboxutils.Init and Shutdown are causing
segmentation violations. Here we disabled nvsandbox utils unless explicitly
specified.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-26 14:45:22 +02:00
dependabot[bot]
05f44b7752 Bump golang.org/x/crypto from 0.33.0 to 0.35.0 in /tests
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.33.0 to 0.35.0.
- [Commits](https://github.com/golang/crypto/compare/v0.33.0...v0.35.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-25 09:12:04 +00:00
dependabot[bot]
a109f28cb6 Bump tags.cncf.io/container-device-interface from 0.8.0 to 0.8.1
Bumps [tags.cncf.io/container-device-interface](https://github.com/cncf-tags/container-device-interface) from 0.8.0 to 0.8.1.
- [Release notes](https://github.com/cncf-tags/container-device-interface/releases)
- [Commits](https://github.com/cncf-tags/container-device-interface/compare/v0.8.0...v0.8.1)

---
updated-dependencies:
- dependency-name: tags.cncf.io/container-device-interface
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-25 09:11:59 +00:00
Evan Lezar
65b575fa96 Merge pull request #933 from elezar/move-wrapper
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
[no-relnote] Move nvcdi wrapper to separate file
2025-02-21 22:43:45 +02:00
Evan Lezar
6e413d8445 [no-relnote] Move nvcdi wrapper to separate file
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-20 23:23:22 +02:00
Evan Lezar
5d9b27f107 Merge pull request #925 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.24.0
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump golang from 1.23.6 to 1.24.0 in /deployments/devel
2025-02-17 17:27:25 +01:00
Evan Lezar
767aee913d Merge pull request #922 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/matryer/moq-0.5.3
Bump github.com/matryer/moq from 0.5.2 to 0.5.3 in /deployments/devel
2025-02-17 11:27:53 +01:00
dependabot[bot]
25bac4c62a Bump golang from 1.23.6 to 1.24.0 in /deployments/devel
Bumps golang from 1.23.6 to 1.24.0.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-15 08:33:48 +00:00
dependabot[bot]
7e371aa024 Bump github.com/matryer/moq from 0.5.2 to 0.5.3 in /deployments/devel
Bumps [github.com/matryer/moq](https://github.com/matryer/moq) from 0.5.2 to 0.5.3.
- [Release notes](https://github.com/matryer/moq/releases)
- [Changelog](https://github.com/matryer/moq/blob/main/.goreleaser.yml)
- [Commits](https://github.com/matryer/moq/compare/v0.5.2...v0.5.3)

---
updated-dependencies:
- dependency-name: github.com/matryer/moq
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-15 09:29:32 +01:00
Evan Lezar
bc2b937574 Merge pull request #928 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.64.5
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump github.com/golangci/golangci-lint from 1.64.2 to 1.64.5 in /deployments/devel
2025-02-14 16:18:07 +01:00
dependabot[bot]
ae8561b509 Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.64.2 to 1.64.5.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.64.2...v1.64.5)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-14 08:25:58 +00:00
Evan Lezar
64633903c7 Merge pull request #926 from ArangoGutierrez/slack_botv2
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
[no-relnote] Update Slack notification action step
2025-02-13 17:33:05 +01:00
Carlos Eduardo Arango Gutierrez
1273f879cc [no-relnote] Update Slack notification action step
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-13 14:16:44 +01:00
Evan Lezar
8713a173fe Merge pull request #921 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvlib-0.7.1
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Bump github.com/NVIDIA/go-nvlib from 0.7.0 to 0.7.1
2025-02-13 14:08:55 +01:00
Evan Lezar
e6d99934f8 Merge pull request #924 from NVIDIA/dependabot/github_actions/main/NVIDIA/holodeck-0.2.6
Bump NVIDIA/holodeck from 0.2.5 to 0.2.6
2025-02-13 14:06:59 +01:00
dependabot[bot]
0cf2d60406 Bump NVIDIA/holodeck from 0.2.5 to 0.2.6
Bumps [NVIDIA/holodeck](https://github.com/nvidia/holodeck) from 0.2.5 to 0.2.6.
- [Release notes](https://github.com/nvidia/holodeck/releases)
- [Commits](https://github.com/nvidia/holodeck/compare/v0.2.5...v0.2.6)

---
updated-dependencies:
- dependency-name: NVIDIA/holodeck
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-13 08:39:07 +00:00
dependabot[bot]
0e759d4ad8 Bump github.com/NVIDIA/go-nvlib from 0.7.0 to 0.7.1
Bumps [github.com/NVIDIA/go-nvlib](https://github.com/NVIDIA/go-nvlib) from 0.7.0 to 0.7.1.
- [Release notes](https://github.com/NVIDIA/go-nvlib/releases)
- [Commits](https://github.com/NVIDIA/go-nvlib/compare/v0.7.0...v0.7.1)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvlib
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-13 08:33:08 +00:00
Evan Lezar
9e85fb54fc Merge pull request #903 from NVIDIA/dependabot/github_actions/main/slackapi/slack-github-action-2.0.0
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Bump slackapi/slack-github-action from 1.27.0 to 2.0.0
2025-02-12 17:21:01 +01:00
dependabot[bot]
ce79d01cf5 Bump slackapi/slack-github-action from 1.27.0 to 2.0.0
Bumps [slackapi/slack-github-action](https://github.com/slackapi/slack-github-action) from 1.27.0 to 2.0.0.
- [Release notes](https://github.com/slackapi/slack-github-action/releases)
- [Commits](https://github.com/slackapi/slack-github-action/compare/v1.27.0...v2.0.0)

---
updated-dependencies:
- dependency-name: slackapi/slack-github-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-12 15:14:20 +00:00
Evan Lezar
5d27489d18 Merge pull request #890 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.30.0
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Bump golang.org/x/sys from 0.29.0 to 0.30.0
2025-02-12 13:12:50 +01:00
Evan Lezar
d80e8b9733 Merge pull request #904 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.23.6
Bump golang from 1.23.4 to 1.23.6 in /deployments/devel
2025-02-12 13:12:25 +01:00
Evan Lezar
a72b9cd12c Merge pull request #919 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/matryer/moq-0.5.2
Bump github.com/matryer/moq from 0.5.1 to 0.5.2 in /deployments/devel
2025-02-12 13:12:03 +01:00
dependabot[bot]
c2f4ac0959 Bump github.com/matryer/moq from 0.5.1 to 0.5.2 in /deployments/devel
Bumps [github.com/matryer/moq](https://github.com/matryer/moq) from 0.5.1 to 0.5.2.
- [Release notes](https://github.com/matryer/moq/releases)
- [Changelog](https://github.com/matryer/moq/blob/main/.goreleaser.yml)
- [Commits](https://github.com/matryer/moq/compare/v0.5.1...v0.5.2)

---
updated-dependencies:
- dependency-name: github.com/matryer/moq
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-12 11:47:17 +00:00
Carlos Eduardo Arango Gutierrez
5db5ddb31b Merge pull request #915 from ArangoGutierrez/e2e_tests02
[no-relnote] Configure all GitHub actions as reusable workflow
2025-02-12 12:10:53 +01:00
dependabot[bot]
ff1897f2e4 Bump golang.org/x/sys from 0.29.0 to 0.30.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.29.0 to 0.30.0.
- [Commits](https://github.com/golang/sys/compare/v0.29.0...v0.30.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-12 10:24:31 +00:00
Evan Lezar
88f0157cae Merge pull request #920 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.64.2
Bump github.com/golangci/golangci-lint from 1.63.4 to 1.64.2 in /deployments/devel
2025-02-12 11:23:25 +01:00
Carlos Eduardo Arango Gutierrez
27f1738c3d [no-relnote] COnfigure all GitHub actions as reusable workflow
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-12 11:05:57 +01:00
Evan Lezar
f52df9522e [no-relnote] Remove toolchain from deployments/devel go.mod
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-12 10:57:20 +01:00
Evan Lezar
edf79d6e8b [no-relnote] Update vendor make target to handle submodules
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-12 10:56:56 +01:00
dependabot[bot]
6aadd6ed21 Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.63.4 to 1.64.2.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.63.4...v1.64.2)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-12 08:50:19 +00:00
Evan Lezar
a1d60910b6 Merge pull request #912 from NVIDIA/dependabot/go_modules/tests/main/golang.org/x/crypto-0.33.0
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Has been cancelled
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Has been cancelled
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Has been cancelled
image / End-to-end Tests (push) Has been cancelled
Bump golang.org/x/crypto from 0.31.0 to 0.33.0 in /tests
2025-02-10 19:28:11 +01:00
Evan Lezar
03f5a09aec Merge pull request #891 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.23.0
Bump golang.org/x/mod from 0.22.0 to 0.23.0
2025-02-10 19:27:33 +01:00
Evan Lezar
1f2232fc5f Merge pull request #894 from elezar/remove-nvidia-container-runtime-hook-in-cdi-mode
Allow cdi mode to work with --gpus flag
2025-02-10 19:19:33 +01:00
Evan Lezar
b19f5d8f7d Merge pull request #914 from elezar/bump-version-v1.17.4-main
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / End-to-end Tests (push) Blocked by required conditions
Bump version v1.17.4 main
2025-02-10 13:44:54 +01:00
Carlos Eduardo Arango Gutierrez
fa1379c6d5 Merge pull request #913 from ArangoGutierrez/e2e_tests01
[no-relnote] Run e2e tests as reusable workflow
2025-02-10 13:44:14 +01:00
Carlos Eduardo Arango Gutierrez
4ded119c31 [no-relnote] Run e2e tests as reusable workflow
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-10 13:28:36 +01:00
Evan Lezar
e0813e0c4d Bump version for v1.17.4 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-10 13:24:41 +01:00
Evan Lezar
4add0a6bf2 Bump version for v1.17.4 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-10 13:24:27 +01:00
dependabot[bot]
fbcf8ef12e Bump golang.org/x/crypto from 0.31.0 to 0.33.0 in /tests
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.31.0 to 0.33.0.
- [Commits](https://github.com/golang/crypto/compare/v0.31.0...v0.33.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-10 08:28:19 +00:00
Evan Lezar
92472bd0ed Merge pull request #909 from ArangoGutierrez/reg_test08
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Has been cancelled
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Has been cancelled
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Has been cancelled
[no-relnote] fix host and port string generation for remote E2E
2025-02-07 12:53:58 +01:00
Carlos Eduardo Arango Gutierrez
ac236a80f8 [no-relnote] fix host and port string generation for remote E2E
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-07 12:00:17 +01:00
Carlos Eduardo Arango Gutierrez
b69d98d7cf Merge pull request #905 from ArangoGutierrez/reg_test07
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
[no-relnote] fix step holodeck_public_dns_name variable name E2E
2025-02-06 14:01:28 +01:00
Evan Lezar
adbe127afd [no-relnote] Remove on push trigger
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-06 12:02:49 +01:00
Carlos Eduardo Arango Gutierrez
82ccae23b1 [no-relnote] fix step holodeck_public_dns_name variable name E2E
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-06 11:03:29 +01:00
dependabot[bot]
44cf1f4e3b Bump golang from 1.23.4 to 1.23.6 in /deployments/devel
Bumps golang from 1.23.4 to 1.23.6.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-06 09:07:30 +00:00
Carlos Eduardo Arango Gutierrez
1d0777ee01 Merge pull request #902 from ArangoGutierrez/reg_test06
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
[no-relnote] set E2E_INSTALL_CTK to true during GH Actions E2E
2025-02-06 10:02:48 +01:00
Carlos Eduardo Arango Gutierrez
9f82e910f6 [no-relnote] set E2E_INSTALL_CTK to true during GH Actions E2E
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-06 09:52:06 +01:00
Evan Lezar
cb172ba336 Merge pull request #901 from ArangoGutierrez/reg_test05
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
[no-relnote] Fix flag name on E2E
2025-02-05 22:02:21 +01:00
Carlos Eduardo Arango Gutierrez
bbdb13c47d [no-relnote] Fix flag name on E2E
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-05 20:29:27 +01:00
Evan Lezar
03152dba8d Allow cdi mode to work with --gpus flag
This changes ensures that the cdi modifier also removes the NVIDIA
Container Runtime Hook from the incoming spec. This aligns with what is
done for CSV modifications and prevents an error when starting the
container.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 19:01:43 +01:00
Evan Lezar
cf026dce9a [no-relnote] Remove duplicate test case
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 19:01:43 +01:00
Evan Lezar
073fb138d7 Merge pull request #900 from elezar/fix-make-target
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
[no-relnote] Fix e2e test make target
2025-02-05 19:01:20 +01:00
Evan Lezar
948bc113f0 [no-relnote] Fix e2e test make target
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 19:00:40 +01:00
Evan Lezar
b457247a0c Merge pull request #895 from ArangoGutierrez/reg_test03
Add E2E GitHub Action for Container Toolkit
2025-02-05 17:51:18 +01:00
Evan Lezar
bae4b3ebd3 [no-relnote] Wait on image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 17:28:29 +01:00
Evan Lezar
3a06066f6b [no-relnote] Simplify multi-arch checks
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 16:46:29 +01:00
Evan Lezar
c0292f5048 [no-relnote] Use nvcr.io/nvidia/cuda
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 16:26:09 +01:00
Carlos Eduardo Arango Gutierrez
cd8937bc5b Add E2E GitHub Action for Container Toolkit
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-05 16:22:14 +01:00
Evan Lezar
cec3445318 Merge pull request #897 from elezar/fix-copy-prs
Fix detection of PRs
2025-02-05 16:21:44 +01:00
Evan Lezar
f830653738 [no-relnote] Remove unused LABEL_IMAGE_SOURCE
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 16:13:16 +01:00
Evan Lezar
6e9ff446c8 [no-relnote] Use github.ref_name to detect PRs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 16:13:16 +01:00
Evan Lezar
9a07de0ee8 Merge pull request #896 from elezar/onboard-pr-copy-bot
[no-relnote] Enable pr-copy-bot
2025-02-05 15:55:49 +01:00
Evan Lezar
517873e97d [no-relnote] Enable pr-copy-bot
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 15:55:12 +01:00
dependabot[bot]
49b45693ee Bump golang.org/x/mod from 0.22.0 to 0.23.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.22.0 to 0.23.0.
- [Commits](https://github.com/golang/mod/compare/v0.22.0...v0.23.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-05 08:21:37 +00:00
Carlos Eduardo Arango Gutierrez
78d6cdc7f7 Merge pull request #876 from ArangoGutierrez/reg_test02
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Add remote-test option for E2E
2025-02-04 18:33:34 +01:00
Carlos Eduardo Arango Gutierrez
61640591ba Add remote-test option for E2E
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-04 15:05:46 +01:00
Evan Lezar
df4c87b877 Merge pull request #838 from cdesiniotis/enable-cdi-toolkit-container
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Enable CDI in the container runtime if enabled in the toolkit
2025-02-03 16:37:49 +01:00
Tariq
d6c312956b Merge pull request #887 from NVIDIA/latest-qemu
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
[no-relnote] switch to the newer qemu artifacts image
2025-01-31 10:33:38 -05:00
Tariq Ibrahim
51f765dd71 [no-relnote] switch to the newer qemu artifacts image
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2025-01-31 09:28:30 -05:00
Evan Lezar
d8cd5438e4 [no-relnote] Add basic cdi-enabled tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-29 15:37:32 +01:00
Evan Lezar
5ed25bb375 [no-relnote] Add unit test for installer command
This change adds a basic unit test for the nvidia-ckt-installer command.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-29 15:37:32 +01:00
Evan Lezar
a7786d4d41 Enable CDI in runtime if CDI_ENABLED is set
This change also enables CDI in the configured runtime when the toolkit
is installed with CDI enabled.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-29 15:37:32 +01:00
Evan Lezar
be1ac24f2a Merge pull request #882 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.8.0-base-ubuntu20.04
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump nvidia/cuda from 12.6.3-base-ubuntu20.04 to 12.8.0-base-ubuntu20.04 in /deployments/container
2025-01-29 14:11:47 +01:00
dependabot[bot]
dd86f598b5 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.6.3-base-ubuntu20.04 to 12.8.0-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-28 08:57:31 +00:00
Evan Lezar
2b417c1a9a Fix overwriting docker feature flags
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-27 15:52:08 +01:00
Christopher Desiniotis
e89be14c86 Add option in toolkit container to enable CDI in runtime
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2025-01-27 15:52:08 +01:00
Evan Lezar
f625242ed6 Remove Set from engine config API
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-27 15:52:08 +01:00
Christopher Desiniotis
df73db7e1e Add EnableCDI() method to engine.Interface
This change adds an EnableCDI method to the container engine config files and
Updates the 'nvidia-ctk runtime configure' command to use this new method.

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2025-01-27 15:51:23 +01:00
Evan Lezar
89f33bdf71 Merge pull request #881 from elezar/add-imex-binaries
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Add IMEX binaries to CDI discovery
2025-01-27 13:27:58 +01:00
Evan Lezar
6834d5e0a4 Merge pull request #879 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-95d3e86
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump third_party/libnvidia-container from `f23e5e5` to `95d3e86`
2025-01-24 18:30:23 +01:00
Evan Lezar
c91a1b1dc8 Add IMEX binaries to CDI discovery
This change adds the nvidia-imex and nivdia-imex-ctl binaries to
the list of driver binaries that are searched when using CDI.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-24 14:46:31 +01:00
dependabot[bot]
96d91df78e Bump third_party/libnvidia-container from f23e5e5 to 95d3e86
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `f23e5e5` to `95d3e86`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](f23e5e55ea...95d3e86522)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-24 09:05:17 +00:00
Carlos Eduardo Arango Gutierrez
a990860bfa Merge pull request #873 from ArangoGutierrez/tests_rename
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Rename test folder to tests
2025-01-23 11:54:59 +01:00
Carlos Eduardo Arango Gutierrez
bf9d618ff2 Rename test folder to tests
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-23 11:46:14 +01:00
Evan Lezar
7ae5c2901f Merge commit from fork
Disable mounting of compat libs from container by default
2025-01-23 10:56:32 +01:00
Evan Lezar
6b236746ce Bump libnvidia-container to f23e5e55
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-23 10:51:32 +01:00
Evan Lezar
ed3b52eb8d Add allow-cuda-compat-libs-from-container feature flag
This change adds an allow-cuda-compat-libs-from-container feature flag
to the NVIDIA Container Toolkit config. This allows a user to opt-in
to the previous default behaviour of overriding certain driver
libraries with CUDA compat libraries from the container.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-22 17:34:20 +01:00
Evan Lezar
1176430278 Disable mounting of compat libs from container
This change passes the --no-cntlibs argument to the nvidia-container-cli
from the nvidia-container-runtime-hook to disable overwriting host
drivers with the compat libs from a container being started.

Note that this may be a breaking change for some applications.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-22 17:34:20 +01:00
Evan Lezar
c22f3bd56c Merge pull request #765 from elezar/use-logger-in-toolkit-install
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Simplify standalone installer
2025-01-22 14:42:50 +01:00
Evan Lezar
6375e832ff Merge pull request #874 from elezar/skip-graphics-for-csv
Skip graphics modifier in CSV mode
2025-01-22 14:37:07 +01:00
Evan Lezar
991b9c222f Skip graphics modifier in CSV mode
In CSV mode the CSV files at /etc/nvidia-container-runtime/host-files-for-container.d/
should be the source of truth for container modifications. This change skips graphics
modifications to a container. This prevents conflicts when handling files such as
vulkan icd files which are already defined in the CSV file.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-22 13:58:31 +01:00
Evan Lezar
fdad3927b4 [no-relnote] Refactor oci spec modifier list
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-22 13:58:31 +01:00
Evan Lezar
6bd292eff8 [no-relnote] Move tools/container to cmd/nvidia-ctk-installer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:28:46 +01:00
Evan Lezar
9429ce039b [no-relnote] Rename run.go to main.go
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:28:46 +01:00
Evan Lezar
d953bbb977 Move nvidia-toolkit to nvidia-ctk-installer
This change moves the containerized installer from nvidia-toolkit to
cmd/nvidia-ctk-installer to allow for its use in CI.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:28:44 +01:00
Evan Lezar
5cbf3f82d9 [no-relnote] Move fileinstaller to separate file
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:27:21 +01:00
Evan Lezar
69375d7889 [no-relnote] Use logger in toolkit installation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:27:21 +01:00
Evan Lezar
9753096398 [no-relnote] Add app struct for nvidia-toolkit
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:27:21 +01:00
Evan Lezar
76edd1d7ca [no-relnote] Remove unused TryDelete function
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:27:21 +01:00
Evan Lezar
a6476193c8 [no-relnote] Merge verifyFlags and validateFlags
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:27:21 +01:00
Evan Lezar
d75b1adeda Merge pull request #868 from elezar/docker-file-fixes
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
[no-relnote] Fix Dockerfile lint issues
2025-01-20 14:21:22 +01:00
Evan Lezar
9aa2f67a22 Merge pull request #866 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.5
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump github.com/urfave/cli/v2 from 2.27.4 to 2.27.5
2025-01-16 14:19:17 +01:00
Evan Lezar
b89e7ca849 Merge pull request #865 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvlib-0.7.0
Bump github.com/NVIDIA/go-nvlib from 0.6.1 to 0.7.0
2025-01-16 14:18:55 +01:00
Evan Lezar
7c5ed8157b [no-relnote] Fix Dockerfile lint issues
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-16 14:17:39 +01:00
dependabot[bot]
a401e4a5a6 Bump github.com/urfave/cli/v2 from 2.27.4 to 2.27.5
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.27.4 to 2.27.5.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.27.4...v2.27.5)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-16 08:25:45 +00:00
dependabot[bot]
2bc4201205 Bump github.com/NVIDIA/go-nvlib from 0.6.1 to 0.7.0
Bumps [github.com/NVIDIA/go-nvlib](https://github.com/NVIDIA/go-nvlib) from 0.6.1 to 0.7.0.
- [Release notes](https://github.com/NVIDIA/go-nvlib/releases)
- [Commits](https://github.com/NVIDIA/go-nvlib/compare/v0.6.1...v0.7.0)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvlib
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-16 08:25:38 +00:00
Carlos Eduardo Arango Gutierrez
cb52e779ba Merge pull request #860 from ArangoGutierrez/reg_test01
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Add basic integration tests for docker runtime
2025-01-15 15:59:47 +01:00
Carlos Eduardo Arango Gutierrez
8cc672bed9 Automated regression testing for the NVIDIA Container Toolkit
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-01-15 14:39:44 +01:00
Evan Lezar
be001d938c Merge pull request #863 from elezar/sort-feature-flags
[no-relnote] Sort feature flags
2025-01-15 13:33:34 +01:00
Evan Lezar
d584f9a40b [no-relnote] Sort feature flags
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-15 13:27:14 +01:00
Evan Lezar
11acbe7ca2 Merge pull request #805 from alam0rt/add-v3-containerd-config
Add support for containerd v3 configs
2025-01-15 10:46:30 +01:00
Sam Lockart
95bda3ad72 Add support for containerd version 3 config
This change adds support for containerd configs with version=3.
From the perspective of the runtime configuration the contents of the
config are the same. This means that we just have to load the new
version and ensure that this is propagated to the generated config.

Note that v3 config also requires a switch to the 'io.containerd.cri.v1.runtime'
CRI runtime plugin. See:
https://github.com/containerd/containerd/blob/v2.0.0/docs/PLUGINS.md
https://github.com/containerd/containerd/issues/10132

Note that we still use a default config of version=2 since we need to
ensure compatibility with older containerd versions (1.6.x and 1.7.x).

Signed-off-by: Sam Lockart <sam.lockart@zendesk.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2025-01-15 10:33:43 +01:00
Evan Lezar
6e4b0632c5 Merge pull request #861 from elezar/remove-fsnotify
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Remove watch option from create-dev-char-symlinks
2025-01-14 20:26:20 +01:00
Evan Lezar
e4be26ff55 Merge pull request #853 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.22.0
Bump golang.org/x/mod from 0.20.0 to 0.22.0
2025-01-14 18:34:59 +01:00
Evan Lezar
8e6b57b38a Remove watch option from create-dev-char-symlinks
This removes the untested watch option from the
nvidia-ctk system create-dev-char-symlinks command.

This also removes the direct dependency on fsnotify.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-14 18:34:14 +01:00
Evan Lezar
c935779693 Merge pull request #855 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvml-0.12.4-1
Bump github.com/NVIDIA/go-nvml from 0.12.4-0 to 0.12.4-1
2025-01-14 18:28:09 +01:00
Evan Lezar
4b43a29957 Merge pull request #844 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.63.4
Bump github.com/golangci/golangci-lint from 1.62.2 to 1.63.4 in /deployments/devel
2025-01-14 18:27:36 +01:00
Evan Lezar
b3b0d107de Merge pull request #851 from NVIDIA/dependabot/go_modules/main/github.com/stretchr/testify-1.10.0
Bump github.com/stretchr/testify from 1.9.0 to 1.10.0
2025-01-14 18:27:00 +01:00
Evan Lezar
d22fca51b8 Bump go version in go.mod to go 1.22.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-14 18:25:47 +01:00
dependabot[bot]
adad9b9c92 Bump golang.org/x/mod from 0.20.0 to 0.22.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.20.0 to 0.22.0.
- [Commits](https://github.com/golang/mod/compare/v0.20.0...v0.22.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-14 18:18:58 +01:00
Evan Lezar
b6d360fbbe Merge pull request #847 from mihalicyn/fix_custom_config_path_handling
Properly pass configSearchPaths to a Driver constructor
2025-01-14 16:03:40 +01:00
Evan Lezar
5699ec1cff Merge pull request #807 from elezar/add-cdi-imex-channels
Add imex mode to CDI spec generation
2025-01-14 16:03:17 +01:00
Evan Lezar
ec182705f1 Add string TOML source
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-14 10:35:29 +01:00
Evan Lezar
7598fe14bc Improve the implementation for UseLegacyConfig
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-14 10:35:29 +01:00
Evan Lezar
4fc181a418 Merge pull request #821 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.23.4
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump golang from 1.23.2 to 1.23.4 in /deployments/devel
2025-01-10 13:42:31 +01:00
dependabot[bot]
4bf5894833 Bump github.com/NVIDIA/go-nvml from 0.12.4-0 to 0.12.4-1
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.4-0 to 0.12.4-1.
- [Release notes](https://github.com/NVIDIA/go-nvml/releases)
- [Commits](https://github.com/NVIDIA/go-nvml/compare/v0.12.4-0...v0.12.4-1)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-10 09:01:45 +00:00
dependabot[bot]
272f891204 Bump github.com/stretchr/testify from 1.9.0 to 1.10.0
Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.9.0 to 1.10.0.
- [Release notes](https://github.com/stretchr/testify/releases)
- [Commits](https://github.com/stretchr/testify/compare/v1.9.0...v1.10.0)

---
updated-dependencies:
- dependency-name: github.com/stretchr/testify
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-09 14:16:21 +00:00
Evan Lezar
4b352e6bac Merge pull request #845 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.29.0
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Bump golang.org/x/sys from 0.26.0 to 0.29.0
2025-01-09 09:28:17 +01:00
Evan Lezar
178369eb8e Merge pull request #824 from elezar/post-release
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Perform some cleanups following the v1.17.3 release
2025-01-08 11:24:09 +01:00
Alexander Mikhalitsyn
3c012499db Properly pass configSearchPaths to a Driver constructor
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-01-07 19:11:01 +01:00
dependabot[bot]
6e883bf10a Bump golang.org/x/sys from 0.26.0 to 0.29.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.26.0 to 0.29.0.
- [Commits](https://github.com/golang/sys/compare/v0.26.0...v0.29.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-05 08:44:56 +00:00
dependabot[bot]
cc61f186a8 Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.62.2 to 1.63.4.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.62.2...v1.63.4)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-05 08:31:49 +00:00
Evan Lezar
aac0a62528 Merge pull request #826 from elezar/fix-test
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Fix create-device-node test when devices exist
2024-12-06 14:07:59 +01:00
Evan Lezar
2529aebd6c Fix create-device-node test when devices exist
This changes fixes the TestCreateControlDevices test on systems
where device nodes exist.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-06 14:05:51 +01:00
Evan Lezar
784917a0d9 [no-relnote] Fix archive-packages script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-04 12:50:35 +01:00
Evan Lezar
1330467652 [no-relnote] Update dependabot
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-04 12:50:35 +01:00
Evan Lezar
22035d4561 [no-relnote] Update changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-04 12:50:35 +01:00
Evan Lezar
9318aaf275 Bump version for v1.17.3 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-04 12:37:16 +01:00
Evan Lezar
70261d84f1 Bump version for v1.17.2 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-04 12:36:39 +01:00
Evan Lezar
f8f506fd85 Bump version for v1.17.1 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-04 12:36:33 +01:00
Evan Lezar
b0a5fb9f86 Merge pull request #822 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-16f37fc
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump third_party/libnvidia-container from `63d366e` to `16f37fc`
2024-12-04 10:40:54 +01:00
dependabot[bot]
83790dbccd Bump third_party/libnvidia-container from 63d366e to 16f37fc
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `63d366e` to `16f37fc`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](63d366ee3b...16f37fcafc)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-04 09:39:44 +00:00
dependabot[bot]
7ceae92ac1 Bump golang from 1.23.2 to 1.23.4 in /deployments/devel
Bumps golang from 1.23.2 to 1.23.4.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-04 08:26:02 +00:00
Evan Lezar
8dfa61338b Merge pull request #818 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.62.2
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Bump github.com/golangci/golangci-lint from 1.61.0 to 1.62.2 in /deployments/devel
2024-12-03 20:19:12 +01:00
Evan Lezar
484cae273c Merge pull request #820 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.6.3-base-ubuntu20.04
Bump nvidia/cuda from 12.6.2-base-ubuntu20.04 to 12.6.3-base-ubuntu20.04 in /deployments/container
2024-12-03 20:18:26 +01:00
dependabot[bot]
6e67bb247b Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.6.2-base-ubuntu20.04 to 12.6.3-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-03 09:03:05 +00:00
dependabot[bot]
8bd7dd7060 Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.61.0 to 1.62.2.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.61.0...v1.62.2)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-29 16:49:15 +00:00
Evan Lezar
e25c052aa6 Merge pull request #810 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/matryer/moq-0.5.1
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump github.com/matryer/moq from 0.5.0 to 0.5.1 in /deployments/devel
2024-11-29 17:47:36 +01:00
Evan Lezar
21ba9932d0 Merge pull request #778 from zhanluxianshen/fix-fsnotify-logic
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Fix fsnotify.Remove logic function.
2024-11-29 14:17:03 +01:00
Evan Lezar
dcd65a6410 Merge pull request #690 from elezar/ignore-ldconfig-option
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Only allow host-relative LDConfig paths
2024-11-26 13:35:33 +01:00
Evan Lezar
8603d605aa Add imex mode to CDI spec generation
This change adds a imex mode to CDI spec generation. This mode detected
generates CDI specifications for existing IMEX channels. By default these
devices have the fully qualified CDI device names:

nvidia.com/imex-channel=<ID>

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-25 13:46:43 +01:00
Evan Lezar
e1efa28fe6 [no-relnote] Refactor CDI mode handling
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-25 13:46:34 +01:00
Evan Lezar
75a934e11c [no-relnote] Fix error string
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-25 13:46:34 +01:00
dependabot[bot]
43aca46f10 Bump github.com/matryer/moq from 0.5.0 to 0.5.1 in /deployments/devel
Bumps [github.com/matryer/moq](https://github.com/matryer/moq) from 0.5.0 to 0.5.1.
- [Release notes](https://github.com/matryer/moq/releases)
- [Changelog](https://github.com/matryer/moq/blob/main/.goreleaser.yml)
- [Commits](https://github.com/matryer/moq/compare/v0.5.0...v0.5.1)

---
updated-dependencies:
- dependency-name: github.com/matryer/moq
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-24 08:26:22 +00:00
Evan Lezar
00f1d5a627 Only allow host-relative LDConfig paths
This change only allows host-relative LDConfig paths.

An allow-ldconfig-from-container feature flag is added to allow for this
the default behaviour to be changed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-22 14:25:17 +01:00
Evan Lezar
c0764366d9 [no-relnote] Refactor config handling for hook
This change removes indirect calls to get the default config
from the nvidia-container-runtime-hook.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-22 13:19:05 +01:00
Evan Lezar
1afada7de5 [no-relnote] Remove unused code
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-22 13:11:04 +01:00
Evan Lezar
5c3ffc2fba Merge pull request #799 from elezar/fix-legacy-nvidia-imex-channels
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Failing after 51m27s
Golang / check (push) Failing after 6m27s
Golang / Unit test (push) Failing after 2m23s
Golang / Build (push) Failing after 2m24s
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Failing after 6m40s
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Failing after 3m9s
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Failing after 2m38s
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Failing after 2m51s
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Failing after 2m32s
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Failing after 2m56s
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been skipped
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been skipped
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been skipped
Fix NVIDIA_IMEX_CHANNELS handling on legacy images
2024-11-15 10:55:45 -07:00
Evan Lezar
e188090a92 Fix NVIDIA_IMEX_CHANNELS handling on legacy images
For legacy images (images with a CUDA_VERSION set but no CUDA_REQUIRES set), the
default behaviour for device envvars is to treat non-existence as all.

This change ensures that the NVIDIA_IMEX_CHANNELS envvar is not treated in the same
way, instead returning no devices if the envvar is not set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-14 13:26:30 -07:00
Evan Lezar
1995925a7d Merge pull request #783 from elezar/fix-config-file-path
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Fix bug in default config file path
2024-11-08 15:57:42 -08:00
Evan Lezar
46fc47c67a Fix bug in default config file path
This fix ensures that the default config file path for the nvidia-ctk runtime configure
command is set consistently.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-08 15:38:28 -08:00
Evan Lezar
b142091234 Merge pull request #777 from elezar/add-config-fallback
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Fallback to file for runtime config
2024-11-08 09:09:18 -08:00
Evan Lezar
e9b8ad95df Fallback to file for runtime config
This change ensures that we fall back to the previous behaviour of
reading the existing config from the specified config file if extracting
the current config from the command line fails. This fixes use cases where
the containerd / crio executables are not available.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-08 08:43:09 -08:00
zhanluxianshen
f77a808105 Fix fsnotify.Remove logic function.
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2024-11-08 22:07:21 +08:00
Evan Lezar
832818084d Merge pull request #773 from elezar/fix-libcuda-symlink
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Force creation of symlinks in create-symlink hook
2024-11-07 17:41:20 +01:00
Evan Lezar
ab0a4eaa19 Merge pull request #768 from elezar/add-toolkit-install-unit-tests
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
[no-relnote] Add toolkit install unit test
2024-11-06 00:41:55 +01:00
Evan Lezar
0c687be794 [no-relnote] Also validate CDI management spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-05 14:23:36 -08:00
Evan Lezar
8d869acce5 [no-relnote] Add toolkit install unit test
This change adds basic toolkit installation unit tests. This required
that the source for files be specified when installing to allow for
a testdata folder to be used.

This replaces the currently unused shell-based tests in /test/container.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-05 14:23:35 -08:00
Evan Lezar
324096c979 Force symlink creation in create-symlink hook
This change updates the create-symlink hook to be equivalent to
ln -f -s target link

This ensures that links are updated even if they exist in the container
being run.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-05 09:39:11 -08:00
Evan Lezar
5bc0315448 Merge pull request #766 from elezar/bump-release-v1.17.0
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump version for v1.17.0 release
2024-10-31 10:17:13 +01:00
Evan Lezar
3fb1615d26 [no-relnote] Address lint errors in test
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-31 10:11:58 +01:00
Evan Lezar
9e4696bf7d Bump version for v1.17.0 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-31 07:20:10 +01:00
Evan Lezar
8c9d3d8f65 Merge commit from fork
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Check for valid paths in create-symlinks hook
2024-10-31 07:16:01 +01:00
Evan Lezar
efb18a72ad Merge pull request #762 from elezar/fix-auto-cdi-runtime-mode
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Fix bug when using just-in-time CDI spec generation
2024-10-30 13:08:26 +01:00
Evan Lezar
75376d3df2 Fix bug when using just-in-time CDI spec generation
This change fixes a bug when using just-in-time CDI spec generation for the
NVIDIA Container Runtime for specific devices (i.e. not 'all').
Instead of unconditionally using the default nvsandboxutils library -- leading
to errors due to undefined symbols -- we check whether the library can be
properly initialised before continuing.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-30 12:20:36 +01:00
Christopher Desiniotis
7e0cd45b1c Check for valid paths in create-symlinks hook
This change updates the create-symlinks hook to always evaluate
link paths in the container's root filesystem. In addition the
executable is updated to return an error if a link could not
be created.

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-10-29 12:16:51 -07:00
Christopher Desiniotis
a04e3ac4f7 Write failing test case for create-symlinks hook
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-10-29 12:16:51 -07:00
Christopher Desiniotis
92779e71b3 Handle case where symlink already exists in create-symlinks hook
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-10-29 12:16:51 -07:00
Christopher Desiniotis
23f1ba3e93 Add unit tests for create-symlinks hook
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:51 -07:00
Evan Lezar
d0d85a8c5c Always use paths relative to the container root for links
This chagne ensures that we always treat the link path as a path
relative to the container root. Without this change, relative paths
in link paths would result links being created relative to the
current working directory where the hook is executed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:51 -07:00
Evan Lezar
bfea673d6a [no-relnote] Remove unused hostRoot argument
The hostRoot argument is always empty and not applicable to
how links are specified.

Links are specified by the paths in the container filesystem and as such
the only transform required to change the root is a join of the filepath.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:50 -07:00
Evan Lezar
6a6a3e6055 [no-relnote] Remove redundant changeRoot for link target
Since hostRoot is always the empty string and we are changing the root in the
target path to /, the call to changeRoot is redundant.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:50 -07:00
Evan Lezar
fa59d12973 [no-relnote] Check created outside of create loop
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:49 -07:00
Evan Lezar
d78868cd31 Merge pull request #760 from elezar/bump-release-v1.17.0-rc.2
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump version for v1.17.0-rc.2 release
2024-10-28 14:26:53 +01:00
Evan Lezar
74b1e5ea8c Bump version for v1.17.0-rc.2 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-28 14:09:12 +01:00
Evan Lezar
88608781b6 Merge pull request #755 from elezar/fix-libcuda-so
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Fix bug where libcuda.so is not found in ldcache
2024-10-24 23:33:12 +02:00
Evan Lezar
fa5a4ac499 Read ldcache at construction instead of on each locate call
This change udpates the ldcache locator to read the ldcache at construction
and use these contents to perform future lookups against. Each of the cache
entries are resolved and lookups return the resolved target.

Assuming a symlink chain: libcuda.so -> libcuda.so.1 -> libcuda.so.VERSION, this
means that libcuda.so.VERION will be returned for any of the following inputs:
libcuda.so, libcuda.so.1, libcudal.so.*.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 23:12:58 +02:00
Evan Lezar
9f1bd62c42 [no-relnote] Add failing libcuda locate test
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:53 +02:00
Evan Lezar
9534249936 [no-relnote] Add test for libcuda lookup
This change adds a test for locating libcuda as a driver library.
This includes a failing test on a system where libcuda.so.1 is in
the ldcache, but not at one of the predefined library search paths.

A testdata folder with sample root filesystems is included to test
various combinations.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Evan Lezar
e1ea0056b9 Fix bug in sorting of symlink chain
Since we use a map to keep track of the elements of a symlink chain
the construction of the final list of located elements is not stable.
This change constructs the output as this is being discovered and as
such maintains the original ordering.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Evan Lezar
c802c3089c Remove unsupported print-ldcache command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Tariq
771ac6b88a Merge pull request #756 from NVIDIA/cli-source-fallback
[TOML ConfigSource] add support for executing fallback CLI commands
2024-10-23 14:31:45 -07:00
Tariq Ibrahim
0f7aba9c3c [TOML ConfigSource] add support for executing fallback CLI commands
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Co-authored-by: Evan Lezar <elezar@nvidia.com>
2024-10-23 14:26:17 -07:00
Tariq
3c07ea0b17 Merge pull request #726 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.23.2
Bump golang from 1.23.1 to 1.23.2 in /deployments/devel
2024-10-21 10:11:21 -07:00
Evan Lezar
183dff9161 Merge pull request #750 from elezar/remove-csv-filenames-support
Remove csv filenames support
2024-10-21 11:10:27 +02:00
Evan Lezar
5e3e91a010 [no-relnote] Minor cleanup in create-symlinks
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 16:27:38 +02:00
Evan Lezar
dc0e191093 Remove csv-filename support from create-symlinks
This change removes support for specifying csv-filenames when
calling the create-symlinks hook. This is no longer required
as tegra-based systems generate hooks with `--link` arguments.

This also allows the hook to better serve as a reference implementation
for upstream projects wanting to implement a set of standard CDI hooks.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 16:27:27 +02:00
Evan Lezar
8a6c1944a5 Merge pull request #749 from elezar/bump-release-v1.17.0-rc.1
Bump version for v1.17.0-rc.1 release
2024-10-18 15:35:34 +02:00
Evan Lezar
5d057dce66 Bump version for v1.17.0-rc.1 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 15:33:12 +02:00
Evan Lezar
5931136879 Merge pull request #748 from elezar/fix-operator
Add aliases for runtime-specific envvars
2024-10-18 14:49:32 +02:00
Evan Lezar
1145ce2283 Add aliases for runtime-specific envvars
This change ensures that the toolkit works with older
versions of the GPU Operator where runtime-specific envvars are
used to set options such as the config file location.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 12:16:50 +02:00
Evan Lezar
38790c5df0 Merge pull request #747 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-63d366e
Bump third_party/libnvidia-container from `921e2f3` to `63d366e`
2024-10-17 18:18:53 +02:00
Evan Lezar
e5175c270e Merge pull request #745 from elezar/fix-symlink-logging
Fix symlink resolution error message
2024-10-17 18:04:54 +02:00
dependabot[bot]
d18a2b6fc7 Bump third_party/libnvidia-container from 921e2f3 to 63d366e
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `921e2f3` to `63d366e`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](921e2f3197...63d366ee3b)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-17 16:02:52 +00:00
Evan Lezar
2987c4d670 Merge pull request #740 from elezar/imex-by-volume-mount
Allow IMEX channel requests by volume mount
2024-10-17 17:56:12 +02:00
Evan Lezar
2e6712d2bc Allow IMEX channels to be requested as volume mounts
This change allows IMEX channels to be requested using the
volume mount mechanism.

A mount from /dev/null to /var/run/nvidia-container-devices/imex/{{ .ChannelID }}
is equivalent to including {{ .ChannelID }} in the NVIDIA_IMEX_CHANNELS
envvironment variables.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:54:29 +02:00
Evan Lezar
92df542f2f [no-relnote] Use image.CUDA to extract visible devices
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:53:17 +02:00
Evan Lezar
1991b3ef2a [no-relnote] Use string slice for devices in hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:53:17 +02:00
Evan Lezar
cdf39fbad3 [no-relnote] Use symlinks.Resolve in hook
This change removes duplicate logic from the create-symlinks hook
and uses symlinks.Resolve instead.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 15:47:13 +02:00
Evan Lezar
c30ca0fdc3 Fix typo in error message
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 15:46:49 +02:00
Evan Lezar
b077e2648d Merge pull request #741 from elezar/imex-default
Add disableIMEXChannelCreation feature flag
2024-10-17 15:26:21 +02:00
Evan Lezar
457d71c170 Add disable-imex-channel-creation feature flag
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 14:26:24 +02:00
Evan Lezar
bc9180b59d Expose opt-in features in toolkit-container
This change enables opt-in (off-by-default) features to be opted into.
These features can be toggled by name by specifying the (repeated)
--opt-in-features command line argument or as a comma-separated list
in the NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES environment variable.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 14:26:24 +02:00
Evan Lezar
ec8dfaf779 Merge pull request #743 from elezar/remove-opt-in-features
Remove ability to set per-container features in the config file
2024-10-17 13:46:23 +02:00
Evan Lezar
c129122da6 Merge pull request #742 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.6.2-base-ubuntu20.04
Bump nvidia/cuda from 12.6.1-base-ubuntu20.04 to 12.6.2-base-ubuntu20.04 in /deployments/container
2024-10-17 11:49:05 +02:00
Evan Lezar
0abf800000 Merge pull request #744 from elezar/fix-script
[no-relnote] Fix typo in script
2024-10-16 15:32:09 +02:00
Evan Lezar
1d9d0acf7d [no-relnote] Remove feature flag for per-container features
This change REMOVES the ability to set opt-in features
(e.g. GDS, MOFED, GDRCOPY) in the config file. The existing
per-container envvars are unaffected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-16 15:30:31 +02:00
Evan Lezar
17f14278a9 [no-relnote] Fix typo in script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-16 10:53:45 +02:00
dependabot[bot]
1fa5bbf351 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.6.1-base-ubuntu20.04 to 12.6.2-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-15 09:07:59 +00:00
Evan Lezar
f794d09df1 Merge pull request #729 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.26.0
Bump golang.org/x/sys from 0.25.0 to 0.26.0
2024-10-11 16:16:15 +02:00
Evan Lezar
17a2377ad5 Merge pull request #734 from NVIDIA/minor-cleanup
minor cleanup and improvements
2024-10-11 16:15:19 +02:00
Tariq Ibrahim
b90ee5d100 [no-relnote] minor cleanup and improvements
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-11 16:14:41 +02:00
Evan Lezar
1ef3f4048f Merge pull request #733 from elezar/add-imex-channels-to-management-spec
Add imex channels to management CDI spec
2024-10-11 15:28:50 +02:00
Evan Lezar
7fb31bd1dc Merge pull request #732 from elezar/add-z-lazy
Add -z,lazy to LDFLAGS
2024-10-11 15:20:30 +02:00
Evan Lezar
e2fe591535 Add -z,lazy to LDFLAGS
This fixes undefined symbol errors on platforms where -z,lazy may
not be the default.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-11 15:20:06 +02:00
Evan Lezar
adf3708d0b Add imex channels to management CDI spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-10 14:38:33 +02:00
Evan Lezar
a06d838b1c Merge pull request #686 from NVIDIA/get-config-from-cmdline
Fetch current container runtime config
2024-10-10 11:58:08 +02:00
Tariq Ibrahim
f477dc0df1 fetch current container runtime config through the command line
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>

add default runtime binary path to runtimes field of toolkit config toml

Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>

[no-relnote] Get low-level runtimes consistently

We ensure that we use the same low-level runtimes regardless
of the runtime engine being configured. This ensures consistent
behaviour.

Signed-off-by: Evan Lezar <elezar@nvidia.com>

Co-authored-by: Evan Lezar <elezar@nvidia.com>

address review comment

Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-10-10 01:13:20 -07:00
dependabot[bot]
879bb9ffd5 Bump golang.org/x/sys from 0.25.0 to 0.26.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.25.0 to 0.26.0.
- [Commits](https://github.com/golang/sys/compare/v0.25.0...v0.26.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-06 08:57:11 +00:00
Tariq
4604e3b6c8 Merge pull request #725 from elezar/fix-nvsandboxutils
Ensure that nvsandboxutils is available for version
2024-10-04 04:58:03 +08:00
dependabot[bot]
a9ca6995f7 Bump golang from 1.23.1 to 1.23.2 in /deployments/devel
Bumps golang from 1.23.1 to 1.23.2.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-02 08:37:23 +00:00
Evan Lezar
7cd2aef0d8 Ensure that nvsandboxutils is available for version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-02 09:37:05 +02:00
Evan Lezar
19482dac6f Merge pull request #715 from elezar/add-libcuda-so-symlink
Align driver symlinks with libnvidia-container
2024-10-01 18:16:36 +02:00
Evan Lezar
78c4ca8a12 Merge pull request #693 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.61.0
Bump github.com/golangci/golangci-lint from 1.60.1 to 1.61.0 in /deployments/devel
2024-10-01 11:40:31 +02:00
Evan Lezar
b12bdfc52a Merge pull request #691 from elezar/fix-codeql
Add an explicit CodeQL workflow to this repostitory
2024-10-01 11:39:34 +02:00
Evan Lezar
82ae2e615a Add creation of select driver symlinks to CDI spec
This change aligns the creation of symlinks under CDI with
the implementation in libnvidia-container. If the driver libraries
are present, the following symlinks are created:

* {{ .LibRoot }}/libcuda.so -> libcuda.so.1
* {{ .LibRoot }}/libnvidia-opticalflow.so -> libnvidia-opticalflow.so.1
* {{ .LibRoot }}/libGLX_indirect.so.0 -> libGLX_nvidia.so.{{ .Version }}

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-01 11:34:58 +02:00
Tariq
4f440dedda Merge pull request #722 from tariq1890/use-go-api-for-toolkit-install-rebase 2024-10-01 07:41:28 +08:00
Evan Lezar
3ee678f4f6 Convert crio to runtime package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-30 14:40:30 -07:00
Evan Lezar
103375e504 Convert containerd to runtime package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-30 14:39:52 -07:00
Evan Lezar
5bedbc2b50 Convert docker to runtime package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-30 14:36:35 -07:00
Evan Lezar
94337b7427 Add runtime package for runtime setup
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-30 14:36:35 -07:00
Evan Lezar
046a05921f Convert toolkit to go package
This change converts the toolkit installation logic to a go package
and invokes this installation over the go API instead of starting
this executable.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-30 14:36:35 -07:00
Tariq
6ca2700a17 Merge pull request #721 from NVIDIA/devel-check-modules
add go modules check for deployments/devel
2024-10-01 05:35:45 +08:00
Tariq Ibrahim
0d626cfbb7 add go modules check for deployments/devel
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-09-30 12:13:56 -07:00
Tariq
10bafd1d09 Merge pull request #643 from elezar/refactor-toml-source
Refactor handling of TOML config files for runtimes
2024-10-01 00:59:52 +08:00
Evan Lezar
bf2bdfd35e Refactor Toml config handling
This change refactors the toml config file handlig for runtimes
such as containerd or crio. A toml.Loader is introduced that
encapsulates loading the required file.

This can be extended to allow other mechanisms for loading
loading the current config.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-30 14:24:18 +02:00
Evan Lezar
f126877254 Merge pull request #716 from elezar/discover-vdpau-libraries
Also search for driver libraries in vdpau
2024-09-30 11:18:49 +02:00
Evan Lezar
006aebf31e Merge pull request #717 from elezar/fix-libnvidia-allocator-so-1
Skip explicit creation of libnvidia-allocator.so.1 symlink
2024-09-30 11:06:08 +02:00
Evan Lezar
6c5f4eea63 Remove support for config overrides
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-27 13:23:35 +02:00
dependabot[bot]
b0b7c7c9ee Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.60.1 to 1.61.0.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.60.1...v1.61.0)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-26 14:45:16 +00:00
Evan Lezar
b466270a24 Merge pull request #666 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/matryer/moq-0.5.0
Bump github.com/matryer/moq from 0.3.4 to 0.5.0 in /deployments/devel
2024-09-26 16:43:49 +02:00
Evan Lezar
d806f1045b Merge pull request #677 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.6.1-base-ubuntu20.04
Bump nvidia/cuda from 12.6.0-base-ubuntu20.04 to 12.6.1-base-ubuntu20.04 in /deployments/container
2024-09-26 16:43:08 +02:00
Evan Lezar
35ee96ac41 Merge pull request #685 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.25.0
Bump golang.org/x/sys from 0.24.0 to 0.25.0
2024-09-26 16:42:30 +02:00
Evan Lezar
f8141aab27 Skip explicit creation of libnvidia-allocator.so.1 symlink
Since we expect .so.1 symlinks to be created by ldconfig, we don't
explicitly request these. This change removes the creation of
a libnvidia-allocator.so.1 -> libnvidia-allocator.so.RM_VERSION symlink
through the create-symlinks hook.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-26 16:34:39 +02:00
Evan Lezar
98ffe2aa67 Also search for driver libraries in vdpau
This change adds the vdpau subfolder to the paths searched
for driver libraries. This allows the libvdpau_nvidia.so.RM_VERSION
library to also be discovered.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-26 14:42:29 +02:00
Evan Lezar
79c59aeb7f Merge pull request #713 from sananya12/nvsandboxutils-sananya
Fix an incompatible pointer conversion in nvsandboxutils
2024-09-26 13:12:21 +02:00
Sananya Majumder
906531fee3 Fix incompatible pointer conversion
This change adds a safe pointer conversion to fix an
incompatible C pointer conversion, which caused build failures on some
architectures.

Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-25 16:40:43 -07:00
Evan Lezar
0e68f60c0b Merge pull request #629 from sananya12/nvsandboxutils-sananya
Add changes for usage of nvsandboxutils
2024-09-25 19:17:05 +02:00
Sananya Majumder
563db0e0be nvsandboxutils: Add usage of GetGpuResource and GetFileContent APIs
This change adds a new discoverer for Sandboxutils to report the file
system paths and associated symbolic links using GetGpuResource and
GetFileContent APIs. Both GPU and MIG devices are supported. If the
Sandboxutils discoverer fails, the NVML discoverer is used to report
the file system information.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:05:14 -07:00
Sananya Majumder
7b770f63c3 nvsandboxutils: Add usage of GetDriverVersion API
This change includes the usage of Sandboxutils GetDriverVersion API to
retrieve the CUDA driver version. If the library is not available on the
system or the API call fails for some other reason, it will fallback to
the NVML API to return the driver version.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:05:11 -07:00
Sananya Majumder
dcbf5bc81f nvsandboxutils: Add implementation for the APIs
This change adds manual wrappers around the generated bindings to make
them into more user-friendly APIs for the caller. Some helper functions
are also added.
The APIs that are currently present in the library and implemented here
are:
nvSandboxUtilsInit
nvSandboxUtilsShutdown
nvSandboxUtilsGetDriverVersion
nvSandboxUtilsGetGpuResource
nvSandboxUtilsGetFileContent

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:05:09 -07:00
Sananya Majumder
978d439cf8 nvsandboxutils: Add internal bindings
This change adds the internal bindings for Sandboxutils, some of which
have been automatically generated with the help of c-for-go. The format
followed is similar to what is used in go-nvml. These would need to be
regenerated when the header file is modified and new APIs are added.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:05:05 -07:00
Sananya Majumder
aa946f3f59 nvsandboxutils: Add script to generate bindings
This change adds a script and related files to generate the internal
bindings for Sandboxutils library with the help of c-for-go.
This can be used to update the bindings when the header file is modified
with reference to how they are generated with the Makefile in go-nvml.

Run: ./update-bindings.sh

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:04:48 -07:00
Evan Lezar
a5a5833c14 Merge pull request #710 from elezar/update-changelog
Update CHANGELOG.md for v1.16.2 release
2024-09-24 17:22:29 +02:00
Evan Lezar
8e07d90802 Update CHANGELOG.md for v1.16.2 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-24 14:12:24 +02:00
Evan Lezar
2aadbbf22d Merge pull request #709 from elezar/bump-version-v1.16.2
Bump version to v1.16.2
2024-09-20 20:52:54 +02:00
Evan Lezar
88f9414849 Bump version to v1.16.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-20 20:50:33 +02:00
Evan Lezar
53c2dc6301 Merge pull request #705 from elezar/remove-libnvidia-contianer-update-check
[no-relnote] Don't check for submodule changes
2024-09-20 20:50:17 +02:00
Evan Lezar
3121663537 Merge pull request #708 from elezar/revert-check-symlink-resolution
Revert "Merge pull request #696 from elezar/check-link-resolution"
2024-09-20 20:50:02 +02:00
Evan Lezar
4b6de8036d Merge pull request #707 from elezar/revert-no-persistenced-flag-by-default
Revert #703 and #694
2024-09-20 20:46:13 +02:00
Evan Lezar
dc2ccdd2fa Revert "Merge pull request #696 from elezar/check-link-resolution"
This reverts commit c490baab63, reversing
changes made to 32486cf1e9.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-20 20:29:42 +02:00
Evan Lezar
5145b0a4b6 Revert "Merge pull request #694 from elezar/add-opt-in-to-sockets"
This reverts commit b061446694, reversing
changes made to c490baab63.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-20 20:26:45 +02:00
Evan Lezar
a819cfdab4 Revert "Merge pull request #703 from elezar/fix-no-persistenced-flag"
This reverts commit c02b144ed4, reversing
changes made to b061446694.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-20 20:26:33 +02:00
Evan Lezar
629db79b4f [no-relnote] Don't check for submodule changes
With the move to dependabot to udpate the libnvidia-container
submodule it is no longer required to have checks in place that
ensure that this is up to date when building.

In addition when re-building old commits in CI, for example we explicitly
do NOT want to have this check.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-20 20:20:36 +02:00
Evan Lezar
8693dd6962 [no-relnote] Add CodeQL workflow
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-19 14:36:45 +02:00
dependabot[bot]
51cc619eab Bump github.com/matryer/moq from 0.3.4 to 0.5.0 in /deployments/devel
Bumps [github.com/matryer/moq](https://github.com/matryer/moq) from 0.3.4 to 0.5.0.
- [Release notes](https://github.com/matryer/moq/releases)
- [Changelog](https://github.com/matryer/moq/blob/main/.goreleaser.yml)
- [Commits](https://github.com/matryer/moq/compare/v0.3.4...v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/matryer/moq
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-18 22:45:12 +00:00
Tariq
9b72161b63 Merge pull request #702 from NVIDIA/sync-gomod2 2024-09-19 06:43:55 +08:00
Evan Lezar
d925b6596b Merge pull request #687 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.23.1
Bump golang from 1.23.0 to 1.23.1 in /deployments/devel
2024-09-18 23:30:30 +02:00
Evan Lezar
c02b144ed4 Merge pull request #703 from elezar/fix-no-persistenced-flag
[no-relnote] Move --no-persistenced flag to after configure
2024-09-18 22:58:43 +02:00
Evan Lezar
16fdef50e6 [no-relnote] Move --no-persistenced flag to after configure
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 22:52:25 +02:00
Christopher Desiniotis
b061446694 Merge pull request #694 from elezar/add-opt-in-to-sockets
Skip injection of nvidia-persistenced socket by default
2024-09-18 13:39:28 -07:00
Evan Lezar
9c2476c98d Expose opt-in features in toolkit-container
This change enables opt-in (off-by-default) features to be opted into.
These features can be toggled by name by specifying the (repeated)
--opt-in-feature command line argument or as a comma-separated list
in the NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES environment variable.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 22:30:27 +02:00
Evan Lezar
70da6cfa50 Allow inclusion of persistenced socket in CDI specification
This change adds an include-persistenced-socket flag to the
nvidia-ctk cdi generate command that ensures that a generated
specification includes the nvidia-persistenced socket if present on
the host.

Note that for mangement mode, these sockets are always included
if detected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 22:30:27 +02:00
Evan Lezar
a4bfccc3fe Use include-persistenced-socket feature for CDI mode
This change ensures that the internal CDI representation includes
the persistenced socket if the include-persistenced-socket feature
flag is enabled.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 22:30:27 +02:00
Tariq Ibrahim
66f50d91bd sync go.mod with go mod tidy in deployments/devel
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-09-18 13:17:54 -07:00
Evan Lezar
ba1ed3232f Skip injection of nvidia-persistenced socket by default
This changes skips the injection of the nvidia-persistenced socket by
default.

An include-persistenced-socket feature flag is added to allow the
injection of this socket to be explicitly requested.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 22:10:09 +02:00
Evan Lezar
c490baab63 Merge pull request #696 from elezar/check-link-resolution
Check for valid paths in create-symlinks hook
2024-09-18 22:05:14 +02:00
Evan Lezar
32486cf1e9 Merge pull request #701 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-921e2f3
Bump third_party/libnvidia-container from `6b8b8e4` to `921e2f3`
2024-09-18 22:04:37 +02:00
dependabot[bot]
b9c3185d72 Bump third_party/libnvidia-container from 6b8b8e4 to 921e2f3
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `6b8b8e4` to `921e2f3`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](6b8b8e425e...921e2f3197)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-18 20:00:15 +00:00
Evan Lezar
e383b75a4d Check for valid paths in create-symlinks hook
This change updates the create-symlinks hook to check whether
link paths resolve in the container's filesystem. In addition
the executable is updated to return an error if a link could
not be created.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 17:47:19 +02:00
Evan Lezar
5827434cae Merge pull request #699 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-6b8b8e4
Bump third_party/libnvidia-container from `4c2494f` to `6b8b8e4`
2024-09-18 16:22:21 +02:00
dependabot[bot]
6ad699c095 Bump third_party/libnvidia-container from 4c2494f to 6b8b8e4
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `4c2494f` to `6b8b8e4`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](4c2494f165...6b8b8e425e)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-18 09:05:59 +00:00
dependabot[bot]
2450b002a8 Bump golang from 1.23.0 to 1.23.1 in /deployments/devel
Bumps golang from 1.23.0 to 1.23.1.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-09 08:25:21 +00:00
dependabot[bot]
03d1acc7b0 Bump golang.org/x/sys from 0.24.0 to 0.25.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.24.0 to 0.25.0.
- [Commits](https://github.com/golang/sys/compare/v0.24.0...v0.25.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-08 08:25:45 +00:00
dependabot[bot]
39120d5878 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.6.0-base-ubuntu20.04 to 12.6.1-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-06 08:44:24 +00:00
Evan Lezar
da5e3ce8c3 Merge pull request #634 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.20.0
Bump golang.org/x/mod from 0.19.0 to 0.20.0
2024-09-04 12:41:04 +02:00
dependabot[bot]
b47f954b7c Bump golang.org/x/mod from 0.19.0 to 0.20.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.19.0 to 0.20.0.
- [Commits](https://github.com/golang/mod/compare/v0.19.0...v0.20.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-27 07:44:17 +00:00
Evan Lezar
f959c0daaa Merge pull request #644 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.24.0
Bump golang.org/x/sys from 0.22.0 to 0.24.0
2024-08-27 09:43:10 +02:00
Evan Lezar
c3c0cdcc89 Merge pull request #660 from elezar/fix-libnvidia-allocator-duplicate-mount
Exclude libnvidia-allocator from graphics mounts
2024-08-22 14:00:25 +02:00
Evan Lezar
e3936fd623 Merge pull request #661 from elezar/fix-golangci-lint
[no-relnote] Ignore integer overflow in golangci-lint
2024-08-22 11:43:17 +02:00
Evan Lezar
f4987580d2 [no-relnote] Ignore integer overflow in golangci-lint
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-08-22 11:33:59 +02:00
dependabot[bot]
f3fceab317 Bump golang.org/x/sys from 0.22.0 to 0.24.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.22.0 to 0.24.0.
- [Commits](https://github.com/golang/sys/compare/v0.22.0...v0.24.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-20 18:29:08 +00:00
Evan Lezar
ff2ba2bbc5 Merge pull request #655 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.4
Bump github.com/urfave/cli/v2 from 2.27.3 to 2.27.4
2024-08-20 20:27:53 +02:00
Evan Lezar
0c554cbf7e Merge pull request #652 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.23.0
Bump golang from 1.22.6 to 1.23.0 in /deployments/devel
2024-08-20 17:18:06 +02:00
Evan Lezar
beb921fafe Exclude libnvidia-allocator from graphics mounts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-08-20 16:28:14 +02:00
Evan Lezar
45030d1169 Merge pull request #651 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.6.0-base-ubuntu20.04
Bump nvidia/cuda from 12.5.1-base-ubuntu20.04 to 12.6.0-base-ubuntu20.04 in /deployments/container
2024-08-19 11:39:46 +02:00
dependabot[bot]
2f37055bc6 Bump golang from 1.22.6 to 1.23.0 in /deployments/devel
Bumps golang from 1.22.6 to 1.23.0.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-19 09:39:15 +00:00
Evan Lezar
5316c2a618 Merge pull request #657 from yeahdongcn/typo
Rename icp_test.go to ipc_test.go
2024-08-19 11:38:56 +02:00
dependabot[bot]
89e4ae70c8 Bump github.com/urfave/cli/v2 from 2.27.3 to 2.27.4
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.27.3 to 2.27.4.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.27.3...v2.27.4)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-19 09:38:51 +00:00
Evan Lezar
c68b4be12f Merge pull request #658 from elezar/fix-default-value
Use empty string for default runtime-config-override
2024-08-19 11:34:36 +02:00
Evan Lezar
89c12c1368 Use empty string for default runtime-config-override
This change ensures that we unnecessarily print warnings for
runtimes where these configs are not applicable.

This removes the following warnings:
WARN[0000] Ignoring runtime-config-override flag for docker

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-08-19 11:33:26 +02:00
Evan Lezar
9d4294450d Merge pull request #656 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.60.1
Bump github.com/golangci/golangci-lint from 1.59.1 to 1.60.1 in /deployments/devel
2024-08-19 11:32:34 +02:00
dependabot[bot]
070b40d62a Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.59.1 to 1.60.1.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.59.1...v1.60.1)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-19 11:31:11 +02:00
Xiaodong Ye
13862f3c75 Rename icp_test.go to ipc_test.go
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2024-08-19 15:39:54 +08:00
dependabot[bot]
05449807d1 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.5.1-base-ubuntu20.04 to 12.6.0-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-13 08:52:55 +00:00
Evan Lezar
b7948428ff Merge pull request #624 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.3
Bump github.com/urfave/cli/v2 from 2.27.2 to 2.27.3
2024-08-07 17:40:35 +02:00
Evan Lezar
d2750e86c3 Merge pull request #633 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.22.6
Bump golang from 1.22.5 to 1.22.6 in /deployments/devel
2024-08-07 17:39:35 +02:00
Evan Lezar
0e4759cf36 Merge pull request #628 from elezar/switch-release-branch
[no-relnote] Switch dependabot to release-1.16 branch
2024-08-07 17:38:37 +02:00
dependabot[bot]
d767e4cfe9 Bump golang from 1.22.5 to 1.22.6 in /deployments/devel
Bumps golang from 1.22.5 to 1.22.6.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-07 09:08:13 +00:00
Evan Lezar
e454caf577 [no-relnote] Switch dependabot to release-1.16 branch
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-08-01 14:48:35 +02:00
Evan Lezar
fa46c01331 Merge pull request #616 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvlib-0.6.1
Bump github.com/NVIDIA/go-nvlib from 0.6.0 to 0.6.1
2024-08-01 11:01:10 +02:00
Evan Lezar
122136fe92 Merge pull request #627 from elezar/add-mig-cdi-test
[no-relnotes] Add initial unit test for MIG CDI spec generation
2024-08-01 11:00:07 +02:00
Evan Lezar
fa16d83494 [no-relnotes] Add initial unit test for MIG CDI spec generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-31 11:56:36 +02:00
dependabot[bot]
faabbd36d6 Bump github.com/urfave/cli/v2 from 2.27.2 to 2.27.3
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.27.2 to 2.27.3.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.27.2...v2.27.3)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-28 08:18:49 +00:00
Christopher Desiniotis
a470818ba7 Merge pull request #620 from NVIDIA/bump-v1.16.1
Bump version for v1.16.1 release
2024-07-23 07:55:16 -07:00
Christopher Desiniotis
404a2763cb Bump version for v1.16.1 release
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-07-22 20:38:53 -07:00
Christopher Desiniotis
87a63a6159 Merge pull request #619 from NVIDIA/fix-mig-get-minor-number
[bugfix] Process error type instead of nvml.Return
2024-07-22 15:21:49 -07:00
Christopher Desiniotis
d8ccd50349 [bugfix] Process error type instead of nvml.Return in internal/platform-support/dgpu
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-07-22 13:57:14 -07:00
dependabot[bot]
95694f082b Bump github.com/NVIDIA/go-nvlib from 0.6.0 to 0.6.1
Bumps [github.com/NVIDIA/go-nvlib](https://github.com/NVIDIA/go-nvlib) from 0.6.0 to 0.6.1.
- [Release notes](https://github.com/NVIDIA/go-nvlib/releases)
- [Commits](https://github.com/NVIDIA/go-nvlib/compare/v0.6.0...v0.6.1)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvlib
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-21 08:24:28 +00:00
Evan Lezar
c5c124b882 Merge pull request #612 from elezar/bump-release-v1.16.0
Bump version for v1.16.0 release
2024-07-15 15:39:37 +02:00
Evan Lezar
a67c3a1bb5 Bump version for v1.16.0 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-15 15:07:11 +02:00
Evan Lezar
51911974fb Merge pull request #611 from elezar/fix-generate-changelog
[no-relnote] Fix generate changelog
2024-07-15 15:06:44 +02:00
Evan Lezar
73a6c4d521 [no-relnote] Fix generate changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-15 15:06:24 +02:00
Evan Lezar
eb7983af41 Merge pull request #610 from elezar/fix-generate-changelog
[no-relnote] Fix generate-changelog
2024-07-15 14:48:13 +02:00
Evan Lezar
2273664328 [no-relnote] Fix generate-changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-15 14:47:54 +02:00
Evan Lezar
49acaef503 [no-relnote] Fix release script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-15 14:40:25 +02:00
Evan Lezar
fb6a767ada Merge pull request #609 from elezar/add-release-md
[no-relnote] Add RELEASE.md
2024-07-15 14:27:30 +02:00
Evan Lezar
76f419e3ba [no-relnote] Add RELEASE.md
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-15 14:25:12 +02:00
Evan Lezar
df3a7729ce Merge pull request #606 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.5.1-base-ubuntu20.04
Bump nvidia/cuda from 12.5.0-base-ubuntu20.04 to 12.5.1-base-ubuntu20.04 in /deployments/container
2024-07-15 11:04:34 +02:00
dependabot[bot]
509993e138 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.5.0-base-ubuntu20.04 to 12.5.1-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-15 08:36:11 +00:00
Evan Lezar
fce56f905b Merge pull request #601 from elezar/post-release-tooling-fix
[no-relnote] Fix release tooling
2024-07-11 14:58:02 +02:00
Evan Lezar
832fcba5cd [no-relnote] fix archive script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-11 10:34:28 +02:00
Evan Lezar
b646372998 [no-relnote] Fix release tooling
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 17:38:19 +02:00
Evan Lezar
d51cee6a9d Merge pull request #600 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.22.5
Bump golang from 1.22.1 to 1.22.5 in /deployments/devel
2024-07-10 13:31:09 +02:00
dependabot[bot]
2d9c45575c Bump golang from 1.22.1 to 1.22.5 in /deployments/devel
Bumps golang from 1.22.1 to 1.22.5.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-10 11:29:58 +00:00
Evan Lezar
49ada6ce56 Merge pull request #599 from elezar/add-build-image-tooling
Update build image tooling
2024-07-10 13:29:33 +02:00
Evan Lezar
d8edb83673 [no-relnote] Add dependabot rule for devel/Dockerfile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 13:28:58 +02:00
Evan Lezar
969d6c3a4b [no-relnotes] Add golangci-lint to devel image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 13:26:02 +02:00
Evan Lezar
ee4f1f4a24 Merge pull request #597 from elezar/add-build-image-tooling
[no-relnotes] Use a local development image
2024-07-10 13:25:22 +02:00
Evan Lezar
1b1b03d46e [no-relnotes] Use a local development image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 13:21:46 +02:00
Evan Lezar
cf417b0308 Merge pull request #595 from elezar/manage-golang-version-dependabot
Manage golang version with dependabot
2024-07-10 13:13:14 +02:00
Evan Lezar
aedd3b8f04 [no-relnote] Define golang version in Dockefile
This change adds logic to extract the golang version from a Dockefile.
This allows us to trigger golang updates using dependabot.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 13:12:44 +02:00
Evan Lezar
d969c6f302 Merge pull request #593 from elezar/bump-release-v1.16.0-rc.2
Bump version for v1.16.0-rc.2 release
2024-07-10 12:01:02 +02:00
Evan Lezar
976fdae5d0 [no-relnote] Fix version bump in release script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 11:56:42 +02:00
Evan Lezar
0eec445d7a Bump version for v1.16.0-rc.2 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 11:56:42 +02:00
Evan Lezar
448a3853ad Merge pull request #552 from elezar/refactor-dgpu-discovery
Refactor dGPU device discovery
2024-07-10 11:48:30 +02:00
Evan Lezar
9dd4e357b7 Merge pull request #562 from elezar/add-release-ci
[no-relnote] Add basic release workflow
2024-07-10 11:32:20 +02:00
Evan Lezar
6358c13dab [no-relnote] Add basic release workflow
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 11:20:40 +02:00
Evan Lezar
e4cdc48854 Merge pull request #591 from elezar/fix-release-scripts
[no-relnote] fix package release scripts
2024-07-09 15:34:13 +02:00
Evan Lezar
8e079c040a [no-relnote] fix package release scripts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 15:33:41 +02:00
Evan Lezar
8b010313e5 Merge pull request #548 from elezar/additional-graphics-libs
Inject additional libraries for full X11 functionality
2024-07-09 14:57:55 +02:00
Evan Lezar
bd7295702e Merge pull request #584 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.22.0
Bump golang.org/x/sys from 0.21.0 to 0.22.0
2024-07-09 14:26:10 +02:00
dependabot[bot]
135f682a87 Bump golang.org/x/sys from 0.21.0 to 0.22.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.21.0 to 0.22.0.
- [Commits](https://github.com/golang/sys/compare/v0.21.0...v0.22.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-09 12:01:41 +00:00
Evan Lezar
077ad3eb25 Merge pull request #583 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.19.0
Bump golang.org/x/mod from 0.18.0 to 0.19.0
2024-07-09 14:00:54 +02:00
Evan Lezar
e527cc1ff5 Use relative path to locate driver libraries
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:54:07 +02:00
Evan Lezar
55ea268829 Add RelativeToRoot function to Driver
This adds a function to return a path as a path relative to
the specified driver root.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:54:07 +02:00
Seungmin Kim
aae3da88c3 Inject additional libraries for full X11 functionality
This change updates X11 library detection to ensure that this
works for a wider range of driver installations.

Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
Signed-off-by: tux-rampage <tuxrampage@gmail.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:53:55 +02:00
dependabot[bot]
bb2be19a6c Bump golang.org/x/mod from 0.18.0 to 0.19.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.18.0 to 0.19.0.
- [Commits](https://github.com/golang/mod/compare/v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-09 11:53:13 +00:00
Evan Lezar
6fd2fc0c24 Merge pull request #587 from NVIDIA/dependabot/go_modules/main/tags.cncf.io/container-device-interface-0.8.0
Bump tags.cncf.io/container-device-interface from 0.7.2 to 0.8.0
2024-07-09 13:52:27 +02:00
Evan Lezar
7c2e624492 Merge pull request #589 from elezar/fix-dependabot
[no-relnote] Remove gh-pages action update
2024-07-09 13:52:14 +02:00
Evan Lezar
c0a3864ab4 [no-relnote] Remove gh-pages action update
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:50:09 +02:00
dependabot[bot]
0c309df7e7 Bump tags.cncf.io/container-device-interface from 0.7.2 to 0.8.0
Bumps [tags.cncf.io/container-device-interface](https://github.com/cncf-tags/container-device-interface) from 0.7.2 to 0.8.0.
- [Release notes](https://github.com/cncf-tags/container-device-interface/releases)
- [Commits](https://github.com/cncf-tags/container-device-interface/compare/v0.7.2...v0.8.0)

---
updated-dependencies:
- dependency-name: tags.cncf.io/container-device-interface
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-09 11:48:04 +00:00
Evan Lezar
46838b1a44 Merge pull request #576 from elezar/get-options-from-default
Extract options from default runtime if runc does not exist
2024-07-09 13:34:57 +02:00
Evan Lezar
be11cf428b [no-relnote] Add MIG discoverer to dgpu package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:34:04 +02:00
Evan Lezar
b42a5d3e3a [no-relnote] Refactor dGPU device discovery
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:34:04 +02:00
Evan Lezar
b8389283d5 Extract options from default runtime if runc does not exist
This change updates the logic to populate the options for the
nvidia runtime configs added to containerd or crio from a default runtime
if this is specified and a runc entry is not found.

This allows the default runtime values for settings such as SystemdCgroup
to be applied correctly.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-04 12:44:08 +02:00
Evan Lezar
6732f6d13b Merge pull request #577 from tariq1890/no-map-ptr
avoid using map pointers as maps are always passed by reference
2024-07-03 12:36:58 +02:00
Tariq Ibrahim
70ef0fb973 avoid using map pointers as maps are always passed by reference
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-07-02 17:35:44 -07:00
Evan Lezar
15c884e99f Merge pull request #558 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvlib-0.6.0
Bump github.com/NVIDIA/go-nvlib from 0.5.0 to 0.6.0
2024-07-02 16:21:02 +02:00
Evan Lezar
17d4d7da1f Merge pull request #574 from elezar/fix-centos7-builds
Use centos:7 vault repos for builds
2024-07-01 16:41:40 +02:00
Evan Lezar
c96ac07bf7 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 16:32:41 +02:00
Evan Lezar
a6a96a8d0e Merge pull request #560 from elezar/reduce-logging
Reduce logging verbosity in NVIDIA Container Runtime
2024-07-01 16:21:59 +02:00
Evan Lezar
be61ba01d9 [no-relnote] Use centos:7 vault repos for builds
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 16:21:20 +02:00
Evan Lezar
3aeba886d4 Reduce logging for the NVIDIA Container runtime
This change moves most of the logging for the NVIDIA Container runtime
from Info to Debug or Trace -- especially in the case where no
modifications are requried.

This removes execessive logging.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 12:12:38 +02:00
Evan Lezar
490c7dd599 Add Tracef to logger Interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 12:12:38 +02:00
Evan Lezar
8df5e33ef6 Add String function to oci.Runtime interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 12:12:38 +02:00
Evan Lezar
f55aac0232 Merge pull request #573 from elezar/bugfix-in-log-argument
Fix bug in argument parsing for logger creation
2024-07-01 12:12:14 +02:00
Evan Lezar
2c8431c1f8 Fix bug in argument parsing for logger creation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 12:03:53 +02:00
Evan Lezar
f35f3903ab Merge pull request #551 from Weselton/main
[no-relnote] Fixed spelling error
2024-06-24 15:00:56 +02:00
Evan Lezar
bbc5363009 Merge pull request #561 from elezar/update-generate-changelog
[no-relnote] Update generate changelog filter
2024-06-24 13:04:55 +02:00
Evan Lezar
0b944ba274 [no-relnote] Update generate changelog filter
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-24 13:04:13 +02:00
dependabot[bot]
ca528c4f53 Bump github.com/NVIDIA/go-nvlib from 0.5.0 to 0.6.0
Bumps [github.com/NVIDIA/go-nvlib](https://github.com/NVIDIA/go-nvlib) from 0.5.0 to 0.6.0.
- [Release notes](https://github.com/NVIDIA/go-nvlib/releases)
- [Commits](https://github.com/NVIDIA/go-nvlib/compare/v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvlib
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-23 08:42:01 +00:00
German Maltsev
692dac4cbd [no-relnote] Fixed spelling error
Signed-off-by: German Maltsev <maltsev.german@gmail.com>
2024-06-19 17:40:56 +03:00
Evan Lezar
6b78c72fec Merge pull request #550 from ArangoGutierrez/ref_name
Use ref_name on release workflow
2024-06-18 11:49:59 +02:00
Carlos Eduardo Arango Gutierrez
a3223f32e0 Use ref_name on release workflow
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-06-17 16:56:39 +02:00
Evan Lezar
f2eb4ea9ba Merge pull request #549 from elezar/bump-v1.16.0-rc.1
Bump v1.16.0 rc.1
2024-06-17 15:35:27 +02:00
Evan Lezar
4686f9499c Add basic release workflow
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-17 15:33:11 +02:00
Evan Lezar
3f481cd20a Update changelog for v1.16.0-rc.1 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-17 15:17:28 +02:00
Evan Lezar
cd52be86e6 Merge pull request #529 from elezar/vulkan-location
Support vulkan ICD files in a driver root
2024-06-17 15:00:29 +02:00
Evan Lezar
b5743da52f Merge pull request #526 from elezar/add-dev-root-to-create-device-nodes
Add dev-root option to create-device-nodes
2024-06-17 14:57:26 +02:00
Evan Lezar
03ccd64f33 Merge pull request #512 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvml-0.12.4-0
Bump github.com/NVIDIA/go-nvml from 0.12.0-6 to 0.12.4-0
2024-06-17 14:23:00 +02:00
Evan Lezar
33369861fc Merge pull request #547 from NVIDIA/revert-490-main
Revert "Inject additional libraries required for full display functionality"
2024-06-17 11:47:36 +02:00
Evan Lezar
d9a1106e00 Revert "Inject additional libraries required for full display functionality"
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-17 11:47:15 +02:00
Evan Lezar
f26425d3fd Merge pull request #490 from ehfd/main
Inject additional libraries required for full display functionality
2024-06-17 11:41:38 +02:00
Evan Lezar
272585d261 Merge pull request #544 from elezar/allow-toolkit-pid-file-to-be-specified
Allow toolkit.pid path to be specified
2024-06-17 11:39:44 +02:00
Evan Lezar
fe5a44cb35 Merge pull request #527 from elezar/increase-priority-of-injected-libs
Increase priority of ld.so.conf.d config file
2024-06-17 11:34:43 +02:00
Evan Lezar
5f2be72335 Support vulkan ICD files in a driver root
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-17 11:33:29 +02:00
Evan Lezar
ae074e7ba2 Merge pull request #545 from elezar/bump-version-v1.16.0-rc.1
Bump version to v1.16.0-rc.1
2024-06-17 11:32:07 +02:00
Evan Lezar
876d479308 Allow toolkit.pid path to be specified
This change makes the following changes:
* Allows the toolkit.pid path to be specified
* Creates the toolkit.pid file at /run/nvidia/toolkit/toolkit.pid by default
* Handles failures to remove the /run/nvidia/toolkit folder

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-17 11:26:23 +02:00
dependabot[bot]
abd638add9 Bump github.com/NVIDIA/go-nvml from 0.12.0-6 to 0.12.4-0
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.0-6 to 0.12.4-0.
- [Commits](https://github.com/NVIDIA/go-nvml/compare/v0.12.0-6...v0.12.4-0)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-14 13:27:46 +00:00
Evan Lezar
1dd59101c7 Bump version to v1.16.0-rc.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-14 15:25:21 +02:00
Evan Lezar
55630bc2c0 Merge pull request #542 from elezar/remove-provenance
Remove provenance information from image manifests
2024-06-14 15:16:06 +02:00
Evan Lezar
4f0de9f1ef Increase priority of ld.so.conf.d config file
This change ensures that the created /etc/ld.so.conf.d file
has a higher priority to ensure that the injected libraries
take precendence over non-compat libraries.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-13 13:49:14 +02:00
Evan Lezar
bced007f87 Remove provenance information from image manifests
Tools such as oc mirror do not support the provenence metadata
added to the image manifests with newer docker buildx versions.

This change disables the addition of provenance information.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-13 13:36:07 +02:00
Evan Lezar
ac90b7963d Merge pull request #519 from shivakunv/ngc_signing_job
add ngc image signing job for auto signing
2024-06-13 12:01:19 +02:00
shiva kumar
2e947edbe4 add ngc image signing job for auto signing
Signed-off-by: shiva kumar <shivaku@nvidia.com>
2024-06-12 13:20:35 +05:30
Evan Lezar
9fde4b21df Merge pull request #539 from elezar/fix-ppcle64-builds
Fix ppcle64 builds
2024-06-11 14:49:37 +02:00
Evan Lezar
84e0060fe8 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-11 14:31:31 +02:00
Evan Lezar
024dd3126d Use archived package repo for centos:stream8
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-11 14:31:06 +02:00
Evan Lezar
86b272cc7b Merge pull request #533 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.18.0
Bump golang.org/x/mod from 0.17.0 to 0.18.0
2024-06-10 13:43:15 +02:00
dependabot[bot]
2bc24970e0 Bump golang.org/x/mod from 0.17.0 to 0.18.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.17.0 to 0.18.0.
- [Commits](https://github.com/golang/mod/compare/v0.17.0...v0.18.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-10 11:35:48 +00:00
Evan Lezar
00dc0daecc Merge pull request #532 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.21.0
Bump golang.org/x/sys from 0.20.0 to 0.21.0
2024-06-10 13:34:42 +02:00
dependabot[bot]
e3120cbe64 Bump golang.org/x/sys from 0.20.0 to 0.21.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.20.0 to 0.21.0.
- [Commits](https://github.com/golang/sys/compare/v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-10 11:32:41 +00:00
Evan Lezar
00d82dd540 Merge pull request #536 from elezar/bump-github.com/xrash/smetrics
Bump github.com/xrash/smetrics to v0.0.0-20240521201337-686a1a2994c1
2024-06-10 13:31:58 +02:00
Evan Lezar
8fe366683e Bump github.com/xrash/smetrics to v0.0.0-20240521201337-686a1a2994c1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-10 13:30:14 +02:00
Evan Lezar
7320fcd86d Merge pull request #524 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.5.0-base-ubuntu20.04
Bump nvidia/cuda from 12.4.1-base-ubuntu20.04 to 12.5.0-base-ubuntu20.04 in /deployments/container
2024-06-10 13:23:25 +02:00
Evan Lezar
01f212b7a8 Merge pull request #528 from elezar/set-cdi-permissions-644
Set default CDI spec permissions to 644
2024-06-10 13:20:18 +02:00
Evan Lezar
71e0b8590f Set default CDI spec permissions to 644
Although the nvidia-ctk cdi generate command generates
specs with 644 permissions, the nvidia-ctk cdi transform
commands do not. This change sets the default permissions
to 600 instead of 644.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-05 11:27:03 +02:00
tux-rampage
e841c6256a Add driver hooks 2024-06-05 06:42:25 +02:00
tux-rampage
c2411e644e Build xorg lib search paths dynamically 2024-06-04 19:40:05 +02:00
Evan Lezar
dffce25637 Rename driver-root option to root
This change renames the nvidia-ctk system create-device-nodes
flag driver-root to root. This makes it clearer that this is
used to load the kernel modules and is not specific to the
user-mode driver installation.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-04 10:25:49 +02:00
Evan Lezar
f5a4b23041 Add dev-root option to create-device-nodes
This allows for dev nodes to be created in cases
where the driver root and the dev root do not
match.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-04 10:24:26 +02:00
Evan Lezar
dfc8e22e12 Merge pull request #360 from elezar/add-device-root-to-toolkit-container
Add dev-root option to toolkit container
2024-06-04 10:09:37 +02:00
Seungmin Kim
155fe66575 Merge branch 'NVIDIA:main' into main 2024-06-04 16:29:35 +09:00
Evan Lezar
9208159263 Add dev-root option to toolkit container
This changes adds an option to the toolkit container to allow
the dev root to be specified. This adds support for driver installations
where the driver files are at one root and the dev nodes are created
elsewhere -- most typically at /. This is the case, for example, for
GKE driver installations.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-03 20:40:30 +02:00
Evan Lezar
9b83d09f18 Merge pull request #440 from elezar/cdi-generation-with-driver-root
Find libnvidia-ml in driver root
2024-06-03 13:31:54 +02:00
Evan Lezar
c5eda7af8e Ensure that libnvidia-ml.so.1 is found in driver root
This change ensures that the driver root is used to locate libnvidia-ml.so.1
if required.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-03 12:01:10 +02:00
dependabot[bot]
572b0401a4 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.4.1-base-ubuntu20.04 to 12.5.0-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-03 08:21:46 +00:00
Seungmin Kim
0d70052105 Merge branch 'NVIDIA:main' into main 2024-06-02 17:42:21 +09:00
Evan Lezar
bead6f98f3 Merge pull request #518 from elezar/fix-staging-image-repo
Remove trailing slash from staging registry
2024-05-29 16:51:39 +02:00
Evan Lezar
533d7119db Remove trailing slash from staging registry
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-29 16:51:20 +02:00
Evan Lezar
e4b46a09a7 Merge pull request #516 from elezar/update-infolib-construction
Update infolib construction
2024-05-28 13:32:55 +02:00
Evan Lezar
8fc4b9c742 Add WithInfoLib option to CDI package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-28 13:30:31 +02:00
Evan Lezar
ef57c07199 Bump github.com/NVIDIA/go-nvlib to v0.5.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-28 13:28:28 +02:00
Evan Lezar
b407109bdf Merge pull request #515 from elezar/fix-cdi-construction
Ensure consistent construction order for libs
2024-05-28 12:33:37 +02:00
Evan Lezar
abb5abaea4 Ensure consistent construction order for libs
This change ensures that nvnllib and devicelib are constructed
before these are used to construct infolib.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-28 12:05:44 +02:00
Evan Lezar
e55e6abc09 Merge pull request #506 from elezar/set-version-on-transform
Set mininum spec version on save
2024-05-28 11:48:28 +02:00
Evan Lezar
17c044eef8 Set minimum version on save
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-24 15:36:18 +02:00
Evan Lezar
edda11d647 Merge pull request #428 from elezar/fix-cdi-mode-resolution
Fix cdi mode resolution
2024-05-21 13:22:10 +02:00
Evan Lezar
52d0383b47 Bump github.com/NVIDIA/go-nvlib to v0.4.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-21 12:25:54 +02:00
Evan Lezar
3defc6babb Use go-nvlib mode resolution
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-21 12:25:54 +02:00
Evan Lezar
7b988f15ab Merge pull request #474 from deitch/separate-hook-binary
move nvidia-ctk hook command into nvidia-cdi-hook binary
2024-05-21 12:24:56 +02:00
Avi Deitcher
179d8655f9 Move nvidia-ctk hook command into own binary
This change creates an nvidia-cdi-hook binary for implementing
CDI hooks. This allows for these hooks to be separated from the
nvidia-ctk command which may, for example, require libnvidia-ml
to support other functionality.

The nvidia-ctk hook subcommand is maintained as an alias for the
time being to allow for existing CDI specifications referring to
this path to work as expected.

Signed-off-by: Avi Deitcher <avi@deitcher.net>
2024-05-21 12:19:44 +02:00
Evan Lezar
2d7b2360d2 Merge pull request #497 from elezar/systemdcgroup
Add option to set additional containerd configs per runtime
2024-05-21 12:05:51 +02:00
Evan Lezar
a61dc148b2 Merge pull request #501 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvml-0.12.0-6
Bump github.com/NVIDIA/go-nvml from 0.12.0-5 to 0.12.0-6
2024-05-21 11:35:58 +02:00
dependabot[bot]
3f6b916a85 Bump github.com/NVIDIA/go-nvml from 0.12.0-5 to 0.12.0-6
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.0-5 to 0.12.0-6.
- [Commits](https://github.com/NVIDIA/go-nvml/compare/v0.12.0-5...v0.12.0-6)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-21 11:28:09 +02:00
Seungmin Kim
cf388e7e63 Update internal/discover/graphics.go
Co-authored-by: Evan Lezar <evanlezar@gmail.com>
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
2024-05-17 23:55:36 +09:00
Evan Lezar
b435b797af Add support for adding additional containerd configs
This allow for options such as SystemdCgroup to be optionally set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-17 12:58:08 +02:00
Evan Lezar
c86c3aeeaf Allow per-runtime config overrides
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-17 12:58:08 +02:00
Evan Lezar
f13f1bdba4 Merge pull request #484 from elezar/fix-config-set
Use : as a config --set slice separator
2024-05-13 12:32:53 +02:00
Seungmin Kim
55440f40b3 Make X11 search paths accurate
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
2024-05-11 04:31:11 +09:00
Seungmin Kim
cc34996684 Inject additional libraries required for full X11 functionality and fix paths
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
2024-05-11 00:33:41 +09:00
Evan Lezar
5a3eda4cba Use : as a config --set list separator
This allows settings such as:

nvidia-ctk config --set nvidia-container-runtime.runtimes=crun:runc

to be applied correctly.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-08 17:26:59 +02:00
Evan Lezar
973a6633b3 Merge pull request #479 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.2
Bump github.com/urfave/cli/v2 from 2.27.1 to 2.27.2
2024-05-07 22:46:49 +02:00
Evan Lezar
f4d0cfb687 Merge pull request #318 from cdesiniotis/update-func-signature
Get device specs by Identifier
2024-05-07 22:45:38 +02:00
Christopher Desiniotis
35b23c5a2c Accept device.Identifiers for requesting CDI specs
This change moves from using strings to useing device.Identifiers
as input for requesting CDI specifications for specific
devices.

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-07 21:50:28 +02:00
Evan Lezar
0dc87e5d69 Merge pull request #488 from NVIDIA/dependabot/github_actions/golangci/golangci-lint-action-6
Bump golangci/golangci-lint-action from 5 to 6
2024-05-07 15:22:24 +02:00
Evan Lezar
edc50f6e49 Merge pull request #485 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.20.0
Bump golang.org/x/sys from 0.19.0 to 0.20.0
2024-05-07 15:21:58 +02:00
dependabot[bot]
7de7444b0f Bump golangci/golangci-lint-action from 5 to 6
Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 5 to 6.
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](https://github.com/golangci/golangci-lint-action/compare/v5...v6)

---
updated-dependencies:
- dependency-name: golangci/golangci-lint-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-07 08:25:32 +00:00
dependabot[bot]
8d3ffcd122 Bump github.com/urfave/cli/v2 from 2.27.1 to 2.27.2
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.27.1 to 2.27.2.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.27.1...v2.27.2)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-06 11:45:55 +02:00
dependabot[bot]
d72481cbd7 Bump golang.org/x/sys from 0.19.0 to 0.20.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.19.0 to 0.20.0.
- [Commits](https://github.com/golang/sys/compare/v0.19.0...v0.20.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-05 08:27:16 +00:00
Evan Lezar
a442a5ed1f Merge pull request #478 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvml-0.12.0-5
Bump github.com/NVIDIA/go-nvml from 0.12.0-4 to 0.12.0-5
2024-05-03 13:14:38 +02:00
dependabot[bot]
7de58b4af4 Bump github.com/NVIDIA/go-nvml from 0.12.0-4 to 0.12.0-5
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.0-4 to 0.12.0-5.
- [Commits](https://github.com/NVIDIA/go-nvml/compare/v0.12.0-4...v0.12.0-5)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-28 08:29:54 +00:00
Carlos Eduardo Arango Gutierrez
fde099d25b Merge pull request #476 from NVIDIA/dependabot/github_actions/golangci/golangci-lint-action-5
Bump golangci/golangci-lint-action from 4 to 5
2024-04-25 12:31:18 +02:00
dependabot[bot]
0a3eb67df8 Bump golangci/golangci-lint-action from 4 to 5
Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 4 to 5.
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](https://github.com/golangci/golangci-lint-action/compare/v4...v5)

---
updated-dependencies:
- dependency-name: golangci/golangci-lint-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-25 08:31:16 +00:00
Carlos Eduardo Arango Gutierrez
78f250a6b0 Merge pull request #469 from elezar/switch-to-ghimages
Switch to ghcr.io/nvidia/container-toolkit staging images
2024-04-22 10:19:56 +02:00
Evan Lezar
0aed9a16ad Merge pull request #460 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvml-0.12.0-4
Bump github.com/NVIDIA/go-nvml from 0.12.0-3 to 0.12.0-4
2024-04-19 11:46:20 +02:00
Evan Lezar
f46b99c2f7 Switch to ghcr.io/nvidia/container-toolkit staging images
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-18 15:04:59 +02:00
Evan Lezar
d5f6e6f868 Use nvml/mock package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-18 14:53:37 +02:00
Evan Lezar
082ce066ed Replace go-nvlib/pkg/nvml with go-nvml/pkg/nvml
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-18 14:53:37 +02:00
Evan Lezar
bbaf543537 Update github.com/NVIDIA/go-nvlib to v0.3.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-18 14:53:37 +02:00
dependabot[bot]
50dd460eaa Bump github.com/NVIDIA/go-nvml from 0.12.0-3 to 0.12.0-4
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.0-3 to 0.12.0-4.
- [Commits](https://github.com/NVIDIA/go-nvml/compare/v0.12.0-3...v0.12.0-4)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-18 14:53:37 +02:00
Evan Lezar
b3af77166b Merge pull request #467 from NVIDIA/dependabot/go_modules/main/tags.cncf.io/container-device-interface-0.7.2
Bump tags.cncf.io/container-device-interface from 0.7.1 to 0.7.2
2024-04-18 14:50:33 +02:00
dependabot[bot]
d8cb812c8e Bump tags.cncf.io/container-device-interface from 0.7.1 to 0.7.2
Bumps [tags.cncf.io/container-device-interface](https://github.com/cncf-tags/container-device-interface) from 0.7.1 to 0.7.2.
- [Release notes](https://github.com/cncf-tags/container-device-interface/releases)
- [Commits](https://github.com/cncf-tags/container-device-interface/compare/v0.7.1...v0.7.2)

---
updated-dependencies:
- dependency-name: tags.cncf.io/container-device-interface
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-18 10:12:16 +00:00
Carlos Eduardo Arango Gutierrez
80386a7fb2 Merge pull request #463 from elezar/switch-maintenance-branch
Update maintenance dependabot rule
2024-04-18 12:11:04 +02:00
Evan Lezar
c0a5bbe7db Update maintenance dependabot rule
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-16 15:36:20 +02:00
Evan Lezar
ddeeca392c Merge pull request #462 from elezar/bump-version-v1.15.0
Bump version to v1.15.0
2024-04-15 15:21:52 +02:00
Evan Lezar
9944feee45 Bump version to v1.15.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-15 14:37:16 +02:00
Evan Lezar
762b14b6cd Merge pull request #459 from elezar/remove-runtime-docker
Remove runtime and docker packages
2024-04-15 11:51:32 +02:00
Evan Lezar
e76e10fb36 Remove third_party package folders
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-12 14:35:03 +02:00
Evan Lezar
fcdf565586 Remove tooling to build packages
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-12 14:32:46 +02:00
Evan Lezar
7a9bc14d98 Merge pull request #425 from jmbaur/discover-use-xdg
Use XDG_DATA_DIRS instead of hardcoding /usr/share
2024-04-11 19:09:53 +02:00
Jared Baur
5788e622f4 Use XDG_DATA_DIRS instead of hardcoding /usr/share
When running nvidia-ctk on a system that uses a custom XDG_DATA_DIRS
environment variable value, the configuration files for `glvnd`,
`vulkan`, and `egl` fail to get passed through from the host to the
container. Reading from XDG_DATA_DIRS instead of hardcoding the default
value allows for finding said files so they can be mounted in the
container.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-11 17:13:02 +02:00
Evan Lezar
29c0f82ed2 Merge pull request #327 from elezar/add-driver-config
Add config search path option to driver root
2024-04-11 16:58:33 +02:00
Evan Lezar
e1417bee64 Merge pull request #456 from elezar/fix-ubi8-image-build
Remove unneeded repo manipulation
2024-04-11 16:54:31 +02:00
Evan Lezar
5f9e49705c Remove unneeded repo manipulation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-11 16:44:42 +02:00
Evan Lezar
1d2b61ee11 Merge pull request #455 from elezar/fix-ubi8-image-build
Fix typo in dockerfile
2024-04-11 15:10:04 +02:00
Evan Lezar
271987d448 Fix typo in dockerfile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-11 15:06:22 +02:00
Evan Lezar
6cac2f5848 Merge pull request #446 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.19.0
Bump golang.org/x/sys from 0.18.0 to 0.19.0
2024-04-11 11:41:51 +02:00
dependabot[bot]
ef4eb0d3c6 Bump golang.org/x/sys from 0.18.0 to 0.19.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.18.0 to 0.19.0.
- [Commits](https://github.com/golang/sys/compare/v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-11 09:39:33 +00:00
Evan Lezar
04ab0595fa Merge pull request #449 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.17.0
Bump golang.org/x/mod from 0.16.0 to 0.17.0
2024-04-11 11:38:52 +02:00
Evan Lezar
9d3418d603 Merge pull request #454 from NVIDIA/dependabot/docker/deployments/container/nvidia/cuda-12.4.1-base-ubuntu20.04
Bump nvidia/cuda from 12.3.2-base-ubuntu20.04 to 12.4.1-base-ubuntu20.04 in /deployments/container
2024-04-11 11:38:18 +02:00
dependabot[bot]
57acd85fb1 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.3.2-base-ubuntu20.04 to 12.4.1-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-10 14:11:46 +00:00
Evan Lezar
6d69ca81de Merge pull request #453 from elezar/bump-cuda-version-in-containers
Add dependabot update for CUDA base images
2024-04-10 16:11:26 +02:00
Evan Lezar
be73581489 Add dependabot update for Dockefiles
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-10 16:06:27 +02:00
Evan Lezar
5682ce3149 Specify CUDA base images directly in Dockerfile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-10 16:06:27 +02:00
dependabot[bot]
cb2b000ddc Bump golang.org/x/mod from 0.16.0 to 0.17.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.16.0 to 0.17.0.
- [Commits](https://github.com/golang/mod/compare/v0.16.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-08 11:01:21 +00:00
Evan Lezar
cbc6ff73a4 Merge pull request #441 from elezar/bump-cdi-version
Update tags.cncf.io/container-device-interface to v0.7.1
2024-04-08 13:00:33 +02:00
Evan Lezar
4cd86caf67 Use NewCache instead of GetRegistry
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-05 17:09:17 +02:00
Evan Lezar
885313af3b Bump tags.cncf.io/container-device-interface to v0.7.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-05 17:05:40 +02:00
Evan Lezar
26e52b8013 Merge pull request #438 from elezar/refactor-driver-root
Create root.Driver instance at first usage
2024-04-03 15:11:45 +02:00
Evan Lezar
011c658945 Create root.Driver instance at first usage
This allows for testing through injection of the driver root.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-03 15:03:33 +02:00
Evan Lezar
413da20838 Merge pull request #362 from elezar/add-feature-flags
Add support for feature flags
2024-04-03 12:04:18 +02:00
Evan Lezar
09341a0934 Add support for feature flags
This change adds a features config that allows
individual features to be toggled at a global level. Each feature can (by default)
be controlled by an environment variable.

The GDS, MOFED, NVSWITCH, and GDRCOPY features are examples of such features.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-03 11:58:37 +02:00
Evan Lezar
2a9e3537ec Add config search paths option to driver root.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-02 23:03:05 +02:00
Evan Lezar
c374520b64 Merge pull request #436 from elezar/remove-verbose-from-tests
Remove verbose from tests
2024-04-02 17:46:33 +02:00
Evan Lezar
e982b9798c Remove verbose from tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-02 17:45:25 +02:00
Evan Lezar
7eb031919c Merge pull request #430 from lengrongfu/main
fix doc link
2024-04-02 17:41:19 +02:00
Evan Lezar
97950d6b8d Merge pull request #418 from elezar/bump-golang-version
Bump golang version to v1.22.1
2024-04-02 10:59:45 +02:00
Evan Lezar
1613f35bf5 Merge pull request #419 from elezar/rename-build-deployments
Rename build folder deployments
2024-04-02 10:57:54 +02:00
rongfu.leng
a78a7f866f fix doc
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
2024-03-28 13:55:25 +08:00
Evan Lezar
643b89e539 Add driver.Config
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-25 20:18:15 +02:00
Evan Lezar
bdfa525a75 Merge pull request #427 from elezar/add-functional-options-to-driver-root
Use functional options to construct driver root
2024-03-25 20:17:54 +02:00
Evan Lezar
93763d25f0 Use functional options to construct driver root
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-25 20:13:25 +02:00
Evan Lezar
5800e55027 Rename build folder deployments
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 14:00:24 +02:00
Evan Lezar
c572c3b787 Remove lint-internal make target
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:54:16 +02:00
Evan Lezar
3f7ed7c8db Rename golangci-lint target to lint
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:54:16 +02:00
Evan Lezar
cc6cbd4a89 Use versions.mk GOLANG version in CI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:54:16 +02:00
Evan Lezar
98ad835a77 Add vendor and check-vendor make targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:48:58 +02:00
Evan Lezar
3a1ac85020 Bump golang version to 1.22.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:46:47 +02:00
Evan Lezar
1ddc859700 Merge pull request #417 from elezar/bump-version-v1.15.0-rc.4
Bump version v1.15.0 rc.4
2024-03-19 10:47:10 +02:00
Evan Lezar
f1f629674e Bump CUDA base image to 12.3.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 10:37:36 +02:00
Evan Lezar
5a6bf02914 Bump version to v1.15.0-rc.4
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 10:36:02 +02:00
Evan Lezar
197cbbe0c6 Merge pull request #411 from elezar/bump-go-nvlib
Bump go-nvlib to v0.2.0  and go-nvml v0.12.0-3
2024-03-15 15:12:01 +02:00
Evan Lezar
b9abb44613 Bump go-nvlib to v0.2.0 and go-nvml v0.12.0-3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-15 15:09:12 +02:00
Evan Lezar
c4ec4a01f8 Merge pull request #384 from tariq1890/upd-go
Fix dockerfile arch selection
2024-03-15 08:58:27 +02:00
Tariq Ibrahim
f40f4369a1 Fix dockerfile arch selection
This change includes an arm64 arch check when installing golang.

Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-15 08:54:00 +02:00
Evan Lezar
2733661125 Merge pull request #383 from elezar/fix-golangci-lint-in-shell
Add GOLANGCI_LINT_CACHE to docker make targets
2024-03-15 08:51:29 +02:00
Evan Lezar
4806f6e70d Merge pull request #395 from elezar/add-nvidia-visible-devices-void
Add NVIDIA_VISIBLE_DEVICES=void to CDI specs
2024-03-13 10:00:38 +02:00
Evan Lezar
db21f5f9a8 Merge pull request #400 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.18.0
Bump golang.org/x/sys from 0.17.0 to 0.18.0
2024-03-12 15:16:05 +02:00
Evan Lezar
07443a0e86 Merge pull request #405 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-6c8f1df
Bump third_party/libnvidia-container from `86f1946` to `6c8f1df`
2024-03-12 15:15:32 +02:00
dependabot[bot]
675db67ebb Bump third_party/libnvidia-container from 86f1946 to 6c8f1df
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `86f1946` to `6c8f1df`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](86f19460fb...6c8f1df7fd)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-12 12:10:07 +00:00
Evan Lezar
14ecacf6d1 Merge pull request #403 from elezar/add-github-submodule-update
Add dependabot config to update libnvidia-container submodule
2024-03-12 13:47:17 +02:00
Evan Lezar
9451da1e6d Add dependabot config to update libnvidia-container submodule
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-12 13:45:04 +02:00
Evan Lezar
e30ddb398f Merge pull request #397 from jbujak/enumAdapters3
Use D3DKMTEnumAdapters3 for adapter enumeration
2024-03-12 13:33:25 +02:00
dependabot[bot]
375188495e Bump golang.org/x/sys from 0.17.0 to 0.18.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.17.0 to 0.18.0.
- [Commits](https://github.com/golang/sys/compare/v0.17.0...v0.18.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-12 11:14:05 +00:00
Evan Lezar
5ff48a5a89 Merge pull request #401 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.16.0
Bump golang.org/x/mod from 0.15.0 to 0.16.0
2024-03-12 13:13:24 +02:00
Jakub Bujak
44ae31d101 Use D3DKMTEnumAdapters3 for adapter enumeration
D3DKMTEnumAdapters3 is required to enumerate MCDM compute-only adapters

Signed-off-by: Jakub Bujak <jbujak@nvidia.com>
2024-03-12 11:42:08 +01:00
dependabot[bot]
942e5c7224 Bump golang.org/x/mod from 0.15.0 to 0.16.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.15.0 to 0.16.0.
- [Commits](https://github.com/golang/mod/compare/v0.15.0...v0.16.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-10 08:26:44 +00:00
Evan Lezar
88ad42ccd1 Add NVIDIA_VISIBLE_DEVICES=void to CDI specs
This change ensures taht NVIDIA_VISIBLE_DEVICES=void is included in
generated CDI specs. This prevents the NVIDIA Container Runtime Hook
from injecting devices if NVIDIA_VISIBLE_DEVICES=all is set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-04 16:10:06 +02:00
Evan Lezar
df9732dae4 Merge pull request #369 from NVIDIA/dependabot/go_modules/main/github.com/opencontainers/runtime-spec-1.2.0
Bump github.com/opencontainers/runtime-spec from 1.1.0 to 1.2.0
2024-03-01 22:49:29 +02:00
Evan Lezar
e66cc6a7b1 Merge pull request #392 from elezar/fix-non-fork-image-builds
Allow finer control of pushing images
2024-03-01 22:49:01 +02:00
dependabot[bot]
8f5a9a1918 Bump github.com/opencontainers/runtime-spec from 1.1.0 to 1.2.0
Bumps [github.com/opencontainers/runtime-spec](https://github.com/opencontainers/runtime-spec) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/opencontainers/runtime-spec/releases)
- [Changelog](https://github.com/opencontainers/runtime-spec/blob/main/ChangeLog)
- [Commits](https://github.com/opencontainers/runtime-spec/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: github.com/opencontainers/runtime-spec
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-01 22:42:29 +02:00
Evan Lezar
6b9dee5b77 Allow finer control of pushing images
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-01 22:42:13 +02:00
Evan Lezar
50bbf32cf0 Add GOLANGCI_LINT_CACHE to docker make targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-01 22:27:06 +02:00
Evan Lezar
413c1264ce Merge pull request #391 from elezar/add-lint-excludes-for-prestart
Add lint exclude for hooks.Prestart deprecation
2024-03-01 22:20:54 +02:00
Evan Lezar
c084756e48 Add lint exclude for hooks.Prestart deprecation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-01 22:18:27 +02:00
Evan Lezar
6265a2d89e Merge pull request #389 from NVIDIA/dependabot/go_modules/main/github.com/stretchr/testify-1.9.0
Bump github.com/stretchr/testify from 1.8.4 to 1.9.0
2024-03-01 22:16:05 +02:00
dependabot[bot]
72778ee536 Bump github.com/stretchr/testify from 1.8.4 to 1.9.0
Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.8.4 to 1.9.0.
- [Release notes](https://github.com/stretchr/testify/releases)
- [Commits](https://github.com/stretchr/testify/compare/v1.8.4...v1.9.0)

---
updated-dependencies:
- dependency-name: github.com/stretchr/testify
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-01 15:45:26 +00:00
Evan Lezar
2f11a190bf Merge pull request #388 from elezar/add-gh-pages-dependabot
Add dependabot for gh-pages branches
2024-03-01 17:05:02 +02:00
Evan Lezar
2d394f4624 Add dependabot for gh-pages branches
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-01 16:55:14 +02:00
Evan Lezar
ea55757bc3 Merge pull request #382 from elezar/remove-centos7-image
Remove centos7 container-toolkit image
2024-02-29 12:01:26 +02:00
Evan Lezar
2a620dc845 Merge pull request #387 from elezar/update-libnvidia-container
Update libnvidia-container
2024-02-29 12:00:34 +02:00
Evan Lezar
bad5369760 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-29 11:53:51 +02:00
Evan Lezar
2623e8a707 Allow finer control of pushing images
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-29 11:45:40 +02:00
Evan Lezar
05dd438489 Remove centos7 container-toolkit image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-28 15:45:09 +02:00
Evan Lezar
6780afbed1 Merge pull request #379 from NVIDIA/exists-fallback
[R550 driver support] add fallback logic to device.Exists(name)
2024-02-27 22:25:33 +02:00
Tariq Ibrahim
f80f4c485d [R550 driver support] add fallback logic to device.Exists(name)
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-02-27 11:59:35 -08:00
Evan Lezar
ac63063362 Merge pull request #375 from klueska/add-imex-support
Add imex support
2024-02-27 13:24:07 +02:00
Kevin Klues
761a425e0d Update reference to latest libnvidia-container
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-02-27 10:17:32 +01:00
Kevin Klues
296d4560b0 Add support for an NVIDIA_IMEX_CHANNELS envvar
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-02-26 20:09:43 +01:00
Evan Lezar
0409824106 Merge pull request #370 from elezar/remove-libnvidia-container0-dependency
Remove additional libnvidia-container0 dependency
2024-02-19 13:56:02 +01:00
Evan Lezar
562addc3c6 Remove additional libnvidia-container0 dependency
This change removes the additional libnvidia-container0=0.10.0+jetpack dependency
that was introduced for Tegra-based systems. These have since been migrated to
CDI-based direct injection using the NVIDIA Container Runtime.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-19 13:12:57 +01:00
Evan Lezar
ae82b2d9b6 Merge pull request #345 from elezar/allow-skip-of-device-nodes
Add --create-device-nodes option to toolkit config
2024-02-14 20:28:15 +01:00
Evan Lezar
355997d2d6 Merge pull request #314 from elezar/CNT-4032/mulitple-naming-strategies
Allow multiple naming strategies when generating CDI specification
2024-02-13 16:46:54 +01:00
Evan Lezar
b6efd3091d Use index and uuid as default device-name-strategies
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-13 16:38:18 +01:00
Evan Lezar
52da12cf9a Allow multiple device name strategies to be specified
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-13 16:38:05 +01:00
Evan Lezar
cd7d586afa Also ignore CDI errors if required
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-13 12:37:41 +01:00
Evan Lezar
cc4c2783a3 Add --create-device-nodes option to toolkit config
This change adds a --create-device-nodes option to the toolkit config CLI.
Most noteably, this allows the creation of control devices to be skipped
when CDI spec generation is enabled.

Currently values of "", "node", and "control" are supported and can be set
via the command line flag or the CREATE_DEVICE_NODES environment variable.

The default value of CREATE_DEVICE_NODES=control will trigger the creation
of control device nodes. Setting this envvar to include the (comma-separated)
strings of "" or "none" will disable device node creation regardless of
whether other supported strings are included.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-13 12:37:41 +01:00
Evan Lezar
a8d48808d7 Merge pull request #354 from elezar/add-release-1.14-dependabot
Add dependabot for release-1.14
2024-02-12 17:34:38 +01:00
Evan Lezar
aa724f1ac6 Add dependabot for release-1.14
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-12 17:32:18 +01:00
Evan Lezar
519b9f3cc8 Merge pull request #353 from elezar/update-changelog
Update changelog for #330
2024-02-12 17:29:06 +01:00
Evan Lezar
6e1bc0d7fb Update changelog for #330
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-12 17:28:40 +01:00
Evan Lezar
a2a1a78620 Merge pull request #330 from tariq1890/nvidia-dev-maj-num-lookup
add fallback logic when retrieving major number of the nvidia control device
2024-02-12 17:22:32 +01:00
Evan Lezar
ab7693ac9f Merge pull request #347 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.15.0
Bump golang.org/x/mod from 0.14.0 to 0.15.0
2024-02-12 15:47:01 +01:00
dependabot[bot]
f4df5308d0 Bump golang.org/x/mod from 0.14.0 to 0.15.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.14.0 to 0.15.0.
- [Commits](https://github.com/golang/mod/compare/v0.14.0...v0.15.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-12 13:35:09 +00:00
Evan Lezar
8dcc57c614 Merge pull request #348 from NVIDIA/dependabot/go_modules/main/github.com/sirupsen/logrus-1.9.3
Bump github.com/sirupsen/logrus from 1.9.0 to 1.9.3
2024-02-12 14:35:07 +01:00
Evan Lezar
6594f06e9a Merge pull request #349 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.17.0
Bump golang.org/x/sys from 0.16.0 to 0.17.0
2024-02-12 14:34:27 +01:00
Evan Lezar
8a706a97a0 Merge pull request #350 from NVIDIA/dependabot/github_actions/golangci/golangci-lint-action-4
Bump golangci/golangci-lint-action from 3 to 4
2024-02-12 14:33:43 +01:00
Evan Lezar
39f0bf21ce Merge pull request #346 from elezar/consistent-driver-root
Specify DRIVER_ROOT consistently
2024-02-12 14:33:15 +01:00
dependabot[bot]
0915a12e38 Bump golangci/golangci-lint-action from 3 to 4
Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 3 to 4.
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](https://github.com/golangci/golangci-lint-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: golangci/golangci-lint-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-12 08:13:37 +00:00
dependabot[bot]
e6cd897cc4 Bump golang.org/x/sys from 0.16.0 to 0.17.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.16.0 to 0.17.0.
- [Commits](https://github.com/golang/sys/compare/v0.16.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-11 08:49:59 +00:00
dependabot[bot]
35600910e0 Bump github.com/sirupsen/logrus from 1.9.0 to 1.9.3
Bumps [github.com/sirupsen/logrus](https://github.com/sirupsen/logrus) from 1.9.0 to 1.9.3.
- [Release notes](https://github.com/sirupsen/logrus/releases)
- [Changelog](https://github.com/sirupsen/logrus/blob/master/CHANGELOG.md)
- [Commits](https://github.com/sirupsen/logrus/compare/v1.9.0...v1.9.3)

---
updated-dependencies:
- dependency-name: github.com/sirupsen/logrus
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-11 08:49:55 +00:00
Evan Lezar
f89cef307d Specify DRIVER_ROOT consistently
This change ensures that CLI tools that require the path to the
driver root accept both the NVIDIA_DRIVER_ROOT and DRIVER_ROOT
environment variables in addition to the --driver-root command
line argument.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-09 14:28:56 +01:00
Evan Lezar
e96edb3f36 Merge pull request #342 from elezar/add-spec-dirs-to-cdi-list
Add spec-dir flag to nvidia-ctk cdi list command
2024-02-09 14:17:45 +01:00
Evan Lezar
bab4ec30af Improve error reporting for cdi list
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-08 14:58:48 +01:00
Evan Lezar
b6ab444529 Add spec-dirs argument to cdi list
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-08 14:50:14 +01:00
Evan Lezar
15d905def0 Merge pull request #309 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.1
Bump github.com/urfave/cli/v2 from 2.3.0 to 2.27.1
2024-02-07 11:01:46 +01:00
Evan Lezar
e64b723b71 Add proc.devices.New constructor
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-06 11:19:43 +01:00
Evan Lezar
f0545dd979 Merge pull request #333 from elezar/test-on-darwin
Fix build and tests targets on darwin
2024-02-06 10:16:04 +01:00
Tariq Ibrahim
f414ac2865 add fallback logic when retrieving major number of the nvidia control device
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-02-05 22:55:54 -08:00
Evan Lezar
772cf77dcc Fix build and test on darwin
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-05 23:58:28 +01:00
dependabot[bot]
026055a0b7 Bump github.com/urfave/cli/v2 from 2.3.0 to 2.27.1
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.3.0 to 2.27.1.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.3.0...v2.27.1)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-05 15:04:18 +00:00
Evan Lezar
812e6a2402 Merge pull request #308 from NVIDIA/dependabot/go_modules/main/github.com/pelletier/go-toml-1.9.5
Bump github.com/pelletier/go-toml from 1.9.4 to 1.9.5
2024-02-05 16:03:36 +01:00
Evan Lezar
b56aebb26f Merge pull request #310 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.16.0
Bump golang.org/x/sys from 0.7.0 to 0.16.0
2024-02-05 10:39:58 +01:00
dependabot[bot]
870903e03e Bump golang.org/x/sys from 0.7.0 to 0.16.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.7.0 to 0.16.0.
- [Commits](https://github.com/golang/sys/compare/v0.7.0...v0.16.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-05 09:33:36 +00:00
dependabot[bot]
b233a8b6ba Bump github.com/pelletier/go-toml from 1.9.4 to 1.9.5
Bumps [github.com/pelletier/go-toml](https://github.com/pelletier/go-toml) from 1.9.4 to 1.9.5.
- [Release notes](https://github.com/pelletier/go-toml/releases)
- [Changelog](https://github.com/pelletier/go-toml/blob/v2/.goreleaser.yaml)
- [Commits](https://github.com/pelletier/go-toml/compare/v1.9.4...v1.9.5)

---
updated-dependencies:
- dependency-name: github.com/pelletier/go-toml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-05 09:32:35 +00:00
Evan Lezar
e96e1baed5 Merge pull request #336 from NVIDIA/dependabot/go_modules/main/github.com/fsnotify/fsnotify-1.7.0
Bump github.com/fsnotify/fsnotify from 1.5.4 to 1.7.0
2024-02-05 10:31:45 +01:00
dependabot[bot]
dce368e308 Bump github.com/fsnotify/fsnotify from 1.5.4 to 1.7.0
Bumps [github.com/fsnotify/fsnotify](https://github.com/fsnotify/fsnotify) from 1.5.4 to 1.7.0.
- [Release notes](https://github.com/fsnotify/fsnotify/releases)
- [Changelog](https://github.com/fsnotify/fsnotify/blob/main/CHANGELOG.md)
- [Commits](https://github.com/fsnotify/fsnotify/compare/v1.5.4...v1.7.0)

---
updated-dependencies:
- dependency-name: github.com/fsnotify/fsnotify
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-04 08:26:11 +00:00
Evan Lezar
15f609a52d Merge pull request #332 from elezar/filter-actions
Enable a subset of package and image builds in PRs
2024-02-02 11:34:10 +01:00
Evan Lezar
0bf08085ce Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-02 09:21:54 +01:00
Evan Lezar
da68ad393c Enable subset of image builds in PRs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-02 09:20:05 +01:00
Evan Lezar
2f3600af9a Merge pull request #329 from elezar/migrate-libnvidia-container-to-github
Update libnvidia-container to github ref
2024-02-01 17:08:55 +01:00
Evan Lezar
0ff28aa21b Update libnvidia-container to github ref
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-01 16:36:10 +01:00
Evan Lezar
b88ff4470c Merge pull request #328 from NVIDIA/revert-323-sheckler/cuda-12.3.2
Revert "chore: Update CUDA base image to 12.3.2"
2024-02-01 16:28:59 +01:00
Evan Lezar
cfb1daee0a Revert "chore: Update CUDA base image to 12.3.2" 2024-02-01 16:27:53 +01:00
Evan Lezar
e3ab55beed Merge pull request #323 from heckler1/sheckler/cuda-12.3.2
chore: Update CUDA base image to 12.3.2
2024-02-01 14:02:33 +01:00
Evan Lezar
9530d9949f Merge pull request #325 from elezar/remove-jenkinsfile
Remove unneeded Jenkinsfile
2024-02-01 08:41:41 +01:00
Evan Lezar
6b2cd487a6 Remove unneeded Jenkinsfile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-31 22:05:01 +01:00
Stephen Heckler
e5ec408a5c Release 1.15.0-rc4
Signed-off-by: Stephen Heckler <sheckler@cloudflare.com>
2024-01-31 13:49:31 -06:00
Stephen Heckler
301b666790 chore: Update CUDA base image to 12.3.2
Signed-off-by: Stephen Heckler <sheckler@cloudflare.com>
2024-01-31 13:48:05 -06:00
Evan Lezar
e99b519509 Merge pull request #321 from elezar/bump-libnvidia-container
Update libnvidia-container
2024-01-30 16:44:01 +01:00
Evan Lezar
d123273800 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-30 16:11:57 +01:00
Evan Lezar
07136d9ac4 Merge pull request #312 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.14.0
Bump golang.org/x/mod from 0.5.0 to 0.14.0
2024-01-30 16:04:48 +01:00
Evan Lezar
0ef06be477 Merge pull request #320 from elezar/remove-centos8-jobs
Remove centos8 jobs
2024-01-30 16:03:30 +01:00
Evan Lezar
5a70e75547 Use stable/rpm repo for release tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-30 12:40:57 +01:00
Evan Lezar
46b4cd7b03 Remove unused centos8 jobs
This change removes the centos8-x86_64 and centos8-aarch64 pipeline jobs.

These packages are no longer used since centos7 packages are used instead.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-30 12:40:16 +01:00
Evan Lezar
93e15bc641 Merge pull request #319 from elezar/bump-version-v1.15.0-rc.3
Bump version to v1.15.0-rc.3
2024-01-30 11:41:17 +01:00
Evan Lezar
07d1f48778 Bump version to v1.15.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-30 11:18:17 +01:00
Carlos Eduardo Arango Gutierrez
21ed60bc46 Merge pull request #313 from elezar/fix-update-ldconfig
Fix bug in update-ldcache hook
2024-01-29 18:24:33 +01:00
dependabot[bot]
7abbd98ff0 Bump golang.org/x/mod from 0.5.0 to 0.14.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.5.0 to 0.14.0.
- [Commits](https://github.com/golang/mod/compare/v0.5.0...v0.14.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-29 14:20:35 +00:00
Evan Lezar
862f071557 Fix bug in update-ldcache hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-29 15:20:21 +01:00
Carlos Eduardo Arango Gutierrez
73cd63e4e5 Merge pull request #307 from ArangoGutierrez/githubactions
Add GitHub Actions
2024-01-29 15:18:32 +01:00
Carlos Eduardo Arango Gutierrez
6857f538a6 Add github actions
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-01-29 14:32:02 +01:00
Carlos Eduardo Arango Gutierrez
195e3a22b4 Drop go checks from gitlab ci
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-01-28 18:23:20 +01:00
Carlos Eduardo Arango Gutierrez
88debb8e34 Use test-infra devel image
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-01-28 18:22:51 +01:00
Christopher Desiniotis
03cbf9c6cd Merge branch 'CNT-4798/gdrcopy' into 'main'
Add a new gated modifier for GDRCopy which injects the gdrdrv device node

See merge request nvidia/container-toolkit/container-toolkit!530
2024-01-24 23:17:15 +00:00
Christopher Desiniotis
55097b3d7d Add a new gated modifier for GDRCopy which injects the gdrdrv device node
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-01-24 14:25:58 -08:00
Evan Lezar
738ebd83d3 Merge branch 'fix-cdi-enable-docker' into 'main'
Fix --cdi.enabled for Docker

See merge request nvidia/container-toolkit/container-toolkit!541
2024-01-23 20:16:12 +00:00
Tariq Ibrahim
9c6dd94ac8 Merge branch 'disc-typos' into 'main'
fix minor typos and rm unused logger param

See merge request nvidia/container-toolkit/container-toolkit!540
2024-01-23 18:42:21 +00:00
Evan Lezar
f936f4c0bc Add --cdi.enable as alias for --cdi.enabled
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-23 14:57:15 +01:00
Evan Lezar
ab598f004d Fix --cdi.enabled for Docker
Instead of relying only on Experimental mode, the docker daemon
config requires that CDI is an opt-in feature.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-23 14:56:08 +01:00
Tariq Ibrahim
9c1f0bb08b fix minor typos and rm unused logger param
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-01-22 16:48:11 -08:00
Evan Lezar
b3519fadc4 Merge branch 'ldconfig-path' into 'main'
Allow for customizing the path to ldconfig and call ldconfig with explicit cache/config paths

See merge request nvidia/container-toolkit/container-toolkit!525
2024-01-22 12:36:51 +00:00
Jared Baur
d80657dd0a Explicitly set ldconfig cache and config file
Since the `update-ldcache` hook uses the host's `ldconfig`, the default
cache and config files configured on the host will be used. If those
defaults differ from what nvidia-ctk expects it to be (/etc/ld.so.cache
and /etc/ld.so.conf, respectively), then the hook will fail. This change
makes the call to ldconfig explicit in which cache and config files are
being used.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2024-01-18 02:23:27 -08:00
Jared Baur
838493b8b9 Allow for customizing the path to ldconfig
Since the `createContainer` `runc` hook runs with the environment that
the container's config.json specifies, the path to `ldconfig` may not be
easily resolvable if the host environment differs enough from the
container (e.g. on a NixOS host where all binaries are under hashed
paths in /nix/store with an Ubuntu container whose PATH contains
FHS-style paths such as /bin and /usr/bin). This change allows for
specifying exactly where ldconfig comes from.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2024-01-17 21:07:00 -08:00
Evan Lezar
26a4eb327c Merge branch 'add-crun-as-configured-runtime' into 'main'
Set default low-level runtimes to runc, crun

See merge request nvidia/container-toolkit/container-toolkit!536
2024-01-17 21:28:14 +00:00
Evan Lezar
f6c252cbde Add crun as a default low-level runtime.
This change adds crun as a configured low-level runtime.
Note that runc still preferred and will be used if present on the
system.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-17 11:31:07 +01:00
Evan Lezar
11692a8499 Merge branch 'bump-cuda-12.3.1' into 'main'
Bump CUDA base image to 12.3.1

See merge request nvidia/container-toolkit/container-toolkit!535
2024-01-11 14:03:32 +00:00
Evan Lezar
ba3d80e8ea Merge branch 'fix-user-group' into 'main'
Fix bug in determining CLI user on SUSE systems

See merge request nvidia/container-toolkit/container-toolkit!532
2024-01-11 13:19:25 +00:00
Evan Lezar
9c029cac72 Fix bug in determining CLI user on SUSE systems
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-11 13:54:40 +01:00
Evan Lezar
dd065fa69e Bump CUDA base image to 12.3.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-11 10:35:54 +01:00
Evan Lezar
6f3d9307bb Merge branch 'update-libnvidia-container' into 'main'
Update libnvidia-container to a4ef85eb

See merge request nvidia/container-toolkit/container-toolkit!533
2024-01-10 14:14:39 +00:00
Evan Lezar
72584cd863 Update libnvidia-container to a4ef85eb
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-10 14:29:03 +01:00
Evan Lezar
2a7bfcd36b Merge branch 'fix-mid-device-nodes' into 'main'
Use devRoot to resolve MIG device nodes

See merge request nvidia/container-toolkit/container-toolkit!526
2024-01-09 14:58:19 +00:00
Evan Lezar
21fc1f24e4 Use devRoot to resolve MIG device nodes
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-09 15:40:17 +01:00
Evan Lezar
9396858834 Merge branch 'libnvdxgdmal' into 'main'
Add libnvdxgdmal library

See merge request nvidia/container-toolkit/container-toolkit!529
2024-01-09 14:37:08 +00:00
Jakub Bujak
79acd7acff Add libnvdxgdmal library
This change adds the new libnvdxgdmal.so.1 library to the list of files copied from the DriverStore.

Signed-off-by: Jakub Bujak <jbujak@nvidia.com>
2024-01-09 15:29:55 +01:00
Evan Lezar
fab711ddf3 Merge branch 'remove-libseccomp-dependency' into 'main'
Remove libseccomp package dependency

See merge request nvidia/container-toolkit/container-toolkit!531
2024-01-09 09:44:23 +00:00
Evan Lezar
760cf93317 Remove libseccomp package dependency
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-09 10:02:06 +01:00
Evan Lezar
f4838dde9b Merge branch 'log-requested-mode' into 'main'
Log explicitly requested runtime mode

See merge request nvidia/container-toolkit/container-toolkit!527
2024-01-08 11:42:42 +00:00
Evan Lezar
c90211e070 Log explicitly requested runtime mode
For users running the nvidia-container-runtime it would be useful
to determine the runtime mode used from the logs directly instead
of relying on other log messages as signals. This change ensures
that an explicitly selected mode is also logged instead of only
when mode=auto.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-15 15:35:35 +01:00
Evan Lezar
a2262d00cc Merge branch 'tegra-chardev-discover' into 'main'
Use `devRoot` for discovering character devices on Tegra platforms

See merge request nvidia/container-toolkit/container-toolkit!524
2023-12-15 14:34:07 +00:00
Jared Baur
95b8ebc297 Use devRoot for discovering character devices on Tegra platforms
Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2023-12-14 11:46:21 -08:00
Evan Lezar
99b3050d20 Merge branch 'update-changelog' into 'main'
Update changelog

See merge request nvidia/container-toolkit/container-toolkit!523
2023-12-14 15:35:45 +00:00
Evan Lezar
883f7ec3d8 Update changelog
See merge request nvidia/container-toolkit/container-toolkit!522

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-14 13:47:38 +01:00
Evan Lezar
9dd324be9c Merge branch 'tegra-dev-root' into 'main'
Fix using `devRoot` on Tegra platforms

See merge request nvidia/container-toolkit/container-toolkit!522
2023-12-14 09:46:05 +00:00
Jared Baur
508438a0c5 Fix using devRoot on Tegra platforms
Using `WithDevRoot` on Tegra platforms was incorrectly setting
`driverRoot`, fix it so that it correctly sets `devRoot`.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2023-12-13 19:56:02 -08:00
Christopher Desiniotis
9baed635d1 Merge branch 'CNT-4778/bump-gonvlib' into 'main'
Update to github.com/NVIDIA/go-nvlib@f3264c8a6a7a

See merge request nvidia/container-toolkit/container-toolkit!520
2023-12-13 18:11:59 +00:00
Christopher Desiniotis
895a5ed73a Update to github.com/NVIDIA/go-nvlib@f3264c8a6a7a
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-13 10:08:14 -08:00
Christopher Desiniotis
2d7b126bc9 Merge branch 'CNT-4762/extend-runtime-cdi-device-names' into 'main'
Extend the 'runtime.nvidia.com/gpu' CDI device kind to support full-GPUs...

See merge request nvidia/container-toolkit/container-toolkit!514
2023-12-06 17:10:26 +00:00
Christopher Desiniotis
86d86395ea Update changelog for the automatic CDI spec generation added for the 'runtime.nvidia.com/gpu' CDI kind
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:09:10 -08:00
Christopher Desiniotis
32c3bd1ded Fallback to standard CDI modifier when creation of automatic CDI modifier fails
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
3158146946 Extend the 'runtime.nvidia.com/gpu' CDI device kind to support MIG devices specified by index or UUID
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
def7d09f85 Refactor how device identifiers are parsed before performing automatic CDI spec generation
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
b9ac54b922 Add GetDeviceSpecsByID() API to the nvcdi Interface
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
ae1b7e126c Extend the 'runtime.nvidia.com/gpu' CDI device kind to support full-GPUs specified by index or UUID
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Evan Lezar
08ef3e7969 Merge branch 'bump-version-1.15.0-rc.2' into 'main'
Bump version to v1.15.0-rc.2

See merge request nvidia/container-toolkit/container-toolkit!519
2023-12-06 16:45:29 +00:00
Evan Lezar
ea977fb43e Bump version to v1.15.0-rc.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-06 17:45:08 +01:00
Evan Lezar
7b47eee634 Merge branch 'CNT-4774/implement-set-for-crio' into 'main'
Implement Set() for the crio implementation of engine.Interface

See merge request nvidia/container-toolkit/container-toolkit!517
2023-12-05 09:30:42 +00:00
Christopher Desiniotis
d7a3d93024 Implement Set() for the crio implementation of engine.Interface
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-04 15:27:35 -08:00
Christopher Desiniotis
527248ef5b Merge branch 'CNT-4764/cleanup-engine-interface' into 'main'
Refactor the engine.Interface such that the Set() API does not return an extraneous error

See merge request nvidia/container-toolkit/container-toolkit!515
2023-12-04 23:05:30 +00:00
Christopher Desiniotis
83ad09b179 Refactor the engine.Interface such that the Set() API does not return an extraneous error
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-01 15:59:34 -08:00
Evan Lezar
ffe7ed313a Merge branch 'goimports-local' into 'main'
run goimports -local against the entire codebase

See merge request nvidia/container-toolkit/container-toolkit!512
2023-12-01 10:54:26 +00:00
Tariq Ibrahim
7627d48a5c run goimports -local against the entire codebase
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-01 11:13:17 +01:00
Evan Lezar
5c78e2b7e6 Merge branch 'CNT-4659/transform-container-roots' into 'main'
Add transformer for container roots

See merge request nvidia/container-toolkit/container-toolkit!507
2023-12-01 09:38:39 +00:00
Evan Lezar
bc4e19aa48 Add --relative-to option to nvidia-ctk transform root
This change adds a --relative-to option to the nvidia-ctk transform root
command. This defaults to "host" maintaining the existing behaviour.

If --relative-to=container is specified, the root transform is applied to
container paths in the CDI specification instead of host paths.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-30 20:26:42 +01:00
Evan Lezar
879cc99aac Add transformer for container roots
This change renames the root transformer to indicate that it
operates on host paths and adds a container root transformer for
explicitly transforming container roots.

The transform.NewRootTransformer constructor still exists, but has
been marked as deprecated.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-30 20:26:42 +01:00
Evan Lezar
aa72dcde97 Merge branch 'golangci-lint-on-darwin' into 'main'
Allow make check to run on non-linux platforms

See merge request nvidia/container-toolkit/container-toolkit!509
2023-11-27 13:51:08 +00:00
Evan Lezar
a545810981 Allow make check to run on non-linux platforms
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-27 14:10:34 +01:00
Evan Lezar
cff50aa5d6 Merge branch 'bump-version-1.15.0-rc.1' into 'main'
Bump version to v1.15.0-rc.1

See merge request nvidia/container-toolkit/container-toolkit!508
2023-11-27 13:06:31 +00:00
Evan Lezar
84d857b497 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-27 13:22:52 +01:00
Evan Lezar
7840e7d650 Bump version to v1.15.0-rc.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-27 11:49:54 +01:00
Evan Lezar
c014f72ffb Merge branch 'fix-ldconfig-update' into 'main'
Fix incorrect ldconfig path

See merge request nvidia/container-toolkit/container-toolkit!505
2023-11-24 10:49:23 +00:00
Evan Lezar
893b3c1824 Fix incorrect ldconfig path
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-24 11:03:51 +01:00
Evan Lezar
097e203f1f Merge branch 'fix-config-update-command' into 'main'
Switch to reflect package for config updates

See merge request nvidia/container-toolkit/container-toolkit!500
2023-11-23 12:35:09 +00:00
Evan Lezar
671d787a42 Switch to reflect package for config updates
This change switches to using the reflect package to determine
the type of config options instead of inferring the type from the
Toml data structure.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-23 10:29:38 +01:00
Evan Lezar
fcc9922133 Merge branch 'CNT-4761/enable-cdi-in-docker' into 'main'
Add option to nvidia-ctk to enable CDI in docker

See merge request nvidia/container-toolkit/container-toolkit!499
2023-11-23 09:16:33 +00:00
Christopher Desiniotis
64fb26b086 Add option to nvidia-ctk to enable CDI in docker
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-23 10:15:58 +01:00
Evan Lezar
16a4de1a2b Merge branch 'CNT-4645/add-nvswitch-devices' into 'main'
Add support for NVIDIA_NVSWITCH envvar to inject nvidia-nvswitch device nodes

See merge request nvidia/container-toolkit/container-toolkit!502
2023-11-22 21:00:54 +00:00
Evan Lezar
efae501834 Add support for injecting NVSWITCH devices
This change adds support for an NVIDIA_NVSWITCH environment variable.
When set to `enabled` this striggers the injection of all available
/dev/nvidia-nvswitch* device nodes.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 21:59:39 +01:00
Evan Lezar
3045954cd9 Consolidate GDS and MOFED modifiers
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 21:59:17 +01:00
Evan Lezar
886c6b973e Merge branch 'library-search-path-cdi-generate' into 'main'
Locate libnvidia-egl-gbm.so.*

See merge request nvidia/container-toolkit/container-toolkit!504
2023-11-22 20:58:55 +00:00
Evan Lezar
1ab3ef0af4 Locate libnvidia-egl-gbm.so.*
Searching for a pattern allows platforms where no `.so` symlink
exists to function as expected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 21:57:36 +01:00
Evan Lezar
dd9b13cb58 Merge branch 'bump-changelog' into 'main'
Add missing changelog

See merge request nvidia/container-toolkit/container-toolkit!503
2023-11-22 20:46:09 +00:00
Evan Lezar
8a7a6e8a70 Add missing changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 20:51:12 +01:00
Evan Lezar
1909b1fe60 Merge branch 'library-search-path-cdi-generate' into 'main'
Allow search paths when locating libcuda.so

See merge request nvidia/container-toolkit/container-toolkit!462
2023-11-22 19:49:15 +00:00
Evan Lezar
881e440d22 Merge branch 'include-nvoptix' into 'main'
Update list of graphics mounts

See merge request nvidia/container-toolkit/container-toolkit!501
2023-11-22 19:47:52 +00:00
Evan Lezar
7d79b311d8 Include vulkan/icd.d/nvidia_layers.json
This change includes vulkan/icd.d/nvidia_layers.json in the list of
possible graphics mounts.
2023-11-22 13:54:12 +01:00
Evan Lezar
b46bc10c44 Include nvidia/nvoptix.bin in graphics mounts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 13:53:59 +01:00
Evan Lezar
bbd9222206 Add driver root abstraction
This change adds a driver root abstraction that defines how
libraries are located relative to the root. This allows for
this driver root to be constructed once and passed to discovery
code.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 13:27:48 +01:00
Evan Lezar
f20ab793a2 Add support for specifying search paths
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 13:27:47 +01:00
Evan Lezar
e5391760e6 Remove duplicate not found error
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 18:09:42 +01:00
Evan Lezar
5505886655 Use options for NewLibraryLocator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 18:08:53 +01:00
Evan Lezar
64f554ef41 Add builder for file locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 18:07:47 +01:00
Evan Lezar
fc8c5f82dc Merge branch 'fix-ldconfig-resolution' into 'main'
Resolve LDConfig path

See merge request nvidia/container-toolkit/container-toolkit!490
2023-11-21 16:45:21 +00:00
Evan Lezar
d792e64f38 Resolve ldconfig path in update-ldcache hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 15:31:12 +01:00
Evan Lezar
232df647c1 Resolve LDConfig path passed to nvidia-container-cli
Instead of relying solely on a static config, we resolve the path
to ldconfig. The path is checked for existence and a .real suffix is preferred.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 15:31:12 +01:00
Evan Lezar
adc516fd59 Merge branch 'ctk-hook-chmod-improve-eperm-handling' into 'main'
nvidia-ctk hook chmod: Improve permission error handling

See merge request nvidia/container-toolkit/container-toolkit!496
2023-11-21 11:05:03 +00:00
Evan Lezar
039d7fd324 Merge branch 'remove-config-import-from-discover' into 'main'
Remove NewGraphicsDiscoverer API simplification

See merge request nvidia/container-toolkit/container-toolkit!498
2023-11-20 22:52:02 +00:00
Christopher Desiniotis
2768023ff5 Merge branch 'gen-cdi-spec-at-runtime' into 'main'
Automatically generate CDI spec for the runtime.nvidia.com/gpu=all device

See merge request nvidia/container-toolkit/container-toolkit!492
2023-11-20 22:15:05 +00:00
Evan Lezar
255181a5ff Rename NewGraphicsDiscoverer as NewDRMNodesDiscoverer
This change renames NewGraphicsDiscoverer to NewDRMNodesDiscoverer and
instead calls NewGraphicsMountsDiscoverer explicitly when constructing
a graphics modifier.

This avoids the import of config.Config into the discover package
which leads to a transitive dependency on toml-specifics and
requires that the vendor/github.com/pelletier/ package
be vendored in to consumers.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 23:10:57 +01:00
Christopher Desiniotis
dc36ea76e8 Automatically generate CDI spec for the runtime.nvidia.com/gpu=all device
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-11-20 13:35:07 -08:00
Evan Lezar
e315d7d74b Merge branch 'allow-separate-dev-root' into 'main'
Add devRoot option to CDI api

See merge request nvidia/container-toolkit/container-toolkit!497
2023-11-20 21:10:12 +00:00
Evan Lezar
b4c6832828 Add additional debug
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 21:29:35 +01:00
Evan Lezar
3a96a00362 Simplify meta device discovery
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 21:29:35 +01:00
Evan Lezar
00a712d018 Add --dev-root option to CDI spec generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 21:29:35 +01:00
Evan Lezar
d4e21fdd10 Add devRoot option to CDI api
A driverRoot defines both the driver library root and the
root for device nodes. In the case of preinstalled drivers or
the driver container, these are equal, but in cases such as GKE
they do not match. In this case, drivers are extracted to a folder
and devices exist at the root /.

The changes here add a devRoot option to the nvcdi API that allows the
parent of /dev to be specified explicitly.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 21:29:35 +01:00
Ievgen Popovych
9085cb7dd5 nvidia-ctk hook chmod: Move file mode parsing into flag validation function
Signed-off-by: Ievgen Popovych <jmennius@gmail.com>
2023-11-20 14:49:29 +02:00
Evan Lezar
f6e3593a72 Merge branch 'update-go-nvlib' into 'main'
Update to github.com/NVIDIA/go-nvlib@9fd385bace0d2b8949cf60d9fcaab6169bde87ef

See merge request nvidia/container-toolkit/container-toolkit!495
2023-11-20 11:25:25 +00:00
Evan Lezar
f2ef7ee661 Update to github.com/NVIDIA/go-nvlib@9fd385bace0d2b8949cf60d9fcaab6169bde87ef
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 12:25:08 +01:00
Evan Lezar
27777f4dab Merge branch 'bump-libnvidia-container' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!494
2023-11-20 11:11:15 +00:00
Evan Lezar
34175f15d3 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 10:57:35 +01:00
Ievgen Popovych
eb35d9b30a nvidia-ctk hook chmod: Ignore permission errors
In some cases we might get a permission error trying to chmod -
most likely this is due to something beyond our control
like whole `/dev` being mounted.
Do not fail container creation in this case.

Due to loosing control of the program after `exec()`-ing `chmod(1)` program
and therefore not being able to handle errors -
refactor to use `chmod(2)` syscall instead of `exec()` `chmod(1)` program.

Fixes: #143
Signed-off-by: Ievgen Popovych <jmennius@gmail.com>
2023-11-20 01:29:51 +02:00
Ievgen Popovych
f1d32f2cd3 nvidia-ctk hook chmod: Only chmod if desired permissions are different
This is to avoid any unnecessary potential errors (e.g. due to permissions).

Fixes: #143
Signed-off-by: Ievgen Popovych <jmennius@gmail.com>
2023-11-20 01:18:36 +02:00
Evan Lezar
ee713adf33 Merge branch 'fix-update-ldcache' into 'main'
Allow ldcache update in container to be skipped

See merge request nvidia/container-toolkit/container-toolkit!485
2023-11-18 11:37:56 +00:00
Christopher Desiniotis
33cb1b68df Merge branch 'deduplicate-symlinks' into 'main'
Deduplicate symlinks

See merge request nvidia/container-toolkit/container-toolkit!493
2023-11-17 17:09:18 +00:00
Evan Lezar
6dc9ee3f33 Allow ldcache update in container to be skipped
This change skips the update of ld.cache in the container if it
doesn't exist. Instead, the -N flag is used to only create the
relevant symlinks.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-17 11:56:19 +01:00
Evan Lezar
e609e41a64 Allow multiple pattern matches for symlinks
Since we allow pattern inputs for locating symlinks we could have
multiples. The error being checked is resolved by the deduplication.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-17 10:43:52 +01:00
Evan Lezar
80ecd024ee Add tests for library locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-17 10:43:52 +01:00
Evan Lezar
e8dbb216a5 Return empty ldcache if cache does not exist
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-17 10:14:03 +01:00
Christopher Desiniotis
f5d8d248b7 Deduplicate symlinks
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-11-16 17:57:31 -08:00
Evan Lezar
5d7ee25b37 Merge branch 'migrate-go-nvlib' into 'main'
Use github.com/NVIDIA/go-nvlib imports

See merge request nvidia/container-toolkit/container-toolkit!491
2023-11-15 20:39:35 +00:00
Evan Lezar
2ff2d84283 Update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-15 21:38:54 +01:00
Evan Lezar
c63fb35ba8 Use github.com/NVIDIA/go-nvlib imports
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-15 21:38:26 +01:00
Evan Lezar
da0755769f Merge branch 'improve-library-lookup' into 'main'
Make CDI-based library discovery more robust

See merge request nvidia/container-toolkit/container-toolkit!488
2023-11-06 18:17:25 +00:00
Evan Lezar
04b28d116c Make library lookups more robust
These changes make library lookups more robust. The core change is that
library lookups now first look a set of predefined locations before checking
the ldcache. This also handles cases where an ldcache is not available more
gracefully.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-06 12:15:28 -06:00
Evan Lezar
65b0b2b5e0 Merge branch 'make-graphics-optional' into 'main'
Make discovery of graphics libraries optional

See merge request nvidia/container-toolkit/container-toolkit!489
2023-11-03 21:22:47 +00:00
Evan Lezar
8d52cc18ce Make discovery of graphics libraries optional
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-03 22:15:41 +01:00
Evan Lezar
c25376afa0 Merge branch 'update-cdi' into 'main'
Use tags.cncf.io for CDI imports

See merge request nvidia/container-toolkit/container-toolkit!487
2023-11-02 09:14:30 +00:00
Evan Lezar
7cc0c1f1cf Run go mod tidy
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-01 12:42:07 +01:00
Evan Lezar
e56bb09889 Use tags.cncf.io for CDI imports
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-01 12:40:51 +01:00
Evan Lezar
c7a7083e64 Merge branch 'add-option-to-enable-cdi' into 'main'
Add cdi.enabled config option to nvidia-ctk runtime configure command

See merge request nvidia/container-toolkit/container-toolkit!486
2023-11-01 09:54:19 +00:00
Evan Lezar
61595aa0fa Add cdi.enabled option to runtime configure
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-31 17:23:55 +01:00
Evan Lezar
b8b134f389 Ensure git works in container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-31 15:25:41 +01:00
Evan Lezar
c5a9ed6594 Merge branch 'support-cdi-mount-devices' into 'main'
Support CDI devices as mounts

See merge request nvidia/container-toolkit/container-toolkit!480
2023-10-27 20:04:46 +00:00
Evan Lezar
833254fa59 Support CDI devices as mounts
This change allows CDI devices to be requested as mounts in the
container. This enables their use in environments such as kind
where environment variables or annotations cannot be used.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-27 21:24:53 +02:00
Evan Lezar
1b1aae9c4a Merge branch 'golangci-lint' into 'main'
Use golanglint-ci to check code in NVIDIA Container Toolkit

See merge request nvidia/container-toolkit/container-toolkit!474
2023-10-24 18:59:46 +00:00
Evan Lezar
acc50969dc Fix ifElseChain lint errors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
48d68e4eff Add nolint for exec calls
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
709e27bf4b Fix implicit memory aliasing in for loop
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
1b16b341dd Fix default permissions
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
2e1f94aedf Fix append assignments
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
2f48ab99c3 Address singleCaseSwitch errors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
f8870b31be Fix filepath.Join with single arg
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
73857eb8e3 Fix unnecessary conversion
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
dd2f218226 Use MustCompile for static regexp
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
8a9f367067 Check returned error values
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
e0df157f70 Remove unnecessary assignment to the blank identifier
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
f2c9937ca8 Use cdi parser package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
12dc12ce09 Fix misspellings
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
2fad708556 Address ioutil deprecation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
73749285d5 Remove unused loadSaver interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
49dbae5c32 Use .golangci config for toml.Delete issues
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
d8d56e18f9 Add nolint for dxcore
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
7f610d19ed Add .golangci.yml config
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
3eca7dfd7b Replace check targets with golangci-lint
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
b02d5538b0 Merge branch 'update-container-runtime-readme' into 'main'
Update nvidia-container-runtime README

See merge request nvidia/container-toolkit/container-toolkit!484
2023-10-24 13:05:12 +00:00
Evan Lezar
ebff62f56b Update nvidia-container-runtime README
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-23 13:35:19 +02:00
Evan Lezar
53b24618a5 Merge branch 'bump-cuda-version' into 'main'
Add missing changelog entry

See merge request nvidia/container-toolkit/container-toolkit!483
2023-10-19 11:22:22 +00:00
Evan Lezar
867151fe25 Bump version to v1.14.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-19 13:20:23 +02:00
Evan Lezar
82fc309c4e Merge branch 'bump-cuda-version' into 'main'
Bump version to v1.14.3

See merge request nvidia/container-toolkit/container-toolkit!482
2023-10-19 09:32:30 +00:00
Evan Lezar
27521c0b5e Bump CUDA base image version to 12.2.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-19 10:47:13 +02:00
Evan Lezar
e611d4403b Bump version to v1.14.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-19 10:46:46 +02:00
Evan Lezar
807c87e057 Merge branch 'fix-symlink-discovery' into 'main'
Fix bug in creating symlinks in containers on Tegra-based systems

See merge request nvidia/container-toolkit/container-toolkit!479
2023-09-25 10:01:08 +00:00
Evan Lezar
f63ad3d9e7 Refactor symlink filter
This change refactors the use of the symlink filter to make it extendible.
A blocked filter can be set on the Tegra CSV discoverer to ensure that the correct
symlink libraries are filtered out. Here, globs can be used to select mulitple libraries,
and a **/ prefix on the globs indicates that the pattern that follows is only applied to
the filename of the symlink entry in the CSV file.

A --csv.ignore-pattern command line argument is added to the nvidia-ctk cdi generate
command that allows this to be set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:04:06 +02:00
Evan Lezar
c4b4478d1a Remove default symlink filter
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:02:51 +02:00
Evan Lezar
963250a58f Refactor CSV discovery for testability
This change improves the testibility of the CSV discoverer.
This is done by adding injection points for mocks for library discovery and
symlink resolution.

Note that this highlights a bug in the current implementation where the
library filter causes valid symlinks to be skipped.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:02:30 +02:00
Evan Lezar
be570fce65 Bump version to 1.14.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:02:28 +02:00
Evan Lezar
6094effd58 Merge branch 'debug-no-cgroups' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!478
2023-09-07 15:55:06 +00:00
Evan Lezar
7187608a36 Update libnvidia-container 2023-09-07 15:55:06 +00:00
Evan Lezar
a54d9d2118 Merge branch 'debug-no-cgroups' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!477
2023-09-07 15:36:23 +00:00
Evan Lezar
56dd69ff1c Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-07 17:35:48 +02:00
Evan Lezar
89240cecae Merge branch 'debug-no-cgroups' into 'main'
Add required option to new toml config

See merge request nvidia/container-toolkit/container-toolkit!476
2023-09-07 11:16:34 +00:00
Evan Lezar
4ec9bd751e Add required option to new toml config
This change adds a "required" option to the new toml config
that controls whether a default config is returned or not.
This is useful from the NVIDIA Container Runtime Hook, where
/run/driver/nvidia/etc/nvidia-container-runtime/config.toml
is checked before the standard path.

This fixes a bug where the default config was always applied
when this config was not used.

See https://github.com/NVIDIA/nvidia-container-toolkit/issues/106

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-07 11:56:01 +02:00
Evan Lezar
d74f7fef4e Update libnvidia-container to fix rpm builds
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-07 11:55:46 +02:00
Evan Lezar
538d4020df Bump version to v1.14.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-06 17:51:52 +02:00
Evan Lezar
f2bd3173d4 Merge branch 'bump-version-v1.14.0' into 'main'
Bump verison to v1.14.0

See merge request nvidia/container-toolkit/container-toolkit!475
2023-08-29 14:45:37 +00:00
Evan Lezar
2bf8017516 Bump verison to v1.14.0
Note that v1.14.0-rc.3 was an internal-only release.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-29 16:03:41 +02:00
Evan Lezar
2a3afdd5d9 Merge branch 'fix-platform-detection' into 'main'
Add UsesNVGPUModule info function

See merge request nvidia/container-toolkit/container-toolkit!473
2023-08-28 15:58:41 +00:00
Evan Lezar
1dc028cdf2 Add UsesNVGPUModule info function
This change adds a UsesNVGPUModule function that checks whether the nvgpu
kernel module is used by NVML. This allows for more robust detection of
Tegra-based platforms where libnvidia-ml.so is supported to enumerate the
iGPU.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-25 11:24:34 +02:00
Evan Lezar
72c56567fe Merge branch 'CNT-4496/remove/sys/devices/soc' into 'main'
Remove /sys/devices/soc0/family from CDI spec

See merge request nvidia/container-toolkit/container-toolkit!472
2023-08-25 08:26:15 +00:00
Evan Lezar
ca1055588d Remove /sys/devices/soc0/family from CDI spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-25 10:25:33 +02:00
Evan Lezar
fca30d7acc Merge branch 'fix-config-file' into 'main'
Properly create output for config file

See merge request nvidia/container-toolkit/container-toolkit!471
2023-08-23 08:55:58 +00:00
Evan Lezar
5bf2209fdb Properly create output for config file
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-23 09:41:46 +02:00
Evan Lezar
f86a5abeb6 Merge branch 'CNT-4478/fix-unknown-devices' into 'main'
Update go-nvlib dependency to  v0.0.0-20230818092907-09424fdc8884

See merge request nvidia/container-toolkit/container-toolkit!470
2023-08-21 09:05:58 +00:00
Evan Lezar
9ac313f551 Instantiate nvpci.Interface with logger
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-18 11:40:17 +02:00
Evan Lezar
546f810159 Update go-nvlib dependency to v0.0.0-20230818092907-09424fdc8884
This change updates go-nvlib to include logic to skip NVIDIA PCI-E
devices where the name or class id is not known.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-18 11:40:15 +02:00
Evan Lezar
7affdafcd3 Merge branch 'CNT-4286/set-nvidia-visible-devices-to-void' into 'main'
Set NVIDIA_VISIBLE_DEVICES=void in toolkit-container

See merge request nvidia/container-toolkit/container-toolkit!469
2023-08-15 10:36:09 +00:00
Evan Lezar
7221b6b24b Set NVIDIA_VISIBLE_DEVICES=void in toolkit-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-15 11:52:56 +02:00
Tariq Ibrahim
f904ec41eb Merge branch 'log-unresolved-devices' into 'main'
add a warning statement listing unresolved CDI devices

See merge request nvidia/container-toolkit/container-toolkit!461
2023-08-14 17:22:36 +00:00
Evan Lezar
e7ae0f183c Merge branch 'rename-library-search-path' into 'main'
Add library-search-path option to cdi generate

See merge request nvidia/container-toolkit/container-toolkit!468
2023-08-14 13:49:26 +00:00
Evan Lezar
86df7c6696 Add library-search-path option to cdi generate
This change renames the csv.library-search-path option to
library-search-path so as to be more generally applicable in
future. Note that the option is still only applied in csv mode.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 15:04:33 +02:00
Evan Lezar
99923b57b8 Merge branch 'add-config-set-command' into 'main'
Allow config options to be set usign the nvidia-ctk config command

See merge request nvidia/container-toolkit/container-toolkit!464
2023-08-14 11:18:57 +00:00
Evan Lezar
4addb292b1 Extend nvidia-ctk config command to allow options to be set
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 11:33:26 +02:00
Evan Lezar
149a8d7bd8 Simplify nvidia-ctk config default command
This chagne simplifies the nvidia-ctk config default command.
By default it now outputs the default config to STDOUT, and can
optionally output this to file.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 11:32:54 +02:00
Evan Lezar
a69657dde7 Add config.Toml type to handle config files
This change introduced a config.Toml type that is used as the base for
config file processing and manipulation. This ensures that configs --
including commented values -- can be handled consistently.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 11:32:54 +02:00
Evan Lezar
c2d4de54b0 Add function to get config file path. 2023-08-14 11:32:54 +02:00
Evan Lezar
5216e89a70 Merge branch 'refactor-hook-configs' into 'main'
Migrate to internal/config.Config structs for the NVIDIA Container Runtime Hook config.

See merge request nvidia/container-toolkit/container-toolkit!463
2023-08-14 09:23:43 +00:00
Evan Lezar
96766aa719 Remove BurntSushi/toml go dependency
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:40:42 +02:00
Evan Lezar
3670e7b89e Refactor loading of hook configs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:40:42 +02:00
Evan Lezar
b18ac09f77 Refactor handling of DriverCapabilities
This change consolidates the handling of NVIDIA_DRIVER_CAPABILITIES in the
interal/image package.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:40:42 +02:00
Evan Lezar
4dcaa61167 Use internal/config structs in hook
This change ensures that the Config structs from internal.Config
are used for the NVIDIA Container Runtime Hook config too.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:40:41 +02:00
Evan Lezar
8bf52e1dec Export config.GetDefault function
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:35:33 +02:00
Evan Lezar
e4722e9642 Merge branch 'add-runtime-hook-config' into 'main'
Add support for creating oci hook to nvidia-ctk

See merge request nvidia/container-toolkit/container-toolkit!467
2023-08-14 08:33:56 +00:00
Evan Lezar
65f6f46846 Remove installation of oci-nvidia-hook files in RPM packages
This change removes installation of the oci-nvidia-hook files.
These files conflict with CDI use in runtimes that support it.

The use of the hook should be considered deprecated on these platforms.

If a hook is required, the

nvidia-ctk runtime configure --config-mode=oci-hook

command should be used to create the hook file(s).

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-11 16:34:58 +02:00
Evan Lezar
f6a4986c15 Add support for creating oci hook to nvidia-ctk
This change extends the nvidia-ctk runtime configure command
with a --config-mode=oci-hook that creates an OCI hook json file.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-11 16:34:58 +02:00
Tariq Ibrahim
6d3b29f3ca add a warning statement listing unresolved CDI devices 2023-08-10 08:38:33 -07:00
Evan Lezar
30c0848487 Merge branch 'fix-libnvidia-container0-url' into 'main'
Use stable repo URL directly

See merge request nvidia/container-toolkit/container-toolkit!465
2023-08-10 14:31:06 +00:00
Evan Lezar
ee1b0c3e4f Use stable repo URL directly
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-10 16:30:25 +02:00
Evan Lezar
37ac294a11 Merge branch 'add-deb-and-rpm-repos' into 'main'
Publish generic deb and rpm repos.

See merge request nvidia/container-toolkit/container-toolkit!460
2023-08-10 13:35:12 +00:00
Evan Lezar
0d862efa9c Publish generic deb and rpm repos.
This change ensures that the centos7 and ubuntu18.04 packages are
published to the generic rpm and deb repos, respectively.

All other packages except the centos8-ppc64le packages are skipped
as these use cases are covered by the generic packages.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-09 17:54:41 +02:00
Evan Lezar
22d7b52a58 Merge branch 'set-libnvidia-container-version' into 'main'
Set libnvidia-container version to toolkit version

See merge request nvidia/container-toolkit/container-toolkit!459
2023-08-09 12:10:36 +00:00
Evan Lezar
9f1c9b2a31 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-09 13:24:48 +02:00
Evan Lezar
0483eebc7b Set libnvidia-container version to toolkit version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-09 13:24:08 +02:00
Evan Lezar
54aacb4245 Merge branch 'list-shows-errors' into 'main'
Log registry refresh errors in cdi list

See merge request nvidia/container-toolkit/container-toolkit!458
2023-08-08 15:26:42 +00:00
Evan Lezar
5cb367e771 Merge branch 'sort-cdi-entities' into 'main'
Sort CDI entities in generated CDI specifications

See merge request nvidia/container-toolkit/container-toolkit!457
2023-08-08 14:11:17 +00:00
Evan Lezar
feb069a2e9 Log registry refresh errors in cdi list
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-08 16:00:36 +02:00
Evan Lezar
cbdbcd87ff Add sorter to simplifying transformer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-08 15:27:04 +02:00
Evan Lezar
7a4d2cff67 Add merged CDI spec transformer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-08 14:45:31 +02:00
Evan Lezar
5638f47cb0 Add sort CDI spec transoformer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-08 14:45:31 +02:00
Evan Lezar
4c513d536b Merge branch 'improve-csv-cdi-spec-generation' into 'main'
Rework CSV file support to enable more robust CDI spec generation

See merge request nvidia/container-toolkit/container-toolkit!447
2023-08-04 16:40:15 +00:00
Evan Lezar
8553fce68a Specify library search paths for CSV CDI spec generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
03a4e2f8a9 Skip symlinks to libraries
In order to properly handle systems with both iGPU and dGPU
drivers included, we skip "sym" mount specifications which
refer to .so or .so.[1-9] files.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
918bd03488 Move tegra-specifics to new package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
01a7f7bb8e Explicitly generate CDI spec for CSV mode
This change explicitly generates a CDI specification from
the supplied CSV files when cdi mode is detected. This
ensures consistency between the behaviour on Tegra-based
systems.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
6b48cbd1dc Move CDI modifier to separate package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
64a0a67eb4 Merge branch 'bump-version' into 'main'
Bump version to 1.14.0-rc.3

See merge request nvidia/container-toolkit/container-toolkit!456
2023-08-04 14:16:56 +00:00
Evan Lezar
93d9e18f04 Update libnvidia-container to 1.14.0~rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 15:17:00 +02:00
Evan Lezar
7c2c42b8da Bump version to 1.14.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 15:12:29 +02:00
Evan Lezar
e4fee325cb Merge branch 'fix-hook' into 'main'
Handle empty root in config

See merge request nvidia/container-toolkit/container-toolkit!454
2023-07-19 12:45:49 +00:00
Evan Lezar
ec63533eb1 Ensure default config comments are consistent
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-19 14:37:49 +02:00
Evan Lezar
e51621aa7f Handle empty root in config
If the config.toml has an empty root specified, this could be
passed to the NVIDIA Container CLI through the --root flag
which causes argument parsing to fail. This change only
adds the --root flag if the config option is specified
and is non-empty.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-19 14:02:23 +02:00
Evan Lezar
80a78e60d1 Merge branch 'device-namer' into 'main'
Refactor device namer

See merge request nvidia/container-toolkit/container-toolkit!453
2023-07-18 14:16:01 +00:00
Evan Lezar
9f46c34587 Support device name strategies for Tegra devices
This change generates CDI specifications for Tegra devices
with the nvidia.com/gpu=0 name by default. The type-index
nameing strategy is also supported and will generate a device
with the name nvidia.com/gpu=gpu0.

The uuid naming strategy will raise an error if selected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 16:13:38 +02:00
Evan Lezar
f07a0585fc Refactor device namer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 16:13:37 +02:00
Evan Lezar
32ec10485e Merge branch 'lookup-functional-options' into 'main'
Use functional options when creating Symlink and Directory locators

See merge request nvidia/container-toolkit/container-toolkit!452
2023-07-18 13:39:23 +00:00
Evan Lezar
ce7d5f7a51 Use functional options when constructing direcory locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:36:03 +02:00
Evan Lezar
9b64d74f6a Use functional options when constructing Symlink locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:31:15 +02:00
Evan Lezar
99cc0aebd6 Merge branch 'pass-image-to-csv-constructor' into 'main'
Pass image when constructing CSV modifier

See merge request nvidia/container-toolkit/container-toolkit!451
2023-07-18 13:30:53 +00:00
Evan Lezar
cca343abb0 Pass image when constructing CSV modifier
Since the incoming OCI spec has already been parsed and used to
construct a CUDA image representation, pass this to the CSV
modifier constructor instead of re-creating an image representation.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:27:16 +02:00
Evan Lezar
f08e48e700 Merge branch 'set-cdi-spec-dirs-in-config' into 'main'
Set default spec dirs at config level

See merge request nvidia/container-toolkit/container-toolkit!450
2023-07-18 13:25:29 +00:00
Evan Lezar
e2f8d2a15f Set default spec dirs at config level
This change sets the default CDI spec dirs at a config level instead
of when a CDI runtime modifier is constructed. This makes this setting
consistent with other options such as the nvidia-ctk path.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:23:09 +02:00
Evan Lezar
2c5761d32e Merge branch 'bug-fixes' into 'main'
Minor fixes and cleanups

See merge request nvidia/container-toolkit/container-toolkit!449
2023-07-18 13:20:46 +00:00
Evan Lezar
3c9d95c62f Fix usage string in CLI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:20:24 +02:00
Evan Lezar
481000b4ce Remove unused argument
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:20:24 +02:00
Evan Lezar
b2126722e5 Update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:16:25 +02:00
Evan Lezar
083b789102 Use cdi parser package for IsQualiedName
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:16:25 +02:00
Evan Lezar
a564b38b7e Merge branch 'remove-centos7-aarch64-scan' into 'main'
Remove centos7-arm64 scan

See merge request nvidia/container-toolkit/container-toolkit!445
2023-07-17 14:29:17 +00:00
Evan Lezar
5427249cfc Remove centos7-arm64 scan
Since we don't publish a centos7-arm64 image, the scan does not
make sense.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-17 16:28:31 +02:00
Evan Lezar
032982ab9c Merge branch 'bump-dependencies' into 'main'
Bump dependencies

See merge request nvidia/container-toolkit/container-toolkit!444
2023-07-17 14:13:12 +00:00
Evan Lezar
96aeb9bf64 Update container-device-interface to v0.6.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-17 14:12:06 +02:00
Evan Lezar
c98f6ea395 Update containerized docker files for golang 1.20.5
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-17 14:10:05 +02:00
Evan Lezar
073f9cf120 Bump golang version to 1.20.5
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-17 14:06:48 +02:00
Evan Lezar
1fdd0c1248 Merge branch 'bump-changelog' into 'main'
Fix changelog for 1.14.0-rc.2

See merge request nvidia/container-toolkit/container-toolkit!443
2023-07-17 12:04:39 +00:00
Evan Lezar
a883c65dd6 Fix changelog for 1.14.0-rc.2 2023-07-17 12:04:38 +00:00
Evan Lezar
aac39f89cc Merge branch 'update-libnvidia-container' into 'main'
Include Shared Compiler Library (libnvidia-gpucomp.so) in the list of compute libaries.

See merge request nvidia/container-toolkit/container-toolkit!442
2023-07-13 12:57:40 +00:00
Evan Lezar
e25576d26d Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-13 14:15:33 +02:00
Evan Lezar
3626a13273 Merge branch 'fix-disable-require' into 'main'
Return empty requirements if NVIDIA_DISABLE_REQUIRE is true

See merge request nvidia/container-toolkit/container-toolkit!438
2023-07-11 11:48:35 +00:00
Evan Lezar
6750ce1667 Print invalid version on parse error
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 13:47:39 +02:00
Evan Lezar
1081cecea9 Return empty requirements if NVIDIA_DISABLE_REQUIRE is true
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 13:47:37 +02:00
Evan Lezar
7451e6eb75 Merge branch 'custom-firmware-paths' into 'main'
Add firmware search paths when generating CDI specifications

See merge request nvidia/container-toolkit/container-toolkit!439
2023-07-11 09:16:33 +00:00
Evan Lezar
81908c8cc9 Search custom firmware paths first
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 10:34:14 +02:00
Evan Lezar
d3d41a3e1d Simplify handling of custom firmware path
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 10:31:50 +02:00
Evan Lezar
0a37f8798a Add firmware search paths when generating CDI specifications
Path to locate the GSP firmware is explicitly set to /lib/firmware/nvidia.
Users may chose to install the GSP firmware in alternate locations where
the kernel would look for firmware on the root filesystem.

Add locate functionality which looks for the GSP firmware files in the
same location as the kernel would
(https://docs.kernel.org/driver-api/firmware/fw_search_path.html).

The paths searched in order are:
- path described in /sys/module/firmware_class/parameters/path
- /lib/firmware/updates/UTS_RELEASE/
- /lib/firmware/updates/
- /lib/firmware/UTS_RELEASE/
- /lib/firmware/

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 10:31:50 +02:00
Evan Lezar
4f89b60ab9 Merge branch 'remove-experimental-runtime' into 'main'
Remove NVIDIA experimental runtime from toolkit container

See merge request nvidia/container-toolkit/container-toolkit!238
2023-07-10 10:25:55 +00:00
Evan Lezar
0938576618 Remove NVIDIA experimental runtime from toolkit container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-10 11:44:55 +02:00
Evan Lezar
4ca8d4173a Merge branch 'revert-d5cbe48d' into 'main'
Revert "Merge branch 'bump-golang-1.20.5' into 'main'"

See merge request nvidia/container-toolkit/container-toolkit!437
2023-07-05 15:12:01 +00:00
Evan Lezar
978549dc58 Revert "Merge branch 'bump-golang-1.20.5' into 'main'"
This reverts merge request !436
2023-07-05 15:11:41 +00:00
Evan Lezar
d5cbe48d59 Merge branch 'bump-golang-1.20.5' into 'main'
Bump golang version to 1.20.5

See merge request nvidia/container-toolkit/container-toolkit!436
2023-07-05 14:07:58 +00:00
Evan Lezar
e8ec795883 Bump golang version to 1.20.5
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 16:07:41 +02:00
Evan Lezar
62bc6b211f Merge branch 'bump-cuda-12.2.0' into 'main'
Bump cuda base image to 12.2.0

See merge request nvidia/container-toolkit/container-toolkit!435
2023-07-05 10:11:27 +00:00
Evan Lezar
6fac6c237b Bump cuda base image to 12.2.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:28:32 +02:00
Evan Lezar
20ff4e2fb9 Merge branch 'generate-default-config-post-install' into 'main'
Ensure that default config is created on the file system as a post-install step

See merge request nvidia/container-toolkit/container-toolkit!431
2023-07-05 09:27:29 +00:00
Evan Lezar
f78d3a858f Rework default config generation to not use toml
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:26:55 +02:00
Evan Lezar
bc6ca7ff88 Generate default config post-install
The debian and rpm packages are updated to trigger the generation of
of a default config if no config exists at the expected location.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:26:53 +02:00
Evan Lezar
65ae6f1dab Fix generation of default config
This change ensures that the nvidia-ctk config default command
generates a config file that is compatible with the official documentation
to, for example, disable cgroups in the NVIDIA Container CLI.

This requires that whitespace around comments is stripped before outputing the
contets.

This also adds an option to load a config and modify it in-place instead. This can
be triggered as a post-install step, for example.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:26:04 +02:00
Evan Lezar
ba24338122 Add quiet mode to nvidia-ctk cli
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:26:04 +02:00
Evan Lezar
2299c9588d Merge branch 'create-config-folders' into 'main'
Ensure that folders exist when creating config files

See merge request nvidia/container-toolkit/container-toolkit!433
2023-07-05 09:25:28 +00:00
Evan Lezar
ba80d0318f Merge branch 'rpm-fix-missing-coreutils-during-install' into 'main'
RPM spec: Avoid scriptlet failure during initial system installation

See merge request nvidia/container-toolkit/container-toolkit!432
2023-07-05 08:43:26 +00:00
Evan Lezar
6342dae0e9 Ensure that parent directories exist for config files
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-03 15:30:31 +02:00
Evan Lezar
baf94181aa Add engine.Config to encapsulate writing
This change adds an engine.Config type to encapsulate the writing
of config files for container engines.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-03 15:26:47 +02:00
Evan Lezar
bbe9742c46 Merge branch 'switch-to-latest-dind' into 'main'
Switch to latest dind image for tests

See merge request nvidia/container-toolkit/container-toolkit!430
2023-06-30 09:46:49 +00:00
Evan Lezar
1447ef3818 Switch to latest dind image for tests
The stable-dind image is out of date and has not been updated for 3 years.
This change updates to the latest dind image.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-30 11:03:07 +02:00
Claudius Volz
5598dbf9d7 RPM spec: Only run fixup code if the package is being upgraded, to avoid a scenario where the coreutils (mkdir, cp) are not available yet during a fresh system installation.
Signed-off-by: Claudius Volz <c.volz@gmx.de>
2023-06-29 00:23:24 +02:00
Evan Lezar
8967e851c4 Merge branch 'fix-multiple-driver-roots-wsl' into 'main'
Fix bug with multiple driver store paths

See merge request nvidia/container-toolkit/container-toolkit!425
2023-06-27 14:15:38 +00:00
Evan Lezar
15378f6ced Merge branch 'fix-ordering-of-envvars' into 'main'
Ensure common envvars have higher precedence

See merge request nvidia/container-toolkit/container-toolkit!426
2023-06-27 13:26:34 +00:00
Evan Lezar
4d2e8d1913 Ensure common envvars have higher precedence
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-27 14:45:15 +02:00
Evan Lezar
4feaee0fe6 Merge branch 'bump-version' into 'main'
Bump version to v1.14.0-rc.2

See merge request nvidia/container-toolkit/container-toolkit!427
2023-06-27 12:38:05 +00:00
Evan Lezar
51984d49cf Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-27 14:37:26 +02:00
Evan Lezar
a6a8bb940c Bump version to v1.14.0-rc.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-27 14:02:59 +02:00
Evan Lezar
6265e34afb Fix bug with multiple driver store paths
This change uses the actual discovered path of nvidia-smi when
creating a symlink to the binary on WSL2 platforms.

This ensures that cases where multiple driver store paths are
detected are supported.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-26 21:37:14 +02:00
Evan Lezar
d08a2394b3 Merge branch 'fix-package-archive-script' into 'main'
Fix package archive script

See merge request nvidia/container-toolkit/container-toolkit!424
2023-06-26 11:46:26 +00:00
Evan Lezar
c0f1263d78 Fix package archive script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-26 13:46:08 +02:00
Evan Lezar
a25b1c1048 Merge branch 'fix-load-kernel-modules' into 'main'
Split internal system package

See merge request nvidia/container-toolkit/container-toolkit!420
2023-06-26 08:30:51 +00:00
Evan Lezar
99859e461d Merge branch 'import-wrapper-and-runtime' into 'main'
Import NVIDIA Docker and NVIDIA Container Runtime to in-tree folders

See merge request nvidia/container-toolkit/container-toolkit!418
2023-06-21 12:33:05 +00:00
Evan Lezar
d52dbeaa7a Split internal system package
This changes splits the functionality in the internal system package
into two packages: one for dealing with devices and one for dealing
with kernel modules. This removes ambiguity around the meaning of
driver / device roots in each case.

In each case, a root can be specified where device nodes are created
or kernel modules loaded.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-15 09:01:13 +02:00
Evan Lezar
c11c7695cb Merge branch 'update-go-nvlib' into 'main'
Update go-nvlib with new constructor API

See merge request nvidia/container-toolkit/container-toolkit!422
2023-06-14 22:50:12 +00:00
Evan Lezar
c4d3b13ae2 Update go-nvlib with new constructor
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-14 17:55:33 +02:00
Evan Lezar
bcf3a70174 Update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-14 17:55:18 +02:00
Evan Lezar
743d290577 Merge branch 'CNT-4301/resolve-auto-to-cdi' into 'main'
Resolve auto mode to cdi if all devices are cdi devices

See merge request nvidia/container-toolkit/container-toolkit!421
2023-06-13 14:48:32 +00:00
Evan Lezar
82347eb9bc Resolve auto mode as cdi for fully-qualified names
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-13 16:05:37 +02:00
Evan Lezar
84c7bf8b18 Minor refactor of mode resolver
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-13 16:04:03 +02:00
Evan Lezar
d92300506c Construct CUDA image object once
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-13 10:36:02 +02:00
Evan Lezar
2da32970b9 Merge branch 'refactor-logger' into 'main'
Refactor Logging

See merge request nvidia/container-toolkit/container-toolkit!416
2023-06-12 08:59:41 +00:00
Evan Lezar
1d0a733487 Replace logger.Warn(f) with logger.Warning(f)
This aligns better with klog used in other projects.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-12 10:48:04 +02:00
Evan Lezar
9464953924 Use logger.Interface when resolving auto mode
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-12 10:46:11 +02:00
Evan Lezar
c9b05d8fed Use logger Interface in runtime configuration
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-12 10:46:11 +02:00
Evan Lezar
a02bc27c3e Define a basic logger interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-12 10:46:10 +02:00
Carlos Eduardo Arango Gutierrez
6a04e97bca Merge branch 'use-same-envvars-for-runtime-config' into 'main'
Allow same envars for all runtime configs

See merge request nvidia/container-toolkit/container-toolkit!357
2023-06-12 08:32:34 +00:00
Evan Lezar
0780621024 Ensure runtime dir is set for crio cleanup
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-11 12:46:08 +02:00
Evan Lezar
2bc0f45a52 Remove unused constants and variables
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-11 11:38:22 +02:00
Evan Lezar
178eb5c5a8 Rework restart logic
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-10 12:41:53 +02:00
Evan Lezar
761fc29567 Add version info to config CLIs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-09 18:49:17 +02:00
Evan Lezar
9f5c82420a Refactor toolking to setup and cleanup configs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-09 18:49:15 +02:00
Evan Lezar
23041be511 Add runtimeDir as argument
Thsi change adds the --nvidia-runtime-dir as a command line
argument when configuring container runtimes in the toolkit container.
This removes the need to set it via the command line.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-09 18:48:34 +02:00
Evan Lezar
dcbf4b4f2a Allow same envars for all runtime configs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-09 18:46:34 +02:00
Evan Lezar
652345bc4d Add nvidia-container-runtime as folder
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-07 13:17:48 +02:00
Evan Lezar
69a1a9ef7a Add nvidia-docker as folder
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-07 13:15:23 +02:00
Evan Lezar
2464181d2b Remove runtime and wrapper submodules
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-07 13:09:18 +02:00
Evan Lezar
c3c1d19a5c Merge branch 'bump-libnvidia-container' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!415
2023-06-06 19:22:43 +00:00
Evan Lezar
75f288a6e4 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-06 21:22:25 +02:00
Evan Lezar
94259baea1 Merge branch 'CNT-4302/cdi-only' into 'main'
Skip additional modifications in CDI mode

See merge request nvidia/container-toolkit/container-toolkit!413
2023-06-06 18:29:59 +00:00
Evan Lezar
9e8ff003b6 Merge branch 'bump-libnvidia-container' into 'main'
Bump libnvidia container

See merge request nvidia/container-toolkit/container-toolkit!414
2023-06-06 15:29:35 +00:00
Evan Lezar
3dee9d9a4c Support OpenSSL 3 with the Encrypt/Decrypt library
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-06 16:47:21 +02:00
Evan Lezar
3f03a71afd Skip additional modifications in CDI mode
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-05 15:01:58 +02:00
Evan Lezar
093e93cfbf Merge branch 'bump-cuda-base-image' into 'main'
Bump CUDA baseimage version to 12.1.1

See merge request nvidia/container-toolkit/container-toolkit!412
2023-06-01 12:42:48 +00:00
Evan Lezar
78f619b1e7 Bump CUDA baseimage version to 12.1.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-01 14:42:30 +02:00
Evan Lezar
43c44a0f48 Merge branch 'treat-log-errors-as-non-fatal' into 'main'
Ignore errors when creating debug log file

See merge request nvidia/container-toolkit/container-toolkit!404
2023-06-01 07:44:56 +00:00
Evan Lezar
6b1e8171c8 Merge branch 'add-mod-probe' into 'main'
Add option to load NVIDIA kernel modules

See merge request nvidia/container-toolkit/container-toolkit!409
2023-05-31 18:14:45 +00:00
Evan Lezar
2e50b3da7c Merge branch 'ldcache-resolve-circular' into 'main'
Fix infinite recursion when resolving libraries in LDCache

Closes #13

See merge request nvidia/container-toolkit/container-toolkit!406
2023-05-31 17:35:27 +00:00
Evan Lezar
eca13e72bf Update CHANGELOG
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 19:33:31 +02:00
Evan Lezar
b64ba6ac2d Add option to create device nodes
This change adds a --create-device-nodes option to the
nvidia-ctk system create-dev-char-symlinks command to create
device nodes. The currently only creates control device nodes.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 19:31:38 +02:00
Evan Lezar
7b801a0ce0 Add option to load NVIDIA kernel modules
These changes add a --load-kernel-modules option to the
nvidia-ctk system commands. If specified the NVIDIA kernel modules
(nvidia, nvidia-uvm, and nvidia-modeset) are loaded before any
operations on device nodes are performed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 19:31:38 +02:00
Evan Lezar
528cbbb636 Merge branch 'fix-device-symlinks' into 'main'
Fix creation of device symlinks in /dev/char

See merge request nvidia/container-toolkit/container-toolkit!399
2023-05-31 17:31:04 +00:00
Evan Lezar
fd48233c13 Merge branch 'fix-ubi-pipeline-dependency' into 'main'
Fix ui8 image job dependencies

See merge request nvidia/container-toolkit/container-toolkit!411
2023-05-31 16:39:10 +00:00
Evan Lezar
b72764af5a Fix ui8 image job dependencies
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 17:54:41 +02:00
Evan Lezar
7e7c45fb0f Merge branch 'switch-to-centos7' into 'main'
Use centos7 packages instead of centos8 packages

See merge request nvidia/container-toolkit/container-toolkit!410
2023-05-31 15:17:08 +00:00
Evan Lezar
61f515b3dd Use centos7 packages in kitmaker archives
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 16:28:54 +02:00
Evan Lezar
e05686cbe8 Use centos7 packages for ubi8 image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 16:26:43 +02:00
Carlos Eduardo Arango Gutierrez
1fc8ae32bd Merge branch 'rorajani-rename-ci' into 'main'
Rename blossom ci file

See merge request nvidia/container-toolkit/container-toolkit!408
2023-05-30 11:41:33 +00:00
rorajani
e80d43f4c4 Rename blossom ci file
Signed-off-by: rorajani <rorajani@nvidia.com>
2023-05-30 16:56:32 +05:30
Evan Lezar
a6b0f45d2c Fix infinite recursion when resolving ldcache
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-30 11:03:36 +02:00
Evan Lezar
39263ea365 Add command to print ldcache
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-30 11:02:33 +02:00
Evan Lezar
9ea214d0b3 Correct typo in info command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-30 10:58:30 +02:00
Evan Lezar
5371ff039b Merge branch 'CNT-4285/add-runtime-hook-path' into 'main'
Add nvidia-contianer-runtime-hook.path config option

See merge request nvidia/container-toolkit/container-toolkit!401
2023-05-26 08:29:52 +00:00
Evan Lezar
315f4adb8f Check for required device majors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-26 10:24:36 +02:00
Evan Lezar
05632c0a40 Treat missing nvidia device majors as an error
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-26 10:24:36 +02:00
Evan Lezar
8df4a98d7b Merge branch 'pre-sanity-check' into 'main'
Add pre sanity check for gothub repo

See merge request nvidia/container-toolkit/container-toolkit!396
2023-05-25 14:35:43 +00:00
Evan Lezar
02656b624d Create log directory if required
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-25 15:17:00 +02:00
Evan Lezar
61af2aee8e Ignore errors when creating debug log file
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-25 14:44:00 +02:00
Evan Lezar
ddebd69128 Use installed hook path in toolkit container
This change uses the installed NVIDIA Container Runtime Hook wrapper
as the path in the applied config. This prevents conflicts with other
installations of the NVIDIA Container Toolkit.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-25 12:05:33 +02:00
Evan Lezar
ac11727ec5 Add nvidia-contianer-runtime-hook.path config option
This change adds an nvidia-container-runtime-hook.path config option
to allow the path used for the prestart hook to be overridden. This
is useful in cases where multiple NVIDIA Container Toolkit installations
are present.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-25 12:05:33 +02:00
Evan Lezar
5748d220ba Merge branch 'add-centos7-aarch64' into 'main'
Add centos7-aarch64 CI jobs

See merge request nvidia/container-toolkit/container-toolkit!403
2023-05-24 14:48:59 +00:00
Evan Lezar
3b86683843 Add centos7-aarch64 CI jobs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-24 16:48:14 +02:00
Evan Lezar
3bd5baa3c5 Merge branch 'add-centos7-aarch64' into 'main'
Add centos7-aarch64 targets

See merge request nvidia/container-toolkit/container-toolkit!402
2023-05-24 14:42:32 +00:00
Evan Lezar
330aa16687 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-24 15:32:34 +02:00
Evan Lezar
8a4d6b5bcf Add centos7-aarch64 targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-24 15:32:09 +02:00
Evan Lezar
40d0a88cf9 Merge branch 'update-go-nvlib' into 'main'
Update go-nvlib to skip non-MIG devices

See merge request nvidia/container-toolkit/container-toolkit!398
2023-05-24 08:38:21 +00:00
Evan Lezar
dc6a895db8 Merge branch 'pass-single-links-instead-of-csv' into 'main'
Pass individual links in create-symlinks hook instead of CSV filename

See merge request nvidia/container-toolkit/container-toolkit!394
2023-05-23 19:56:18 +00:00
Evan Lezar
3b1b89e6c0 Merge branch 'better-support-for-skipping-update' into 'main'
Skip update of components on SKIP_UPDATE_COMPONENTS=yes

See merge request nvidia/container-toolkit/container-toolkit!400
2023-05-23 19:17:29 +00:00
Evan Lezar
013a1b413b Fix ineffectual assignment
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 21:14:02 +02:00
Evan Lezar
3be16d8077 Create individual links instead of processing CSV
This change switches to generating a OCI runtime hook to create
individual symlinks instead of processing a CSV file in the hook.
This allows for better reuse of the logic generating CDI
specifications, for example.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 20:43:36 +02:00
Evan Lezar
927ec78b6e Add symlinks package with Resolve function
This change adds a symlinks.Resolve function for resolving symlinks and
updates usages across the code to make use of it.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 20:42:17 +02:00
Evan Lezar
8ca606f7ac Skip update of components on SKIP_UPDATE_COMPONENTS=yes
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 20:34:20 +02:00
Evan Lezar
e7d2a9c212 Merge branch 'CNT-1876/cdi-specs-from-csv' into 'main'
Add csv mode to CDI spec generation

See merge request nvidia/container-toolkit/container-toolkit!393
2023-05-23 14:47:19 +00:00
Evan Lezar
fcb4e379e3 Fix mode resolution tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 16:02:07 +02:00
rorajani
cda96f2f9e Add pre sanity
Signed-off-by: rorajani <rorajani@nvidia.com>
2023-05-22 20:39:50 +05:30
Evan Lezar
e11f65e51e Update go-nvlib to skip non-MIG devices
This change updates go-nvlib to ensure that non-migcapable GPUs
are skipped when generating CDI specifications for MIG devices.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 15:36:55 +02:00
Evan Lezar
3ea02d13fc Merge branch 'use-major-minor-for-cuda-version' into 'main'
Use *.* pattern when locating libcuda.so

See merge request nvidia/container-toolkit/container-toolkit!397
2023-05-22 13:02:33 +00:00
Evan Lezar
e30fd0f4ad Add csv mode to nvidia-ctk cdi generate command
This chagne allows the csv mode option to specified in the
nvidia-ctk cdi generate command and adds a --csv.file option
that can be repeated to specify the CSV files to be processed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:56:45 +02:00
Evan Lezar
418e4014e6 Resolve to csv for CDI generation on tegra systems
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:56:00 +02:00
Evan Lezar
e78a4f5eac Add csv mode to nvcdi api
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:55:58 +02:00
Evan Lezar
540dbcbc03 Move tegra system mounts to tegra-specific discoverer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:55:22 +02:00
Evan Lezar
a8265f8846 Add tegra discoverer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:55:22 +02:00
Evan Lezar
c120c511d5 Merge branch 'CNT-3939/generate-all-device' into 'main'
Add options to generate all device to nvcdi API

See merge request nvidia/container-toolkit/container-toolkit!348
2023-05-22 11:53:39 +00:00
Evan Lezar
424b8c9d46 Use *.* pattern when locating libcuda.so
This change ensures that libcuda.so can be located on systems
where no patch version is specified in the driver version.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:53:19 +02:00
Evan Lezar
5bc72b70b8 Merge branch 'minor-refactor' into 'main'
Include xorg discoverer with graphics mounts

See merge request nvidia/container-toolkit/container-toolkit!392
2023-05-12 13:12:06 +00:00
Evan Lezar
fe37196788 Generate all device using merged transform
The nvcid api is extended to allow for merged device options to
be specified. If any options are specified, then a merged device
is generated.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-12 13:52:58 +02:00
Evan Lezar
ba44c50f4e Add MergedDevice transform to generate all device
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-12 13:52:58 +02:00
Evan Lezar
729ca941be Merge branch 'refactor-nvidia-ctk-path' into 'main'
Refactor discover.Config to prepare for CSV CDI spec generation.

See merge request nvidia/container-toolkit/container-toolkit!391
2023-05-12 10:45:36 +00:00
Evan Lezar
0ee947dba6 Merge branch 'CNT-4257/remove-redundant-packages' into 'main'
Remove config.toml from installation

See merge request nvidia/container-toolkit/container-toolkit!388
2023-05-12 10:41:47 +00:00
Evan Lezar
d1fd0a7384 Merge branch 'CNT-4270/centos7-packages' into 'main'
Publish centos7 packages as a kitmaker branch

See merge request nvidia/container-toolkit/container-toolkit!390
2023-05-11 10:10:18 +00:00
Evan Lezar
ae2c582138 Merge branch 'clean-scan-archives' into 'main'
Remove image archives after scan

See merge request nvidia/container-toolkit/container-toolkit!389
2023-05-11 10:10:05 +00:00
Evan Lezar
b7e5cef934 Include xorg discoverer with graphics mounts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 17:07:55 +02:00
Evan Lezar
9378d0cd0f Move discover.FindNvidiaCTK to config.ResolveNVIDIACTKPath
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 15:12:44 +02:00
Evan Lezar
f9df36c473 Rename config struct to options
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 15:12:00 +02:00
Evan Lezar
8bb0235c92 Remove discover.Config
These changes remove the use of discover.Config which was used
to pass the driver root and the nvidiaCTK path in some cases.

Instead, the nvidiaCTKPath is resolved at the begining of runtime
invocation to ensure that this is valid at all points where it is
used.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 15:03:37 +02:00
Evan Lezar
fc310e429e Publish centos7 packages as a kitmaker branch
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 13:44:47 +02:00
Evan Lezar
8d0ffb2fa5 Remove unneeded targets from scripts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 13:13:05 +02:00
Evan Lezar
9f07cc9ab2 Remove CVE_UPDATES from dockerfiles
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 13:10:10 +02:00
Evan Lezar
1fff80e10d Remove unused CI variables
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 13:10:10 +02:00
Evan Lezar
0a57cdc6e8 Remove redundant packaging targets from CI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 13:10:10 +02:00
Evan Lezar
1a86b20f7c Remove config.toml from installation
Since the default configuration is now platform specific,
there is no need to install specific versions as part of
the package installation.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 13:10:10 +02:00
Evan Lezar
0068750a5c Remove image archives after scan
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 10:52:47 +02:00
Evan Lezar
ee47f26d1c Merge branch 'cdi-list' into 'main'
Add nvidia-ctk cdi list command

See merge request nvidia/container-toolkit/container-toolkit!387
2023-05-09 18:59:56 +00:00
Evan Lezar
3945abb2f2 Add nvidia-ctk cdi list command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-09 19:59:00 +02:00
Evan Lezar
9de4f7f4b9 Merge branch 'CNT-4262/create-release-archives' into 'main'
Create and upload release archives to artifactory

See merge request nvidia/container-toolkit/container-toolkit!386
2023-05-09 14:38:10 +00:00
Evan Lezar
3610b5073b Add package release archive
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-09 16:01:53 +02:00
Evan Lezar
1991138185 Merge branch 'CNT-4259/check-for-images-before-release' into 'main'
Skip publishing of images if these already exist

See merge request nvidia/container-toolkit/container-toolkit!385
2023-05-09 09:03:52 +00:00
Evan Lezar
8ebc21cd1f Merge branch 'CNT-4260/add-packaging-scan-and-release' into 'main'
Add scan and release steps for packaging image

See merge request nvidia/container-toolkit/container-toolkit!384
2023-05-08 16:50:09 +00:00
Evan Lezar
1c1ce2c6f7 Use version from manifest to extract packages
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 16:32:35 +02:00
Evan Lezar
39b0830a66 Extract manifest from packaging image
Also include manifest.txt with, for example, version
info when extracting packages from the packagin image.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 15:55:27 +02:00
Evan Lezar
6b367445a3 Merge branch 'CNT-4016/add-nvidia-ctk-config-default' into 'main'
Add nvidia-ctk config generate-default command to generate default config file contents

See merge request nvidia/container-toolkit/container-toolkit!338
2023-05-08 10:40:42 +00:00
Evan Lezar
37c66fc33c Ensure that the nvidia-container-cli.user option is uncommented on suse
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 11:26:54 +02:00
Evan Lezar
1bd5798a99 Use toml representation to get defaults
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 11:26:53 +02:00
Evan Lezar
90c4c4811a Fallback to ldconfig if ldconfig.real does not exist
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 11:26:24 +02:00
Evan Lezar
49de170652 Generate default config.toml contents
This change adds a GetDefaultConfigToml function to the config package.

This function returns the default config in the form of raw TOML
including comments. This is useful for generating a default config at
installation time, with platform-specific differences codified.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 11:26:22 +02:00
Evan Lezar
07c89fa975 Always publish external images
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 10:27:48 +02:00
Evan Lezar
7a1f23e2e4 Skip publishing of images if these already exist
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-05 15:06:32 +02:00
Evan Lezar
25165b0771 Add scan and release steps for packaging image
This ensures that the artifacts associated with a particular
release version are preserved along with the container
images that are used as operands for this version.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-05 13:56:10 +02:00
Evan Lezar
3e7acec0b4 Add nvidia-ctk config generate-default command
This change adds a CLI command to generate a default config.
This config checks the host operating system to apply specific
modifications that were previously captured in static config
files.

These include:
* select /sbin/ldconfig or /sbin/ldconfig.real depending on which exists on the host
* set the user to allow device access on SUSE-based systems

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-03 16:11:05 +02:00
Evan Lezar
4165961d31 Rename config struct options to avoid conflict
This change renames the struct for storing CLI flag values options over
config to avoid a conflict with the config package.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-03 15:59:02 +02:00
Evan Lezar
2e3a12438a Fix toml definition in cli config struct
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-03 15:59:02 +02:00
Evan Lezar
731c99b52c Merge branch 'fix-cdi-permissions' into 'main'
Properly set spec permissions

See merge request nvidia/container-toolkit/container-toolkit!383
2023-05-03 08:47:58 +00:00
Evan Lezar
470b4eebd8 Properly set spec permissions
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-03 10:45:11 +02:00
Carlos Eduardo Arango Gutierrez
6750df8e01 Merge branch 'fix-cdi-spec-permissions' into 'main'
Generate CDI specifications with 644 permissions to allow non-root clients to consume them

See merge request nvidia/container-toolkit/container-toolkit!381
2023-05-02 19:36:40 +00:00
Evan Lezar
8736d1e78f Merge branch 'fix/minor-spelling' into 'main'
chore(cmd): Fixing minor spelling error.

See merge request nvidia/container-toolkit/container-toolkit!382
2023-05-02 17:58:13 +00:00
Elliot Courant
140b1e33ef chore(cmd): Fixing minor spelling error.
Fixed a minor spelling error inside `nvidia-ctk system create-device-nodes`.

Signed-off-by: Elliot Courant <me@elliotcourant.dev>
2023-05-02 12:53:45 -05:00
Evan Lezar
3056428eda Generate spec file with 644 permissions
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-02 16:47:44 +02:00
Evan Lezar
367a30827f Allow spec file permisions to be specified
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-02 16:27:50 +02:00
Evan Lezar
fe8ef9e0bd Merge branch 'fix-ld.so.conf-permissions' into 'main'
Create ld.so.conf file with permissions 644

See merge request nvidia/container-toolkit/container-toolkit!380
2023-05-02 10:51:40 +00:00
Evan Lezar
d77f46aa09 Create ld.so.conf file with permissions 644
By default, temporary files are created with permissions 600 and
this means that the files created when updating the ldcache are
not readable in non-root containers.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-02 12:51:27 +02:00
Evan Lezar
043e283db3 Merge branch 'nvidia-docker-as-meta-package' into 'main'
Support building nvidia-docker and nvidia-container-runtime as dist-independent packages

See merge request nvidia/container-toolkit/container-toolkit!379
2023-05-02 08:25:45 +00:00
Evan Lezar
2019f1e7ea Preserve timestamps when copying meta packages
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-26 17:03:36 +02:00
Evan Lezar
22c7178561 Build meta-packages before others
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-26 17:03:36 +02:00
Evan Lezar
525aeb102f Update third_party submodules
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-26 13:52:56 +02:00
Evan Lezar
9fb5ac36ed Allow update of subcomponents to be skipped
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-26 13:14:19 +02:00
Evan Lezar
c30764b7cc Update build all components for meta packages
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-26 13:14:19 +02:00
Evan Lezar
8a2de90c28 Merge branch 'revert-kitmaker-workaround' into 'main'
Remove workaround to add libnvidia-container0 to kitmaker archive

See merge request nvidia/container-toolkit/container-toolkit!378
2023-04-26 10:10:23 +00:00
Evan Lezar
243c439bb8 Remove workaround to add libnvidia-container0 to kitmaker archive
In order to add the libnvidia-container0 packages to our ubuntu18.04
kitmaker archive, a workaround was added that downloaded these packages
before constructing the archive. Since the packages have now been
published -- and will not change -- this workaround is not longer needed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-26 11:47:39 +02:00
Evan Lezar
060ac46bd8 Merge branch 'bump-runc' into 'main'
Bump golang version and update dependencies

See merge request nvidia/container-toolkit/container-toolkit!377
2023-04-25 10:28:07 +00:00
Evan Lezar
ae2a683929 Run go fmt
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-25 11:27:58 +02:00
Evan Lezar
2b5eeb8d24 Regenerate mocks for formatting
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-25 11:26:55 +02:00
Evan Lezar
bbb94be213 Bump golang version to 1.20.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-25 10:42:22 +02:00
Evan Lezar
e1c75aec6c Bump runc version and update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-25 10:40:46 +02:00
Carlos Eduardo Arango Gutierrez
3030d281d9 Merge branch 'engine_export' into 'main'
Export pkg config/engine

See merge request nvidia/container-toolkit/container-toolkit!375
2023-04-25 05:17:20 +00:00
Carlos Eduardo Arango Gutierrez
81d8b94cdc Export pkg config/engine
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-25 07:16:59 +02:00
Evan Lezar
276e1960b1 Merge branch 'CNT-2350/configure-containerd' into 'main'
Add support for containerd configs to nvidia-ctk runtime configure command

See merge request nvidia/container-toolkit/container-toolkit!355
2023-04-24 17:24:24 +00:00
Evan Lezar
70920d7a04 Add support for containerd to the runtime configure CLI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-24 18:32:28 +02:00
Evan Lezar
f1e201d368 Refactor runtime configure cli
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-24 18:32:04 +02:00
Evan Lezar
ef863f5fd4 Merge branch 'bump-version-1.14.0-rc.1' into 'main'
Bump version to 1.14.0-rc.1

See merge request nvidia/container-toolkit/container-toolkit!376
2023-04-24 16:27:05 +00:00
Evan Lezar
ce65df7d17 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-24 17:33:37 +02:00
Evan Lezar
fa9c6116a4 Bump version to 1.14.0-rc.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-24 17:33:04 +02:00
Evan Lezar
28b70663f1 Merge branch 'skip-for-point-release' into 'main'
Skip components for patch releases

See merge request nvidia/container-toolkit/container-toolkit!374
2023-04-24 12:12:36 +00:00
Evan Lezar
c0fe8f27eb Skip components for patch releases
This change ensures that the nvidia-docker2 and nvidia-container-runtime
components are not build and distributed for patch releases.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-24 14:00:10 +02:00
Evan Lezar
926ac77bc0 Merge branch 'fix-cdi-spec-generation-on-debian' into 'main'
Resolve all symlinks when finding libraries in LDCache

See merge request nvidia/container-toolkit/container-toolkit!370
2023-04-24 10:09:37 +00:00
Evan Lezar
fc7c8f7520 Resolve all symlinks in ldcache
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-21 17:28:49 +02:00
Evan Lezar
46c1c45d85 Add /usr/lib/current to search path
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-21 11:47:42 +02:00
Evan Lezar
f99e863649 Merge branch 'CNT-4142/xorg-missing-not-fatal' into 'main'
Make discovery of Xorg libraries optional

See merge request nvidia/container-toolkit/container-toolkit!368
2023-04-21 09:47:10 +00:00
Evan Lezar
dcc21ece97 Merge branch 'add-debug-output' into 'main'
Fix target folder for kitmaker

See merge request nvidia/container-toolkit/container-toolkit!373
2023-04-20 18:38:05 +00:00
Evan Lezar
a53e3604a6 Fix target folder for kitmaker
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-20 20:37:46 +02:00
Evan Lezar
cfea6c1179 Merge branch 'add-debug-output' into 'main'
Properly create target folder for kitmaker

See merge request nvidia/container-toolkit/container-toolkit!372
2023-04-20 14:53:57 +00:00
Evan Lezar
4d1daa0b6c Properly create target folder for kitmaker
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-20 16:53:32 +02:00
Evan Lezar
df925bc7fd Merge branch 'include-libnvidia-container-in-kitmaker' into 'main'
Add a workaround to publish libnvidia-container0

See merge request nvidia/container-toolkit/container-toolkit!371
2023-04-20 08:29:02 +00:00
Evan Lezar
df22e37dfd Add a workaround to publish libnvidia-container0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-18 23:23:36 +02:00
Evan Lezar
2136266d1d Make discovery of Xorg libraries optional
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-13 18:41:38 +02:00
Evan Lezar
a95232dd33 Merge branch 'CNT-4144/non-ldcache' into 'main'
Only update ldcache if it exists

See merge request nvidia/container-toolkit/container-toolkit!369
2023-04-13 16:12:22 +00:00
Evan Lezar
29c6288128 Only update ldcache if it exists
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-13 17:18:09 +02:00
Evan Lezar
cd6fcb5297 Merge branch 'bump-version-1.13.1' into 'main'
Bump version to 1.13.1

See merge request nvidia/container-toolkit/container-toolkit!367
2023-04-13 11:12:37 +00:00
Evan Lezar
36989deff7 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-13 13:11:39 +02:00
Evan Lezar
7f6c9851fe Bump version to 1.13.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-13 13:11:09 +02:00
Evan Lezar
b7079454b5 Merge branch 'bump-versions' into 'main'
Bump nvidia-docker and nvidia-container-runtime versions

See merge request nvidia/container-toolkit/container-toolkit!366
2023-03-31 13:00:58 +00:00
Evan Lezar
448bd45ab4 Bump nvidia-docker and nvidia-container-runtime versions
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-31 15:00:15 +02:00
Evan Lezar
dde6170df1 Merge branch 'bump-version-v1.13.0' into 'main'
Bump version to v1.13.0

See merge request nvidia/container-toolkit/container-toolkit!365
2023-03-31 10:58:18 +00:00
Evan Lezar
e4b9350e65 Update libnvidia-container to v1.13.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-31 11:38:01 +02:00
Evan Lezar
622a0649ce Bump version to v1.13.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-31 11:35:53 +02:00
1952 changed files with 584662 additions and 38263 deletions

View File

@@ -19,21 +19,13 @@ default:
variables:
GIT_SUBMODULE_STRATEGY: recursive
BUILDIMAGE: "${CI_REGISTRY_IMAGE}/build:${CI_COMMIT_SHORT_SHA}"
BUILD_MULTI_ARCH_IMAGES: "true"
stages:
- trigger
- image
- lint
- go-checks
- go-build
- unit-tests
- package-build
- image-build
- test
- pull
- scan
- release
- sign
.pipeline-trigger-rules:
rules:
@@ -53,139 +45,6 @@ workflow:
# We then add all the regular triggers
- !reference [.pipeline-trigger-rules, rules]
# The main or manual job is used to filter out distributions or architectures that are not required on
# every build.
.main-or-manual:
rules:
- !reference [.pipeline-trigger-rules, rules]
- if: $CI_PIPELINE_SOURCE == "schedule"
when: manual
# The trigger-pipeline job adds a manualy triggered job to the pipeline on merge requests.
trigger-pipeline:
stage: trigger
script:
- echo "starting pipeline"
rules:
- !reference [.main-or-manual, rules]
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
when: manual
allow_failure: false
- when: always
# Define the distribution targets
.dist-amazonlinux2:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: amazonlinux2
PACKAGE_REPO_TYPE: rpm
.dist-centos7:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: centos7
CVE_UPDATES: "cyrus-sasl-lib"
PACKAGE_REPO_TYPE: rpm
.dist-centos8:
variables:
DIST: centos8
CVE_UPDATES: "cyrus-sasl-lib"
PACKAGE_REPO_TYPE: rpm
.dist-debian10:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: debian10
PACKAGE_REPO_TYPE: debian
.dist-opensuse-leap15.1:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: opensuse-leap15.1
PACKAGE_REPO_TYPE: rpm
.dist-ubi8:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: ubi8
CVE_UPDATES: "cyrus-sasl-lib"
PACKAGE_REPO_TYPE: rpm
.dist-ubuntu18.04:
variables:
DIST: ubuntu18.04
CVE_UPDATES: "libsasl2-2 libsasl2-modules-db"
PACKAGE_REPO_TYPE: debian
.dist-ubuntu20.04:
variables:
DIST: ubuntu20.04
CVE_UPDATES: "libsasl2-2 libsasl2-modules-db"
PACKAGE_REPO_TYPE: debian
.dist-packaging:
variables:
DIST: packaging
# Define architecture targets
.arch-aarch64:
variables:
ARCH: aarch64
.arch-amd64:
variables:
ARCH: amd64
.arch-arm64:
variables:
ARCH: arm64
.arch-ppc64le:
rules:
- !reference [.main-or-manual, rules]
variables:
ARCH: ppc64le
.arch-x86_64:
variables:
ARCH: x86_64
# Define the platform targets
.platform-amd64:
variables:
PLATFORM: linux/amd64
.platform-arm64:
variables:
PLATFORM: linux/arm64
# Define test helpers
.integration:
stage: test
variables:
IMAGE_NAME: "${CI_REGISTRY_IMAGE}/container-toolkit"
VERSION: "${CI_COMMIT_SHORT_SHA}"
before_script:
- apk add --no-cache make bash jq
- docker login -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}" "${CI_REGISTRY}"
- docker pull "${IMAGE_NAME}:${VERSION}-${DIST}"
script:
- make -f build/container/Makefile test-${DIST}
# Define the test targets
test-packaging:
extends:
- .integration
- .dist-packaging
needs:
- image-packaging
# Download the regctl binary for use in the release steps
.regctl-setup:
before_script:
@@ -195,92 +54,3 @@ test-packaging:
- curl -sSLo bin/regctl https://github.com/regclient/regclient/releases/download/${REGCTL_VERSION}/regctl-linux-amd64
- chmod a+x bin/regctl
- export PATH=$(pwd)/bin:${PATH}
# .release forms the base of the deployment jobs which push images to the CI registry.
# This is extended with the version to be deployed (e.g. the SHA or TAG) and the
# target os.
.release:
stage: release
variables:
# Define the source image for the release
IMAGE_NAME: "${CI_REGISTRY_IMAGE}/container-toolkit"
VERSION: "${CI_COMMIT_SHORT_SHA}"
# OUT_IMAGE_VERSION is overridden for external releases
OUT_IMAGE_VERSION: "${CI_COMMIT_SHORT_SHA}"
before_script:
- !reference [.regctl-setup, before_script]
# We ensure that the OUT_IMAGE_VERSION is set
- 'echo Version: ${OUT_IMAGE_VERSION} ; [[ -n "${OUT_IMAGE_VERSION}" ]] || exit 1'
# In the case where we are deploying a different version to the CI_COMMIT_SHA, we
# need to tag the image.
# Note: a leading 'v' is stripped from the version if present
- apk add --no-cache make bash
script:
# Log in to the "output" registry, tag the image and push the image
- 'echo "Logging in to CI registry ${CI_REGISTRY}"'
- regctl registry login "${CI_REGISTRY}" -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}"
- '[ ${CI_REGISTRY} = ${OUT_REGISTRY} ] || echo "Logging in to output registry ${OUT_REGISTRY}"'
- '[ ${CI_REGISTRY} = ${OUT_REGISTRY} ] || regctl registry login "${OUT_REGISTRY}" -u "${OUT_REGISTRY_USER}" -p "${OUT_REGISTRY_TOKEN}"'
# Since OUT_IMAGE_NAME and OUT_IMAGE_VERSION are set, this will push the CI image to the
# Target
- make -f build/container/Makefile push-${DIST}
# Define a staging release step that pushes an image to an internal "staging" repository
# This is triggered for all pipelines (i.e. not only tags) to test the pipeline steps
# outside of the release process.
.release:staging:
extends:
- .release
variables:
OUT_REGISTRY_USER: "${CI_REGISTRY_USER}"
OUT_REGISTRY_TOKEN: "${CI_REGISTRY_PASSWORD}"
OUT_REGISTRY: "${CI_REGISTRY}"
OUT_IMAGE_NAME: "${CI_REGISTRY_IMAGE}/staging/container-toolkit"
# Define an external release step that pushes an image to an external repository.
# This includes a devlopment image off main.
.release:external:
extends:
- .release
rules:
- if: $CI_COMMIT_TAG
variables:
OUT_IMAGE_VERSION: "${CI_COMMIT_TAG}"
- if: $CI_COMMIT_BRANCH == $RELEASE_DEVEL_BRANCH
variables:
OUT_IMAGE_VERSION: "${DEVEL_RELEASE_IMAGE_VERSION}"
# Define the release jobs
release:staging-centos7:
extends:
- .release:staging
- .dist-centos7
needs:
- image-centos7
release:staging-ubi8:
extends:
- .release:staging
- .dist-ubi8
needs:
- image-ubi8
release:staging-ubuntu20.04:
extends:
- .release:staging
- .dist-ubuntu20.04
needs:
- test-toolkit-ubuntu20.04
- test-containerd-ubuntu20.04
- test-crio-ubuntu20.04
- test-docker-ubuntu20.04
release:staging-packaging:
extends:
- .release:staging
- .dist-packaging
needs:
- test-packaging

3
.github/copy-pr-bot.yaml vendored Normal file
View File

@@ -0,0 +1,3 @@
# https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#configuration
enabled: true

120
.github/dependabot.yml vendored Normal file
View File

@@ -0,0 +1,120 @@
# Please see the documentation for all configuration options:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
version: 2
updates:
# main branch
- package-ecosystem: "gomod"
target-branch: main
directories:
- "/"
- "deployments/devel"
- "tests"
schedule:
interval: "daily"
labels:
- dependencies
groups:
k8sio:
patterns:
- k8s.io/*
exclude-patterns:
- k8s.io/klog/*
- package-ecosystem: "docker"
target-branch: main
directories:
# CUDA image
- "/deployments/container"
# Golang version
- "/deployments/devel"
schedule:
interval: "daily"
labels:
- dependencies
- package-ecosystem: "github-actions"
target-branch: main
directory: "/"
schedule:
interval: "daily"
labels:
- dependencies
# Allow dependabot to update the libnvidia-container submodule.
- package-ecosystem: "gitsubmodule"
target-branch: main
directory: "/"
allow:
- dependency-name: "third_party/libnvidia-container"
schedule:
interval: "daily"
labels:
- dependencies
- libnvidia-container
# The release branch(es):
- package-ecosystem: "gomod"
target-branch: release-1.17
directories:
- "/"
# We don't update development or test dependencies on release branches
# - "deployments/devel"
# - "tests"
schedule:
interval: "weekly"
day: "sunday"
labels:
- dependencies
- maintenance
ignore:
# For release branches we only consider patch updates.
- dependency-name: "*"
update-types:
- version-update:semver-major
- version-update:semver-minor
groups:
k8sio:
patterns:
- k8s.io/*
exclude-patterns:
- k8s.io/klog/*
- package-ecosystem: "docker"
target-branch: release-1.17
directories:
# CUDA image
- "/deployments/container"
# Golang version
- "/deployments/devel"
schedule:
interval: "weekly"
day: "sunday"
ignore:
# For release branches we only apply patch updates to the golang version.
- dependency-name: "*golang*"
update-types:
- version-update:semver-major
- version-update:semver-minor
labels:
- dependencies
- maintenance
- package-ecosystem: "github-actions"
target-branch: release-1.17
directory: "/"
schedule:
interval: "weekly"
day: "sunday"
labels:
- dependencies
- maintenance
# Github actions need to be gh-pages branches.
- package-ecosystem: "github-actions"
target-branch: gh-pages
directory: "/"
schedule:
interval: "daily"
labels:
- dependencies

View File

@@ -1,113 +0,0 @@
# Copyright (c) 2020-2023, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# A workflow to trigger ci on hybrid infra (github + self hosted runner)
name: Blossom-CI
on:
issue_comment:
types: [created]
workflow_dispatch:
inputs:
platform:
description: 'runs-on argument'
required: false
args:
description: 'argument'
required: false
jobs:
Authorization:
name: Authorization
runs-on: blossom
outputs:
args: ${{ env.args }}
# This job only runs for pull request comments
if: |
contains( '\
anstockatnv,\
rohitrajani2018,\
cdesiniotis,\
shivamerla,\
ArangoGutierrez,\
elezar,\
klueska,\
zvonkok,\
', format('{0},', github.actor)) &&
github.event.comment.body == '/blossom-ci'
steps:
- name: Check if comment is issued by authorized person
run: blossom-ci
env:
OPERATION: 'AUTH'
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO_KEY_DATA: ${{ secrets.BLOSSOM_KEY }}
Vulnerability-scan:
name: Vulnerability scan
needs: [Authorization]
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
with:
repository: ${{ fromJson(needs.Authorization.outputs.args).repo }}
ref: ${{ fromJson(needs.Authorization.outputs.args).ref }}
lfs: 'true'
# repo specific steps
#- name: Setup java
# uses: actions/setup-java@v1
# with:
# java-version: 1.8
# add blackduck properties https://synopsys.atlassian.net/wiki/spaces/INTDOCS/pages/631308372/Methods+for+Configuring+Analysis#Using-a-configuration-file
#- name: Setup blackduck properties
# run: |
# PROJECTS=$(mvn -am dependency:tree | grep maven-dependency-plugin | awk '{ out="com.nvidia:"$(NF-1);print out }' | grep rapids | xargs | sed -e 's/ /,/g')
# echo detect.maven.build.command="-pl=$PROJECTS -am" >> application.properties
# echo detect.maven.included.scopes=compile >> application.properties
- name: Run blossom action
uses: NVIDIA/blossom-action@main
env:
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO_KEY_DATA: ${{ secrets.BLOSSOM_KEY }}
with:
args1: ${{ fromJson(needs.Authorization.outputs.args).args1 }}
args2: ${{ fromJson(needs.Authorization.outputs.args).args2 }}
args3: ${{ fromJson(needs.Authorization.outputs.args).args3 }}
Job-trigger:
name: Start ci job
needs: [Vulnerability-scan]
runs-on: blossom
steps:
- name: Start ci job
run: blossom-ci
env:
OPERATION: 'START-CI-JOB'
CI_SERVER: ${{ secrets.CI_SERVER }}
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Upload-Log:
name: Upload log
runs-on: blossom
if : github.event_name == 'workflow_dispatch'
steps:
- name: Jenkins log for pull request ${{ fromJson(github.event.inputs.args).pr }} (click here)
run: blossom-ci
env:
OPERATION: 'POST-PROCESSING'
CI_SERVER: ${{ secrets.CI_SERVER }}
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}

53
.github/workflows/ci.yaml vendored Normal file
View File

@@ -0,0 +1,53 @@
# Copyright 2025 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: CI Pipeline
on:
push:
branches:
- "pull-request/[0-9]+"
- main
- release-*
jobs:
code-scanning:
uses: ./.github/workflows/code_scanning.yaml
variables:
runs-on: ubuntu-latest
outputs:
version: ${{ steps.version.outputs.version }}
steps:
- name: Generate Commit Short SHA
id: version
run: echo "version=$(echo $GITHUB_SHA | cut -c1-8)" >> "$GITHUB_OUTPUT"
golang:
uses: ./.github/workflows/golang.yaml
image:
uses: ./.github/workflows/image.yaml
needs: [variables, golang, code-scanning]
secrets: inherit
with:
version: ${{ needs.variables.outputs.version }}
build_multi_arch_images: ${{ github.ref_name == 'main' || startsWith(github.ref_name, 'release-') }}
e2e-test:
needs: [image, variables]
secrets: inherit
uses: ./.github/workflows/e2e.yaml
with:
version: ${{ needs.variables.outputs.version }}

49
.github/workflows/code_scanning.yaml vendored Normal file
View File

@@ -0,0 +1,49 @@
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: "CodeQL"
on:
workflow_call: {}
pull_request:
types:
- opened
- synchronize
branches:
- main
- release-*
jobs:
analyze:
name: Analyze Go code with CodeQL
runs-on: ubuntu-latest
timeout-minutes: 360
permissions:
security-events: write
packages: read
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: go
build-mode: manual
- shell: bash
run: |
make build
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3
with:
category: "/language:go"

103
.github/workflows/e2e.yaml vendored Normal file
View File

@@ -0,0 +1,103 @@
# Copyright 2025 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: End-to-end Tests
on:
workflow_call:
inputs:
version:
required: true
type: string
secrets:
AWS_ACCESS_KEY_ID:
required: true
AWS_SECRET_ACCESS_KEY:
required: true
AWS_SSH_KEY:
required: true
E2E_SSH_USER:
required: true
SLACK_BOT_TOKEN:
required: true
SLACK_CHANNEL_ID:
required: true
jobs:
e2e-tests:
runs-on: linux-amd64-cpu4
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Calculate build vars
id: vars
run: |
echo "COMMIT_SHORT_SHA=${GITHUB_SHA:0:8}" >> $GITHUB_ENV
echo "LOWERCASE_REPO_OWNER=$(echo "${GITHUB_REPOSITORY_OWNER}" | awk '{print tolower($0)}')" >> $GITHUB_ENV
GOLANG_VERSION=$(./hack/golang-version.sh)
echo "GOLANG_VERSION=${GOLANG_VERSION##GOLANG_VERSION := }" >> $GITHUB_ENV
- name: Install Go
uses: actions/setup-go@v5
with:
go-version: ${{ env.GOLANG_VERSION }}
- name: Set up Holodeck
uses: NVIDIA/holodeck@v0.2.12
with:
aws_access_key_id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws_secret_access_key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws_ssh_key: ${{ secrets.AWS_SSH_KEY }}
holodeck_config: "tests/e2e/infra/aws.yaml"
- name: Get public dns name
id: holodeck_public_dns_name
uses: mikefarah/yq@master
with:
cmd: yq '.status.properties[] | select(.name == "public-dns-name") | .value' /github/workspace/.cache/holodeck.yaml
- name: Run e2e tests
env:
E2E_INSTALL_CTK: "true"
E2E_IMAGE_NAME: ghcr.io/nvidia/container-toolkit
E2E_IMAGE_TAG: ${{ inputs.version }}
E2E_SSH_USER: ${{ secrets.E2E_SSH_USER }}
E2E_SSH_HOST: ${{ steps.holodeck_public_dns_name.outputs.result }}
run: |
e2e_ssh_key=$(mktemp)
echo "${{ secrets.AWS_SSH_KEY }}" > "$e2e_ssh_key"
chmod 600 "$e2e_ssh_key"
export E2E_SSH_KEY="$e2e_ssh_key"
make -f tests/e2e/Makefile test
- name: Archive Ginkgo logs
uses: actions/upload-artifact@v4
with:
name: ginkgo-logs
path: ginkgo.json
retention-days: 15
- name: Send Slack alert notification
if: ${{ failure() }}
uses: slackapi/slack-github-action@v2.1.0
with:
method: chat.postMessage
token: ${{ secrets.SLACK_BOT_TOKEN }}
payload: |
channel: ${{ secrets.SLACK_CHANNEL_ID }}
text: |
:x: On repository ${{ github.repository }}, the Workflow *${{ github.workflow }}* has failed.
Details: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}

102
.github/workflows/golang.yaml vendored Normal file
View File

@@ -0,0 +1,102 @@
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Golang
on:
workflow_call: {}
pull_request:
types:
- opened
- synchronize
branches:
- main
- release-*
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
name: Checkout code
- name: Get Golang version
id: vars
run: |
GOLANG_VERSION=$(./hack/golang-version.sh)
echo "GOLANG_VERSION=${GOLANG_VERSION##GOLANG_VERSION := }" >> $GITHUB_ENV
- name: Install Go
uses: actions/setup-go@v5
with:
go-version: ${{ env.GOLANG_VERSION }}
- name: Lint
uses: golangci/golangci-lint-action@v8
with:
version: latest
args: -v --timeout 5m
skip-cache: true
- name: Check golang modules
run: |
make check-vendor
make -C deployments/devel check-modules
test:
name: Unit test
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Get Golang version
id: vars
run: |
GOLANG_VERSION=$(./hack/golang-version.sh)
echo "GOLANG_VERSION=${GOLANG_VERSION##GOLANG_VERSION := }" >> $GITHUB_ENV
- name: Install Go
uses: actions/setup-go@v5
with:
go-version: ${{ env.GOLANG_VERSION }}
- name: Run unit tests and generate coverage report
run: make coverage
- name: Upload to Coveralls
uses: coverallsapp/github-action@v2
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
file: coverage.out
build:
name: Build
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Get Golang version
id: vars
run: |
GOLANG_VERSION=$(./hack/golang-version.sh)
echo "GOLANG_VERSION=${GOLANG_VERSION##GOLANG_VERSION ?= }" >> $GITHUB_ENV
- name: Install Go
uses: actions/setup-go@v5
with:
go-version: ${{ env.GOLANG_VERSION }}
- run: make build

120
.github/workflows/image.yaml vendored Normal file
View File

@@ -0,0 +1,120 @@
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Run this workflow on pull requests
name: image
on:
workflow_call:
inputs:
version:
required: true
type: string
build_multi_arch_images:
required: true
type: string
jobs:
packages:
runs-on: ubuntu-latest
strategy:
matrix:
target:
- ubuntu18.04-arm64
- ubuntu18.04-amd64
- ubuntu18.04-ppc64le
- centos7-aarch64
- centos7-x86_64
- centos8-ppc64le
ispr:
- ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}
exclude:
- ispr: true
target: ubuntu18.04-arm64
- ispr: true
target: ubuntu18.04-ppc64le
- ispr: true
target: centos7-aarch64
- ispr: true
target: centos8-ppc64le
fail-fast: false
steps:
- uses: actions/checkout@v4
name: Check out code
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
with:
image: tonistiigi/binfmt:master
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: build ${{ matrix.target }} packages
run: |
sudo apt-get install -y coreutils build-essential sed git bash make
echo "Building packages"
./scripts/build-packages.sh ${{ matrix.target }}
- name: 'Upload Artifacts'
uses: actions/upload-artifact@v4
with:
compression-level: 0
name: toolkit-container-${{ matrix.target }}-${{ github.run_id }}
path: ${{ github.workspace }}/dist/*
image:
runs-on: ubuntu-latest
strategy:
matrix:
target:
- application
- packaging
needs: packages
steps:
- uses: actions/checkout@v4
name: Check out code
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
with:
image: tonistiigi/binfmt:master
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Get built packages
uses: actions/download-artifact@v4
with:
path: ${{ github.workspace }}/dist/
pattern: toolkit-container-*-${{ github.run_id }}
merge-multiple: true
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build image
env:
IMAGE_NAME: ghcr.io/nvidia/container-toolkit
VERSION: ${{ inputs.version }}
PUSH_ON_BUILD: "true"
BUILD_MULTI_ARCH_IMAGES: ${{ inputs.build_multi_arch_images }}
run: |
echo "${VERSION}"
make -f deployments/container/Makefile build-${{ matrix.target }}

38
.github/workflows/release.yaml vendored Normal file
View File

@@ -0,0 +1,38 @@
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Run this workflow on new tags
name: Release
on:
push:
tags:
- v*
jobs:
release:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
name: Check out code
- name: Prepare Artifacts
run: |
./hack/prepare-artifacts.sh ${{ github.ref_name }}
- name: Create Draft Release
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
./hack/create-release.sh ${{ github.ref_name }}

16
.gitignore vendored
View File

@@ -1,13 +1,11 @@
dist
artifacts
/dist
/artifacts
*.swp
*.swo
/coverage.out*
/test/output/
/nvidia-container-runtime
/nvidia-container-runtime.*
/nvidia-container-runtime-hook
/nvidia-container-toolkit
/nvidia-ctk
/tests/output/
/nvidia-*
/shared-*
/release-*
/release-*
/bin
/toolkit-test

View File

@@ -1,314 +0,0 @@
# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
include:
- .common-ci.yml
build-dev-image:
stage: image
script:
- apk --no-cache add make bash
- make .build-image
- docker login -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}" "${CI_REGISTRY}"
- make .push-build-image
.requires-build-image:
image: "${BUILDIMAGE}"
.go-check:
extends:
- .requires-build-image
stage: go-checks
fmt:
extends:
- .go-check
script:
- make assert-fmt
vet:
extends:
- .go-check
script:
- make vet
lint:
extends:
- .go-check
script:
- make lint
allow_failure: true
ineffassign:
extends:
- .go-check
script:
- make ineffassign
allow_failure: true
misspell:
extends:
- .go-check
script:
- make misspell
go-build:
extends:
- .requires-build-image
stage: go-build
script:
- make build
unit-tests:
extends:
- .requires-build-image
stage: unit-tests
script:
- make coverage
# Define the package build helpers
.multi-arch-build:
before_script:
- apk add --no-cache coreutils build-base sed git bash make
- '[[ -n "${SKIP_QEMU_SETUP}" ]] || docker run --rm --privileged multiarch/qemu-user-static --reset -p yes -c yes'
.package-artifacts:
variables:
ARTIFACTS_NAME: "toolkit-container-${CI_PIPELINE_ID}"
ARTIFACTS_ROOT: "toolkit-container-${CI_PIPELINE_ID}"
DIST_DIR: ${CI_PROJECT_DIR}/${ARTIFACTS_ROOT}
.package-build:
extends:
- .multi-arch-build
- .package-artifacts
stage: package-build
timeout: 3h
script:
- ./scripts/build-packages.sh ${DIST}-${ARCH}
artifacts:
name: ${ARTIFACTS_NAME}
paths:
- ${ARTIFACTS_ROOT}
# Define the package build targets
package-amazonlinux2-aarch64:
extends:
- .package-build
- .dist-amazonlinux2
- .arch-aarch64
package-amazonlinux2-x86_64:
extends:
- .package-build
- .dist-amazonlinux2
- .arch-x86_64
package-centos7-x86_64:
extends:
- .package-build
- .dist-centos7
- .arch-x86_64
package-centos8-aarch64:
extends:
- .package-build
- .dist-centos8
- .arch-aarch64
package-centos8-ppc64le:
extends:
- .package-build
- .dist-centos8
- .arch-ppc64le
package-centos8-x86_64:
extends:
- .package-build
- .dist-centos8
- .arch-x86_64
package-debian10-amd64:
extends:
- .package-build
- .dist-debian10
- .arch-amd64
package-opensuse-leap15.1-x86_64:
extends:
- .package-build
- .dist-opensuse-leap15.1
- .arch-x86_64
package-ubuntu18.04-amd64:
extends:
- .package-build
- .dist-ubuntu18.04
- .arch-amd64
package-ubuntu18.04-arm64:
extends:
- .package-build
- .dist-ubuntu18.04
- .arch-arm64
package-ubuntu18.04-ppc64le:
extends:
- .package-build
- .dist-ubuntu18.04
- .arch-ppc64le
.buildx-setup:
before_script:
- export BUILDX_VERSION=v0.6.3
- apk add --no-cache curl
- mkdir -p ~/.docker/cli-plugins
- curl -sSLo ~/.docker/cli-plugins/docker-buildx "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-amd64"
- chmod a+x ~/.docker/cli-plugins/docker-buildx
- docker buildx create --use --platform=linux/amd64,linux/arm64
- '[[ -n "${SKIP_QEMU_SETUP}" ]] || docker run --rm --privileged multiarch/qemu-user-static --reset -p yes'
# Define the image build targets
.image-build:
stage: image-build
variables:
IMAGE_NAME: "${CI_REGISTRY_IMAGE}/container-toolkit"
VERSION: "${CI_COMMIT_SHORT_SHA}"
PUSH_ON_BUILD: "true"
before_script:
- !reference [.buildx-setup, before_script]
- apk add --no-cache bash make git
- 'echo "Logging in to CI registry ${CI_REGISTRY}"'
- docker login -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}" "${CI_REGISTRY}"
script:
- make -f build/container/Makefile build-${DIST}
image-centos7:
extends:
- .image-build
- .package-artifacts
- .dist-centos7
needs:
- package-centos7-x86_64
image-ubi8:
extends:
- .image-build
- .package-artifacts
- .dist-ubi8
needs:
# Note: The ubi8 image uses the centos8 packages
- package-centos8-aarch64
- package-centos8-x86_64
- package-centos8-ppc64le
image-ubuntu20.04:
extends:
- .image-build
- .package-artifacts
- .dist-ubuntu20.04
needs:
- package-ubuntu18.04-amd64
- package-ubuntu18.04-arm64
- job: package-ubuntu18.04-ppc64le
optional: true
# The DIST=packaging target creates an image containing all built packages
image-packaging:
extends:
- .image-build
- .package-artifacts
- .dist-packaging
needs:
- job: package-centos8-aarch64
- job: package-centos8-x86_64
- job: package-ubuntu18.04-amd64
- job: package-ubuntu18.04-arm64
- job: package-amazonlinux2-aarch64
optional: true
- job: package-amazonlinux2-x86_64
optional: true
- job: package-centos7-x86_64
optional: true
- job: package-centos8-ppc64le
optional: true
- job: package-debian10-amd64
optional: true
- job: package-opensuse-leap15.1-x86_64
optional: true
- job: package-ubuntu18.04-ppc64le
optional: true
# Define publish test helpers
.test:toolkit:
extends:
- .integration
variables:
TEST_CASES: "toolkit"
.test:docker:
extends:
- .integration
variables:
TEST_CASES: "docker"
.test:containerd:
# TODO: The containerd tests fail due to issues with SIGHUP.
# Until this is resolved with retry up to twice and allow failure here.
retry: 2
allow_failure: true
extends:
- .integration
variables:
TEST_CASES: "containerd"
.test:crio:
extends:
- .integration
variables:
TEST_CASES: "crio"
# Define the test targets
test-toolkit-ubuntu20.04:
extends:
- .test:toolkit
- .dist-ubuntu20.04
needs:
- image-ubuntu20.04
test-containerd-ubuntu20.04:
extends:
- .test:containerd
- .dist-ubuntu20.04
needs:
- image-ubuntu20.04
test-crio-ubuntu20.04:
extends:
- .test:crio
- .dist-ubuntu20.04
needs:
- image-ubuntu20.04
test-docker-ubuntu20.04:
extends:
- .test:docker
- .dist-ubuntu20.04
needs:
- image-ubuntu20.04

10
.gitmodules vendored
View File

@@ -1,12 +1,4 @@
[submodule "third_party/libnvidia-container"]
path = third_party/libnvidia-container
url = https://gitlab.com/nvidia/container-toolkit/libnvidia-container.git
branch = main
[submodule "third_party/nvidia-container-runtime"]
path = third_party/nvidia-container-runtime
url = https://gitlab.com/nvidia/container-toolkit/container-runtime.git
branch = main
[submodule "third_party/nvidia-docker"]
path = third_party/nvidia-docker
url = https://gitlab.com/nvidia/container-toolkit/nvidia-docker.git
url = https://github.com/NVIDIA/libnvidia-container.git
branch = main

72
.golangci.yml Normal file
View File

@@ -0,0 +1,72 @@
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
version: "2"
linters:
enable:
- contextcheck
- gocritic
- gosec
- misspell
- unconvert
exclusions:
generated: lax
presets:
- comments
- common-false-positives
- legacy
- std-error-handling
rules:
# Exclude the gocritic dupSubExpr issue for cgo files.
- linters:
- gocritic
path: internal/dxcore/dxcore.go
text: dupSubExpr
# Exclude the checks for usage of returns to config.Delete(Path) in the
# crio and containerd config packages.
- linters:
- errcheck
path: pkg/config/engine/
text: config.Delete
# RENDERD refers to the Render Device and not the past tense of render.
- linters:
- misspell
path: .*.go
text: '`RENDERD` is a misspelling of `RENDERED`'
# The legacy hook relies on spec.Hooks.Prestart, which is deprecated as of
# the v1.2.0 OCI runtime spec.
- path: (.+)\.go$
text: SA1019:(.+).Prestart is deprecated(.+)
# TODO: We should address each of the following integer overflows.
- path: (.+)\.go$
text: 'G115: integer overflow conversion(.+)'
paths:
- third_party$
- builtin$
- examples$
formatters:
enable:
- gofmt
- goimports
settings:
goimports:
local-prefixes:
- github.com/NVIDIA/nvidia-container-toolkit
exclusions:
generated: lax
paths:
- third_party$
- builtin$
- examples$

View File

@@ -33,24 +33,68 @@ variables:
# On the multi-arch builder we don't need the qemu setup.
SKIP_QEMU_SETUP: "1"
# Define the public staging registry
STAGING_REGISTRY: registry.gitlab.com/nvidia/container-toolkit/container-toolkit/staging
STAGING_REGISTRY: ghcr.io/nvidia
STAGING_VERSION: ${CI_COMMIT_SHORT_SHA}
ARTIFACTORY_REPO_BASE: "https://urm.nvidia.com/artifactory/sw-gpu-cloudnative"
KITMAKER_RELEASE_FOLDER: "kitmaker"
PACKAGE_ARCHIVE_RELEASE_FOLDER: "releases"
.image-pull:
stage: image-build
# .copy-images copies the required application and packaging images from the
# IN_IMAGE="${IN_IMAGE_NAME}:${IN_IMAGE_TAG}${TAG_SUFFIX}"
# to
# OUT_IMAGE="${OUT_IMAGE_NAME}:${OUT_IMAGE_TAG}${TAG_SUFFIX}"
# The script also logs into IN_REGISTRY and OUT_REGISTRY using the supplied
# username and tokens.
.copy-images:
parallel:
matrix:
- TAG_SUFFIX: ["", "-packaging"]
before_script:
- !reference [.regctl-setup, before_script]
- apk add --no-cache make bash
variables:
REGCTL: regctl
script:
- |
if [ -n ${IN_REGISTRY} ] && [ -n ${IN_REGISTRY_USER} ]; then
echo "Logging in to ${IN_REGISTRY}"
${REGCTL} registry login "${IN_REGISTRY}" -u "${IN_REGISTRY_USER}" -p "${IN_REGISTRY_TOKEN}" || exit 1
fi
if [ -n ${OUT_REGISTRY} ] && [ -n ${OUT_REGISTRY_USER} ] && [ "${IN_REGISTRY}" != "${OUT_REGISTRY}" ]; then
echo "Logging in to ${OUT_REGISTRY}"
${REGCTL} registry login "${OUT_REGISTRY}" -u "${OUT_REGISTRY_USER}" -p "${OUT_REGISTRY_TOKEN}" || exit 1
fi
export IN_IMAGE="${IN_IMAGE_NAME}:${IN_IMAGE_TAG}${TAG_SUFFIX}"
export OUT_IMAGE="${OUT_IMAGE_NAME}:${OUT_IMAGE_TAG}${TAG_SUFFIX}"
echo "Copying ${IN_IMAGE} to ${OUT_IMAGE}"
${REGCTL} image copy ${IN_IMAGE} ${OUT_IMAGE}
# pull-images pulls images from the public CI registry to the internal CI registry.
pull-images:
extends:
- .copy-images
stage: pull
variables:
IN_REGISTRY: "${STAGING_REGISTRY}"
IN_IMAGE_NAME: container-toolkit
IN_VERSION: "${STAGING_VERSION}"
IN_IMAGE_NAME: ${STAGING_REGISTRY}/container-toolkit
IN_IMAGE_TAG: "${STAGING_VERSION}"
OUT_REGISTRY: "${CI_REGISTRY}"
OUT_REGISTRY_USER: "${CI_REGISTRY_USER}"
OUT_REGISTRY_TOKEN: "${CI_REGISTRY_PASSWORD}"
OUT_REGISTRY: "${CI_REGISTRY}"
OUT_IMAGE_NAME: "${CI_REGISTRY_IMAGE}/container-toolkit"
PUSH_MULTIPLE_TAGS: "false"
OUT_IMAGE_TAG: "${CI_COMMIT_SHORT_SHA}"
# We delay the job start to allow the public pipeline to generate the required images.
rules:
# If the pipeline is triggered from a tag or the WEB UI we don't delay the
# start of the pipeline.
- if: $CI_COMMIT_TAG || $CI_PIPELINE_SOURCE == "web"
# If the pipeline is triggered through other means (i.e. a branch or MR)
# we add a 30 minute delay to ensure that the images are available in the
# public CI registry.
- when: delayed
start_in: 30 minutes
timeout: 30 minutes
@@ -59,35 +103,6 @@ variables:
when:
- job_execution_timeout
- stuck_or_timeout_failure
before_script:
- !reference [.regctl-setup, before_script]
- apk add --no-cache make bash
- >
regctl manifest get ${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST} --list > /dev/null && echo "${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST}" || ( echo "${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST} does not exist" && sleep infinity )
script:
- regctl registry login "${OUT_REGISTRY}" -u "${OUT_REGISTRY_USER}" -p "${OUT_REGISTRY_TOKEN}"
- make -f build/container/Makefile IMAGE=${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST} OUT_IMAGE=${OUT_IMAGE_NAME}:${CI_COMMIT_SHORT_SHA}-${DIST} push-${DIST}
image-centos7:
extends:
- .dist-centos7
- .image-pull
image-ubi8:
extends:
- .dist-ubi8
- .image-pull
image-ubuntu20.04:
extends:
- .dist-ubuntu20.04
- .image-pull
# The DIST=packaging target creates an image containing all built packages
image-packaging:
extends:
- .dist-packaging
- .image-pull
# We skip the integration tests for the internal CI:
.integration:
@@ -99,26 +114,37 @@ image-packaging:
# The .scan step forms the base of the image scan operation performed before releasing
# images.
.scan:
scan-images:
stage: scan
needs:
- pull-images
image: "${PULSE_IMAGE}"
parallel:
matrix:
- TAG_SUFFIX: [""]
PLATFORM: ["linux/amd64", "linux/arm64"]
- TAG_SUFFIX: "-packaging"
PLATFORM: "linux/amd64"
variables:
IMAGE: "${CI_REGISTRY_IMAGE}/container-toolkit:${CI_COMMIT_SHORT_SHA}-${DIST}"
IMAGE_ARCHIVE: "container-toolkit-${DIST}-${ARCH}-${CI_JOB_ID}.tar"
IMAGE: "${CI_REGISTRY_IMAGE}/container-toolkit:${CI_COMMIT_SHORT_SHA}"
IMAGE_ARCHIVE: "container-toolkit-${CI_JOB_ID}.tar"
rules:
- if: $SKIP_SCANS != "yes"
- when: manual
before_script:
- docker login -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}" "${CI_REGISTRY}"
# TODO: We should specify the architecture here and scan all architectures
- docker pull --platform="${PLATFORM}" "${IMAGE}"
- docker save "${IMAGE}" -o "${IMAGE_ARCHIVE}"
- AuthHeader=$(echo -n $SSA_CLIENT_ID:$SSA_CLIENT_SECRET | base64 -w0)
- >
export SSA_TOKEN=$(curl --request POST --header "Authorization: Basic $AuthHeader" --header "Content-Type: application/x-www-form-urlencoded" ${SSA_ISSUER_URL} | jq ".access_token" | tr -d '"')
- if [ -z "$SSA_TOKEN" ]; then exit 1; else echo "SSA_TOKEN set!"; fi
- if: $IGNORE_SCANS == "yes"
allow_failure: true
- when: on_success
script:
- pulse-cli -n $NSPECT_ID --ssa $SSA_TOKEN scan -i $IMAGE_ARCHIVE -p $CONTAINER_POLICY -o
- |
docker login -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}" "${CI_REGISTRY}"
export SCAN_IMAGE=${IMAGE}${TAG_SUFFIX}
echo "Scanning image ${SCAN_IMAGE} for ${PLATFORM}"
docker pull --platform="${PLATFORM}" "${SCAN_IMAGE}"
docker save "${SCAN_IMAGE}" -o "${IMAGE_ARCHIVE}"
AuthHeader=$(echo -n $SSA_CLIENT_ID:$SSA_CLIENT_SECRET | base64 -w0)
export SSA_TOKEN=$(curl --request POST --header "Authorization: Basic $AuthHeader" --header "Content-Type: application/x-www-form-urlencoded" ${SSA_ISSUER_URL} | jq ".access_token" | tr -d '"')
if [ -z "$SSA_TOKEN" ]; then exit 1; else echo "SSA_TOKEN set!"; fi
pulse-cli -n $NSPECT_ID --ssa $SSA_TOKEN scan -i $IMAGE_ARCHIVE -p $CONTAINER_POLICY -o
rm -f "${IMAGE_ARCHIVE}"
artifacts:
when: always
expire_in: 1 week
@@ -129,72 +155,10 @@ image-packaging:
- vulns.json
- policy_evaluation.json
# Define the scan targets
scan-centos7-amd64:
extends:
- .dist-centos7
- .platform-amd64
- .scan
needs:
- image-centos7
scan-centos7-arm64:
extends:
- .dist-centos7
- .platform-arm64
- .scan
needs:
- image-centos7
- scan-centos7-amd64
scan-ubuntu20.04-amd64:
extends:
- .dist-ubuntu20.04
- .platform-amd64
- .scan
needs:
- image-ubuntu20.04
scan-ubuntu20.04-arm64:
extends:
- .dist-ubuntu20.04
- .platform-arm64
- .scan
needs:
- image-ubuntu20.04
- scan-ubuntu20.04-amd64
scan-ubi8-amd64:
extends:
- .dist-ubi8
- .platform-amd64
- .scan
needs:
- image-ubi8
scan-ubi8-arm64:
extends:
- .dist-ubi8
- .platform-arm64
- .scan
needs:
- image-ubi8
- scan-ubi8-amd64
# Define external release helpers
.release:ngc:
extends:
- .release:external
variables:
OUT_REGISTRY_USER: "${NGC_REGISTRY_USER}"
OUT_REGISTRY_TOKEN: "${NGC_REGISTRY_TOKEN}"
OUT_REGISTRY: "${NGC_REGISTRY}"
OUT_IMAGE_NAME: "${NGC_REGISTRY_IMAGE}"
.release:packages:
upload-kitmaker-packages:
stage: release
needs:
- image-packaging
- pull-images
variables:
VERSION: "${CI_COMMIT_SHORT_SHA}"
PACKAGE_REGISTRY: "${CI_REGISTRY}"
@@ -203,39 +167,126 @@ scan-ubi8-arm64:
PACKAGE_IMAGE_NAME: "${CI_REGISTRY_IMAGE}/container-toolkit"
PACKAGE_IMAGE_TAG: "${CI_COMMIT_SHORT_SHA}-packaging"
KITMAKER_ARTIFACTORY_REPO: "${ARTIFACTORY_REPO_BASE}-generic-local/${KITMAKER_RELEASE_FOLDER}"
ARTIFACTS_DIR: "${CI_PROJECT_DIR}/artifacts"
script:
- !reference [.regctl-setup, before_script]
- apk add --no-cache bash git
- regctl registry login "${PACKAGE_REGISTRY}" -u "${PACKAGE_REGISTRY_USER}" -p "${PACKAGE_REGISTRY_TOKEN}"
- ./scripts/extract-packages.sh "${PACKAGE_IMAGE_NAME}:${PACKAGE_IMAGE_TAG}"
# TODO: ./scripts/release-packages-artifactory.sh "${DIST}-${ARCH}" "${PACKAGE_ARTIFACTORY_REPO}"
- ./scripts/release-kitmaker-artifactory.sh "${KITMAKER_ARTIFACTORY_REPO}"
- rm -rf ${ARTIFACTS_DIR}
# Define the package release targets
release:packages:kitmaker:
push-images-to-staging:
extends:
- .release:packages
release:staging-ubuntu20.04:
extends:
- .release:staging
- .dist-ubuntu20.04
- .copy-images
stage: release
needs:
- image-ubuntu20.04
- scan-images
variables:
IN_REGISTRY: "${CI_REGISTRY}"
IN_REGISTRY_USER: "${CI_REGISTRY_USER}"
IN_REGISTRY_TOKEN: "${CI_REGISTRY_PASSWORD}"
IN_IMAGE_NAME: "${CI_REGISTRY_IMAGE}/container-toolkit"
IN_IMAGE_TAG: "${CI_COMMIT_SHORT_SHA}"
# Define the external release targets
# Release to NGC
release:ngc-centos7:
extends:
- .dist-centos7
- .release:ngc
OUT_REGISTRY: "${NGC_REGISTRY}"
OUT_REGISTRY_USER: "${NGC_REGISTRY_USER}"
OUT_REGISTRY_TOKEN: "${NGC_REGISTRY_TOKEN}"
OUT_IMAGE_NAME: "${NGC_STAGING_REGISTRY}/container-toolkit"
OUT_IMAGE_TAG: "${CI_COMMIT_SHORT_SHA}"
release:ngc-ubuntu20.04:
.release-images:
extends:
- .dist-ubuntu20.04
- .release:ngc
- .copy-images
stage: release
needs:
- scan-images
- push-images-to-staging
variables:
IN_REGISTRY: "${CI_REGISTRY}"
IN_REGISTRY_USER: "${CI_REGISTRY_USER}"
IN_REGISTRY_TOKEN: "${CI_REGISTRY_PASSWORD}"
IN_IMAGE_NAME: "${CI_REGISTRY_IMAGE}/container-toolkit"
IN_IMAGE_TAG: "${CI_COMMIT_SHORT_SHA}"
release:ngc-ubi8:
OUT_REGISTRY: "${NGC_REGISTRY}"
OUT_REGISTRY_USER: "${NGC_REGISTRY_USER}"
OUT_REGISTRY_TOKEN: "${NGC_REGISTRY_TOKEN}"
OUT_IMAGE_NAME: "${NGC_REGISTRY_IMAGE}"
OUT_IMAGE_TAG: "${CI_COMMIT_TAG}"
release-images-to-ngc:
extends:
- .dist-ubi8
- .release:ngc
- .release-images
rules:
- if: $CI_COMMIT_TAG
release-images-dummy:
extends:
- .release-images
variables:
REGCTL: "echo [DUMMY] regctl"
rules:
- if: $CI_COMMIT_TAG == null || $CI_COMMIT_TAG == ""
# .sign-images forms the base of the jobs which sign images in the NGC registry.
.sign-images:
stage: sign
image: ubuntu:latest
parallel:
matrix:
- TAG_SUFFIX: ["", "-packaging"]
variables:
IMAGE_NAME: "${NGC_REGISTRY_IMAGE}"
IMAGE_TAG: "${CI_COMMIT_TAG}"
NGC_CLI: "ngc-cli/ngc"
before_script:
- !reference [.ngccli-setup, before_script]
script:
- |
# We ensure that the IMAGE_NAME and IMAGE_TAG is set
echo Image Name: ${IMAGE_NAME} && [[ -n "${IMAGE_NAME}" ]] || exit 1
echo Image Tag: ${IMAGE_TAG} && [[ -n "${IMAGE_TAG}" ]] || exit 1
export IMAGE=${IMAGE_NAME}:${IMAGE_TAG}${TAG_SUFFIX}
echo "Signing the image ${IMAGE}"
${NGC_CLI} registry image publish --source ${IMAGE} ${IMAGE} --public --discoverable --allow-guest --sign --org nvidia
# Define the external image signing steps for NGC
# Download the ngc cli binary for use in the sign steps
.ngccli-setup:
before_script:
- apt-get update && apt-get install -y curl unzip jq
- |
if [ -z "${NGCCLI_VERSION}" ]; then
NGC_VERSION_URL="https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions"
# Extract the latest version from the JSON data using jq
export NGCCLI_VERSION=$(curl -s $NGC_VERSION_URL | jq -r '.recipe.latestVersionIdStr')
fi
echo "NGCCLI_VERSION ${NGCCLI_VERSION}"
- curl -sSLo ngccli_linux.zip https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/${NGCCLI_VERSION}/files/ngccli_linux.zip
- unzip ngccli_linux.zip
- chmod u+x ngc-cli/ngc
sign-ngc-images:
extends:
- .sign-images
needs:
- release-images-to-ngc
rules:
- if: $CI_COMMIT_TAG
variables:
NGC_CLI_API_KEY: "${NGC_REGISTRY_TOKEN}"
retry:
max: 2
sign-images-dummy:
extends:
- .sign-images
needs:
- release-images-dummy
variables:
NGC_CLI: "echo [DUMMY] ngc-cli/ngc"
IMAGE_TAG: "${CI_COMMIT_SHORT_SHA}"
rules:
- if: $CI_COMMIT_TAG == null || $CI_COMMIT_TAG == ""

View File

@@ -1,5 +1,352 @@
# NVIDIA Container Toolkit Changelog
## v1.18.0-rc.1
- Add create-soname-symlinks hook
- Require matching version of libnvidia-container-tools
- Add envvar for libcuda.so parent dir to CDI spec
- Add EnvVar to Discover interface
- Resolve to legacy by default in nvidia-container-runtime-hook
- Default to jit-cdi mode in the nvidia runtime
- Use functional options to construct runtime mode resolver
- Add NVIDIA_CTK_CONFIG_FILE_PATH envvar
- Switch to cuda ubi9 base image
- Use single version tag for image
- BUGFIX: modifier: respect GPU volume-mount device requests
- Ensure consistent sorting of annotation devices
- Extract deb and rpm packages to single image
- Remove docker-run as default runtime candidate
- Return annotation devices from VisibleDevices
- Make CDI device requests consistent with other methods
- Construct container info once
- Add logic to extract annotation device requests to image type
- Add IsPrivileged function to CUDA container type
- Add device IDs to nvcdi.GetSpec API
- Refactor extracting requested devices from the container image
- Add EnvVars option for all nvidia-ctk cdi commands
- Add nvidia-cdi-refresh service
- Add discovery of arch-specific vulkan ICD
- Add disabled-device-node-modification hook to CDI spec
- Add a hook to disable device node creation in a container
- Remove redundant deduplication of search paths for WSL
- Added ability to disable specific (or all) CDI hooks
- Consolidate HookName functionality on internal/discover pkg
- Add envvar to control debug logging in CDI hooks
- Add FeatureFlags to the nvcdi API
- Reenable nvsandboxutils for driver discovery
- Edit discover.mounts to have a deterministic output
- Refactor the way we create CDI Hooks
- Issue warning on unsupported CDI hook
- Run update-ldcache in isolated namespaces
- Add cuda-compat-mode config option
- Fix mode detection on Thor-based systems
- Add rprivate to CDI mount options
- Skip nil discoverers in merge
- bump runc go dep to v1.3.0
- Fix resolution of libs in LDCache on ARM
- Updated .release:staging to stage images in nvstaging
- Refactor toolkit installer
- Allow container runtime executable path to be specified
- Add support for building ubuntu22.04 on arm64
- Fix race condition in mounts cache
- Add support for building ubuntu22.04 on amd64
- Fix update-ldcache arguments
- Remove positional arguments from nvidia-ctk-installer
- Remove deprecated --runtime-args from nvidia-ctk-installer
- Add version info to nvidia-ctk-installer
- Update nvidia-ctk-installer app name to match binary name
- Allow nvidia-ctk config --set to accept comma-separated lists
- Disable enable-cuda-compat hook for management containers
- Allow enable-cuda-compat hook to be disabled in CDI spec generation
- Add disable-cuda-compat-lib-hook feature flag
- Add basic integration tests for forward compat
- Ensure that mode hook is executed last
- Add enable-cuda-compat hook to CDI spec generation
- Add ldconfig hook in legacy mode
- Add enable-cuda-compat hook if required
- Add enable-cuda-compat hook to allow compat libs to be discovered
- Use libcontainer execseal to run ldconfig
- Add ignore-imex-channel-requests feature flag
- Disable nvsandboxutils in nvcdi API
- Allow cdi mode to work with --gpus flag
- Add E2E GitHub Action for Container Toolkit
- Add remote-test option for E2E
- Enable CDI in runtime if CDI_ENABLED is set
- Fix overwriting docker feature flags
- Add option in toolkit container to enable CDI in runtime
- Remove Set from engine config API
- Add EnableCDI() method to engine.Interface
- Add IMEX binaries to CDI discovery
- Rename test folder to tests
- Add allow-cuda-compat-libs-from-container feature flag
- Disable mounting of compat libs from container
- Skip graphics modifier in CSV mode
- Move nvidia-toolkit to nvidia-ctk-installer
- Automated regression testing for the NVIDIA Container Toolkit
- Add support for containerd version 3 config
- Remove watch option from create-dev-char-symlinks
- Add string TOML source
- Improve the implementation for UseLegacyConfig
- Properly pass configSearchPaths to a Driver constructor
- Fix create-device-node test when devices exist
- Add imex mode to CDI spec generation
- Only allow host-relative LDConfig paths
- Fix NVIDIA_IMEX_CHANNELS handling on legacy images
- Fix bug in default config file path
- Fix fsnotify.Remove logic function.
- Force symlink creation in create-symlink hook
### Changes in the Toolkit Container
- Create /work/nvidia-toolkit symlink
- Use Apache license for images
- Switch to golang distroless image
- Switch to cuda ubi9 base image
- Use single version tag for image
- Extract deb and rpm packages to single image
- Bump nvidia/cuda in /deployments/container
- Bump nvidia/cuda in /deployments/container
- Add E2E GitHub Action for Container Toolkit
- Bump nvidia/cuda in /deployments/container
- Move nvidia-toolkit to nvidia-ctk-installer
- Add support for containerd version 3 config
- Improve the implementation for UseLegacyConfig
- Bump nvidia/cuda in /deployments/container
- Add imex mode to CDI spec generation
- Only allow host-relative LDConfig paths
- Fallback to file for runtime config
### Changes in libnvidia-container
- Fix pointer accessing local variable out of scope
- Require version match between libnvidia-container-tools and libnvidia-container1
- Add libnvidia-gpucomp.so to the list of compute libs
- Use VERSION_ prefix for version parts in makefiles
- Add additional logging
- Do not discard container flags when --cuda-compat-mode is not specified
- Remove unneeded --no-cntlibs argument from list command
- Add cuda-compat-mode flag to configure command
- Skip files when user has insufficient permissions
- Fix building with Go 1.24
- Add no-cntlibs CLI option to nvidia-container-cli
- Fix always using fallback
- Add fallback for systems without memfd_create()
- Create virtual copy of host ldconfig binary before calling fexecve()
- Fix some typos in text.
## v1.17.0
- Promote v1.17.0-rc.2 to v1.17.0
- Fix bug when using just-in-time CDI spec generation
- Check for valid paths in create-symlinks hook
## v1.17.0-rc.2
- Fix bug in locating libcuda.so from ldcache
- Fix bug in sorting of symlink chain
- Remove unsupported print-ldcache command
- Remove csv-filename support from create-symlinks
### Changes in the Toolkit Container
- Fallback to `crio-status` if `crio status` does not work when configuring the crio runtime
## v1.17.0-rc.1
- Allow IMEX channels to be requested as volume mounts
- Fix typo in error message
- Add disable-imex-channel-creation feature flag
- Add -z,lazy to LDFLAGS
- Add imex channels to management CDI spec
- Add support to fetch current container runtime config from the command line.
- Add creation of select driver symlinks to CDI spec generation.
- Remove support for config overrides when configuring runtimes.
- Skip explicit creation of libnvidia-allocator.so.1 symlink
- Add vdpau as as a driver library search path.
- Add support for using libnvsandboxutils to generate CDI specifications.
### Changes in the Toolkit Container
- Allow opt-in features to be selected when deploying the toolkit-container.
- Bump CUDA base image version to 12.6.2
- Remove support for config overrides when configuring runtimes.
### Changes in libnvidia-container
- Add no-create-imex-channels command line option.
## v1.16.2
- Exclude libnvidia-allocator from graphics mounts. This fixes a bug that leaks mounts when a container is started with bi-directional mount propagation.
- Use empty string for default runtime-config-override. This removes a redundant warning for runtimes (e.g. Docker) where this is not applicable.
### Changes in the Toolkit Container
- Bump CUDA base image version to 12.6.0
### Changes in libnvidia-container
- Add no-gsp-firmware command line option
- Add no-fabricmanager command line option
- Add no-persistenced command line option
- Skip directories and symlinks when mounting libraries.
## v1.16.1
- Fix bug with processing errors during CDI spec generation for MIG devices
## v1.16.0
- Promote v1.16.0-rc.2 to v1.16.0
### Changes in the Toolkit Container
- Bump CUDA base image version to 12.5.1
## v1.16.0-rc.2
- Use relative path to locate driver libraries
- Add RelativeToRoot function to Driver
- Inject additional libraries for full X11 functionality
- Extract options from default runtime if runc does not exist
- Avoid using map pointers as maps are always passed by reference
- Reduce logging for the NVIDIA Container runtime
- Fix bug in argument parsing for logger creation
## v1.16.0-rc.1
- Support vulkan ICD files directly in a driver root. This allows for the discovery of vulkan files in GKE driver installations.
- Increase priority of ld.so.conf.d config file injected into container. This ensures that injected libraries are preferred over libraries present in the container.
- Set default CDI spec permissions to 644. This fixes permission issues when using the `nvidia-ctk cdi transform` functions.
- Add `dev-root` option to `nvidia-ctk system create-device-nodes` command.
- Fix location of `libnvidia-ml.so.1` when a non-standard driver root is used. This enabled CDI spec generation when using the driver container on a host.
- Recalculate minimum required CDI spec version on save.
- Move `nvidia-ctk hook` commands to a separate `nvidia-cdi-hook` binary. The same subcommands are supported.
- Use `:` as an `nvidia-ctk config --set` list separator. This fixes a bug when trying to set config options that are lists.
- [toolkit-container] Bump CUDA base image version to 12.5.0
- [toolkit-container] Allow the path to `toolkit.pid` to be specified directly.
- [toolkit-container] Remove provenance information from image manifests.
- [toolkit-container] Add `dev-root` option when configuring the toolkit. This adds support for GKE driver installations.
## v1.15.0
* Remove `nvidia-container-runtime` and `nvidia-docker2` packages.
* Use `XDG_DATA_DIRS` environment variable when locating config files such as graphics config files.
* Add support for v0.7.0 Container Device Interface (CDI) specification.
* Add `--config-search-path` option to `nvidia-ctk cdi generate` command. These paths are used when locating driver files such as graphics config files.
* Use D3DKMTEnumAdapters3 to enumerate adpaters on WSL2 if available.
* Add support for v1.2.0 OCI Runtime specification.
* Explicitly set `NVIDIA_VISIBLE_DEVICES=void` in generated CDI specifications. This prevents the NVIDIA Container Runtime from making additional modifications.
* [libnvidia-container] Use D3DKMTEnumAdapters3 to enumerate adpaters on WSL2 if available.
* [toolkit-container] Bump CUDA base image version to 12.4.1
## v1.15.0-rc.4
* Add a `--spec-dir` option to the `nvidia-ctk cdi generate` command. This allows specs outside of `/etc/cdi` and `/var/run/cdi` to be processed.
* Add support for extracting device major number from `/proc/devices` if `nvidia` is used as a device name over `nvidia-frontend`.
* Allow multiple device naming strategies for `nvidia-ctk cdi generate` command. This allows a single
CDI spec to be generated that includes GPUs by index and UUID.
* Set the default `--device-name-strategy` for the `nvidia-ctk cdi generate` command to `[index, uuid]`.
* Remove `libnvidia-container0` jetpack dependency included for legacy Tegra-based systems.
* Add `NVIDIA_VISIBLE_DEVICES=void` to generated CDI specifications.
* [toolkit-container] Remove centos7 image. The ubi8 image can be used on all RPM-based platforms.
* [toolkit-container] Bump CUDA base image version to 12.3.2
## v1.15.0-rc.3
* Fix bug in `nvidia-ctk hook update-ldcache` where default `--ldconfig-path` value was not applied.
## v1.15.0-rc.2
* Extend the `runtime.nvidia.com/gpu` CDI kind to support full-GPUs and MIG devices specified by index or UUID.
* Fix bug when specifying `--dev-root` for Tegra-based systems.
* Log explicitly requested runtime mode.
* Remove package dependency on libseccomp.
* Added detection of libnvdxgdmal.so.1 on WSL2
* Use devRoot to resolve MIG device nodes.
* Fix bug in determining default nvidia-container-runtime.user config value on SUSE-based systems.
* Add `crun` to the list of configured low-level runtimes.
* Added support for `--ldconfig-path` to `nvidia-ctk cdi generate` command.
* Fix `nvidia-ctk runtime configure --cdi.enabled` for Docker.
* Add discovery of the GDRCopy device (`gdrdrv`) if the `NVIDIA_GDRCOPY` environment variable of the container is set to `enabled`
* [toolkit-container] Bump CUDA base image version to 12.3.1.
## v1.15.0-rc.1
* Skip update of ldcache in containers without ldconfig. The .so.SONAME symlinks are still created.
* Normalize ldconfig path on use. This automatically adjust the ldconfig setting applied to ldconfig.real on systems where this exists.
* Include `nvidia/nvoptix.bin` in list of graphics mounts.
* Include `vulkan/icd.d/nvidia_layers.json` in list of graphics mounts.
* Add support for `--library-search-paths` to `nvidia-ctk cdi generate` command.
* Add support for injecting /dev/nvidia-nvswitch* devices if the NVIDIA_NVSWITCH=enabled envvar is specified.
* Added support for `nvidia-ctk runtime configure --enable-cdi` for the `docker` runtime. Note that this requires Docker >= 25.
* Fixed bug in `nvidia-ctk config` command when using `--set`. The types of applied config options are now applied correctly.
* Add `--relative-to` option to `nvidia-ctk transform root` command. This controls whether the root transformation is applied to host or container paths.
* Added automatic CDI spec generation when the `runtime.nvidia.com/gpu=all` device is requested by a container.
* [libnvidia-container] Fix device permission check when using cgroupv2 (fixes #227)
## v1.14.3
* [toolkit-container] Bump CUDA base image version to 12.2.2.
## v1.14.2
* Fix bug on Tegra-based systems where symlinks were not created in containers.
* Add --csv.ignore-pattern command line option to nvidia-ctk cdi generate command.
## v1.14.1
* Fixed bug where contents of `/etc/nvidia-container-runtime/config.toml` is ignored by the NVIDIA Container Runtime Hook.
* [libnvidia-container] Use libelf.so on RPM-based systems due to removed mageia repositories hosting pmake and bmake.
## v1.14.0
* Promote v1.14.0-rc.3 to v1.14.0
## v1.14.0-rc.3
* Added support for generating OCI hook JSON file to `nvidia-ctk runtime configure` command.
* Remove installation of OCI hook JSON from RPM package.
* Refactored config for `nvidia-container-runtime-hook`.
* Added a `nvidia-ctk config` command which supports setting config options using a `--set` flag.
* Added `--library-search-path` option to `nvidia-ctk cdi generate` command in `csv` mode. This allows folders where
libraries are located to be specified explicitly.
* Updated go-nvlib to support devices which are not present in the PCI device database. This allows the creation of dev/char symlinks on systems with such devices installed.
* Added `UsesNVGPUModule` info function for more robust platform detection. This is required on Tegra-based systems where libnvidia-ml.so is also supported.
* [toolkit-container] Set `NVIDIA_VISIBLE_DEVICES=void` to prevent injection of NVIDIA devices and drivers into the NVIDIA Container Toolkit container.
## v1.14.0-rc.2
* Fix bug causing incorrect nvidia-smi symlink to be created on WSL2 systems with multiple driver roots.
* Remove dependency on coreutils when installing package on RPM-based systems.
* Create output folders if required when running `nvidia-ctk runtime configure`
* Generate default config as post-install step.
* Added support for detecting GSP firmware at custom paths when generating CDI specifications.
* Added logic to skip the extraction of image requirements if `NVIDIA_DISABLE_REQUIRES` is set to `true`.
* [libnvidia-container] Include Shared Compiler Library (libnvidia-gpucomp.so) in the list of compute libaries.
* [toolkit-container] Ensure that common envvars have higher priority when configuring the container engines.
* [toolkit-container] Bump CUDA base image version to 12.2.0.
* [toolkit-container] Remove installation of nvidia-experimental runtime. This is superceded by the NVIDIA Container Runtime in CDI mode.
## v1.14.0-rc.1
* Add support for updating containerd configs to the `nvidia-ctk runtime configure` command.
* Create file in `etc/ld.so.conf.d` with permissions `644` to support non-root containers.
* Generate CDI specification files with `644` permissions to allow rootless applications (e.g. podman)
* Add `nvidia-ctk cdi list` command to show the known CDI devices.
* Add support for generating merged devices (e.g. `all` device) to the nvcdi API.
* Use *.* pattern to locate libcuda.so when generating a CDI specification to support platforms where a patch version is not specified.
* Update go-nvlib to skip devices that are not MIG capable when generating CDI specifications.
* Add `nvidia-container-runtime-hook.path` config option to specify NVIDIA Container Runtime Hook path explicitly.
* Fix bug in creation of `/dev/char` symlinks by failing operation if kernel modules are not loaded.
* Add option to load kernel modules when creating device nodes
* Add option to create device nodes when creating `/dev/char` symlinks
* [libnvidia-container] Support OpenSSL 3 with the Encrypt/Decrypt library
* [toolkit-container] Allow same envars for all runtime configs
## v1.13.1
* Update `update-ldcache` hook to only update ldcache if it exists.
* Update `update-ldcache` hook to create `/etc/ld.so.conf.d` folder if it doesn't exist.
* Fix failure when libcuda cannot be located during XOrg library discovery.
* Fix CDI spec generation on systems that use `/etc/alternatives` (e.g. Debian)
## v1.13.0
* Promote 1.13.0-rc.3 to 1.13.0
## v1.13.0-rc.3
* Only initialize NVML for modes that require it when runing `nvidia-ctk cdi generate`.
@@ -9,12 +356,18 @@
* Generate a simplified CDI specification by default. This means that entities in the common edits in a spec are not included in device definitions.
* Also return an error from the nvcdi.New constructor instead of panicing.
* Detect XOrg libraries for injection and CDI spec generation.
* Add `nvidia-container-runtime.modes.cdi.annotation-prefixes` config option that allows the CDI annotation prefixes that are read to be overridden.
* Add `nvidia-ctk system create-device-nodes` command to create control devices.
* Add `nvidia-ctk cdi transform` command to apply transforms to CDI specifications.
* Add `--vendor` and `--class` options to `nvidia-ctk cdi generate`
* [libnvidia-container] Fix segmentation fault when RPC initialization fails.
* [libnvidia-container] Build centos variants of the NVIDIA Container Library with static libtirpc v1.3.2.
* [libnvidia-container] Remove make targets for fedora35 as the centos8 packages are compatible.
* [toolkit-container] Add `nvidia-container-runtime.modes.cdi.annotation-prefixes` config option that allows the CDI annotation prefixes that are read to be overridden.
* [toolkit-container] Create device nodes when generating CDI specification for management containers.
* [toolkit-container] Add `nvidia-container-runtime.runtimes` config option to set the low-level runtime for the NVIDIA Container Runtime
## v1.13.0-rc.2
* Don't fail chmod hook if paths are not injected

View File

@@ -19,7 +19,7 @@ where `TARGET` is a make target that is valid for each of the sub-components.
These include:
* `ubuntu18.04-amd64`
* `centos8-x86_64`
* `centos7-x86_64`
If no `TARGET` is specified, all valid release targets are built.
@@ -34,7 +34,7 @@ environment variables.
## Testing packages locally
The [test/release](./test/release/) folder contains documentation on how the installation of local or staged packages can be tested.
The [tests/release](./tests/release/) folder contains documentation on how the installation of local or staged packages can be tested.
## Releasing

142
Jenkinsfile vendored
View File

@@ -1,142 +0,0 @@
/*
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
podTemplate (cloud:'sw-gpu-cloudnative',
containers: [
containerTemplate(name: 'docker', image: 'docker:dind', ttyEnabled: true, privileged: true),
containerTemplate(name: 'golang', image: 'golang:1.16.3', ttyEnabled: true)
]) {
node(POD_LABEL) {
def scmInfo
stage('checkout') {
scmInfo = checkout(scm)
}
stage('dependencies') {
container('golang') {
sh 'GO111MODULE=off go get -u github.com/client9/misspell/cmd/misspell'
sh 'GO111MODULE=off go get -u github.com/gordonklaus/ineffassign'
sh 'GO111MODULE=off go get -u golang.org/x/lint/golint'
}
container('docker') {
sh 'apk add --no-cache make bash git'
}
}
stage('check') {
parallel (
getGolangStages(["assert-fmt", "lint", "vet", "ineffassign", "misspell"])
)
}
stage('test') {
parallel (
getGolangStages(["test"])
)
}
def versionInfo
stage('version') {
container('docker') {
versionInfo = getVersionInfo(scmInfo)
println "versionInfo=${versionInfo}"
}
}
def dist = 'ubuntu20.04'
def arch = 'amd64'
def stageLabel = "${dist}-${arch}"
stage('build-one') {
container('docker') {
stage (stageLabel) {
sh "make ${dist}-${arch}"
}
}
}
stage('release') {
container('docker') {
stage (stageLabel) {
def component = 'main'
def repository = 'sw-gpu-cloudnative-debian-local/pool/main/'
def uploadSpec = """{
"files":
[ {
"pattern": "./dist/${dist}/${arch}/*.deb",
"target": "${repository}",
"props": "deb.distribution=${dist};deb.component=${component};deb.architecture=${arch}"
}
]
}"""
sh "echo starting release with versionInfo=${versionInfo}"
if (versionInfo.isTag) {
// upload to artifactory repository
def server = Artifactory.server 'sw-gpu-artifactory'
server.upload spec: uploadSpec
} else {
sh "echo skipping release for non-tagged build"
}
}
}
}
}
}
def getGolangStages(def targets) {
stages = [:]
for (t in targets) {
stages[t] = getLintClosure(t)
}
return stages
}
def getLintClosure(def target) {
return {
container('golang') {
stage(target) {
sh "make ${target}"
}
}
}
}
// getVersionInfo returns a hash of version info
def getVersionInfo(def scmInfo) {
def versionInfo = [
isTag: isTag(scmInfo.GIT_BRANCH)
]
scmInfo.each { k, v -> versionInfo[k] = v }
return versionInfo
}
def isTag(def branch) {
if (!branch.startsWith('v')) {
return false
}
def version = shOutput('git describe --all --exact-match --always')
return version == "tags/${branch}"
}
def shOuptut(def script) {
return sh(script: script, returnStdout: true).trim()
}

116
Makefile
View File

@@ -38,8 +38,8 @@ EXAMPLE_TARGETS := $(patsubst %,example-%, $(EXAMPLES))
CMDS := $(patsubst ./cmd/%/,%,$(sort $(dir $(wildcard ./cmd/*/))))
CMD_TARGETS := $(patsubst %,cmd-%, $(CMDS))
CHECK_TARGETS := assert-fmt vet lint ineffassign misspell
MAKE_TARGETS := binaries build check fmt lint-internal test examples cmds coverage generate licenses $(CHECK_TARGETS)
CHECK_TARGETS := lint
MAKE_TARGETS := binaries build check fmt test examples cmds coverage generate licenses vendor check-vendor $(CHECK_TARGETS)
TARGETS := $(MAKE_TARGETS) $(EXAMPLE_TARGETS) $(CMD_TARGETS)
@@ -53,22 +53,26 @@ CLI_VERSION = $(VERSION)
endif
CLI_VERSION_PACKAGE = github.com/NVIDIA/nvidia-container-toolkit/internal/info
GOOS ?= linux
binaries: cmds
ifneq ($(PREFIX),)
cmd-%: COMMAND_BUILD_OPTIONS = -o $(PREFIX)/$(*)
endif
cmds: $(CMD_TARGETS)
ifneq ($(shell uname),Darwin)
EXTLDFLAGS = -Wl,--export-dynamic -Wl,--unresolved-symbols=ignore-in-object-files -Wl,-z,lazy
else
EXTLDFLAGS = -Wl,-undefined,dynamic_lookup
endif
$(CMD_TARGETS): cmd-%:
GOOS=$(GOOS) go build -ldflags "-extldflags=-Wl,-z,lazy -s -w -X $(CLI_VERSION_PACKAGE).gitCommit=$(GIT_COMMIT) -X $(CLI_VERSION_PACKAGE).version=$(CLI_VERSION)" $(COMMAND_BUILD_OPTIONS) $(MODULE)/cmd/$(*)
go build -ldflags "-s -w '-extldflags=$(EXTLDFLAGS)' -X $(CLI_VERSION_PACKAGE).gitCommit=$(GIT_COMMIT) -X $(CLI_VERSION_PACKAGE).version=$(CLI_VERSION)" $(COMMAND_BUILD_OPTIONS) $(MODULE)/cmd/$(*)
build:
GOOS=$(GOOS) go build ./...
go build ./...
examples: $(EXAMPLE_TARGETS)
$(EXAMPLE_TARGETS): example-%:
GOOS=$(GOOS) go build ./examples/$(*)
go build ./examples/$(*)
all: check test build binary
check: $(CHECK_TARGETS)
@@ -78,71 +82,75 @@ fmt:
go list -f '{{.Dir}}' $(MODULE)/... \
| xargs gofmt -s -l -w
assert-fmt:
go list -f '{{.Dir}}' $(MODULE)/... \
| xargs gofmt -s -l > fmt.out
@if [ -s fmt.out ]; then \
echo "\nERROR: The following files are not formatted:\n"; \
cat fmt.out; \
rm fmt.out; \
exit 1; \
else \
rm fmt.out; \
fi
ineffassign:
ineffassign $(MODULE)/...
# Apply goimports -local github.com/NVIDIA/container-toolkit to the codebase
goimports:
go list -f {{.Dir}} $(MODULE)/... \
| xargs goimports -local $(MODULE) -w
lint:
# We use `go list -f '{{.Dir}}' $(MODULE)/...` to skip the `vendor` folder.
go list -f '{{.Dir}}' $(MODULE)/... | xargs golint -set_exit_status
golangci-lint run ./...
misspell:
misspell $(MODULE)/...
vendor: | mod-tidy mod-vendor mod-verify
vet:
go vet $(MODULE)/...
mod-tidy:
@for mod in $$(find . -name go.mod -not -path "./testdata/*" -not -path "./third_party/*"); do \
echo "Tidying $$mod..."; ( \
cd $$(dirname $$mod) && go mod tidy \
) || exit 1; \
done
mod-vendor:
@for mod in $$(find . -name go.mod -not -path "./testdata/*" -not -path "./third_party/*" -not -path "./deployments/*"); do \
echo "Vendoring $$mod..."; ( \
cd $$(dirname $$mod) && go mod vendor \
) || exit 1; \
done
mod-verify:
@for mod in $$(find . -name go.mod -not -path "./testdata/*" -not -path "./third_party/*"); do \
echo "Verifying $$mod..."; ( \
cd $$(dirname $$mod) && go mod verify | sed 's/^/ /g' \
) || exit 1; \
done
check-vendor: vendor
git diff --exit-code HEAD -- go.mod go.sum vendor
licenses:
go-licenses csv $(MODULE)/...
COVERAGE_FILE := coverage.out
test: build cmds
go test -v -coverprofile=$(COVERAGE_FILE) $(MODULE)/...
go test -coverprofile=$(COVERAGE_FILE).with-mocks $(MODULE)/...
coverage: test
cat $(COVERAGE_FILE) | grep -v "_mock.go" > $(COVERAGE_FILE).no-mocks
go tool cover -func=$(COVERAGE_FILE).no-mocks
cat $(COVERAGE_FILE).with-mocks | grep -v "_mock.go" > $(COVERAGE_FILE)
go tool cover -func=$(COVERAGE_FILE)
generate:
go generate $(MODULE)/...
# Generate an image for containerized builds
# Note: This image is local only
.PHONY: .build-image .pull-build-image .push-build-image
.build-image: docker/Dockerfile.devel
if [ x"$(SKIP_IMAGE_BUILD)" = x"" ]; then \
$(DOCKER) build \
--progress=plain \
--build-arg GOLANG_VERSION="$(GOLANG_VERSION)" \
--tag $(BUILDIMAGE) \
-f $(^) \
docker; \
fi
.PHONY: .build-image
.build-image:
make -f deployments/devel/Makefile .build-image
.pull-build-image:
$(DOCKER) pull $(BUILDIMAGE)
ifeq ($(BUILD_DEVEL_IMAGE),yes)
$(DOCKER_TARGETS): .build-image
.shell: .build-image
endif
.push-build-image:
$(DOCKER) push $(BUILDIMAGE)
$(DOCKER_TARGETS): docker-%: .build-image
@echo "Running 'make $(*)' in docker container $(BUILDIMAGE)"
$(DOCKER_TARGETS): docker-%:
@echo "Running 'make $(*)' in container image $(BUILDIMAGE)"
$(DOCKER) run \
--rm \
-e GOCACHE=/tmp/.cache \
-v $(PWD):$(PWD) \
-w $(PWD) \
-e GOCACHE=/tmp/.cache/go \
-e GOMODCACHE=/tmp/.cache/gomod \
-e GOLANGCI_LINT_CACHE=/tmp/.cache/golangci-lint \
-v $(PWD):/work \
-w /work \
--user $$(id -u):$$(id -g) \
$(BUILDIMAGE) \
make $(*)
@@ -153,8 +161,10 @@ PHONY: .shell
$(DOCKER) run \
--rm \
-ti \
-e GOCACHE=/tmp/.cache \
-v $(PWD):$(PWD) \
-w $(PWD) \
-e GOCACHE=/tmp/.cache/go \
-e GOMODCACHE=/tmp/.cache/gomod \
-e GOLANGCI_LINT_CACHE=/tmp/.cache/golangci-lint \
-v $(PWD):/work \
-w /work \
--user $$(id -u):$$(id -g) \
$(BUILDIMAGE)

View File

@@ -28,4 +28,4 @@ The [user guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolk
[Checkout the Contributing document!](CONTRIBUTING.md)
* Please let us know by [filing a new issue](https://github.com/NVIDIA/nvidia-container-toolkit/issues/new)
* You can contribute by creating a [merge request](https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/merge_requests/new) to our public GitLab repository
* You can contribute by creating a [pull request](https://github.com/NVIDIA/nvidia-container-toolkit/compare) to our public GitHub repository

36
RELEASE.md Normal file
View File

@@ -0,0 +1,36 @@
# Release Process
The NVIDIA Container Toolkit consists of the following artifacts:
- The NVIDIA Container Toolkit container
- Packages for debian-based systems
- Packages for rpm-based systems
# Release Process Checklist:
- [ ] Create a release PR:
- [ ] Run the `./hack/prepare-release.sh` script to update the version in all the needed files. This also creates a [release issue](https://github.com/NVIDIA/cloud-native-team/issues?q=is%3Aissue+is%3Aopen+label%3Arelease)
- [ ] Run the `./hack/generate-changelog.sh` script to generate the a draft changelog and update `CHANGELOG.md` with the changes.
- [ ] Create a PR from the created `bump-release-{{ .VERSION }}` branch.
- [ ] Merge the release PR
- [ ] Tag the release and push the tag to the `internal` mirror:
- [ ] Image release pipeline: https://gitlab-master.nvidia.com/dl/container-dev/container-toolkit/-/pipelines/16466098
- [ ] Wait for the image release to complete.
- [ ] Push the tag to the the upstream GitHub repo.
- [ ] Wait for the [`Release`](https://github.com/NVIDIA/k8s-device-plugin/actions/workflows/release.yaml) GitHub Action to complete
- [ ] Publish the [draft release](https://github.com/NVIDIA/k8s-device-plugin/releases) created by the GitHub Action
- [ ] Publish the packages to the gh-pages branch of the libnvidia-container repo
- [ ] Create a KitPick
## Troubleshooting
*Note*: This assumes that we have the release tag checked out locally.
- If the `Release` GitHub Action fails:
- Check the logs for the error first.
- Create the helm packages locally by running:
```bash
./hack/prepare-artifacts.sh {{ .VERSION }}
```
- Create the draft release by running:
```bash
./hack/create-release.sh {{ .VERSION }}
```

24
SECURITY.md Normal file
View File

@@ -0,0 +1,24 @@
# Security
NVIDIA is dedicated to the security and trust of our software products and services, including all source code repositories managed through our organization.
If you need to report a security issue, please use the appropriate contact points outlined below. **Please do not report security vulnerabilities through GitHub.**
## Reporting Potential Security Vulnerability in an NVIDIA Product
To report a potential security vulnerability in any NVIDIA product:
- Web: [Security Vulnerability Submission Form](https://www.nvidia.com/object/submit-security-vulnerability.html)
- E-Mail: psirt@nvidia.com
- We encourage you to use the following PGP key for secure email communication: [NVIDIA public PGP Key for communication](https://www.nvidia.com/en-us/security/pgp-key)
- Please include the following information:
- Product/Driver name and version/branch that contains the vulnerability
- Type of vulnerability (code execution, denial of service, buffer overflow, etc.)
- Instructions to reproduce the vulnerability
- Proof-of-concept or exploit code
- Potential impact of the vulnerability, including how an attacker could exploit the vulnerability
While NVIDIA currently does not have a bug bounty program, we do offer acknowledgement when an externally reported security issue is addressed under our coordinated vulnerability disclosure policy. Please visit our [Product Security Incident Response Team (PSIRT)](https://www.nvidia.com/en-us/security/psirt-policies/) policies page for more information.
## NVIDIA Product Security
For all security-related concerns, please visit NVIDIA's Product Security portal at https://www.nvidia.com/en-us/security

View File

@@ -1,97 +0,0 @@
# Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
ARG BASE_DIST
ARG CUDA_VERSION
ARG GOLANG_VERSION=x.x.x
ARG VERSION="N/A"
# NOTE: In cases where the libc version is a concern, we would have to use an
# image based on the target OS to build the golang executables here -- especially
# if cgo code is included.
FROM golang:${GOLANG_VERSION} as build
# We override the GOPATH to ensure that the binaries are installed to
# /artifacts/bin
ARG GOPATH=/artifacts
# Install the experiemental nvidia-container-runtime
# NOTE: This will be integrated into the nvidia-container-toolkit package / repo
ARG NVIDIA_CONTAINER_RUNTIME_EXPERIMENTAL_VERSION=experimental
RUN GOPATH=/artifacts go install github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime.experimental@${NVIDIA_CONTAINER_RUNTIME_EXPERIMENTAL_VERSION}
WORKDIR /build
COPY . .
# NOTE: Until the config utilities are properly integrated into the
# nvidia-container-toolkit repository, these are built from the `tools` folder
# and not `cmd`.
RUN GOPATH=/artifacts go install -ldflags="-s -w -X 'main.Version=${VERSION}'" ./tools/...
FROM nvidia/cuda:${CUDA_VERSION}-base-${BASE_DIST}
ARG BASE_DIST
# See https://www.centos.org/centos-linux-eol/
# and https://stackoverflow.com/a/70930049 for move to vault.centos.org
# and https://serverfault.com/questions/1093922/failing-to-run-yum-update-in-centos-8 for move to vault.epel.cloud
RUN [[ "${BASE_DIST}" != "centos8" ]] || \
( \
sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-Linux-* && \
sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.epel.cloud|g' /etc/yum.repos.d/CentOS-Linux-* \
)
ENV NVIDIA_DISABLE_REQUIRE="true"
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=utility
ARG ARTIFACTS_ROOT
ARG PACKAGE_DIST
COPY ${ARTIFACTS_ROOT}/${PACKAGE_DIST} /artifacts/packages/${PACKAGE_DIST}
WORKDIR /artifacts/packages
ARG PACKAGE_VERSION
ARG TARGETARCH
ENV PACKAGE_ARCH ${TARGETARCH}
RUN PACKAGE_ARCH=${PACKAGE_ARCH/amd64/x86_64} && PACKAGE_ARCH=${PACKAGE_ARCH/arm64/aarch64} && \
yum localinstall -y \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container1-1.*.rpm \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container-tools-1.*.rpm \
${PACKAGE_DIST}/${PACKAGE_ARCH}/nvidia-container-toolkit*-${PACKAGE_VERSION}*.rpm
WORKDIR /work
COPY --from=build /artifacts/bin /work
ENV PATH=/work:$PATH
LABEL io.k8s.display-name="NVIDIA Container Runtime Config"
LABEL name="NVIDIA Container Runtime Config"
LABEL vendor="NVIDIA"
LABEL version="${VERSION}"
LABEL release="N/A"
LABEL summary="Automatically Configure your Container Runtime for GPU support."
LABEL description="See summary"
RUN mkdir /licenses && mv /NGC-DL-CONTAINER-LICENSE /licenses/NGC-DL-CONTAINER-LICENSE
# Install / upgrade packages here that are required to resolve CVEs
ARG CVE_UPDATES
RUN if [ -n "${CVE_UPDATES}" ]; then \
yum update -y ${CVE_UPDATES} && \
rm -rf /var/cache/yum/*; \
fi
ENTRYPOINT ["/work/nvidia-toolkit"]

View File

@@ -1,41 +0,0 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
ARG BASE_DIST
ARG CUDA_VERSION
ARG GOLANG_VERSION=x.x.x
FROM nvidia/cuda:${CUDA_VERSION}-base-${BASE_DIST}
ARG ARTIFACTS_ROOT
COPY ${ARTIFACTS_ROOT} /artifacts/packages/
WORKDIR /artifacts/packages
# build-args are added to the manifest.txt file below.
ARG BASE_DIST
ARG PACKAGE_DIST
ARG PACKAGE_VERSION
ARG GIT_BRANCH
ARG GIT_COMMIT
ARG GIT_COMMIT_SHORT
ARG SOURCE_DATE_EPOCH
ARG VERSION
# Create a manifest.txt file with the absolute paths of all deb and rpm packages in the container
RUN echo "#IMAGE_EPOCH=$(date '+%s')" > /artifacts/manifest.txt && \
env | sed 's/^/#/g' >> /artifacts/manifest.txt && \
find /artifacts/packages -iname '*.deb' -o -iname '*.rpm' >> /artifacts/manifest.txt
RUN mkdir /licenses && mv /NGC-DL-CONTAINER-LICENSE /licenses/NGC-DL-CONTAINER-LICENSE

View File

@@ -1,105 +0,0 @@
# Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
ARG BASE_DIST
ARG CUDA_VERSION
ARG GOLANG_VERSION=x.x.x
ARG VERSION="N/A"
# NOTE: In cases where the libc version is a concern, we would have to use an
# image based on the target OS to build the golang executables here -- especially
# if cgo code is included.
FROM golang:${GOLANG_VERSION} as build
# We override the GOPATH to ensure that the binaries are installed to
# /artifacts/bin
ARG GOPATH=/artifacts
# Install the experiemental nvidia-container-runtime
# NOTE: This will be integrated into the nvidia-container-toolkit package / repo
ARG NVIDIA_CONTAINER_RUNTIME_EXPERIMENTAL_VERSION=experimental
RUN GOPATH=/artifacts go install github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime.experimental@${NVIDIA_CONTAINER_RUNTIME_EXPERIMENTAL_VERSION}
WORKDIR /build
COPY . .
# NOTE: Until the config utilities are properly integrated into the
# nvidia-container-toolkit repository, these are built from the `tools` folder
# and not `cmd`.
RUN GOPATH=/artifacts go install -ldflags="-s -w -X 'main.Version=${VERSION}'" ./tools/...
FROM nvcr.io/nvidia/cuda:${CUDA_VERSION}-base-${BASE_DIST}
# Remove the CUDA repository configurations to avoid issues with rotated GPG keys
RUN rm -f /etc/apt/sources.list.d/cuda.list
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
libcap2 \
curl \
&& \
rm -rf /var/lib/apt/lists/*
ENV NVIDIA_DISABLE_REQUIRE="true"
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=utility
ARG ARTIFACTS_ROOT
ARG PACKAGE_DIST
COPY ${ARTIFACTS_ROOT}/${PACKAGE_DIST} /artifacts/packages/${PACKAGE_DIST}
WORKDIR /artifacts/packages
ARG PACKAGE_VERSION
ARG TARGETARCH
ENV PACKAGE_ARCH ${TARGETARCH}
ARG LIBNVIDIA_CONTAINER_REPO="https://nvidia.github.io/libnvidia-container"
ARG LIBNVIDIA_CONTAINER0_VERSION
RUN if [ "${PACKAGE_ARCH}" = "arm64" ]; then \
curl -L ${LIBNVIDIA_CONTAINER_REPO}/${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container0_${LIBNVIDIA_CONTAINER0_VERSION}_${PACKAGE_ARCH}.deb \
--output ${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container0_${LIBNVIDIA_CONTAINER0_VERSION}_${PACKAGE_ARCH}.deb && \
dpkg -i ${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container0_${LIBNVIDIA_CONTAINER0_VERSION}_${PACKAGE_ARCH}.deb; \
fi
RUN dpkg -i \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container1_1.*.deb \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container-tools_1.*.deb \
${PACKAGE_DIST}/${PACKAGE_ARCH}/nvidia-container-toolkit*_${PACKAGE_VERSION}*.deb
WORKDIR /work
COPY --from=build /artifacts/bin /work/
ENV PATH=/work:$PATH
LABEL io.k8s.display-name="NVIDIA Container Runtime Config"
LABEL name="NVIDIA Container Runtime Config"
LABEL vendor="NVIDIA"
LABEL version="${VERSION}"
LABEL release="N/A"
LABEL summary="Automatically Configure your Container Runtime for GPU support."
LABEL description="See summary"
RUN mkdir /licenses && mv /NGC-DL-CONTAINER-LICENSE /licenses/NGC-DL-CONTAINER-LICENSE
# Install / upgrade packages here that are required to resolve CVEs
ARG CVE_UPDATES
RUN if [ -n "${CVE_UPDATES}" ]; then \
apt-get update && apt-get upgrade -y ${CVE_UPDATES} && \
rm -rf /var/lib/apt/lists/*; \
fi
ENTRYPOINT ["/work/nvidia-toolkit"]

View File

@@ -0,0 +1,31 @@
# NVIDIA CDI Hook
The CLI `nvidia-cdi-hook` provides container device runtime hook capabilities when
called by a container runtime, as specific in a
[Container Device Interface](https://tags.cncf.io/container-device-interface/blob/main/SPEC.md)
file.
## Generating a CDI
The CDI itself is created for an NVIDIA-capable device using the
[`nvidia-ctk cdi generate`](../nvidia-ctk/) command.
When `nvidia-ctk cdi generate` is run, the CDI specification is generated as a yaml file.
The CDI specification provides instructions for a container runtime to set up devices, files and
other resources for the container prior to starting it. Those instructions
may include executing command-line tools to prepare the filesystem. The execution
of such command-line tools is called a hook.
`nvidia-cdi-hook` is the CLI tool that is expected to be called by the container runtime,
when specified by the CDI file.
See the [`nvidia-ctk` documentation](../nvidia-ctk/README.md) for more information
on generating a CDI file.
## Functionality
The `nvidia-cdi-hook` CLI provides the following functionality:
* `chmod` - Change the permissions of a file or directory inside the directory path to be mounted into a container.
* `create-symlinks` - Create symlinks inside the directory path to be mounted into a container.
* `update-ldcache` - Update the dynamic linker cache inside the directory path to be mounted into a container.

View File

@@ -17,30 +17,33 @@
package chmod
import (
"errors"
"fmt"
"io/fs"
"os"
"path/filepath"
"strconv"
"strings"
"syscall"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
)
type command struct {
logger *logrus.Logger
logger logger.Interface
}
type config struct {
paths cli.StringSlice
mode string
modeStr string
mode fs.FileMode
containerSpec string
}
// NewCommand constructs a chmod command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
@@ -66,13 +69,13 @@ func (m command) build() *cli.Command {
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "path",
Usage: "Specifiy a path to apply the specified mode to",
Usage: "Specify a path to apply the specified mode to",
Destination: &cfg.paths,
},
&cli.StringFlag{
Name: "mode",
Usage: "Specify the file mode",
Destination: &cfg.mode,
Destination: &cfg.modeStr,
},
&cli.StringFlag{
Name: "container-spec",
@@ -85,10 +88,16 @@ func (m command) build() *cli.Command {
}
func validateFlags(c *cli.Context, cfg *config) error {
if strings.TrimSpace(cfg.mode) == "" {
if strings.TrimSpace(cfg.modeStr) == "" {
return fmt.Errorf("a non-empty mode must be specified")
}
modeInt, err := strconv.ParseUint(cfg.modeStr, 8, 32)
if err != nil {
return fmt.Errorf("failed to parse mode as octal: %v", err)
}
cfg.mode = fs.FileMode(modeInt)
for _, p := range cfg.paths.Value() {
if strings.TrimSpace(p) == "" {
return fmt.Errorf("paths must not be empty")
@@ -112,33 +121,38 @@ func (m command) run(c *cli.Context, cfg *config) error {
return fmt.Errorf("empty container root detected")
}
paths := m.getPaths(containerRoot, cfg.paths.Value())
paths := m.getPaths(containerRoot, cfg.paths.Value(), cfg.mode)
if len(paths) == 0 {
m.logger.Debugf("No paths specified; exiting")
return nil
}
locator := lookup.NewExecutableLocator(m.logger, "")
targets, err := locator.Locate("chmod")
if err != nil {
return fmt.Errorf("failed to locate chmod: %v", err)
for _, path := range paths {
err = os.Chmod(path, cfg.mode)
// in some cases this is not an issue (e.g. whole /dev mounted), see #143
if errors.Is(err, fs.ErrPermission) {
m.logger.Debugf("Ignoring permission error with chmod: %v", err)
err = nil
}
}
chmodPath := targets[0]
args := append([]string{filepath.Base(chmodPath), cfg.mode}, paths...)
return syscall.Exec(chmodPath, args, nil)
return err
}
// getPaths updates the specified paths relative to the root.
func (m command) getPaths(root string, paths []string) []string {
func (m command) getPaths(root string, paths []string, desiredMode fs.FileMode) []string {
var pathsInRoot []string
for _, f := range paths {
path := filepath.Join(root, f)
if _, err := os.Stat(path); err != nil {
stat, err := os.Stat(path)
if err != nil {
m.logger.Debugf("Skipping path %q: %v", path, err)
continue
}
if (stat.Mode()&(fs.ModePerm|fs.ModeSetuid|fs.ModeSetgid|fs.ModeSticky))^desiredMode == 0 {
m.logger.Debugf("Skipping path %q: already desired mode", path)
continue
}
pathsInRoot = append(pathsInRoot, path)
}

View File

@@ -0,0 +1,55 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package commands
import (
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/chmod"
createsonamesymlinks "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/create-soname-symlinks"
symlinks "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/create-symlinks"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/cudacompat"
disabledevicenodemodification "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/disable-device-node-modification"
ldcache "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/update-ldcache"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
// New creates the commands associated with supported CDI hooks.
// These are shared by the nvidia-cdi-hook and nvidia-ctk hook commands.
func New(logger logger.Interface) []*cli.Command {
return []*cli.Command{
ldcache.NewCommand(logger),
symlinks.NewCommand(logger),
chmod.NewCommand(logger),
cudacompat.NewCommand(logger),
createsonamesymlinks.NewCommand(logger),
disabledevicenodemodification.NewCommand(logger),
}
}
// IssueUnsupportedHookWarning logs a warning that no hook or an unsupported
// hook has been specified.
// This happens if a subcommand is provided that does not match one of the
// subcommands that has been explicitly specified.
func IssueUnsupportedHookWarning(logger logger.Interface, c *cli.Context) {
args := c.Args().Slice()
if len(args) == 0 {
logger.Warningf("No CDI hook specified")
} else {
logger.Warningf("Unsupported CDI hook: %v", args[0])
}
}

View File

@@ -0,0 +1,166 @@
/**
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package create_soname_symlinks
import (
"errors"
"fmt"
"log"
"os"
"github.com/moby/sys/reexec"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/ldconfig"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
)
const (
reexecUpdateLdCacheCommandName = "reexec-create-soname-symlinks"
)
type command struct {
logger logger.Interface
}
type options struct {
folders cli.StringSlice
ldconfigPath string
containerSpec string
}
func init() {
reexec.Register(reexecUpdateLdCacheCommandName, createSonameSymlinksHandler)
if reexec.Init() {
os.Exit(0)
}
}
// NewCommand constructs an create-soname-symlinks command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build the create-soname-symlinks command
func (m command) build() *cli.Command {
cfg := options{}
// Create the 'create-soname-symlinks' command
c := cli.Command{
Name: "create-soname-symlinks",
Usage: "Create soname symlinks libraries in specified directories",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &cfg)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "folder",
Usage: "Specify a directory to generate soname symlinks in. Can be specified multiple times",
Destination: &cfg.folders,
},
&cli.StringFlag{
Name: "ldconfig-path",
Usage: "Specify the path to ldconfig on the host",
Destination: &cfg.ldconfigPath,
Value: "/sbin/ldconfig",
},
&cli.StringFlag{
Name: "container-spec",
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN",
Destination: &cfg.containerSpec,
},
}
return &c
}
func (m command) validateFlags(c *cli.Context, cfg *options) error {
if cfg.ldconfigPath == "" {
return errors.New("ldconfig-path must be specified")
}
return nil
}
func (m command) run(c *cli.Context, cfg *options) error {
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
return fmt.Errorf("failed to load container state: %v", err)
}
containerRootDir, err := s.GetContainerRoot()
if err != nil || containerRootDir == "" || containerRootDir == "/" {
return fmt.Errorf("failed to determined container root: %v", err)
}
cmd, err := ldconfig.NewRunner(
reexecUpdateLdCacheCommandName,
cfg.ldconfigPath,
containerRootDir,
cfg.folders.Value()...,
)
if err != nil {
return err
}
return cmd.Run()
}
// createSonameSymlinksHandler wraps createSonameSymlinks with error handling.
func createSonameSymlinksHandler() {
if err := createSonameSymlinks(os.Args); err != nil {
log.Printf("Error updating ldcache: %v", err)
os.Exit(1)
}
}
// createSonameSymlinks ensures that soname symlinks are created in the
// specified directories.
// It is invoked from a reexec'd handler and provides namespace isolation for
// the operations performed by this hook. At the point where this is invoked,
// we are in a new mount namespace that is cloned from the parent.
//
// args[0] is the reexec initializer function name
// args[1] is the path of the ldconfig binary on the host
// args[2] is the container root directory
// The remaining args are directories where soname symlinks need to be created.
func createSonameSymlinks(args []string) error {
if len(args) < 3 {
return fmt.Errorf("incorrect arguments: %v", args)
}
hostLdconfigPath := args[1]
containerRootDirPath := args[2]
ldconfig, err := ldconfig.New(
hostLdconfigPath,
containerRootDirPath,
)
if err != nil {
return fmt.Errorf("failed to construct ldconfig runner: %w", err)
}
return ldconfig.CreateSonameSymlinks(args[3:]...)
}

View File

@@ -0,0 +1,172 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package symlinks
import (
"errors"
"fmt"
"os"
"path/filepath"
"strings"
"github.com/moby/sys/symlink"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup/symlinks"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
)
type command struct {
logger logger.Interface
}
type config struct {
links cli.StringSlice
containerSpec string
}
// NewCommand constructs a hook command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build creates the create-symlink command.
func (m command) build() *cli.Command {
cfg := config{}
c := cli.Command{
Name: "create-symlinks",
Usage: "A hook to create symlinks in the container.",
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "link",
Usage: "Specify a specific link to create. The link is specified as target::link. If the link exists in the container root, it is removed.",
Destination: &cfg.links,
},
// The following flags are testing-only flags.
&cli.StringFlag{
Name: "container-spec",
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN. This is only intended for testing.",
Destination: &cfg.containerSpec,
Hidden: true,
},
}
return &c
}
func (m command) run(c *cli.Context, cfg *config) error {
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
return fmt.Errorf("failed to load container state: %v", err)
}
containerRoot, err := s.GetContainerRoot()
if err != nil {
return fmt.Errorf("failed to determined container root: %v", err)
}
created := make(map[string]bool)
for _, l := range cfg.links.Value() {
if created[l] {
m.logger.Debugf("Link %v already processed", l)
continue
}
parts := strings.Split(l, "::")
if len(parts) != 2 {
return fmt.Errorf("invalid symlink specification %v", l)
}
err := m.createLink(containerRoot, parts[0], parts[1])
if err != nil {
return fmt.Errorf("failed to create link %v: %w", parts, err)
}
created[l] = true
}
return nil
}
// createLink creates a symbolic link in the specified container root.
// This is equivalent to:
//
// chroot {{ .containerRoot }} ln -f -s {{ .target }} {{ .link }}
//
// If the specified link already exists and points to the same target, this
// operation is a no-op.
// If a file exists at the link path or the link points to a different target
// this file is removed before creating the link.
//
// Note that if the link path resolves to an absolute path oudside of the
// specified root, this is treated as an absolute path in this root.
func (m command) createLink(containerRoot string, targetPath string, link string) error {
linkPath := filepath.Join(containerRoot, link)
exists, err := linkExists(targetPath, linkPath)
if err != nil {
return fmt.Errorf("failed to check if link exists: %w", err)
}
if exists {
m.logger.Debugf("Link %s already exists", linkPath)
return nil
}
// We resolve the parent of the symlink that we're creating in the container root.
// If we resolve the full link path, an existing link at the location itself
// is also resolved here and we are unable to force create the link.
resolvedLinkParent, err := symlink.FollowSymlinkInScope(filepath.Dir(linkPath), containerRoot)
if err != nil {
return fmt.Errorf("failed to follow path for link %v relative to %v: %w", link, containerRoot, err)
}
resolvedLinkPath := filepath.Join(resolvedLinkParent, filepath.Base(linkPath))
m.logger.Infof("Symlinking %v to %v", resolvedLinkPath, targetPath)
err = os.MkdirAll(filepath.Dir(resolvedLinkPath), 0755)
if err != nil {
return fmt.Errorf("failed to create directory: %v", err)
}
err = symlinks.ForceCreate(targetPath, resolvedLinkPath)
if err != nil {
return fmt.Errorf("failed to create symlink: %v", err)
}
return nil
}
// linkExists checks whether the specified link exists.
// A link exists if the path exists, is a symlink, and points to the specified target.
func linkExists(target string, link string) (bool, error) {
currentTarget, err := symlinks.Resolve(link)
if errors.Is(err, os.ErrNotExist) {
return false, nil
}
if err != nil {
return false, fmt.Errorf("failed to resolve existing symlink %s: %w", link, err)
}
if currentTarget == target {
return true, nil
}
return false, nil
}

View File

@@ -0,0 +1,297 @@
package symlinks
import (
"os"
"path/filepath"
"strings"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup/symlinks"
)
func TestLinkExist(t *testing.T) {
tmpDir := t.TempDir()
require.NoError(
t,
makeFs(tmpDir,
dirOrLink{path: "/a/b/c", target: "d"},
dirOrLink{path: "/a/b/e", target: "/a/b/f"},
),
)
exists, err := linkExists("d", filepath.Join(tmpDir, "/a/b/c"))
require.NoError(t, err)
require.True(t, exists)
exists, err = linkExists("/a/b/f", filepath.Join(tmpDir, "/a/b/e"))
require.NoError(t, err)
require.True(t, exists)
exists, err = linkExists("different-target", filepath.Join(tmpDir, "/a/b/c"))
require.NoError(t, err)
require.False(t, exists)
exists, err = linkExists("/a/b/d", filepath.Join(tmpDir, "/a/b/c"))
require.NoError(t, err)
require.False(t, exists)
exists, err = linkExists("foo", filepath.Join(tmpDir, "/a/b/does-not-exist"))
require.NoError(t, err)
require.False(t, exists)
}
func TestCreateLink(t *testing.T) {
type link struct {
path string
target string
}
type expectedLink struct {
link
err error
}
testCases := []struct {
description string
containerContents []dirOrLink
link link
expectedCreateError error
expectedLinks []expectedLink
}{
{
description: "link to / resolves to container root",
containerContents: []dirOrLink{
{path: "/lib/foo", target: "/"},
},
link: link{
path: "/lib/foo/libfoo.so",
target: "libfoo.so.1",
},
expectedLinks: []expectedLink{
{
link: link{
path: "{{ .containerRoot }}/libfoo.so",
target: "libfoo.so.1",
},
},
},
},
{
description: "link to / resolves to container root; parent relative link",
containerContents: []dirOrLink{
{path: "/lib/foo", target: "/"},
},
link: link{
path: "/lib/foo/libfoo.so",
target: "../libfoo.so.1",
},
expectedLinks: []expectedLink{
{
link: link{
path: "{{ .containerRoot }}/libfoo.so",
target: "../libfoo.so.1",
},
},
},
},
{
description: "link to / resolves to container root; absolute link",
containerContents: []dirOrLink{
{path: "/lib/foo", target: "/"},
},
link: link{
path: "/lib/foo/libfoo.so",
target: "/a-path-in-container/foo/libfoo.so.1",
},
expectedLinks: []expectedLink{
{
link: link{
path: "{{ .containerRoot }}/libfoo.so",
target: "/a-path-in-container/foo/libfoo.so.1",
},
},
{
// We also check that the target is NOT created.
link: link{
path: "{{ .containerRoot }}/a-path-in-container/foo/libfoo.so.1",
},
err: os.ErrNotExist,
},
},
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
tmpDir := t.TempDir()
hostRoot := filepath.Join(tmpDir, "/host-root/")
containerRoot := filepath.Join(tmpDir, "/container-root")
require.NoError(t, makeFs(hostRoot))
require.NoError(t, makeFs(containerRoot, tc.containerContents...))
// nvidia-cdi-hook create-symlinks --link linkSpec
err := getTestCommand().createLink(containerRoot, tc.link.target, tc.link.path)
// TODO: We may be able to replace this with require.ErrorIs.
if tc.expectedCreateError != nil {
require.Error(t, err)
} else {
require.NoError(t, err)
}
for _, expectedLink := range tc.expectedLinks {
path := strings.ReplaceAll(expectedLink.path, "{{ .containerRoot }}", containerRoot)
path = strings.ReplaceAll(path, "{{ .hostRoot }}", hostRoot)
if expectedLink.target != "" {
target, err := symlinks.Resolve(path)
require.ErrorIs(t, err, expectedLink.err)
require.Equal(t, expectedLink.target, target)
} else {
_, err := os.Stat(path)
require.ErrorIs(t, err, expectedLink.err)
}
}
})
}
}
func TestCreateLinkRelativePath(t *testing.T) {
tmpDir := t.TempDir()
hostRoot := filepath.Join(tmpDir, "/host-root/")
containerRoot := filepath.Join(tmpDir, "/container-root")
require.NoError(t, makeFs(hostRoot))
require.NoError(t, makeFs(containerRoot, dirOrLink{path: "/lib/"}))
// nvidia-cdi-hook create-symlinks --link libfoo.so.1::/lib/libfoo.so
err := getTestCommand().createLink(containerRoot, "libfoo.so.1", "/lib/libfoo.so")
require.NoError(t, err)
target, err := symlinks.Resolve(filepath.Join(containerRoot, "/lib/libfoo.so"))
require.NoError(t, err)
require.Equal(t, "libfoo.so.1", target)
}
func TestCreateLinkAbsolutePath(t *testing.T) {
tmpDir := t.TempDir()
hostRoot := filepath.Join(tmpDir, "/host-root/")
containerRoot := filepath.Join(tmpDir, "/container-root")
require.NoError(t, makeFs(hostRoot))
require.NoError(t, makeFs(containerRoot, dirOrLink{path: "/lib/"}))
// nvidia-cdi-hook create-symlinks --link /lib/libfoo.so.1::/lib/libfoo.so
err := getTestCommand().createLink(containerRoot, "/lib/libfoo.so.1", "/lib/libfoo.so")
require.NoError(t, err)
target, err := symlinks.Resolve(filepath.Join(containerRoot, "/lib/libfoo.so"))
require.NoError(t, err)
require.Equal(t, "/lib/libfoo.so.1", target)
}
func TestCreateLinkAlreadyExists(t *testing.T) {
testCases := []struct {
description string
containerContents []dirOrLink
shouldExist []string
}{
{
description: "link already exists with correct target",
containerContents: []dirOrLink{{path: "/lib/libfoo.so", target: "libfoo.so.1"}},
shouldExist: []string{},
},
{
description: "link already exists with different target",
containerContents: []dirOrLink{{path: "/lib/libfoo.so", target: "different-target"}, {path: "different-target"}},
shouldExist: []string{"{{ .containerRoot }}/different-target"},
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
tmpDir := t.TempDir()
hostRoot := filepath.Join(tmpDir, "/host-root/")
containerRoot := filepath.Join(tmpDir, "/container-root")
require.NoError(t, makeFs(hostRoot))
require.NoError(t, makeFs(containerRoot, tc.containerContents...))
// nvidia-cdi-hook create-symlinks --link libfoo.so.1::/lib/libfoo.so
err := getTestCommand().createLink(containerRoot, "libfoo.so.1", "/lib/libfoo.so")
require.NoError(t, err)
target, err := symlinks.Resolve(filepath.Join(containerRoot, "lib/libfoo.so"))
require.NoError(t, err)
require.Equal(t, "libfoo.so.1", target)
for _, p := range tc.shouldExist {
require.DirExists(t, strings.ReplaceAll(p, "{{ .containerRoot }}", containerRoot))
}
})
}
}
func TestCreateLinkOutOfBounds(t *testing.T) {
tmpDir := t.TempDir()
hostRoot := filepath.Join(tmpDir, "/host-root")
containerRoot := filepath.Join(tmpDir, "/container-root")
require.NoError(t,
makeFs(hostRoot,
dirOrLink{path: "libfoo.so"},
),
)
require.NoError(t,
makeFs(containerRoot,
dirOrLink{path: "/lib"},
dirOrLink{path: "/lib/foo", target: hostRoot},
),
)
path, err := symlinks.Resolve(filepath.Join(containerRoot, "/lib/foo"))
require.NoError(t, err)
require.Equal(t, hostRoot, path)
// nvidia-cdi-hook create-symlinks --link ../libfoo.so.1::/lib/foo/libfoo.so
_ = getTestCommand().createLink(containerRoot, "../libfoo.so.1", "/lib/foo/libfoo.so")
require.NoError(t, err)
target, err := symlinks.Resolve(filepath.Join(containerRoot, hostRoot, "libfoo.so"))
require.NoError(t, err)
require.Equal(t, "../libfoo.so.1", target)
require.DirExists(t, filepath.Join(hostRoot, "libfoo.so"))
}
type dirOrLink struct {
path string
target string
}
func makeFs(tmpdir string, fs ...dirOrLink) error {
if err := os.MkdirAll(tmpdir, 0o755); err != nil {
return err
}
for _, s := range fs {
s.path = filepath.Join(tmpdir, s.path)
if s.target == "" {
_ = os.MkdirAll(s.path, 0o755)
continue
}
if err := os.MkdirAll(filepath.Dir(s.path), 0o755); err != nil {
return err
}
if err := os.Symlink(s.target, s.path); err != nil && !os.IsExist(err) {
return err
}
}
return nil
}
// getTestCommand creates a command for running tests against.
func getTestCommand() *command {
logger, _ := testlog.NewNullLogger()
return &command{
logger: logger,
}
}

View File

@@ -0,0 +1,76 @@
/**
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package cudacompat
import (
"os"
"path/filepath"
"github.com/moby/sys/symlink"
)
// A containerRoot represents the root filesystem of a container.
type containerRoot string
// hasPath checks whether the specified path exists in the root.
func (r containerRoot) hasPath(path string) bool {
resolved, err := r.resolve(path)
if err != nil {
return false
}
if _, err := os.Stat(resolved); err != nil && os.IsNotExist(err) {
return false
}
return true
}
// globFiles matches the specified pattern in the root.
// The files that match must be regular files.
func (r containerRoot) globFiles(pattern string) ([]string, error) {
patternPath, err := r.resolve(pattern)
if err != nil {
return nil, err
}
matches, err := filepath.Glob(patternPath)
if err != nil {
return nil, err
}
var files []string
for _, match := range matches {
info, err := os.Lstat(match)
if err != nil {
return nil, err
}
// Ignore symlinks.
if info.Mode()&os.ModeSymlink != 0 {
continue
}
// Ignore directories.
if info.IsDir() {
continue
}
files = append(files, match)
}
return files, nil
}
// resolve returns the absolute path including root path.
// Symlinks are resolved, but are guaranteed to resolve in the root.
func (r containerRoot) resolve(path string) (string, error) {
absolute := filepath.Clean(filepath.Join(string(r), path))
return symlink.FollowSymlinkInScope(absolute, string(r))
}

View File

@@ -0,0 +1,221 @@
/**
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package cudacompat
import (
"fmt"
"os"
"path/filepath"
"strconv"
"strings"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
)
const (
cudaCompatPath = "/usr/local/cuda/compat"
// cudaCompatLdsoconfdFilenamePattern specifies the pattern for the filename
// in ld.so.conf.d that includes a reference to the CUDA compat path.
// The 00-compat prefix is chosen to ensure that these libraries have a
// higher precedence than other libraries on the system.
cudaCompatLdsoconfdFilenamePattern = "00-compat-*.conf"
)
type command struct {
logger logger.Interface
}
type options struct {
hostDriverVersion string
containerSpec string
}
// NewCommand constructs a cuda-compat command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build the enable-cuda-compat command
func (m command) build() *cli.Command {
cfg := options{}
// Create the 'enable-cuda-compat' command
c := cli.Command{
Name: "enable-cuda-compat",
Usage: "This hook ensures that the folder containing the CUDA compat libraries is added to the ldconfig search path if required.",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &cfg)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "host-driver-version",
Usage: "Specify the host driver version. If the CUDA compat libraries detected in the container do not have a higher MAJOR version, the hook is a no-op.",
Destination: &cfg.hostDriverVersion,
},
&cli.StringFlag{
Name: "container-spec",
Hidden: true,
Category: "testing-only",
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN",
Destination: &cfg.containerSpec,
},
}
return &c
}
func (m command) validateFlags(_ *cli.Context, cfg *options) error {
return nil
}
func (m command) run(_ *cli.Context, cfg *options) error {
if cfg.hostDriverVersion == "" {
return nil
}
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
return fmt.Errorf("failed to load container state: %w", err)
}
containerRootDir, err := s.GetContainerRoot()
if err != nil {
return fmt.Errorf("failed to determined container root: %w", err)
}
containerForwardCompatDir, err := m.getContainerForwardCompatDir(containerRoot(containerRootDir), cfg.hostDriverVersion)
if err != nil {
return fmt.Errorf("failed to get container forward compat directory: %w", err)
}
if containerForwardCompatDir == "" {
return nil
}
return m.createLdsoconfdFile(containerRoot(containerRootDir), cudaCompatLdsoconfdFilenamePattern, containerForwardCompatDir)
}
func (m command) getContainerForwardCompatDir(containerRoot containerRoot, hostDriverVersion string) (string, error) {
if hostDriverVersion == "" {
m.logger.Debugf("Host driver version not specified")
return "", nil
}
if !containerRoot.hasPath(cudaCompatPath) {
m.logger.Debugf("No CUDA forward compatibility libraries directory in container")
return "", nil
}
if !containerRoot.hasPath("/etc/ld.so.cache") {
m.logger.Debugf("The container does not have an LDCache")
return "", nil
}
libs, err := containerRoot.globFiles(filepath.Join(cudaCompatPath, "libcuda.so.*.*"))
if err != nil {
m.logger.Warningf("Failed to find CUDA compat library: %w", err)
return "", nil
}
if len(libs) == 0 {
m.logger.Debugf("No CUDA forward compatibility libraries container")
return "", nil
}
if len(libs) != 1 {
m.logger.Warningf("Unexpected number of CUDA compat libraries in container: %v", libs)
return "", nil
}
compatDriverVersion := strings.TrimPrefix(filepath.Base(libs[0]), "libcuda.so.")
compatMajor, err := extractMajorVersion(compatDriverVersion)
if err != nil {
return "", fmt.Errorf("failed to extract major version from %q: %v", compatDriverVersion, err)
}
driverMajor, err := extractMajorVersion(hostDriverVersion)
if err != nil {
return "", fmt.Errorf("failed to extract major version from %q: %v", hostDriverVersion, err)
}
if driverMajor >= compatMajor {
m.logger.Debugf("Compat major version is not greater than the host driver major version (%v >= %v)", hostDriverVersion, compatDriverVersion)
return "", nil
}
resolvedCompatDir := strings.TrimPrefix(filepath.Dir(libs[0]), string(containerRoot))
return resolvedCompatDir, nil
}
// createLdsoconfdFile creates a file at /etc/ld.so.conf.d/ in the specified root.
// The file is created at /etc/ld.so.conf.d/{{ .pattern }} using `CreateTemp` and
// contains the specified directories on each line.
func (m command) createLdsoconfdFile(in containerRoot, pattern string, dirs ...string) error {
if len(dirs) == 0 {
m.logger.Debugf("No directories to add to /etc/ld.so.conf")
return nil
}
ldsoconfdDir, err := in.resolve("/etc/ld.so.conf.d")
if err != nil {
return err
}
if err := os.MkdirAll(ldsoconfdDir, 0755); err != nil {
return fmt.Errorf("failed to create ld.so.conf.d: %w", err)
}
configFile, err := os.CreateTemp(ldsoconfdDir, pattern)
if err != nil {
return fmt.Errorf("failed to create config file: %w", err)
}
defer configFile.Close()
m.logger.Debugf("Adding directories %v to %v", dirs, configFile.Name())
added := make(map[string]bool)
for _, dir := range dirs {
if added[dir] {
continue
}
_, err = fmt.Fprintf(configFile, "%s\n", dir)
if err != nil {
return fmt.Errorf("failed to update config file: %w", err)
}
added[dir] = true
}
// The created file needs to be world readable for the cases where the container is run as a non-root user.
if err := configFile.Chmod(0644); err != nil {
return fmt.Errorf("failed to chmod config file: %w", err)
}
return nil
}
// extractMajorVersion parses a version string and returns the major version as an int.
func extractMajorVersion(version string) (int, error) {
majorString := strings.SplitN(version, ".", 2)[0]
return strconv.Atoi(majorString)
}

View File

@@ -0,0 +1,182 @@
/*
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package cudacompat
import (
"os"
"path/filepath"
"strings"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
)
func TestCompatLibs(t *testing.T) {
logger, _ := testlog.NewNullLogger()
testCases := []struct {
description string
contents map[string]string
hostDriverVersion string
expectedContainerForwardCompatDir string
}{
{
description: "empty root",
hostDriverVersion: "222.55.66",
},
{
description: "compat lib is newer; no ldcache",
contents: map[string]string{
"/usr/local/cuda/compat/libcuda.so.333.88.99": "",
},
hostDriverVersion: "222.55.66",
},
{
description: "compat lib is newer; ldcache",
contents: map[string]string{
"/etc/ld.so.cache": "",
"/usr/local/cuda/compat/libcuda.so.333.88.99": "",
},
hostDriverVersion: "222.55.66",
expectedContainerForwardCompatDir: "/usr/local/cuda/compat",
},
{
description: "compat lib is older; ldcache",
contents: map[string]string{
"/etc/ld.so.cache": "",
"/usr/local/cuda/compat/libcuda.so.111.88.99": "",
},
hostDriverVersion: "222.55.66",
expectedContainerForwardCompatDir: "",
},
{
description: "compat lib has same major version; ldcache",
contents: map[string]string{
"/etc/ld.so.cache": "",
"/usr/local/cuda/compat/libcuda.so.222.88.99": "",
},
hostDriverVersion: "222.55.66",
expectedContainerForwardCompatDir: "",
},
{
description: "numeric comparison is used; ldcache",
contents: map[string]string{
"/etc/ld.so.cache": "",
"/usr/local/cuda/compat/libcuda.so.222.88.99": "",
},
hostDriverVersion: "99.55.66",
expectedContainerForwardCompatDir: "/usr/local/cuda/compat",
},
{
description: "driver version empty; ldcache",
contents: map[string]string{
"/etc/ld.so.cache": "",
"/usr/local/cuda/compat/libcuda.so.222.88.99": "",
},
hostDriverVersion: "",
},
{
description: "symlinks are followed",
contents: map[string]string{
"/etc/ld.so.cache": "",
"/etc/alternatives/cuda/compat/libcuda.so.333.88.99": "",
"/usr/local/cuda": "symlink=/etc/alternatives/cuda",
},
hostDriverVersion: "222.55.66",
expectedContainerForwardCompatDir: "/etc/alternatives/cuda/compat",
},
{
description: "symlinks stay in container",
contents: map[string]string{
"/etc/ld.so.cache": "",
"/compat/libcuda.so.333.88.99": "",
"/usr/local/cuda": "symlink=../../../../../../",
},
hostDriverVersion: "222.55.66",
expectedContainerForwardCompatDir: "/compat",
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
containerRootDir := t.TempDir()
for name, contents := range tc.contents {
target := filepath.Join(containerRootDir, name)
require.NoError(t, os.MkdirAll(filepath.Dir(target), 0755))
if strings.HasPrefix(contents, "symlink=") {
require.NoError(t, os.Symlink(strings.TrimPrefix(contents, "symlink="), target))
continue
}
require.NoError(t, os.WriteFile(target, []byte(contents), 0600))
}
c := command{
logger: logger,
}
containerForwardCompatDir, err := c.getContainerForwardCompatDir(containerRoot(containerRootDir), tc.hostDriverVersion)
require.NoError(t, err)
require.EqualValues(t, tc.expectedContainerForwardCompatDir, containerForwardCompatDir)
})
}
}
func TestUpdateLdconfig(t *testing.T) {
logger, _ := testlog.NewNullLogger()
testCases := []struct {
description string
folders []string
expectedContents string
}{
{
description: "no folders; have no contents",
},
{
description: "single folder is added",
folders: []string{"/usr/local/cuda/compat"},
expectedContents: "/usr/local/cuda/compat\n",
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
containerRootDir := t.TempDir()
c := command{
logger: logger,
}
err := c.createLdsoconfdFile(containerRoot(containerRootDir), cudaCompatLdsoconfdFilenamePattern, tc.folders...)
require.NoError(t, err)
matches, err := filepath.Glob(filepath.Join(containerRootDir, "/etc/ld.so.conf.d/00-compat-*.conf"))
require.NoError(t, err)
if tc.expectedContents == "" {
require.Empty(t, matches)
return
}
require.Len(t, matches, 1)
contents, err := os.ReadFile(matches[0])
require.NoError(t, err)
require.EqualValues(t, tc.expectedContents, string(contents))
})
}
}

View File

@@ -0,0 +1,144 @@
/**
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package disabledevicenodemodification
import (
"bufio"
"bytes"
"errors"
"fmt"
"io"
"os"
"strings"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
)
const (
nvidiaDriverParamsPath = "/proc/driver/nvidia/params"
)
type options struct {
containerSpec string
}
// NewCommand constructs an disable-device-node-modification subcommand with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
cfg := options{}
c := cli.Command{
Name: "disable-device-node-modification",
Usage: "Ensure that the /proc/driver/nvidia/params file present in the container does not allow device node modifications.",
Before: func(c *cli.Context) error {
return validateFlags(c, &cfg)
},
Action: func(c *cli.Context) error {
return run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "container-spec",
Hidden: true,
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN",
Destination: &cfg.containerSpec,
},
}
return &c
}
func validateFlags(c *cli.Context, cfg *options) error {
return nil
}
func run(_ *cli.Context, cfg *options) error {
modifiedParamsFileContents, err := getModifiedNVIDIAParamsContents()
if err != nil {
return fmt.Errorf("failed to get modified params file contents: %w", err)
}
if len(modifiedParamsFileContents) == 0 {
return nil
}
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
return fmt.Errorf("failed to load container state: %w", err)
}
containerRootDirPath, err := s.GetContainerRoot()
if err != nil {
return fmt.Errorf("failed to determined container root: %w", err)
}
return createParamsFileInContainer(containerRootDirPath, modifiedParamsFileContents)
}
func getModifiedNVIDIAParamsContents() ([]byte, error) {
hostNvidiaParamsFile, err := os.Open(nvidiaDriverParamsPath)
if errors.Is(err, os.ErrNotExist) {
return nil, nil
}
if err != nil {
return nil, fmt.Errorf("failed to load params file: %w", err)
}
defer hostNvidiaParamsFile.Close()
modifiedContents, err := getModifiedParamsFileContentsFromReader(hostNvidiaParamsFile)
if err != nil {
return nil, fmt.Errorf("failed to get modfied params file contents: %w", err)
}
return modifiedContents, nil
}
// getModifiedParamsFileContentsFromReader returns the contents of a modified params file from the specified reader.
func getModifiedParamsFileContentsFromReader(r io.Reader) ([]byte, error) {
var modified bytes.Buffer
scanner := bufio.NewScanner(r)
var requiresModification bool
for scanner.Scan() {
line := scanner.Text()
if strings.HasPrefix(line, "ModifyDeviceFiles: ") {
if line == "ModifyDeviceFiles: 0" {
return nil, nil
}
if line == "ModifyDeviceFiles: 1" {
line = "ModifyDeviceFiles: 0"
requiresModification = true
}
}
if _, err := modified.WriteString(line + "\n"); err != nil {
return nil, fmt.Errorf("failed to create output buffer: %w", err)
}
}
if err := scanner.Err(); err != nil {
return nil, fmt.Errorf("failed to read params file: %w", err)
}
if !requiresModification {
return nil, nil
}
return modified.Bytes(), nil
}

View File

@@ -0,0 +1,91 @@
/**
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package disabledevicenodemodification
import (
"bytes"
"testing"
"github.com/stretchr/testify/require"
)
func TestGetModifiedParamsFileContentsFromReader(t *testing.T) {
testCases := map[string]struct {
contents []byte
expectedError error
expectedContents []byte
}{
"no contents": {
contents: nil,
expectedError: nil,
expectedContents: nil,
},
"other contents are ignored": {
contents: []byte(`# Some other content
that we don't care about
`),
expectedError: nil,
expectedContents: nil,
},
"already zero requires no modification": {
contents: []byte("ModifyDeviceFiles: 0"),
expectedError: nil,
expectedContents: nil,
},
"leading spaces require no modification": {
contents: []byte(" ModifyDeviceFiles: 1"),
},
"Trailing spaces require no modification": {
contents: []byte("ModifyDeviceFiles: 1 "),
},
"Not 1 require no modification": {
contents: []byte("ModifyDeviceFiles: 11"),
},
"single line requires modification": {
contents: []byte("ModifyDeviceFiles: 1"),
expectedError: nil,
expectedContents: []byte("ModifyDeviceFiles: 0\n"),
},
"single line with trailing newline requires modification": {
contents: []byte("ModifyDeviceFiles: 1\n"),
expectedError: nil,
expectedContents: []byte("ModifyDeviceFiles: 0\n"),
},
"other content is maintained": {
contents: []byte(`ModifyDeviceFiles: 1
other content
that
is maintained`),
expectedError: nil,
expectedContents: []byte(`ModifyDeviceFiles: 0
other content
that
is maintained
`),
},
}
for description, tc := range testCases {
t.Run(description, func(t *testing.T) {
contents, err := getModifiedParamsFileContentsFromReader(bytes.NewReader(tc.contents))
require.EqualValues(t, tc.expectedError, err)
require.EqualValues(t, string(tc.expectedContents), string(contents))
})
}
}

View File

@@ -0,0 +1,63 @@
//go:build linux
/**
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package disabledevicenodemodification
import (
"fmt"
"os"
"path/filepath"
"github.com/opencontainers/runc/libcontainer/utils"
"golang.org/x/sys/unix"
)
func createParamsFileInContainer(containerRootDirPath string, contents []byte) error {
tmpRoot, err := os.MkdirTemp("", "nvct-empty-dir*")
if err != nil {
return fmt.Errorf("failed to create temp root: %w", err)
}
if err := createTmpFs(tmpRoot, len(contents)); err != nil {
return fmt.Errorf("failed to create tmpfs mount for params file: %w", err)
}
modifiedParamsFile, err := os.OpenFile(filepath.Join(tmpRoot, "nvct-params"), os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0444)
if err != nil {
return fmt.Errorf("failed to open modified params file: %w", err)
}
defer modifiedParamsFile.Close()
if _, err := modifiedParamsFile.Write(contents); err != nil {
return fmt.Errorf("failed to write temporary params file: %w", err)
}
err = utils.WithProcfd(containerRootDirPath, nvidiaDriverParamsPath, func(nvidiaDriverParamsFdPath string) error {
return unix.Mount(modifiedParamsFile.Name(), nvidiaDriverParamsFdPath, "", unix.MS_BIND|unix.MS_RDONLY|unix.MS_NODEV|unix.MS_PRIVATE|unix.MS_NOSYMFOLLOW, "")
})
if err != nil {
return fmt.Errorf("failed to mount modified params file: %w", err)
}
return nil
}
func createTmpFs(target string, size int) error {
return unix.Mount("tmpfs", target, "tmpfs", 0, fmt.Sprintf("size=%d", size))
}

View File

@@ -0,0 +1,27 @@
//go:build !linux
// +build !linux
/**
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package disabledevicenodemodification
import "fmt"
func createParamsFileInContainer(containerRootDirPath string, contents []byte) error {
return fmt.Errorf("not supported")
}

107
cmd/nvidia-cdi-hook/main.go Normal file
View File

@@ -0,0 +1,107 @@
/**
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package main
import (
"os"
"github.com/sirupsen/logrus"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
cli "github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/commands"
)
// options defines the options that can be set for the CLI through config files,
// environment variables, or command line flags
type options struct {
// Debug indicates whether the CLI is started in "debug" mode
Debug bool
// Quiet indicates whether the CLI is started in "quiet" mode
Quiet bool
}
func main() {
logger := logrus.New()
// Create a options struct to hold the parsed environment variables or command line flags
opts := options{}
// Create the top-level CLI
c := cli.NewApp()
c.Name = "NVIDIA CDI Hook"
c.UseShortOptionHandling = true
c.EnableBashCompletion = true
c.Usage = "Command to structure files for usage inside a container, called as hooks from a container runtime, defined in a CDI yaml file"
c.Version = info.GetVersionString()
// We set the default action for the `nvidia-cdi-hook` command to issue a
// warning and exit with no error.
// This means that if an unsupported hook is run, a container will not fail
// to launch. An unsupported hook could be the result of a CDI specification
// referring to a new hook that is not yet supported by an older NVIDIA
// Container Toolkit version or a hook that has been removed in newer
// version.
c.Action = func(ctx *cli.Context) error {
commands.IssueUnsupportedHookWarning(logger, ctx)
return nil
}
// Setup the flags for this command
c.Flags = []cli.Flag{
&cli.BoolFlag{
Name: "debug",
Aliases: []string{"d"},
Usage: "Enable debug-level logging",
Destination: &opts.Debug,
// TODO: Support for NVIDIA_CDI_DEBUG is deprecated and NVIDIA_CTK_DEBUG should be used instead.
EnvVars: []string{"NVIDIA_CTK_DEBUG", "NVIDIA_CDI_DEBUG"},
},
&cli.BoolFlag{
Name: "quiet",
Usage: "Suppress all output except for errors; overrides --debug",
Destination: &opts.Quiet,
// TODO: Support for NVIDIA_CDI_QUIET is deprecated and NVIDIA_CTK_QUIET should be used instead.
EnvVars: []string{"NVDIA_CTK_QUIET", "NVIDIA_CDI_QUIET"},
},
}
// Set log-level for all subcommands
c.Before = func(c *cli.Context) error {
logLevel := logrus.InfoLevel
if opts.Debug {
logLevel = logrus.DebugLevel
}
if opts.Quiet {
logLevel = logrus.ErrorLevel
}
logger.SetLevel(logLevel)
return nil
}
// Define the subcommands
c.Commands = commands.New(logger)
// Run the CLI
err := c.Run(os.Args)
if err != nil {
logger.Errorf("%v", err)
os.Exit(1)
}
}

View File

@@ -0,0 +1,164 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package ldcache
import (
"errors"
"fmt"
"log"
"os"
"github.com/moby/sys/reexec"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/ldconfig"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
)
const (
reexecUpdateLdCacheCommandName = "reexec-update-ldcache"
)
type command struct {
logger logger.Interface
}
type options struct {
folders cli.StringSlice
ldconfigPath string
containerSpec string
}
func init() {
reexec.Register(reexecUpdateLdCacheCommandName, updateLdCacheHandler)
if reexec.Init() {
os.Exit(0)
}
}
// NewCommand constructs an update-ldcache command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build the update-ldcache command
func (m command) build() *cli.Command {
cfg := options{}
// Create the 'update-ldcache' command
c := cli.Command{
Name: "update-ldcache",
Usage: "Update ldcache in a container by running ldconfig",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &cfg)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "folder",
Usage: "Specify a folder to add to /etc/ld.so.conf before updating the ld cache",
Destination: &cfg.folders,
},
&cli.StringFlag{
Name: "ldconfig-path",
Usage: "Specify the path to the ldconfig program",
Destination: &cfg.ldconfigPath,
Value: "/sbin/ldconfig",
},
&cli.StringFlag{
Name: "container-spec",
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN",
Destination: &cfg.containerSpec,
},
}
return &c
}
func (m command) validateFlags(c *cli.Context, cfg *options) error {
if cfg.ldconfigPath == "" {
return errors.New("ldconfig-path must be specified")
}
return nil
}
func (m command) run(c *cli.Context, cfg *options) error {
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
return fmt.Errorf("failed to load container state: %v", err)
}
containerRootDir, err := s.GetContainerRoot()
if err != nil || containerRootDir == "" || containerRootDir == "/" {
return fmt.Errorf("failed to determined container root: %v", err)
}
cmd, err := ldconfig.NewRunner(
reexecUpdateLdCacheCommandName,
cfg.ldconfigPath,
containerRootDir,
cfg.folders.Value()...,
)
if err != nil {
return err
}
return cmd.Run()
}
// updateLdCacheHandler wraps updateLdCache with error handling.
func updateLdCacheHandler() {
if err := updateLdCache(os.Args); err != nil {
log.Printf("Error updating ldcache: %v", err)
os.Exit(1)
}
}
// updateLdCache ensures that the ldcache in the container is updated to include
// libraries that are mounted from the host.
// It is invoked from a reexec'd handler and provides namespace isolation for
// the operations performed by this hook. At the point where this is invoked,
// we are in a new mount namespace that is cloned from the parent.
//
// args[0] is the reexec initializer function name
// args[1] is the path of the ldconfig binary on the host
// args[2] is the container root directory
// The remaining args are folders where soname symlinks need to be created.
func updateLdCache(args []string) error {
if len(args) < 3 {
return fmt.Errorf("incorrect arguments: %v", args)
}
hostLdconfigPath := args[1]
containerRootDirPath := args[2]
ldconfig, err := ldconfig.New(
hostLdconfigPath,
containerRootDirPath,
)
if err != nil {
return fmt.Errorf("failed to construct ldconfig runner: %w", err)
}
return ldconfig.UpdateLDCache(args[3:]...)
}

View File

@@ -2,15 +2,6 @@ package main
import (
"log"
"strings"
)
const (
allDriverCapabilities = DriverCapabilities("compute,compat32,graphics,utility,video,display,ngx")
defaultDriverCapabilities = DriverCapabilities("utility,compute")
none = DriverCapabilities("")
all = DriverCapabilities("all")
)
func capabilityToCLI(cap string) string {
@@ -34,50 +25,3 @@ func capabilityToCLI(cap string) string {
}
return ""
}
// DriverCapabilities is used to process the NVIDIA_DRIVER_CAPABILITIES environment
// variable. Operations include default values, filtering, and handling meta values such as "all"
type DriverCapabilities string
// Intersection returns intersection between two sets of capabilities.
func (d DriverCapabilities) Intersection(capabilities DriverCapabilities) DriverCapabilities {
if capabilities == all {
return d
}
if d == all {
return capabilities
}
lookup := make(map[string]bool)
for _, c := range d.list() {
lookup[c] = true
}
var found []string
for _, c := range capabilities.list() {
if lookup[c] {
found = append(found, c)
}
}
intersection := DriverCapabilities(strings.Join(found, ","))
return intersection
}
// String returns the string representation of the driver capabilities
func (d DriverCapabilities) String() string {
return string(d)
}
// list returns the driver capabilities as a list
func (d DriverCapabilities) list() []string {
var caps []string
for _, c := range strings.Split(string(d), ",") {
trimmed := strings.TrimSpace(c)
if len(trimmed) == 0 {
continue
}
caps = append(caps, trimmed)
}
return caps
}

View File

@@ -1,134 +0,0 @@
/**
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package main
import (
"fmt"
"testing"
"github.com/stretchr/testify/require"
)
func TestDriverCapabilitiesIntersection(t *testing.T) {
testCases := []struct {
capabilities DriverCapabilities
supportedCapabilities DriverCapabilities
expectedIntersection DriverCapabilities
}{
{
capabilities: none,
supportedCapabilities: none,
expectedIntersection: none,
},
{
capabilities: all,
supportedCapabilities: none,
expectedIntersection: none,
},
{
capabilities: all,
supportedCapabilities: allDriverCapabilities,
expectedIntersection: allDriverCapabilities,
},
{
capabilities: allDriverCapabilities,
supportedCapabilities: all,
expectedIntersection: allDriverCapabilities,
},
{
capabilities: none,
supportedCapabilities: all,
expectedIntersection: none,
},
{
capabilities: none,
supportedCapabilities: DriverCapabilities("cap1"),
expectedIntersection: none,
},
{
capabilities: DriverCapabilities("cap0,cap1"),
supportedCapabilities: DriverCapabilities("cap1,cap0"),
expectedIntersection: DriverCapabilities("cap0,cap1"),
},
{
capabilities: defaultDriverCapabilities,
supportedCapabilities: allDriverCapabilities,
expectedIntersection: defaultDriverCapabilities,
},
{
capabilities: DriverCapabilities("compute,compat32,graphics,utility,video,display"),
supportedCapabilities: DriverCapabilities("compute,compat32,graphics,utility,video,display,ngx"),
expectedIntersection: DriverCapabilities("compute,compat32,graphics,utility,video,display"),
},
{
capabilities: DriverCapabilities("cap1"),
supportedCapabilities: none,
expectedIntersection: none,
},
{
capabilities: DriverCapabilities("compute,compat32,graphics,utility,video,display,ngx"),
supportedCapabilities: DriverCapabilities("compute,compat32,graphics,utility,video,display"),
expectedIntersection: DriverCapabilities("compute,compat32,graphics,utility,video,display"),
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("test case %d", i), func(t *testing.T) {
intersection := tc.supportedCapabilities.Intersection(tc.capabilities)
require.EqualValues(t, tc.expectedIntersection, intersection)
})
}
}
func TestDriverCapabilitiesList(t *testing.T) {
testCases := []struct {
capabilities DriverCapabilities
expected []string
}{
{
capabilities: DriverCapabilities(""),
},
{
capabilities: DriverCapabilities(" "),
},
{
capabilities: DriverCapabilities(","),
},
{
capabilities: DriverCapabilities(",cap"),
expected: []string{"cap"},
},
{
capabilities: DriverCapabilities("cap,"),
expected: []string{"cap"},
},
{
capabilities: DriverCapabilities("cap0,,cap1"),
expected: []string{"cap0", "cap1"},
},
{
capabilities: DriverCapabilities("cap1,cap0,cap3"),
expected: []string{"cap1", "cap0", "cap3"},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("test case %d", i), func(t *testing.T) {
require.EqualValues(t, tc.expected, tc.capabilities.list())
})
}
}

View File

@@ -6,46 +6,29 @@ import (
"log"
"os"
"path"
"path/filepath"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
"github.com/opencontainers/runtime-spec/specs-go"
"golang.org/x/mod/semver"
)
const (
envCUDAVersion = "CUDA_VERSION"
envNVRequirePrefix = "NVIDIA_REQUIRE_"
envNVRequireCUDA = envNVRequirePrefix + "CUDA"
envNVDisableRequire = "NVIDIA_DISABLE_REQUIRE"
envNVVisibleDevices = "NVIDIA_VISIBLE_DEVICES"
envNVMigConfigDevices = "NVIDIA_MIG_CONFIG_DEVICES"
envNVMigMonitorDevices = "NVIDIA_MIG_MONITOR_DEVICES"
envNVDriverCapabilities = "NVIDIA_DRIVER_CAPABILITIES"
)
const (
capSysAdmin = "CAP_SYS_ADMIN"
)
const (
deviceListAsVolumeMountsRoot = "/var/run/nvidia-container-devices"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
)
type nvidiaConfig struct {
Devices string
Devices []string
MigConfigDevices string
MigMonitorDevices string
ImexChannels []string
DriverCapabilities string
Requirements []string
DisableRequire bool
// Requirements defines the requirements DSL for the container to run.
// This is empty if no specific requirements are needed, or if requirements are
// explicitly disabled.
Requirements []string
}
type containerConfig struct {
Pid int
Rootfs string
Env map[string]string
Image image.CUDA
Nvidia *nvidiaConfig
}
@@ -72,23 +55,14 @@ type LinuxCapabilities struct {
Ambient []string `json:"ambient,omitempty" platform:"linux"`
}
// Mount from OCI runtime spec
// https://github.com/opencontainers/runtime-spec/blob/v1.0.0/specs-go/config.go#L103
type Mount struct {
Destination string `json:"destination"`
Type string `json:"type,omitempty" platform:"linux,solaris"`
Source string `json:"source,omitempty"`
Options []string `json:"options,omitempty"`
}
// Spec from OCI runtime spec
// We use pointers to structs, similarly to the latest version of runtime-spec:
// https://github.com/opencontainers/runtime-spec/blob/v1.0.0/specs-go/config.go#L5-L28
type Spec struct {
Version *string `json:"ociVersion"`
Process *Process `json:"process,omitempty"`
Root *Root `json:"root,omitempty"`
Mounts []Mount `json:"mounts,omitempty"`
Version *string `json:"ociVersion"`
Process *Process `json:"process,omitempty"`
Root *Root `json:"root,omitempty"`
Mounts []specs.Mount `json:"mounts,omitempty"`
}
// HookState holds state information about the hook
@@ -125,9 +99,9 @@ func loadSpec(path string) (spec *Spec) {
return
}
func isPrivileged(s *Spec) bool {
if s.Process.Capabilities == nil {
return false
func (s *Spec) GetCapabilities() []string {
if s == nil || s.Process == nil || s.Process.Capabilities == nil {
return nil
}
var caps []string
@@ -140,141 +114,72 @@ func isPrivileged(s *Spec) bool {
if err != nil {
log.Panicln("could not decode Process.Capabilities in OCI spec:", err)
}
for _, c := range caps {
if c == capSysAdmin {
return true
}
}
return false
return caps
}
// Otherwise, parse s.Process.Capabilities as:
// github.com/opencontainers/runtime-spec/blob/v1.0.0/specs-go/config.go#L30-L54
process := specs.Process{
Env: s.Process.Env,
}
err := json.Unmarshal(*s.Process.Capabilities, &process.Capabilities)
capabilities := specs.LinuxCapabilities{}
err := json.Unmarshal(*s.Process.Capabilities, &capabilities)
if err != nil {
log.Panicln("could not decode Process.Capabilities in OCI spec:", err)
}
fullSpec := specs.Spec{
Version: *s.Version,
Process: &process,
}
return image.IsPrivileged(&fullSpec)
return image.OCISpecCapabilities(capabilities).GetCapabilities()
}
func getDevicesFromEnvvar(image image.CUDA, swarmResourceEnvvars []string) *string {
// We check if the image has at least one of the Swarm resource envvars defined and use this
// if specified.
var hasSwarmEnvvar bool
for _, envvar := range swarmResourceEnvvars {
if _, exists := image[envvar]; exists {
hasSwarmEnvvar = true
break
func isPrivileged(s *Spec) bool {
return image.IsPrivileged(s)
}
func getMigConfigDevices(i image.CUDA) *string {
return getMigDevices(i, image.EnvVarNvidiaMigConfigDevices)
}
func getMigMonitorDevices(i image.CUDA) *string {
return getMigDevices(i, image.EnvVarNvidiaMigMonitorDevices)
}
func getMigDevices(image image.CUDA, envvar string) *string {
if !image.HasEnvvar(envvar) {
return nil
}
devices := image.Getenv(envvar)
return &devices
}
func (hookConfig *hookConfig) getImexChannels(image image.CUDA, privileged bool) []string {
if hookConfig.Features.IgnoreImexChannelRequests.IsEnabled() {
return nil
}
// If enabled, try and get the device list from volume mounts first
if hookConfig.AcceptDeviceListAsVolumeMounts {
devices := image.ImexChannelsFromMounts()
if len(devices) > 0 {
return devices
}
}
var devices []string
if hasSwarmEnvvar {
devices = image.DevicesFromEnvvars(swarmResourceEnvvars...).List()
} else {
devices = image.DevicesFromEnvvars(envNVVisibleDevices).List()
}
devices := image.ImexChannelsFromEnvVar()
if len(devices) == 0 {
return nil
}
devicesString := strings.Join(devices, ",")
return &devicesString
}
func getDevicesFromMounts(mounts []Mount) *string {
var devices []string
for _, m := range mounts {
root := filepath.Clean(deviceListAsVolumeMountsRoot)
source := filepath.Clean(m.Source)
destination := filepath.Clean(m.Destination)
// Only consider mounts who's host volume is /dev/null
if source != "/dev/null" {
continue
}
// Only consider container mount points that begin with 'root'
if len(destination) < len(root) {
continue
}
if destination[:len(root)] != root {
continue
}
// Grab the full path beyond 'root' and add it to the list of devices
device := destination[len(root):]
if len(device) > 0 && device[0] == '/' {
device = device[1:]
}
if len(device) == 0 {
continue
}
devices = append(devices, device)
}
if devices == nil {
return nil
}
ret := strings.Join(devices, ",")
return &ret
}
func getDevices(hookConfig *HookConfig, image image.CUDA, mounts []Mount, privileged bool) *string {
// If enabled, try and get the device list from volume mounts first
if hookConfig.AcceptDeviceListAsVolumeMounts {
devices := getDevicesFromMounts(mounts)
if devices != nil {
return devices
}
}
// Fallback to reading from the environment variable if privileges are correct
devices := getDevicesFromEnvvar(image, hookConfig.getSwarmResourceEnvvars())
if devices == nil {
return nil
}
if privileged || hookConfig.AcceptEnvvarUnprivileged {
return devices
}
configName := hookConfig.getConfigOption("AcceptEnvvarUnprivileged")
log.Printf("Ignoring devices specified in NVIDIA_VISIBLE_DEVICES (privileged=%v, %v=%v) ", privileged, configName, hookConfig.AcceptEnvvarUnprivileged)
return nil
}
func getMigConfigDevices(env map[string]string) *string {
if devices, ok := env[envNVMigConfigDevices]; ok {
return &devices
}
return nil
}
func getMigMonitorDevices(env map[string]string) *string {
if devices, ok := env[envNVMigMonitorDevices]; ok {
return &devices
}
return nil
}
func getDriverCapabilities(env map[string]string, supportedDriverCapabilities DriverCapabilities, legacyImage bool) DriverCapabilities {
func (hookConfig *hookConfig) getDriverCapabilities(cudaImage image.CUDA, legacyImage bool) image.DriverCapabilities {
// We use the default driver capabilities by default. This is filtered to only include the
// supported capabilities
capabilities := supportedDriverCapabilities.Intersection(defaultDriverCapabilities)
supportedDriverCapabilities := image.NewDriverCapabilities(hookConfig.SupportedDriverCapabilities)
capabilities := supportedDriverCapabilities.Intersection(image.DefaultDriverCapabilities)
capsEnv, capsEnvSpecified := env[envNVDriverCapabilities]
capsEnvSpecified := cudaImage.HasEnvvar(image.EnvVarNvidiaDriverCapabilities)
capsEnv := cudaImage.Getenv(image.EnvVarNvidiaDriverCapabilities)
if !capsEnvSpecified && legacyImage {
// Environment variable unset with legacy image: set all capabilities.
@@ -283,9 +188,9 @@ func getDriverCapabilities(env map[string]string, supportedDriverCapabilities Dr
if capsEnvSpecified && len(capsEnv) > 0 {
// If the envvironment variable is specified and is non-empty, use the capabilities value
envCapabilities := DriverCapabilities(capsEnv)
envCapabilities := image.NewDriverCapabilities(capsEnv)
capabilities = supportedDriverCapabilities.Intersection(envCapabilities)
if envCapabilities != all && capabilities != envCapabilities {
if !envCapabilities.IsAll() && len(capabilities) != len(envCapabilities) {
log.Panicln(fmt.Errorf("unsupported capabilities found in '%v' (allowed '%v')", envCapabilities, capabilities))
}
}
@@ -293,14 +198,12 @@ func getDriverCapabilities(env map[string]string, supportedDriverCapabilities Dr
return capabilities
}
func getNvidiaConfig(hookConfig *HookConfig, image image.CUDA, mounts []Mount, privileged bool) *nvidiaConfig {
func (hookConfig *hookConfig) getNvidiaConfig(image image.CUDA, privileged bool) *nvidiaConfig {
legacyImage := image.IsLegacy()
var devices string
if d := getDevices(hookConfig, image, mounts, privileged); d != nil {
devices = *d
} else {
// 'nil' devices means this is not a GPU container.
devices := image.VisibleDevices()
if len(devices) == 0 {
// empty devices means this is not a GPU container.
return nil
}
@@ -320,26 +223,33 @@ func getNvidiaConfig(hookConfig *HookConfig, image image.CUDA, mounts []Mount, p
log.Panicln("cannot set MIG_MONITOR_DEVICES in non privileged container")
}
driverCapabilities := getDriverCapabilities(image, hookConfig.SupportedDriverCapabilities, legacyImage).String()
imexChannels := hookConfig.getImexChannels(image, privileged)
driverCapabilities := hookConfig.getDriverCapabilities(image, legacyImage).String()
requirements, err := image.GetRequirements()
if err != nil {
log.Panicln("failed to get requirements", err)
}
disableRequire := image.HasDisableRequire()
return &nvidiaConfig{
Devices: devices,
MigConfigDevices: migConfigDevices,
MigMonitorDevices: migMonitorDevices,
ImexChannels: imexChannels,
DriverCapabilities: driverCapabilities,
Requirements: requirements,
DisableRequire: disableRequire,
}
}
func getContainerConfig(hook HookConfig) (config containerConfig) {
func (hookConfig *hookConfig) getContainerConfig() (config *containerConfig) {
hookConfig.Lock()
defer hookConfig.Unlock()
if hookConfig.containerConfig != nil {
return hookConfig.containerConfig
}
var h HookState
d := json.NewDecoder(os.Stdin)
if err := d.Decode(&h); err != nil {
@@ -353,16 +263,28 @@ func getContainerConfig(hook HookConfig) (config containerConfig) {
s := loadSpec(path.Join(b, "config.json"))
image, err := image.NewCUDAImageFromEnv(s.Process.Env)
privileged := isPrivileged(s)
i, err := image.New(
image.WithEnv(s.Process.Env),
image.WithMounts(s.Mounts),
image.WithPrivileged(privileged),
image.WithDisableRequire(hookConfig.DisableRequire),
image.WithAcceptDeviceListAsVolumeMounts(hookConfig.AcceptDeviceListAsVolumeMounts),
image.WithAcceptEnvvarUnprivileged(hookConfig.AcceptEnvvarUnprivileged),
image.WithPreferredVisibleDevicesEnvVars(hookConfig.getSwarmResourceEnvvars()...),
)
if err != nil {
log.Panicln(err)
}
privileged := isPrivileged(s)
return containerConfig{
cc := containerConfig{
Pid: h.Pid,
Rootfs: s.Root.Path,
Env: image,
Nvidia: getNvidiaConfig(&hook, image, s.Mounts, privileged),
Image: i,
Nvidia: hookConfig.getNvidiaConfig(i, privileged),
}
hookConfig.containerConfig = &cc
return hookConfig.containerConfig
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,114 +1,76 @@
package main
import (
"fmt"
"log"
"os"
"path"
"reflect"
"strings"
"sync"
"github.com/BurntSushi/toml"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
)
const (
configPath = "/etc/nvidia-container-runtime/config.toml"
driverPath = "/run/nvidia/driver"
)
var defaultPaths = [...]string{
path.Join(driverPath, configPath),
configPath,
// hookConfig wraps the toolkit config.
// This allows for functions to be defined on the local type.
type hookConfig struct {
sync.Mutex
*config.Config
containerConfig *containerConfig
}
// CLIConfig : options for nvidia-container-cli.
type CLIConfig struct {
Root *string `toml:"root"`
Path *string `toml:"path"`
Environment []string `toml:"environment"`
Debug *string `toml:"debug"`
Ldcache *string `toml:"ldcache"`
LoadKmods bool `toml:"load-kmods"`
NoPivot bool `toml:"no-pivot"`
NoCgroups bool `toml:"no-cgroups"`
User *string `toml:"user"`
Ldconfig *string `toml:"ldconfig"`
}
// HookConfig : options for the nvidia-container-runtime-hook.
type HookConfig struct {
DisableRequire bool `toml:"disable-require"`
SwarmResource *string `toml:"swarm-resource"`
AcceptEnvvarUnprivileged bool `toml:"accept-nvidia-visible-devices-envvar-when-unprivileged"`
AcceptDeviceListAsVolumeMounts bool `toml:"accept-nvidia-visible-devices-as-volume-mounts"`
SupportedDriverCapabilities DriverCapabilities `toml:"supported-driver-capabilities"`
NvidiaContainerCLI CLIConfig `toml:"nvidia-container-cli"`
NVIDIAContainerRuntime config.RuntimeConfig `toml:"nvidia-container-runtime"`
NVIDIAContainerRuntimeHook config.RuntimeHookConfig `toml:"nvidia-container-runtime-hook"`
}
func getDefaultHookConfig() HookConfig {
return HookConfig{
DisableRequire: false,
SwarmResource: nil,
AcceptEnvvarUnprivileged: true,
AcceptDeviceListAsVolumeMounts: false,
SupportedDriverCapabilities: allDriverCapabilities,
NvidiaContainerCLI: CLIConfig{
Root: nil,
Path: nil,
Environment: []string{},
Debug: nil,
Ldcache: nil,
LoadKmods: true,
NoPivot: false,
NoCgroups: false,
User: nil,
Ldconfig: nil,
},
NVIDIAContainerRuntime: *config.GetDefaultRuntimeConfig(),
NVIDIAContainerRuntimeHook: *config.GetDefaultRuntimeHookConfig(),
// loadConfig loads the required paths for the hook config.
func loadConfig() (*config.Config, error) {
configFilePath, required := getConfigFilePath()
cfg, err := config.New(
config.WithConfigFile(configFilePath),
config.WithRequired(true),
)
if err == nil {
return cfg.Config()
} else if os.IsNotExist(err) && !required {
return config.GetDefault()
}
return nil, fmt.Errorf("couldn't open required configuration file: %v", err)
}
func getHookConfig() (config HookConfig) {
var err error
if len(*configflag) > 0 {
config = getDefaultHookConfig()
_, err = toml.DecodeFile(*configflag, &config)
if err != nil {
log.Panicln("couldn't open configuration file:", err)
}
} else {
for _, p := range defaultPaths {
config = getDefaultHookConfig()
_, err = toml.DecodeFile(p, &config)
if err == nil {
break
} else if !os.IsNotExist(err) {
log.Panicln("couldn't open default configuration file:", err)
}
}
func getConfigFilePath() (string, bool) {
if configFromFlag := *configflag; configFromFlag != "" {
return configFromFlag, true
}
if config.SupportedDriverCapabilities == all {
config.SupportedDriverCapabilities = allDriverCapabilities
if configFromEnvvar := os.Getenv(config.FilePathOverrideEnvVar); configFromEnvvar != "" {
return configFromEnvvar, true
}
// We ensure that the supported-driver-capabilites option is a subset of allDriverCapabilities
if intersection := allDriverCapabilities.Intersection(config.SupportedDriverCapabilities); intersection != config.SupportedDriverCapabilities {
return config.GetConfigFilePath(), false
}
func getHookConfig() (*hookConfig, error) {
cfg, err := loadConfig()
if err != nil {
return nil, fmt.Errorf("failed to load config: %v", err)
}
config := &hookConfig{Config: cfg}
allSupportedDriverCapabilities := image.SupportedDriverCapabilities
if config.SupportedDriverCapabilities == "all" {
config.SupportedDriverCapabilities = allSupportedDriverCapabilities.String()
}
configuredCapabilities := image.NewDriverCapabilities(config.SupportedDriverCapabilities)
// We ensure that the configured value is a subset of all supported capabilities
if !allSupportedDriverCapabilities.IsSuperset(configuredCapabilities) {
configName := config.getConfigOption("SupportedDriverCapabilities")
log.Panicf("Invalid value for config option '%v'; %v (supported: %v)\n", configName, config.SupportedDriverCapabilities, allDriverCapabilities)
log.Panicf("Invalid value for config option '%v'; %v (supported: %v)\n", configName, config.SupportedDriverCapabilities, allSupportedDriverCapabilities.String())
}
return config
return config, nil
}
// getConfigOption returns the toml config option associated with the
// specified struct field.
func (c HookConfig) getConfigOption(fieldName string) string {
t := reflect.TypeOf(c)
func (c *hookConfig) getConfigOption(fieldName string) string {
t := reflect.TypeOf(&c)
f, ok := t.FieldByName(fieldName)
if !ok {
return fieldName
@@ -121,12 +83,12 @@ func (c HookConfig) getConfigOption(fieldName string) string {
}
// getSwarmResourceEnvvars returns the swarm resource envvars for the config.
func (c *HookConfig) getSwarmResourceEnvvars() []string {
if c.SwarmResource == nil {
func (c *hookConfig) getSwarmResourceEnvvars() []string {
if c == nil || c.SwarmResource == "" {
return nil
}
candidates := strings.Split(*c.SwarmResource, ",")
candidates := strings.Split(c.SwarmResource, ",")
var envvars []string
for _, c := range candidates {
@@ -138,3 +100,44 @@ func (c *HookConfig) getSwarmResourceEnvvars() []string {
return envvars
}
// nvidiaContainerCliCUDACompatModeFlags returns required --cuda-compat-mode
// flag(s) depending on the hook and runtime configurations.
func (c *hookConfig) nvidiaContainerCliCUDACompatModeFlags() []string {
var flag string
switch c.NVIDIAContainerRuntimeConfig.Modes.Legacy.CUDACompatMode {
case config.CUDACompatModeLdconfig:
flag = "--cuda-compat-mode=ldconfig"
case config.CUDACompatModeMount:
flag = "--cuda-compat-mode=mount"
case config.CUDACompatModeDisabled, config.CUDACompatModeHook:
flag = "--cuda-compat-mode=disabled"
default:
if !c.Features.AllowCUDACompatLibsFromContainer.IsEnabled() {
flag = "--cuda-compat-mode=disabled"
}
}
if flag == "" {
return nil
}
return []string{flag}
}
func (c *hookConfig) assertModeIsLegacy() error {
if c.NVIDIAContainerRuntimeHookConfig.SkipModeDetection {
return nil
}
mr := info.NewRuntimeModeResolver(
info.WithLogger(&logInterceptor{}),
info.WithImage(&c.containerConfig.Image),
info.WithDefaultMode(info.LegacyRuntimeMode),
)
mode := mr.ResolveRuntimeMode(c.NVIDIAContainerRuntimeConfig.Mode)
if mode == "legacy" {
return nil
}
return fmt.Errorf("invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime (e.g. specify the --runtime=nvidia flag) instead")
}

View File

@@ -22,22 +22,25 @@ import (
"testing"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
)
func TestGetHookConfig(t *testing.T) {
testCases := []struct {
lines []string
expectedPanic bool
expectedDriverCapabilities DriverCapabilities
expectedDriverCapabilities string
}{
{
expectedDriverCapabilities: allDriverCapabilities,
expectedDriverCapabilities: image.SupportedDriverCapabilities.String(),
},
{
lines: []string{
"supported-driver-capabilities = \"all\"",
},
expectedDriverCapabilities: allDriverCapabilities,
expectedDriverCapabilities: image.SupportedDriverCapabilities.String(),
},
{
lines: []string{
@@ -47,19 +50,19 @@ func TestGetHookConfig(t *testing.T) {
},
{
lines: []string{},
expectedDriverCapabilities: allDriverCapabilities,
expectedDriverCapabilities: image.SupportedDriverCapabilities.String(),
},
{
lines: []string{
"supported-driver-capabilities = \"\"",
},
expectedDriverCapabilities: none,
expectedDriverCapabilities: "",
},
{
lines: []string{
"supported-driver-capabilities = \"utility,compute\"",
"supported-driver-capabilities = \"compute,utility\"",
},
expectedDriverCapabilities: DriverCapabilities("utility,compute"),
expectedDriverCapabilities: "compute,utility",
},
}
@@ -82,14 +85,15 @@ func TestGetHookConfig(t *testing.T) {
configflag = &filename
for _, line := range tc.lines {
_, err := configFile.WriteString(fmt.Sprintf("%s\n", line))
_, err := fmt.Fprintf(configFile, "%s\n", line)
require.NoError(t, err)
}
}
var config HookConfig
var cfg *hookConfig
getHookConfig := func() {
config = getHookConfig()
c, _ := getHookConfig()
cfg = c
}
if tc.expectedPanic {
@@ -99,7 +103,7 @@ func TestGetHookConfig(t *testing.T) {
getHookConfig()
require.EqualValues(t, tc.expectedDriverCapabilities, config.SupportedDriverCapabilities)
require.EqualValues(t, tc.expectedDriverCapabilities, cfg.SupportedDriverCapabilities)
})
}
}
@@ -109,10 +113,6 @@ func TestGetSwarmResourceEnvvars(t *testing.T) {
value string
expected []string
}{
{
value: "nil",
expected: nil,
},
{
value: "",
expected: nil,
@@ -145,13 +145,10 @@ func TestGetSwarmResourceEnvvars(t *testing.T) {
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
c := &HookConfig{
SwarmResource: func() *string {
if tc.value == "nil" {
return nil
}
return &tc.value
}(),
c := &hookConfig{
Config: &config.Config{
SwarmResource: tc.value,
},
}
envvars := c.getSwarmResourceEnvvars()

View File

@@ -13,7 +13,9 @@ import (
"strings"
"syscall"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
)
@@ -36,16 +38,12 @@ func exit() {
os.Exit(0)
}
func getCLIPath(config CLIConfig) string {
if config.Path != nil {
return *config.Path
func getCLIPath(config config.ContainerCLIConfig) string {
if config.Path != "" {
return config.Path
}
var root string
if config.Root != nil {
root = *config.Root
}
if err := os.Setenv("PATH", lookup.GetPath(root)); err != nil {
if err := os.Setenv("PATH", lookup.GetPath(config.Root)); err != nil {
log.Panicln("couldn't set PATH variable:", err)
}
@@ -57,7 +55,7 @@ func getCLIPath(config CLIConfig) string {
}
// getRootfsPath returns an absolute path. We don't need to resolve symlinks for now.
func getRootfsPath(config containerConfig) string {
func getRootfsPath(config *containerConfig) string {
rootfs, err := filepath.Abs(config.Rootfs)
if err != nil {
log.Panicln(err)
@@ -71,53 +69,61 @@ func doPrestart() {
defer exit()
log.SetFlags(0)
hook := getHookConfig()
cli := hook.NvidiaContainerCLI
if !hook.NVIDIAContainerRuntimeHook.SkipModeDetection && info.ResolveAutoMode(&logInterceptor{}, hook.NVIDIAContainerRuntime.Mode) != "legacy" {
log.Panicln("invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime (e.g. specify the --runtime=nvidia flag) instead.")
hook, err := getHookConfig()
if err != nil || hook == nil {
log.Panicln("error getting hook config:", err)
}
cli := hook.NVIDIAContainerCLIConfig
container := getContainerConfig(hook)
container := hook.getContainerConfig()
nvidia := container.Nvidia
if nvidia == nil {
// Not a GPU container, nothing to do.
return
}
if err := hook.assertModeIsLegacy(); err != nil {
log.Panicf("%v", err)
}
rootfs := getRootfsPath(container)
args := []string{getCLIPath(cli)}
if cli.Root != nil {
args = append(args, fmt.Sprintf("--root=%s", *cli.Root))
if cli.Root != "" {
args = append(args, fmt.Sprintf("--root=%s", cli.Root))
}
if cli.LoadKmods {
args = append(args, "--load-kmods")
}
if hook.Features.DisableImexChannelCreation.IsEnabled() {
args = append(args, "--no-create-imex-channels")
}
if cli.NoPivot {
args = append(args, "--no-pivot")
}
if *debugflag {
args = append(args, "--debug=/dev/stderr")
} else if cli.Debug != nil {
args = append(args, fmt.Sprintf("--debug=%s", *cli.Debug))
} else if cli.Debug != "" {
args = append(args, fmt.Sprintf("--debug=%s", cli.Debug))
}
if cli.Ldcache != nil {
args = append(args, fmt.Sprintf("--ldcache=%s", *cli.Ldcache))
if cli.Ldcache != "" {
args = append(args, fmt.Sprintf("--ldcache=%s", cli.Ldcache))
}
if cli.User != nil {
args = append(args, fmt.Sprintf("--user=%s", *cli.User))
if cli.User != "" {
args = append(args, fmt.Sprintf("--user=%s", cli.User))
}
args = append(args, "configure")
if cli.Ldconfig != nil {
args = append(args, fmt.Sprintf("--ldconfig=%s", *cli.Ldconfig))
args = append(args, hook.nvidiaContainerCliCUDACompatModeFlags()...)
if ldconfigPath := cli.NormalizeLDConfigPath(); ldconfigPath != "" {
args = append(args, fmt.Sprintf("--ldconfig=%s", ldconfigPath))
}
if cli.NoCgroups {
args = append(args, "--no-cgroups")
}
if len(nvidia.Devices) > 0 {
args = append(args, fmt.Sprintf("--device=%s", nvidia.Devices))
if devicesString := strings.Join(nvidia.Devices, ","); len(devicesString) > 0 {
args = append(args, fmt.Sprintf("--device=%s", devicesString))
}
if len(nvidia.MigConfigDevices) > 0 {
args = append(args, fmt.Sprintf("--mig-config=%s", nvidia.MigConfigDevices))
@@ -125,6 +131,9 @@ func doPrestart() {
if len(nvidia.MigMonitorDevices) > 0 {
args = append(args, fmt.Sprintf("--mig-monitor=%s", nvidia.MigMonitorDevices))
}
if imexString := strings.Join(nvidia.ImexChannels, ","); len(imexString) > 0 {
args = append(args, fmt.Sprintf("--imex-channel=%s", imexString))
}
for _, cap := range strings.Split(nvidia.DriverCapabilities, ",") {
if len(cap) == 0 {
@@ -133,16 +142,15 @@ func doPrestart() {
args = append(args, capabilityToCLI(cap))
}
if !hook.DisableRequire && !nvidia.DisableRequire {
for _, req := range nvidia.Requirements {
args = append(args, fmt.Sprintf("--require=%s", req))
}
for _, req := range nvidia.Requirements {
args = append(args, fmt.Sprintf("--require=%s", req))
}
args = append(args, fmt.Sprintf("--pid=%s", strconv.FormatUint(uint64(container.Pid), 10)))
args = append(args, rootfs)
env := append(os.Environ(), cli.Environment...)
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection?
err = syscall.Exec(args[0], args, env)
log.Panicln("exec failed:", err)
}
@@ -185,11 +193,11 @@ func main() {
}
}
// logInterceptor implements the info.Logger interface to allow for logging from this function.
type logInterceptor struct{}
// logInterceptor implements the logger.Interface to allow for logging from executable.
type logInterceptor struct {
logger.NullLogger
}
func (l *logInterceptor) Infof(format string, args ...interface{}) {
log.Printf(format, args...)
}
func (l *logInterceptor) Debugf(format string, args ...interface{}) {}

View File

@@ -21,8 +21,8 @@ The `runtimes` config option allows for the low-level runtime to be specified. T
The default value for this setting is:
```toml
runtimes = [
"docker-runc",
"runc",
"crun",
]
```
@@ -85,3 +85,126 @@ Alternatively the NVIDIA Container Runtime can be set as the default runtime for
}
}
```
## Environment variables (OCI spec)
Each environment variable maps to an command-line argument for `nvidia-container-cli` from [libnvidia-container](https://github.com/NVIDIA/libnvidia-container).
These variables are already set in our [official CUDA images](https://hub.docker.com/r/nvidia/cuda/).
### `NVIDIA_VISIBLE_DEVICES`
This variable controls which GPUs will be made accessible inside the container.
#### Possible values
* `0,1,2`, `GPU-fef8089b` …: a comma-separated list of GPU UUID(s) or index(es).
* `all`: all GPUs will be accessible, this is the default value in our container images.
* `none`: no GPU will be accessible, but driver capabilities will be enabled.
* `void` or *empty* or *unset*: `nvidia-container-runtime` will have the same behavior as `runc`.
**Note**: When running on a MIG capable device, the following values will also be available:
* `0:0,0:1,1:0`, `MIG-GPU-fef8089b/0/1` …: a comma-separated list of MIG Device UUID(s) or index(es).
Where the MIG device indices have the form `<GPU Device Index>:<MIG Device Index>` as seen in the example output:
```
$ nvidia-smi -L
GPU 0: Graphics Device (UUID: GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5)
MIG Device 0: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/0)
MIG Device 1: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/1)
MIG Device 2: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/11/0)
```
### `NVIDIA_MIG_CONFIG_DEVICES`
This variable controls which of the visible GPUs can have their MIG
configuration managed from within the container. This includes enabling and
disabling MIG mode, creating and destroying GPU Instances and Compute
Instances, etc.
#### Possible values
* `all`: Allow all MIG-capable GPUs in the visible device list to have their
MIG configurations managed.
**Note**:
* This feature is only available on MIG capable devices (e.g. the A100).
* To use this feature, the container must be started with `CAP_SYS_ADMIN` privileges.
* When not running as `root`, the container user must have read access to the
`/proc/driver/nvidia/capabilities/mig/config` file on the host.
### `NVIDIA_MIG_MONITOR_DEVICES`
This variable controls which of the visible GPUs can have aggregate information
about all of their MIG devices monitored from within the container. This
includes inspecting the aggregate memory usage, listing the aggregate running
processes, etc.
#### Possible values
* `all`: Allow all MIG-capable GPUs in the visible device list to have their
MIG devices monitored.
**Note**:
* This feature is only available on MIG capable devices (e.g. the A100).
* To use this feature, the container must be started with `CAP_SYS_ADMIN` privileges.
* When not running as `root`, the container user must have read access to the
`/proc/driver/nvidia/capabilities/mig/monitor` file on the host.
### `NVIDIA_DRIVER_CAPABILITIES`
This option controls which driver libraries/binaries will be mounted inside the container.
#### Possible values
* `compute,video`, `graphics,utility` …: a comma-separated list of driver features the container needs.
* `all`: enable all available driver capabilities.
* *empty* or *unset*: use default driver capability: `utility,compute`.
#### Supported driver capabilities
* `compute`: required for CUDA and OpenCL applications.
* `compat32`: required for running 32-bit applications.
* `graphics`: required for running OpenGL and Vulkan applications.
* `utility`: required for using `nvidia-smi` and NVML.
* `video`: required for using the Video Codec SDK.
* `display`: required for leveraging X11 display.
### `NVIDIA_REQUIRE_*`
A logical expression to define constraints on the configurations supported by the container.
#### Supported constraints
* `cuda`: constraint on the CUDA driver version.
* `driver`: constraint on the driver version.
* `arch`: constraint on the compute architectures of the selected GPUs.
* `brand`: constraint on the brand of the selected GPUs (e.g. GeForce, Tesla, GRID).
#### Expressions
Multiple constraints can be expressed in a single environment variable: space-separated constraints are ORed, comma-separated constraints are ANDed.
Multiple environment variables of the form `NVIDIA_REQUIRE_*` are ANDed together.
### `NVIDIA_DISABLE_REQUIRE`
Single switch to disable all the constraints of the form `NVIDIA_REQUIRE_*`.
### `NVIDIA_REQUIRE_CUDA`
The version of the CUDA toolkit used by the container. It is an instance of the generic `NVIDIA_REQUIRE_*` case and it is set by official CUDA images.
If the version of the NVIDIA driver is insufficient to run this version of CUDA, the container will not be started.
#### Possible values
* `cuda>=7.5`, `cuda>=8.0`, `cuda>=9.0` …: any valid CUDA version in the form `major.minor`.
### `CUDA_VERSION`
Similar to `NVIDIA_REQUIRE_CUDA`, for legacy CUDA images.
In addition, if `NVIDIA_REQUIRE_CUDA` is not set, `NVIDIA_VISIBLE_DEVICES` and `NVIDIA_DRIVER_CAPABILITIES` will default to `all`.
## Usage example
**NOTE:** The use of the `nvidia-container-runtime` as CLI replacement for `runc` is uncommon and is only provided for completeness.
Although the `nvidia-container-runtime` is typically configured as a replacement for `runc` or `crun` in various container engines, it can also be
invoked from the command line as `runc` would. For example:
```sh
# Setup a rootfs based on Ubuntu 16.04
cd $(mktemp -d) && mkdir rootfs
curl -sS http://cdimage.ubuntu.com/ubuntu-base/releases/16.04/release/ubuntu-base-16.04.6-base-amd64.tar.gz | tar --exclude 'dev/*' -C rootfs -xz
# Create an OCI runtime spec
nvidia-container-runtime spec
sed -i 's;"sh";"nvidia-smi";' config.json
sed -i 's;\("TERM=xterm"\);\1, "NVIDIA_VISIBLE_DEVICES=0";' config.json
# Run the container
sudo nvidia-container-runtime run nvidia_smi
```

View File

@@ -3,26 +3,28 @@ package main
import (
"bytes"
"encoding/json"
"io/ioutil"
"io"
"log"
"os"
"os/exec"
"path/filepath"
"strings"
"testing"
"github.com/opencontainers/runtime-spec/specs-go"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/internal/modifier"
"github.com/NVIDIA/nvidia-container-toolkit/internal/test"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/sirupsen/logrus"
"github.com/stretchr/testify/require"
)
const (
nvidiaRuntime = "nvidia-container-runtime"
nvidiaHook = "nvidia-container-runtime-hook"
bundlePathSuffix = "test/output/bundle/"
bundlePathSuffix = "tests/output/bundle/"
specFile = "config.json"
unmodifiedSpecFileSuffix = "test/input/test_spec.json"
unmodifiedSpecFileSuffix = "tests/input/test_spec.json"
)
const (
@@ -42,10 +44,10 @@ func TestMain(m *testing.M) {
var err error
moduleRoot, err := test.GetModuleRoot()
if err != nil {
logrus.Fatalf("error in test setup: could not get module root: %v", err)
log.Fatalf("error in test setup: could not get module root: %v", err)
}
testBinPath := filepath.Join(moduleRoot, "test", "bin")
testInputPath := filepath.Join(moduleRoot, "test", "input")
testBinPath := filepath.Join(moduleRoot, "tests", "bin")
testInputPath := filepath.Join(moduleRoot, "tests", "input")
// Set the environment variables for the test
os.Setenv("PATH", test.PrependToPath(testBinPath, moduleRoot))
@@ -54,11 +56,11 @@ func TestMain(m *testing.M) {
// Confirm that the environment is configured correctly
runcPath, err := exec.LookPath(runcExecutableName)
if err != nil || filepath.Join(testBinPath, runcExecutableName) != runcPath {
logrus.Fatalf("error in test setup: mock runc path set incorrectly in TestMain(): %v", err)
log.Fatalf("error in test setup: mock runc path set incorrectly in TestMain(): %v", err)
}
hookPath, err := exec.LookPath(nvidiaHook)
if err != nil || filepath.Join(testBinPath, nvidiaHook) != hookPath {
logrus.Fatalf("error in test setup: mock hook path set incorrectly in TestMain(): %v", err)
log.Fatalf("error in test setup: mock hook path set incorrectly in TestMain(): %v", err)
}
// Store the root and binary paths in the test Config
@@ -85,6 +87,7 @@ func TestBadInput(t *testing.T) {
t.Fatal(err)
}
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
cmdCreate := exec.Command(nvidiaRuntime, "create", "--bundle")
t.Logf("executing: %s\n", strings.Join(cmdCreate.Args, " "))
err = cmdCreate.Run()
@@ -102,6 +105,7 @@ func TestGoodInput(t *testing.T) {
t.Fatalf("error generating runtime spec: %v", err)
}
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
cmdRun := exec.Command(nvidiaRuntime, "run", "--bundle", cfg.bundlePath(), "testcontainer")
t.Logf("executing: %s\n", strings.Join(cmdRun.Args, " "))
output, err := cmdRun.CombinedOutput()
@@ -112,16 +116,16 @@ func TestGoodInput(t *testing.T) {
require.NoError(t, err, "should be no errors when reading and parsing spec from config.json")
require.Empty(t, spec.Hooks, "there should be no hooks in config.json")
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
cmdCreate := exec.Command(nvidiaRuntime, "create", "--bundle", cfg.bundlePath(), "testcontainer")
t.Logf("executing: %s\n", strings.Join(cmdCreate.Args, " "))
err = cmdCreate.Run()
require.NoError(t, err, "runtime should not return an error")
// Check config.json for NVIDIA prestart hook
// Check config.json to ensure that the NVIDIA prestart was not inserted.
spec, err = cfg.getRuntimeSpec()
require.NoError(t, err, "should be no errors when reading and parsing spec from config.json")
require.NotEmpty(t, spec.Hooks, "there should be hooks in config.json")
require.Equal(t, 1, nvidiaHookCount(spec.Hooks), "exactly one nvidia prestart hook should be inserted correctly into config.json")
require.Empty(t, spec.Hooks, "there should be no hooks in config.json")
}
// NVIDIA prestart hook already present in config file
@@ -157,22 +161,23 @@ func TestDuplicateHook(t *testing.T) {
}
// Test how runtime handles already existing prestart hook in config.json
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
cmdCreate := exec.Command(nvidiaRuntime, "create", "--bundle", cfg.bundlePath(), "testcontainer")
t.Logf("executing: %s\n", strings.Join(cmdCreate.Args, " "))
output, err := cmdCreate.CombinedOutput()
require.NoErrorf(t, err, "runtime should not return an error", "output=%v", string(output))
// Check config.json for NVIDIA prestart hook
// Check config.json to ensure that the NVIDIA prestart hook was removed.
spec, err = cfg.getRuntimeSpec()
require.NoError(t, err, "should be no errors when reading and parsing spec from config.json")
require.NotEmpty(t, spec.Hooks, "there should be hooks in config.json")
require.Equal(t, 1, nvidiaHookCount(spec.Hooks), "exactly one nvidia prestart hook should be inserted correctly into config.json")
require.Empty(t, spec.Hooks, "there should be no hooks in config.json")
}
// addNVIDIAHook is a basic wrapper for an addHookModifier that is used for
// testing.
func addNVIDIAHook(spec *specs.Spec) error {
m := modifier.NewStableRuntimeModifier(logrus.StandardLogger())
logger, _ := testlog.NewNullLogger()
m := modifier.NewStableRuntimeModifier(logger, nvidiaHook)
return m.Modify(spec)
}
@@ -186,15 +191,16 @@ func (c testConfig) getRuntimeSpec() (specs.Spec, error) {
}
defer jsonFile.Close()
jsonContent, err := ioutil.ReadAll(jsonFile)
if err != nil {
jsonContent, err := io.ReadAll(jsonFile)
switch {
case err != nil:
return spec, err
} else if json.Valid(jsonContent) {
case json.Valid(jsonContent):
err = json.Unmarshal(jsonContent, &spec)
if err != nil {
return spec, err
}
} else {
default:
err = json.NewDecoder(bytes.NewReader(jsonContent)).Decode(&spec)
if err != nil {
return spec, err
@@ -224,6 +230,7 @@ func (c testConfig) generateNewRuntimeSpec() error {
return err
}
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
cmd := exec.Command("cp", c.unmodifiedSpecFile(), c.specFilePath())
err = cmd.Run()
if err != nil {
@@ -231,18 +238,3 @@ func (c testConfig) generateNewRuntimeSpec() error {
}
return nil
}
// Return number of valid NVIDIA prestart hooks in runtime spec
func nvidiaHookCount(hooks *specs.Hooks) int {
if hooks == nil {
return 0
}
count := 0
for _, hook := range hooks.Prestart {
if strings.Contains(hook.Path, nvidiaHook) {
count++
}
}
return count
}

View File

@@ -15,28 +15,23 @@ docker setup \
/run/nvidia/toolkit
```
Configure the `nvidia-container-runtime` as a docker runtime named `NAME`. If the `--runtime-name` flag is not specified, this runtime would be called `nvidia`. A runtime named `nvidia-experimental` will also be configured using the `nvidia-container-runtime.experimental` OCI-compliant runtime shim.
Configure the `nvidia-container-runtime` as a docker runtime named `NAME`. If the `--runtime-name` flag is not specified, this runtime would be called `nvidia`.
Since `--set-as-default` is enabled by default, the specified runtime name will also be set as the default docker runtime. This can be disabled by explicityly specifying `--set-as-default=false`.
**Note**: If `--runtime-name` is specified as `nvidia-experimental` explicitly, the `nvidia-experimental` runtime will be configured as the default runtime, with the `nvidia` runtime still configured and available for use.
The following table describes the behaviour for different `--runtime-name` and `--set-as-default` flag combinations.
| Flags | Installed Runtimes | Default Runtime |
|-------------------------------------------------------------|:--------------------------------|:----------------------|
| **NONE SPECIFIED** | `nvidia`, `nvidia-experimental` | `nvidia` |
| `--runtime-name nvidia` | `nvidia`, `nvidia-experimental` | `nvidia` |
| `--runtime-name NAME` | `NAME`, `nvidia-experimental` | `NAME` |
| `--runtime-name nvidia-experimental` | `nvidia`, `nvidia-experimental` | `nvidia-experimental` |
| `--set-as-default` | `nvidia`, `nvidia-experimental` | `nvidia` |
| `--set-as-default --runtime-name nvidia` | `nvidia`, `nvidia-experimental` | `nvidia` |
| `--set-as-default --runtime-name NAME` | `NAME`, `nvidia-experimental` | `NAME` |
| `--set-as-default --runtime-name nvidia-experimental` | `nvidia`, `nvidia-experimental` | `nvidia-experimental` |
| `--set-as-default=false` | `nvidia`, `nvidia-experimental` | **NOT SET** |
| `--set-as-default=false --runtime-name NAME` | `NAME`, `nvidia-experimental` | **NOT SET** |
| `--set-as-default=false --runtime-name nvidia` | `nvidia`, `nvidia-experimental` | **NOT SET** |
| `--set-as-default=false --runtime-name nvidia-experimental` | `nvidia`, `nvidia-experimental` | **NOT SET** |
| **NONE SPECIFIED** | `nvidia` | `nvidia` |
| `--runtime-name nvidia` | `nvidia` | `nvidia` |
| `--runtime-name NAME` | `NAME` | `NAME` |
| `--set-as-default` | `nvidia` | `nvidia` |
| `--set-as-default --runtime-name nvidia` | `nvidia` | `nvidia` |
| `--set-as-default --runtime-name NAME` | `NAME` | `NAME` |
| `--set-as-default=false` | `nvidia` | **NOT SET** |
| `--set-as-default=false --runtime-name NAME` | `NAME` | **NOT SET** |
| `--set-as-default=false --runtime-name nvidia` | `nvidia` | **NOT SET** |
These combinations also hold for the environment variables that map to the command line flags: `DOCKER_RUNTIME_NAME`, `DOCKER_SET_AS_DEFAULT`.
@@ -48,7 +43,7 @@ containerd setup \
/run/nvidia/toolkit
```
Configure the `nvidia-container-runtime` as a runtime class named `NAME`. If the `--runtime-class` flag is not specified, this runtime would be called `nvidia`. A runtime class named `nvidia-experimental` will also be configured using the `nvidia-container-runtime.experimental` OCI-compliant runtime shim.
Configure the `nvidia-container-runtime` as a runtime class named `NAME`. If the `--runtime-class` flag is not specified, this runtime would be called `nvidia`.
Adding the `--set-as-default` flag as follows:
```bash
@@ -59,19 +54,15 @@ containerd setup \
```
will set the runtime class `NAME` (or `nvidia` if not specified) as the default runtime class.
**Note**: If `--runtime-class` is specified as `nvidia-experimental` explicitly and `--set-as-default` is specified, the `nvidia-experimental` runtime will be configured as the default runtime class, with the `nvidia` runtime class still configured and available for use.
The following table describes the behaviour for different `--runtime-class` and `--set-as-default` flag combinations.
| Flags | Installed Runtime Classes | Default Runtime Class |
|--------------------------------------------------------|:--------------------------------|:----------------------|
| **NONE SPECIFIED** | `nvidia`, `nvidia-experimental` | **NOT SET** |
| `--runtime-class NAME` | `NAME`, `nvidia-experimental` | **NOT SET** |
| `--runtime-class nvidia` | `nvidia`, `nvidia-experimental` | **NOT SET** |
| `--runtime-class nvidia-experimental` | `nvidia`, `nvidia-experimental` | **NOT SET** |
| `--set-as-default` | `nvidia`, `nvidia-experimental` | `nvidia` |
| `--set-as-default --runtime-class NAME` | `NAME`, `nvidia-experimental` | `NAME` |
| `--set-as-default --runtime-class nvidia` | `nvidia`, `nvidia-experimental` | `nvidia` |
| `--set-as-default --runtime-class nvidia-experimental` | `nvidia`, `nvidia-experimental` | `nvidia-experimental` |
| **NONE SPECIFIED** | `nvidia` | **NOT SET** |
| `--runtime-class NAME` | `NAME` | **NOT SET** |
| `--runtime-class nvidia` | `nvidia` | **NOT SET** |
| `--set-as-default` | `nvidia` | `nvidia` |
| `--set-as-default --runtime-class NAME` | `NAME` | `NAME` |
| `--set-as-default --runtime-class nvidia` | `nvidia` | `nvidia` |
These combinations also hold for the environment variables that map to the command line flags.

View File

@@ -0,0 +1,182 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package container
import (
"fmt"
"os"
"os/exec"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/operator"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine"
)
const (
restartModeNone = "none"
restartModeSignal = "signal"
restartModeSystemd = "systemd"
)
// Options defines the shared options for the CLIs to configure containers runtimes.
type Options struct {
Config string
Socket string
// ExecutablePath specifies the path to the container runtime executable.
// This is used to extract the current config, for example.
// If a HostRootMount is specified, this path is relative to the host root
// mount.
ExecutablePath string
// EnabledCDI indicates whether CDI should be enabled.
EnableCDI bool
RuntimeName string
RuntimeDir string
SetAsDefault bool
RestartMode string
HostRootMount string
}
// ParseArgs parses the command line arguments to the CLI
func ParseArgs(c *cli.Context, o *Options) error {
if o.RuntimeDir != "" {
logrus.Debug("Runtime directory already set; ignoring arguments")
return nil
}
args := c.Args()
logrus.Infof("Parsing arguments: %v", args.Slice())
if c.NArg() != 1 {
return fmt.Errorf("incorrect number of arguments")
}
o.RuntimeDir = args.Get(0)
logrus.Infof("Successfully parsed arguments")
return nil
}
// Configure applies the options to the specified config
func (o Options) Configure(cfg engine.Interface) error {
err := o.UpdateConfig(cfg)
if err != nil {
return fmt.Errorf("unable to update config: %v", err)
}
return o.flush(cfg)
}
// Unconfigure removes the options from the specified config
func (o Options) Unconfigure(cfg engine.Interface) error {
err := o.RevertConfig(cfg)
if err != nil {
return fmt.Errorf("unable to update config: %v", err)
}
return o.flush(cfg)
}
// flush flushes the specified config to disk
func (o Options) flush(cfg engine.Interface) error {
logrus.Infof("Flushing config to %v", o.Config)
n, err := cfg.Save(o.Config)
if err != nil {
return fmt.Errorf("unable to flush config: %v", err)
}
if n == 0 {
logrus.Infof("Config file is empty, removed")
}
return nil
}
// UpdateConfig updates the specified config to include the nvidia runtimes
func (o Options) UpdateConfig(cfg engine.Interface) error {
runtimes := operator.GetRuntimes(
operator.WithNvidiaRuntimeName(o.RuntimeName),
operator.WithSetAsDefault(o.SetAsDefault),
operator.WithRoot(o.RuntimeDir),
)
for name, runtime := range runtimes {
err := cfg.AddRuntime(name, runtime.Path, runtime.SetAsDefault)
if err != nil {
return fmt.Errorf("failed to update runtime %q: %v", name, err)
}
}
if o.EnableCDI {
cfg.EnableCDI()
}
return nil
}
// RevertConfig reverts the specified config to remove the nvidia runtimes
func (o Options) RevertConfig(cfg engine.Interface) error {
runtimes := operator.GetRuntimes(
operator.WithNvidiaRuntimeName(o.RuntimeName),
operator.WithSetAsDefault(o.SetAsDefault),
operator.WithRoot(o.RuntimeDir),
)
for name := range runtimes {
err := cfg.RemoveRuntime(name)
if err != nil {
return fmt.Errorf("failed to remove runtime %q: %v", name, err)
}
}
return nil
}
// Restart restarts the specified service
func (o Options) Restart(service string, withSignal func(string) error) error {
switch o.RestartMode {
case restartModeNone:
logrus.Warningf("Skipping restart of %v due to --restart-mode=%v", service, o.RestartMode)
return nil
case restartModeSignal:
return withSignal(o.Socket)
case restartModeSystemd:
return o.SystemdRestart(service)
}
return fmt.Errorf("invalid restart mode specified: %v", o.RestartMode)
}
// SystemdRestart restarts the specified service using systemd
func (o Options) SystemdRestart(service string) error {
var args []string
var msg string
if o.HostRootMount != "" {
msg = " on host"
args = append(args, "chroot", o.HostRootMount)
}
args = append(args, "systemctl", "restart", service)
logrus.Infof("Restarting %v%v using systemd: %v", service, msg, args)
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
cmd := exec.Command(args[0], args[1:]...)
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
err := cmd.Run()
if err != nil {
return fmt.Errorf("error restarting %v using systemd: %v", service, err)
}
return nil
}

View File

@@ -19,21 +19,18 @@ package operator
import "path/filepath"
const (
defaultRuntimeName = "nvidia"
experimentalRuntimeName = "nvidia-experimental"
defaultRoot = "/usr/bin"
defaultRuntimeName = "nvidia"
)
// Runtime defines a runtime to be configured.
// The path and whether the runtime is the default runtime can be specfied
// The path and whether the runtime is the default runtime can be specified
type Runtime struct {
name string
Path string
SetAsDefault bool
}
// Runtimes defines a set of runtimes to be configure for use in the GPU Operator
// Runtimes defines a set of runtimes to be configured for use in the GPU Operator
type Runtimes map[string]Runtime
type config struct {
@@ -49,9 +46,6 @@ func GetRuntimes(opts ...Option) Runtimes {
opt(c)
}
if c.root == "" {
c.root = defaultRoot
}
if c.nvidiaRuntimeName == "" {
c.nvidiaRuntimeName = defaultRuntimeName
}
@@ -59,7 +53,7 @@ func GetRuntimes(opts ...Option) Runtimes {
runtimes := make(Runtimes)
runtimes.add(c.nvidiaRuntime())
modes := []string{"experimental", "cdi", "legacy"}
modes := []string{"cdi", "legacy"}
for _, mode := range modes {
runtimes.add(c.modeRuntime(mode))
}
@@ -77,17 +71,16 @@ func (r Runtimes) DefaultRuntimeName() string {
}
// Add a runtime to the set of runtimes.
func (r *Runtimes) add(runtime Runtime) {
(*r)[runtime.name] = runtime
func (r Runtimes) add(runtime Runtime) {
r[runtime.name] = runtime
}
// nvidiaRuntime creates a runtime that corresponds to the nvidia runtime.
// If name is equal to one of the predefined runtimes, `nvidia` is used as the runtime name instead.
func (c config) nvidiaRuntime() Runtime {
predefinedRuntimes := map[string]struct{}{
"nvidia-experimental": {},
"nvidia-cdi": {},
"nvidia-legacy": {},
"nvidia-cdi": {},
"nvidia-legacy": {},
}
name := c.nvidiaRuntimeName
if _, isPredefinedRuntime := predefinedRuntimes[name]; isPredefinedRuntime {

View File

@@ -27,7 +27,6 @@ func TestOptions(t *testing.T) {
testCases := []struct {
setAsDefault bool
nvidiaRuntimeName string
root string
expectedDefaultRuntime string
expectedRuntimes Runtimes
}{
@@ -37,10 +36,6 @@ func TestOptions(t *testing.T) {
name: "nvidia",
Path: "/usr/bin/nvidia-container-runtime",
},
"nvidia-experimental": Runtime{
name: "nvidia-experimental",
Path: "/usr/bin/nvidia-container-runtime.experimental",
},
"nvidia-cdi": Runtime{
name: "nvidia-cdi",
Path: "/usr/bin/nvidia-container-runtime.cdi",
@@ -60,10 +55,6 @@ func TestOptions(t *testing.T) {
Path: "/usr/bin/nvidia-container-runtime",
SetAsDefault: true,
},
"nvidia-experimental": Runtime{
name: "nvidia-experimental",
Path: "/usr/bin/nvidia-container-runtime.experimental",
},
"nvidia-cdi": Runtime{
name: "nvidia-cdi",
Path: "/usr/bin/nvidia-container-runtime.cdi",
@@ -84,10 +75,6 @@ func TestOptions(t *testing.T) {
Path: "/usr/bin/nvidia-container-runtime",
SetAsDefault: true,
},
"nvidia-experimental": Runtime{
name: "nvidia-experimental",
Path: "/usr/bin/nvidia-container-runtime.experimental",
},
"nvidia-cdi": Runtime{
name: "nvidia-cdi",
Path: "/usr/bin/nvidia-container-runtime.cdi",
@@ -108,10 +95,6 @@ func TestOptions(t *testing.T) {
Path: "/usr/bin/nvidia-container-runtime",
SetAsDefault: true,
},
"nvidia-experimental": Runtime{
name: "nvidia-experimental",
Path: "/usr/bin/nvidia-container-runtime.experimental",
},
"nvidia-cdi": Runtime{
name: "nvidia-cdi",
Path: "/usr/bin/nvidia-container-runtime.cdi",
@@ -130,56 +113,6 @@ func TestOptions(t *testing.T) {
name: "NAME",
Path: "/usr/bin/nvidia-container-runtime",
},
"nvidia-experimental": Runtime{
name: "nvidia-experimental",
Path: "/usr/bin/nvidia-container-runtime.experimental",
},
"nvidia-cdi": Runtime{
name: "nvidia-cdi",
Path: "/usr/bin/nvidia-container-runtime.cdi",
},
"nvidia-legacy": Runtime{
name: "nvidia-legacy",
Path: "/usr/bin/nvidia-container-runtime.legacy",
},
},
},
{
setAsDefault: true,
nvidiaRuntimeName: "nvidia-experimental",
expectedDefaultRuntime: "nvidia-experimental",
expectedRuntimes: Runtimes{
"nvidia": Runtime{
name: "nvidia",
Path: "/usr/bin/nvidia-container-runtime",
},
"nvidia-experimental": Runtime{
name: "nvidia-experimental",
Path: "/usr/bin/nvidia-container-runtime.experimental",
SetAsDefault: true,
},
"nvidia-cdi": Runtime{
name: "nvidia-cdi",
Path: "/usr/bin/nvidia-container-runtime.cdi",
},
"nvidia-legacy": Runtime{
name: "nvidia-legacy",
Path: "/usr/bin/nvidia-container-runtime.legacy",
},
},
},
{
setAsDefault: false,
nvidiaRuntimeName: "nvidia-experimental",
expectedRuntimes: Runtimes{
"nvidia": Runtime{
name: "nvidia",
Path: "/usr/bin/nvidia-container-runtime",
},
"nvidia-experimental": Runtime{
name: "nvidia-experimental",
Path: "/usr/bin/nvidia-container-runtime.experimental",
},
"nvidia-cdi": Runtime{
name: "nvidia-cdi",
Path: "/usr/bin/nvidia-container-runtime.cdi",
@@ -197,7 +130,7 @@ func TestOptions(t *testing.T) {
runtimes := GetRuntimes(
WithNvidiaRuntimeName(tc.nvidiaRuntimeName),
WithSetAsDefault(tc.setAsDefault),
WithRoot(tc.root),
WithRoot("/usr/bin"),
)
require.EqualValues(t, tc.expectedRuntimes, runtimes)

View File

@@ -14,24 +14,28 @@
# limitations under the License.
*/
package main
package containerd
import (
"fmt"
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/engine/containerd"
"github.com/pelletier/go-toml"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/containerd"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/toml"
)
func TestUpdateV1ConfigDefaultRuntime(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
legacyConfig bool
setAsDefault bool
runtimeClass string
runtimeName string
expectedDefaultRuntimeName interface{}
expectedDefaultRuntimeBinary interface{}
}{
@@ -51,17 +55,10 @@ func TestUpdateV1ConfigDefaultRuntime(t *testing.T) {
{
legacyConfig: true,
setAsDefault: true,
runtimeClass: "NAME",
runtimeName: "NAME",
expectedDefaultRuntimeName: nil,
expectedDefaultRuntimeBinary: "/test/runtime/dir/nvidia-container-runtime",
},
{
legacyConfig: true,
setAsDefault: true,
runtimeClass: "nvidia-experimental",
expectedDefaultRuntimeName: nil,
expectedDefaultRuntimeBinary: "/test/runtime/dir/nvidia-container-runtime.experimental",
},
{
legacyConfig: false,
setAsDefault: false,
@@ -77,72 +74,64 @@ func TestUpdateV1ConfigDefaultRuntime(t *testing.T) {
{
legacyConfig: false,
setAsDefault: true,
runtimeClass: "NAME",
runtimeName: "NAME",
expectedDefaultRuntimeName: "NAME",
expectedDefaultRuntimeBinary: nil,
},
{
legacyConfig: false,
setAsDefault: true,
runtimeClass: "nvidia-experimental",
expectedDefaultRuntimeName: "nvidia-experimental",
expectedDefaultRuntimeBinary: nil,
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &options{
useLegacyConfig: tc.legacyConfig,
setAsDefault: tc.setAsDefault,
runtimeClass: tc.runtimeClass,
runtimeType: runtimeType,
runtimeDir: runtimeDir,
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
SetAsDefault: tc.setAsDefault,
}
config, err := toml.TreeFromMap(map[string]interface{}{})
require.NoError(t, err, "%d: %v", i, tc)
v1, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.Empty),
containerd.WithRuntimeType(runtimeType),
containerd.WithUseLegacyConfig(tc.legacyConfig),
containerd.WithConfigVersion(1),
)
require.NoError(t, err)
v1 := &containerd.ConfigV1{
Tree: config,
UseDefaultRuntimeName: !tc.legacyConfig,
RuntimeType: runtimeType,
}
err = o.UpdateConfig(v1)
require.NoError(t, err)
err = UpdateConfig(v1, o)
require.NoError(t, err, "%d: %v", i, tc)
cfg := v1.(*containerd.ConfigV1)
defaultRuntimeName := cfg.GetPath([]string{"plugins", "cri", "containerd", "default_runtime_name"})
require.EqualValues(t, tc.expectedDefaultRuntimeName, defaultRuntimeName)
defaultRuntimeName := v1.GetPath([]string{"plugins", "cri", "containerd", "default_runtime_name"})
require.EqualValues(t, tc.expectedDefaultRuntimeName, defaultRuntimeName, "%d: %v", i, tc)
defaultRuntime := v1.GetPath([]string{"plugins", "cri", "containerd", "default_runtime"})
defaultRuntime := cfg.GetPath([]string{"plugins", "cri", "containerd", "default_runtime"})
if tc.expectedDefaultRuntimeBinary == nil {
require.Nil(t, defaultRuntime, "%d: %v", i, tc)
require.Nil(t, defaultRuntime)
} else {
require.NotNil(t, defaultRuntime)
expected, err := defaultRuntimeTomlConfigV1(tc.expectedDefaultRuntimeBinary.(string))
require.NoError(t, err, "%d: %v", i, tc)
require.NoError(t, err)
configContents, _ := toml.Marshal(defaultRuntime.(*toml.Tree))
expectedContents, _ := toml.Marshal(expected)
require.Equal(t, string(expectedContents), string(configContents), "%d: %v: %v", i, tc)
}
})
}
}
func TestUpdateV1Config(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
runtimeClass string
runtimeName string
expectedConfig map[string]interface{}
}{
{
runtimeClass: "nvidia",
runtimeName: "nvidia",
expectedConfig: map[string]interface{}{
"version": int64(1),
"plugins": map[string]interface{}{
@@ -160,17 +149,6 @@ func TestUpdateV1Config(t *testing.T) {
"Runtime": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-experimental": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.experimental",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.experimental",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
@@ -200,7 +178,7 @@ func TestUpdateV1Config(t *testing.T) {
},
},
{
runtimeClass: "NAME",
runtimeName: "NAME",
expectedConfig: map[string]interface{}{
"version": int64(1),
"plugins": map[string]interface{}{
@@ -218,75 +196,6 @@ func TestUpdateV1Config(t *testing.T) {
"Runtime": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-experimental": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.experimental",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.experimental",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.cdi",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.cdi",
},
},
"nvidia-legacy": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.legacy",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.legacy",
},
},
},
},
},
},
},
},
{
runtimeClass: "nvidia-experimental",
expectedConfig: map[string]interface{}{
"version": int64(1),
"plugins": map[string]interface{}{
"cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
"Runtime": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-experimental": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.experimental",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.experimental",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
@@ -319,42 +228,41 @@ func TestUpdateV1Config(t *testing.T) {
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &options{
runtimeClass: tc.runtimeClass,
runtimeType: runtimeType,
runtimeDir: runtimeDir,
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
}
config, err := toml.TreeFromMap(map[string]interface{}{})
v1, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.Empty),
containerd.WithRuntimeType(runtimeType),
containerd.WithConfigVersion(1),
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
v1 := &containerd.ConfigV1{
Tree: config,
UseDefaultRuntimeName: true,
RuntimeType: runtimeType,
ContainerAnnotations: []string{"cdi.k8s.io/*"},
}
err = UpdateConfig(v1, o)
err = o.UpdateConfig(v1)
require.NoError(t, err)
expected, err := toml.TreeFromMap(tc.expectedConfig)
require.NoError(t, err)
require.Equal(t, expected.String(), config.String())
require.Equal(t, expected.String(), v1.String())
})
}
}
func TestUpdateV1ConfigWithRuncPresent(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
runtimeClass string
runtimeName string
expectedConfig map[string]interface{}
}{
{
runtimeClass: "nvidia",
runtimeName: "nvidia",
expectedConfig: map[string]interface{}{
"version": int64(1),
"plugins": map[string]interface{}{
@@ -383,18 +291,6 @@ func TestUpdateV1ConfigWithRuncPresent(t *testing.T) {
"Runtime": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-experimental": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.experimental",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.experimental",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
@@ -426,7 +322,7 @@ func TestUpdateV1ConfigWithRuncPresent(t *testing.T) {
},
},
{
runtimeClass: "NAME",
runtimeName: "NAME",
expectedConfig: map[string]interface{}{
"version": int64(1),
"plugins": map[string]interface{}{
@@ -455,90 +351,6 @@ func TestUpdateV1ConfigWithRuncPresent(t *testing.T) {
"Runtime": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-experimental": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.experimental",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.experimental",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.cdi",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.cdi",
},
},
"nvidia-legacy": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.legacy",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.legacy",
},
},
},
},
},
},
},
},
{
runtimeClass: "nvidia-experimental",
expectedConfig: map[string]interface{}{
"version": int64(1),
"plugins": map[string]interface{}{
"cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"runc": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/runc-binary",
},
},
"nvidia": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
"Runtime": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-experimental": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.experimental",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.experimental",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
@@ -573,34 +385,78 @@ func TestUpdateV1ConfigWithRuncPresent(t *testing.T) {
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &options{
runtimeClass: tc.runtimeClass,
runtimeType: runtimeType,
runtimeDir: runtimeDir,
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
}
config, err := toml.TreeFromMap(runcConfigMapV1("/runc-binary"))
v1, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.FromMap(runcConfigMapV1("/runc-binary"))),
containerd.WithRuntimeType(runtimeType),
containerd.WithConfigVersion(1),
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
v1 := &containerd.ConfigV1{
Tree: config,
UseDefaultRuntimeName: true,
RuntimeType: runtimeType,
ContainerAnnotations: []string{"cdi.k8s.io/*"},
}
err = UpdateConfig(v1, o)
err = o.UpdateConfig(v1)
require.NoError(t, err)
expected, err := toml.TreeFromMap(tc.expectedConfig)
require.NoError(t, err)
require.Equal(t, expected.String(), config.String())
require.Equal(t, expected.String(), v1.String())
})
}
}
func TestUpdateV1EnableCDI(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
enableCDI bool
expectedEnableCDIValue interface{}
}{
{},
{
enableCDI: false,
expectedEnableCDIValue: nil,
},
{
enableCDI: true,
expectedEnableCDIValue: true,
},
}
for _, tc := range testCases {
t.Run(fmt.Sprintf("%v", tc.enableCDI), func(t *testing.T) {
o := &container.Options{
EnableCDI: tc.enableCDI,
RuntimeName: "nvidia",
RuntimeDir: runtimeDir,
}
cfg, err := toml.Empty.Load()
require.NoError(t, err)
v1 := &containerd.ConfigV1{
Logger: logger,
Tree: cfg,
RuntimeType: runtimeType,
}
err = o.UpdateConfig(v1)
require.NoError(t, err)
enableCDIValue := v1.GetPath([]string{"plugins", "cri", "containerd", "enable_cdi"})
require.EqualValues(t, tc.expectedEnableCDIValue, enableCDIValue)
})
}
}
func TestRevertV1Config(t *testing.T) {
logger, _ := testlog.NewNullLogger()
testCases := []struct {
config map[string]interface {
}
@@ -619,10 +475,9 @@ func TestRevertV1Config(t *testing.T) {
"cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime"),
"nvidia-experimental": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.experimental"),
"nvidia-cdi": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.cdi"),
"nvidia-legacy": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.legacy"),
"nvidia": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime"),
"nvidia-cdi": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.cdi"),
"nvidia-legacy": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.legacy"),
},
},
},
@@ -636,10 +491,9 @@ func TestRevertV1Config(t *testing.T) {
"cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime"),
"nvidia-experimental": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.experimental"),
"nvidia-cdi": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.cdi"),
"nvidia-legacy": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.legacy"),
"nvidia": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime"),
"nvidia-cdi": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.cdi"),
"nvidia-legacy": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.legacy"),
},
"default_runtime": defaultRuntimeV1("/test/runtime/dir/nvidia-container-runtime"),
"default_runtime_name": "nvidia",
@@ -652,29 +506,26 @@ func TestRevertV1Config(t *testing.T) {
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &options{
runtimeClass: "nvidia",
o := &container.Options{
RuntimeName: "nvidia",
}
config, err := toml.TreeFromMap(tc.config)
require.NoError(t, err, "%d: %v", i, tc)
expected, err := toml.TreeFromMap(tc.expected)
require.NoError(t, err, "%d: %v", i, tc)
require.NoError(t, err)
v1 := &containerd.ConfigV1{
Tree: config,
UseDefaultRuntimeName: true,
RuntimeType: runtimeType,
}
v1, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.FromMap(tc.config)),
containerd.WithRuntimeType(runtimeType),
err = RevertConfig(v1, o)
require.NoError(t, err, "%d: %v", i, tc)
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
configContents, _ := toml.Marshal(config)
expectedContents, _ := toml.Marshal(expected)
err = o.RevertConfig(v1)
require.NoError(t, err)
require.Equal(t, string(expectedContents), string(configContents), "%d: %v", i, tc)
require.Equal(t, expected.String(), v1.String())
})
}
}

View File

@@ -14,15 +14,18 @@
# limitations under the License.
*/
package main
package containerd
import (
"fmt"
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/engine/containerd"
"github.com/pelletier/go-toml"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/containerd"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/toml"
)
const (
@@ -30,81 +33,74 @@ const (
)
func TestUpdateV2ConfigDefaultRuntime(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
setAsDefault bool
runtimeClass string
runtimeName string
expectedDefaultRuntimeName interface{}
}{
{},
{
setAsDefault: false,
runtimeClass: "nvidia",
runtimeName: "nvidia",
expectedDefaultRuntimeName: nil,
},
{
setAsDefault: false,
runtimeClass: "NAME",
expectedDefaultRuntimeName: nil,
},
{
setAsDefault: false,
runtimeClass: "nvidia-experimental",
runtimeName: "NAME",
expectedDefaultRuntimeName: nil,
},
{
setAsDefault: true,
runtimeClass: "nvidia",
runtimeName: "nvidia",
expectedDefaultRuntimeName: "nvidia",
},
{
setAsDefault: true,
runtimeClass: "NAME",
runtimeName: "NAME",
expectedDefaultRuntimeName: "NAME",
},
{
setAsDefault: true,
runtimeClass: "nvidia-experimental",
expectedDefaultRuntimeName: "nvidia-experimental",
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &options{
setAsDefault: tc.setAsDefault,
runtimeClass: tc.runtimeClass,
runtimeDir: runtimeDir,
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
SetAsDefault: tc.setAsDefault,
}
config, err := toml.TreeFromMap(map[string]interface{}{})
v2, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.Empty),
containerd.WithRuntimeType(runtimeType),
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
v2 := &containerd.Config{
Tree: config,
RuntimeType: runtimeType,
}
err = UpdateConfig(v2, o)
err = o.UpdateConfig(v2)
require.NoError(t, err)
defaultRuntimeName := config.GetPath([]string{"plugins", "io.containerd.grpc.v1.cri", "containerd", "default_runtime_name"})
cfg := v2.(*containerd.Config)
defaultRuntimeName := cfg.GetPath([]string{"plugins", "io.containerd.grpc.v1.cri", "containerd", "default_runtime_name"})
require.EqualValues(t, tc.expectedDefaultRuntimeName, defaultRuntimeName)
})
}
}
func TestUpdateV2Config(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
const expectedVersion = int64(2)
testCases := []struct {
runtimeClass string
runtimeName string
expectedConfig map[string]interface{}
}{
{
runtimeClass: "nvidia",
runtimeName: "nvidia",
expectedConfig: map[string]interface{}{
"version": int64(2),
"plugins": map[string]interface{}{
@@ -121,16 +117,6 @@ func TestUpdateV2Config(t *testing.T) {
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-experimental": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.experimental",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
@@ -158,7 +144,7 @@ func TestUpdateV2Config(t *testing.T) {
},
},
{
runtimeClass: "NAME",
runtimeName: "NAME",
expectedConfig: map[string]interface{}{
"version": int64(2),
"plugins": map[string]interface{}{
@@ -175,70 +161,6 @@ func TestUpdateV2Config(t *testing.T) {
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-experimental": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.experimental",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.cdi",
},
},
"nvidia-legacy": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.legacy",
},
},
},
},
},
},
},
},
{
runtimeClass: "nvidia-experimental",
expectedConfig: map[string]interface{}{
"version": int64(2),
"plugins": map[string]interface{}{
"io.containerd.grpc.v1.cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-experimental": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.experimental",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
@@ -269,42 +191,41 @@ func TestUpdateV2Config(t *testing.T) {
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &options{
runtimeClass: tc.runtimeClass,
runtimeType: runtimeType,
runtimeDir: runtimeDir,
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
}
config, err := toml.TreeFromMap(map[string]interface{}{})
v2, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.Empty),
containerd.WithRuntimeType(runtimeType),
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
v2 := &containerd.Config{
Tree: config,
RuntimeType: runtimeType,
ContainerAnnotations: []string{"cdi.k8s.io/*"},
}
err = UpdateConfig(v2, o)
err = o.UpdateConfig(v2)
require.NoError(t, err)
expected, err := toml.TreeFromMap(tc.expectedConfig)
require.NoError(t, err)
require.Equal(t, expected.String(), config.String())
require.Equal(t, expected.String(), v2.String())
})
}
}
func TestUpdateV2ConfigWithRuncPresent(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
runtimeClass string
runtimeName string
expectedConfig map[string]interface{}
}{
{
runtimeClass: "nvidia",
runtimeName: "nvidia",
expectedConfig: map[string]interface{}{
"version": int64(2),
"plugins": map[string]interface{}{
@@ -332,17 +253,6 @@ func TestUpdateV2ConfigWithRuncPresent(t *testing.T) {
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-experimental": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.experimental",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
@@ -372,7 +282,7 @@ func TestUpdateV2ConfigWithRuncPresent(t *testing.T) {
},
},
{
runtimeClass: "NAME",
runtimeName: "NAME",
expectedConfig: map[string]interface{}{
"version": int64(2),
"plugins": map[string]interface{}{
@@ -400,85 +310,6 @@ func TestUpdateV2ConfigWithRuncPresent(t *testing.T) {
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-experimental": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.experimental",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.cdi",
},
},
"nvidia-legacy": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.legacy",
},
},
},
},
},
},
},
},
{
runtimeClass: "nvidia-experimental",
expectedConfig: map[string]interface{}{
"version": int64(2),
"plugins": map[string]interface{}{
"io.containerd.grpc.v1.cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"runc": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/runc-binary",
},
},
"nvidia": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-experimental": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.experimental",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
@@ -511,33 +342,80 @@ func TestUpdateV2ConfigWithRuncPresent(t *testing.T) {
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &options{
runtimeClass: tc.runtimeClass,
runtimeType: runtimeType,
runtimeDir: runtimeDir,
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
}
config, err := toml.TreeFromMap(runcConfigMapV2("/runc-binary"))
v2, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.FromMap(runcConfigMapV2("/runc-binary"))),
containerd.WithRuntimeType(runtimeType),
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
v2 := &containerd.Config{
Tree: config,
RuntimeType: runtimeType,
ContainerAnnotations: []string{"cdi.k8s.io/*"},
}
err = UpdateConfig(v2, o)
err = o.UpdateConfig(v2)
require.NoError(t, err)
expected, err := toml.TreeFromMap(tc.expectedConfig)
require.NoError(t, err)
require.Equal(t, expected.String(), config.String())
require.Equal(t, expected.String(), v2.String())
})
}
}
func TestUpdateV2ConfigEnableCDI(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
enableCDI bool
expectedEnableCDIValue interface{}
}{
{},
{
enableCDI: false,
expectedEnableCDIValue: nil,
},
{
enableCDI: true,
expectedEnableCDIValue: true,
},
}
for _, tc := range testCases {
t.Run(fmt.Sprintf("%v", tc.enableCDI), func(t *testing.T) {
o := &container.Options{
EnableCDI: tc.enableCDI,
RuntimeName: "nvidia",
RuntimeDir: runtimeDir,
SetAsDefault: false,
}
cfg, err := toml.LoadMap(map[string]interface{}{})
require.NoError(t, err)
v2 := &containerd.Config{
Logger: logger,
Tree: cfg,
RuntimeType: runtimeType,
CRIRuntimePluginName: "io.containerd.grpc.v1.cri",
}
err = o.UpdateConfig(v2)
require.NoError(t, err)
enableCDIValue := cfg.GetPath([]string{"plugins", "io.containerd.grpc.v1.cri", "enable_cdi"})
require.EqualValues(t, tc.expectedEnableCDIValue, enableCDIValue)
})
}
}
func TestRevertV2Config(t *testing.T) {
logger, _ := testlog.NewNullLogger()
testCases := []struct {
config map[string]interface {
}
@@ -556,8 +434,7 @@ func TestRevertV2Config(t *testing.T) {
"io.containerd.grpc.v1.cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": runtimeMapV2("/test/runtime/dir/nvidia-container-runtime"),
"nvidia-experimental": runtimeMapV2("/test/runtime/dir/nvidia-container-runtime.experimental"),
"nvidia": runtimeMapV2("/test/runtime/dir/nvidia-container-runtime"),
},
},
},
@@ -571,8 +448,7 @@ func TestRevertV2Config(t *testing.T) {
"io.containerd.grpc.v1.cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": runtimeMapV2("/test/runtime/dir/nvidia-container-runtime"),
"nvidia-experimental": runtimeMapV2("/test/runtime/dir/nvidia-container-runtime.experimental"),
"nvidia": runtimeMapV2("/test/runtime/dir/nvidia-container-runtime"),
},
"default_runtime_name": "nvidia",
},
@@ -584,28 +460,25 @@ func TestRevertV2Config(t *testing.T) {
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &options{
runtimeClass: "nvidia",
o := &container.Options{
RuntimeName: "nvidia",
}
config, err := toml.TreeFromMap(tc.config)
require.NoError(t, err)
expected, err := toml.TreeFromMap(tc.expected)
require.NoError(t, err)
v2 := &containerd.Config{
Tree: config,
RuntimeType: runtimeType,
}
err = RevertConfig(v2, o)
v2, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.FromMap(tc.config)),
containerd.WithRuntimeType(runtimeType),
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
configContents, _ := toml.Marshal(config)
expectedContents, _ := toml.Marshal(expected)
err = o.RevertConfig(v2)
require.NoError(t, err)
require.Equal(t, string(expectedContents), string(configContents))
require.Equal(t, expected.String(), v2.String())
})
}
}
@@ -624,6 +497,7 @@ func runtimeMapV2(binary string) map[string]interface{} {
func runcConfigMapV2(binary string) map[string]interface{} {
return map[string]interface{}{
"version": 2,
"plugins": map[string]interface{}{
"io.containerd.grpc.v1.cri": map[string]interface{}{
"containerd": map[string]interface{}{

View File

@@ -0,0 +1,184 @@
/**
# Copyright (c) 2020-2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package containerd
import (
"encoding/json"
"fmt"
log "github.com/sirupsen/logrus"
cli "github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/containerd"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/toml"
)
const (
Name = "containerd"
DefaultConfig = "/etc/containerd/config.toml"
DefaultSocket = "/run/containerd/containerd.sock"
DefaultRestartMode = "signal"
defaultRuntmeType = "io.containerd.runc.v2"
)
// Options stores the containerd-specific options
type Options struct {
useLegacyConfig bool
runtimeType string
ContainerRuntimeModesCDIAnnotationPrefixes cli.StringSlice
runtimeConfigOverrideJSON string
}
func Flags(opts *Options) []cli.Flag {
flags := []cli.Flag{
&cli.BoolFlag{
Name: "use-legacy-config",
Usage: "Specify whether a legacy (pre v1.3) config should be used. " +
"This ensures that a version 1 container config is created by default and that the " +
"containerd.runtimes.default_runtime config section is used to define the default " +
"runtime instead of container.default_runtime_name.",
Destination: &opts.useLegacyConfig,
EnvVars: []string{"CONTAINERD_USE_LEGACY_CONFIG"},
},
&cli.StringFlag{
Name: "runtime-type",
Usage: "The runtime_type to use for the configured runtime classes",
Value: defaultRuntmeType,
Destination: &opts.runtimeType,
EnvVars: []string{"CONTAINERD_RUNTIME_TYPE"},
},
&cli.StringSliceFlag{
Name: "nvidia-container-runtime-modes.cdi.annotation-prefixes",
Destination: &opts.ContainerRuntimeModesCDIAnnotationPrefixes,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_MODES_CDI_ANNOTATION_PREFIXES"},
},
&cli.StringFlag{
Name: "runtime-config-override",
Destination: &opts.runtimeConfigOverrideJSON,
Usage: "specify additional runtime options as a JSON string. The paths are relative to the runtime config.",
Value: "{}",
EnvVars: []string{"RUNTIME_CONFIG_OVERRIDE", "CONTAINERD_RUNTIME_CONFIG_OVERRIDE"},
},
}
return flags
}
// Setup updates a containerd configuration to include the nvidia-containerd-runtime and reloads it
func Setup(c *cli.Context, o *container.Options, co *Options) error {
log.Infof("Starting 'setup' for %v", c.App.Name)
cfg, err := getRuntimeConfig(o, co)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = o.Configure(cfg)
if err != nil {
return fmt.Errorf("unable to configure containerd: %v", err)
}
err = RestartContainerd(o)
if err != nil {
return fmt.Errorf("unable to restart containerd: %v", err)
}
log.Infof("Completed 'setup' for %v", c.App.Name)
return nil
}
// Cleanup reverts a containerd configuration to remove the nvidia-containerd-runtime and reloads it
func Cleanup(c *cli.Context, o *container.Options, co *Options) error {
log.Infof("Starting 'cleanup' for %v", c.App.Name)
cfg, err := getRuntimeConfig(o, co)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = o.Unconfigure(cfg)
if err != nil {
return fmt.Errorf("unable to unconfigure containerd: %v", err)
}
err = RestartContainerd(o)
if err != nil {
return fmt.Errorf("unable to restart containerd: %v", err)
}
log.Infof("Completed 'cleanup' for %v", c.App.Name)
return nil
}
// RestartContainerd restarts containerd depending on the value of restartModeFlag
func RestartContainerd(o *container.Options) error {
return o.Restart("containerd", SignalContainerd)
}
// containerAnnotationsFromCDIPrefixes returns the container annotations to set for the given CDI prefixes.
func (o *Options) containerAnnotationsFromCDIPrefixes() []string {
var annotations []string
for _, prefix := range o.ContainerRuntimeModesCDIAnnotationPrefixes.Value() {
annotations = append(annotations, prefix+"*")
}
return annotations
}
func (o *Options) runtimeConfigOverride() (map[string]interface{}, error) {
if o.runtimeConfigOverrideJSON == "" {
return nil, nil
}
runtimeOptions := make(map[string]interface{})
if err := json.Unmarshal([]byte(o.runtimeConfigOverrideJSON), &runtimeOptions); err != nil {
return nil, fmt.Errorf("failed to read %v as JSON: %w", o.runtimeConfigOverrideJSON, err)
}
return runtimeOptions, nil
}
func GetLowlevelRuntimePaths(o *container.Options, co *Options) ([]string, error) {
cfg, err := getRuntimeConfig(o, co)
if err != nil {
return nil, fmt.Errorf("unable to load containerd config: %w", err)
}
return engine.GetBinaryPathsForRuntimes(cfg), nil
}
func getRuntimeConfig(o *container.Options, co *Options) (engine.Interface, error) {
return containerd.New(
containerd.WithPath(o.Config),
containerd.WithConfigSource(
toml.LoadFirst(
containerd.CommandLineSource(o.HostRootMount, o.ExecutablePath),
toml.FromFile(o.Config),
),
),
containerd.WithRuntimeType(co.runtimeType),
containerd.WithUseLegacyConfig(co.useLegacyConfig),
containerd.WithContainerAnnotations(co.containerAnnotationsFromCDIPrefixes()...),
)
}

View File

@@ -0,0 +1,113 @@
/**
# Copyright 2020-2023 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package containerd
import (
"fmt"
"net"
"syscall"
"time"
log "github.com/sirupsen/logrus"
)
const (
reloadBackoff = 5 * time.Second
maxReloadAttempts = 6
socketMessageToGetPID = ""
)
// SignalContainerd sends a SIGHUP signal to the containerd daemon
func SignalContainerd(socket string) error {
log.Infof("Sending SIGHUP signal to containerd")
// Wrap the logic to perform the SIGHUP in a function so we can retry it on failure
retriable := func() error {
conn, err := net.Dial("unix", socket)
if err != nil {
return fmt.Errorf("unable to dial: %v", err)
}
defer conn.Close()
sconn, err := conn.(*net.UnixConn).SyscallConn()
if err != nil {
return fmt.Errorf("unable to get syscall connection: %v", err)
}
err1 := sconn.Control(func(fd uintptr) {
err = syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_PASSCRED, 1)
})
if err1 != nil {
return fmt.Errorf("unable to issue call on socket fd: %v", err1)
}
if err != nil {
return fmt.Errorf("unable to SetsockoptInt on socket fd: %v", err)
}
_, _, err = conn.(*net.UnixConn).WriteMsgUnix([]byte(socketMessageToGetPID), nil, nil)
if err != nil {
return fmt.Errorf("unable to WriteMsgUnix on socket fd: %v", err)
}
oob := make([]byte, 1024)
_, oobn, _, _, err := conn.(*net.UnixConn).ReadMsgUnix(nil, oob)
if err != nil {
return fmt.Errorf("unable to ReadMsgUnix on socket fd: %v", err)
}
oob = oob[:oobn]
scm, err := syscall.ParseSocketControlMessage(oob)
if err != nil {
return fmt.Errorf("unable to ParseSocketControlMessage from message received on socket fd: %v", err)
}
ucred, err := syscall.ParseUnixCredentials(&scm[0])
if err != nil {
return fmt.Errorf("unable to ParseUnixCredentials from message received on socket fd: %v", err)
}
err = syscall.Kill(int(ucred.Pid), syscall.SIGHUP)
if err != nil {
return fmt.Errorf("unable to send SIGHUP to 'containerd' process: %v", err)
}
return nil
}
// Try to send a SIGHUP up to maxReloadAttempts times
var err error
for i := 0; i < maxReloadAttempts; i++ {
err = retriable()
if err == nil {
break
}
if i == maxReloadAttempts-1 {
break
}
log.Warningf("Error signaling containerd, attempt %v/%v: %v", i+1, maxReloadAttempts, err)
time.Sleep(reloadBackoff)
}
if err != nil {
log.Warningf("Max retries reached %v/%v, aborting", maxReloadAttempts, maxReloadAttempts)
return err
}
log.Infof("Successfully signaled containerd")
return nil
}

View File

@@ -0,0 +1,29 @@
//go:build !linux
// +build !linux
/**
# Copyright 2023 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package containerd
import (
"errors"
)
// SignalContainerd is unsupported on non-linux platforms.
func SignalContainerd(socket string) error {
return errors.New("SignalContainerd is unsupported on non-linux platforms")
}

View File

@@ -0,0 +1,72 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package containerd
import (
"testing"
"github.com/stretchr/testify/require"
)
func TestRuntimeOptions(t *testing.T) {
testCases := []struct {
description string
options Options
expected map[string]interface{}
expectedError error
}{
{
description: "empty is nil",
},
{
description: "empty json",
options: Options{
runtimeConfigOverrideJSON: "{}",
},
expected: map[string]interface{}{},
expectedError: nil,
},
{
description: "SystemdCgroup is true",
options: Options{
runtimeConfigOverrideJSON: "{\"SystemdCgroup\": true}",
},
expected: map[string]interface{}{
"SystemdCgroup": true,
},
expectedError: nil,
},
{
description: "SystemdCgroup is false",
options: Options{
runtimeConfigOverrideJSON: "{\"SystemdCgroup\": false}",
},
expected: map[string]interface{}{
"SystemdCgroup": false,
},
expectedError: nil,
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
runtimeOptions, err := tc.options.runtimeConfigOverride()
require.ErrorIs(t, tc.expectedError, err)
require.EqualValues(t, tc.expected, runtimeOptions)
})
}
}

View File

@@ -0,0 +1,210 @@
/**
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package crio
import (
"fmt"
"os"
"path/filepath"
log "github.com/sirupsen/logrus"
cli "github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/crio"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/ocihook"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/toml"
)
const (
Name = "crio"
defaultConfigMode = "hook"
// Hook-based settings
defaultHooksDir = "/usr/share/containers/oci/hooks.d"
defaultHookFilename = "oci-nvidia-hook.json"
// Config-based settings
DefaultConfig = "/etc/crio/crio.conf"
DefaultSocket = "/var/run/crio/crio.sock"
DefaultRestartMode = "systemd"
)
// Options defines the cri-o specific options.
type Options struct {
configMode string
// hook-specific options
hooksDir string
hookFilename string
}
func Flags(opts *Options) []cli.Flag {
flags := []cli.Flag{
&cli.StringFlag{
Name: "hooks-dir",
Usage: "path to the cri-o hooks directory",
Value: defaultHooksDir,
Destination: &opts.hooksDir,
EnvVars: []string{"CRIO_HOOKS_DIR"},
DefaultText: defaultHooksDir,
},
&cli.StringFlag{
Name: "hook-filename",
Usage: "filename of the cri-o hook that will be created / removed in the hooks directory",
Value: defaultHookFilename,
Destination: &opts.hookFilename,
EnvVars: []string{"CRIO_HOOK_FILENAME"},
DefaultText: defaultHookFilename,
},
&cli.StringFlag{
Name: "config-mode",
Usage: "the configuration mode to use. One of [hook | config]",
Value: defaultConfigMode,
Destination: &opts.configMode,
EnvVars: []string{"CRIO_CONFIG_MODE"},
},
}
return flags
}
// Setup installs the prestart hook required to launch GPU-enabled containers
func Setup(c *cli.Context, o *container.Options, co *Options) error {
log.Infof("Starting 'setup' for %v", c.App.Name)
switch co.configMode {
case "hook":
return setupHook(o, co)
case "config":
return setupConfig(o)
default:
return fmt.Errorf("invalid config-mode '%v'", co.configMode)
}
}
// setupHook installs the prestart hook required to launch GPU-enabled containers
func setupHook(o *container.Options, co *Options) error {
log.Infof("Installing prestart hook")
hookPath := filepath.Join(co.hooksDir, co.hookFilename)
err := ocihook.CreateHook(hookPath, filepath.Join(o.RuntimeDir, config.NVIDIAContainerRuntimeHookExecutable))
if err != nil {
return fmt.Errorf("error creating hook: %v", err)
}
return nil
}
// setupConfig updates the cri-o config for the NVIDIA container runtime
func setupConfig(o *container.Options) error {
log.Infof("Updating config file")
cfg, err := getRuntimeConfig(o)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = o.Configure(cfg)
if err != nil {
return fmt.Errorf("unable to configure cri-o: %v", err)
}
err = RestartCrio(o)
if err != nil {
return fmt.Errorf("unable to restart crio: %v", err)
}
return nil
}
// Cleanup removes the specified prestart hook
func Cleanup(c *cli.Context, o *container.Options, co *Options) error {
log.Infof("Starting 'cleanup' for %v", c.App.Name)
switch co.configMode {
case "hook":
return cleanupHook(co)
case "config":
return cleanupConfig(o)
default:
return fmt.Errorf("invalid config-mode '%v'", co.configMode)
}
}
// cleanupHook removes the prestart hook
func cleanupHook(co *Options) error {
log.Infof("Removing prestart hook")
hookPath := filepath.Join(co.hooksDir, co.hookFilename)
err := os.Remove(hookPath)
if err != nil {
return fmt.Errorf("error removing hook '%v': %v", hookPath, err)
}
return nil
}
// cleanupConfig removes the NVIDIA container runtime from the cri-o config
func cleanupConfig(o *container.Options) error {
log.Infof("Reverting config file modifications")
cfg, err := getRuntimeConfig(o)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = o.Unconfigure(cfg)
if err != nil {
return fmt.Errorf("unable to unconfigure cri-o: %v", err)
}
err = RestartCrio(o)
if err != nil {
return fmt.Errorf("unable to restart crio: %v", err)
}
return nil
}
// RestartCrio restarts crio depending on the value of restartModeFlag
func RestartCrio(o *container.Options) error {
return o.Restart("crio", func(string) error { return fmt.Errorf("supporting crio via signal is unsupported") })
}
func GetLowlevelRuntimePaths(o *container.Options) ([]string, error) {
cfg, err := getRuntimeConfig(o)
if err != nil {
return nil, fmt.Errorf("unable to load crio config: %w", err)
}
return engine.GetBinaryPathsForRuntimes(cfg), nil
}
func getRuntimeConfig(o *container.Options) (engine.Interface, error) {
return crio.New(
crio.WithPath(o.Config),
crio.WithConfigSource(
toml.LoadFirst(
crio.CommandLineSource(o.HostRootMount, o.ExecutablePath),
toml.FromFile(o.Config),
),
),
)
}

View File

@@ -0,0 +1,111 @@
/**
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package docker
import (
"fmt"
log "github.com/sirupsen/logrus"
cli "github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/docker"
)
const (
Name = "docker"
DefaultConfig = "/etc/docker/daemon.json"
DefaultSocket = "/var/run/docker.sock"
DefaultRestartMode = "signal"
)
type Options struct{}
func Flags(opts *Options) []cli.Flag {
return nil
}
// Setup updates docker configuration to include the nvidia runtime and reloads it
func Setup(c *cli.Context, o *container.Options) error {
log.Infof("Starting 'setup' for %v", c.App.Name)
cfg, err := getRuntimeConfig(o)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = o.Configure(cfg)
if err != nil {
return fmt.Errorf("unable to configure docker: %v", err)
}
err = RestartDocker(o)
if err != nil {
return fmt.Errorf("unable to restart docker: %v", err)
}
log.Infof("Completed 'setup' for %v", c.App.Name)
return nil
}
// Cleanup reverts docker configuration to remove the nvidia runtime and reloads it
func Cleanup(c *cli.Context, o *container.Options) error {
log.Infof("Starting 'cleanup' for %v", c.App.Name)
cfg, err := getRuntimeConfig(o)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = o.Unconfigure(cfg)
if err != nil {
return fmt.Errorf("unable to unconfigure docker: %v", err)
}
err = RestartDocker(o)
if err != nil {
return fmt.Errorf("unable to signal docker: %v", err)
}
log.Infof("Completed 'cleanup' for %v", c.App.Name)
return nil
}
// RestartDocker restarts docker depending on the value of restartModeFlag
func RestartDocker(o *container.Options) error {
return o.Restart("docker", SignalDocker)
}
func GetLowlevelRuntimePaths(o *container.Options) ([]string, error) {
cfg, err := docker.New(
docker.WithPath(o.Config),
)
if err != nil {
return nil, fmt.Errorf("unable to load docker config: %w", err)
}
return engine.GetBinaryPathsForRuntimes(cfg), nil
}
func getRuntimeConfig(o *container.Options) (engine.Interface, error) {
return docker.New(
docker.WithPath(o.Config),
)
}

View File

@@ -0,0 +1,113 @@
/**
# Copyright 2021-2023 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package docker
import (
"fmt"
"net"
"syscall"
"time"
log "github.com/sirupsen/logrus"
)
const (
reloadBackoff = 5 * time.Second
maxReloadAttempts = 6
socketMessageToGetPID = "GET /info HTTP/1.0\r\n\r\n"
)
// SignalDocker sends a SIGHUP signal to docker daemon
func SignalDocker(socket string) error {
log.Infof("Sending SIGHUP signal to docker")
// Wrap the logic to perform the SIGHUP in a function so we can retry it on failure
retriable := func() error {
conn, err := net.Dial("unix", socket)
if err != nil {
return fmt.Errorf("unable to dial: %v", err)
}
defer conn.Close()
sconn, err := conn.(*net.UnixConn).SyscallConn()
if err != nil {
return fmt.Errorf("unable to get syscall connection: %v", err)
}
err1 := sconn.Control(func(fd uintptr) {
err = syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_PASSCRED, 1)
})
if err1 != nil {
return fmt.Errorf("unable to issue call on socket fd: %v", err1)
}
if err != nil {
return fmt.Errorf("unable to SetsockoptInt on socket fd: %v", err)
}
_, _, err = conn.(*net.UnixConn).WriteMsgUnix([]byte(socketMessageToGetPID), nil, nil)
if err != nil {
return fmt.Errorf("unable to WriteMsgUnix on socket fd: %v", err)
}
oob := make([]byte, 1024)
_, oobn, _, _, err := conn.(*net.UnixConn).ReadMsgUnix(nil, oob)
if err != nil {
return fmt.Errorf("unable to ReadMsgUnix on socket fd: %v", err)
}
oob = oob[:oobn]
scm, err := syscall.ParseSocketControlMessage(oob)
if err != nil {
return fmt.Errorf("unable to ParseSocketControlMessage from message received on socket fd: %v", err)
}
ucred, err := syscall.ParseUnixCredentials(&scm[0])
if err != nil {
return fmt.Errorf("unable to ParseUnixCredentials from message received on socket fd: %v", err)
}
err = syscall.Kill(int(ucred.Pid), syscall.SIGHUP)
if err != nil {
return fmt.Errorf("unable to send SIGHUP to 'docker' process: %v", err)
}
return nil
}
// Try to send a SIGHUP up to maxReloadAttempts times
var err error
for i := 0; i < maxReloadAttempts; i++ {
err = retriable()
if err == nil {
break
}
if i == maxReloadAttempts-1 {
break
}
log.Warningf("Error signaling docker, attempt %v/%v: %v", i+1, maxReloadAttempts, err)
time.Sleep(reloadBackoff)
}
if err != nil {
log.Warningf("Max retries reached %v/%v, aborting", maxReloadAttempts, maxReloadAttempts)
return err
}
log.Infof("Successfully signaled docker")
return nil
}

View File

@@ -0,0 +1,29 @@
//go:build !linux
// +build !linux
/**
# Copyright 2023 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package docker
import (
"errors"
)
// SignalDocker is unsupported on non-linux platforms.
func SignalDocker(socket string) error {
return errors.New("SignalDocker is unsupported on non-linux platforms")
}

View File

@@ -14,14 +14,16 @@
# limitations under the License.
*/
package main
package docker
import (
"encoding/json"
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/engine/docker"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/docker"
)
func TestUpdateConfigDefaultRuntime(t *testing.T) {
@@ -42,11 +44,6 @@ func TestUpdateConfigDefaultRuntime(t *testing.T) {
runtimeName: "NAME",
expectedDefaultRuntimeName: "NAME",
},
{
setAsDefault: true,
runtimeName: "nvidia-experimental",
expectedDefaultRuntimeName: "nvidia-experimental",
},
{
setAsDefault: true,
runtimeName: "nvidia",
@@ -55,15 +52,15 @@ func TestUpdateConfigDefaultRuntime(t *testing.T) {
}
for i, tc := range testCases {
o := &options{
setAsDefault: tc.setAsDefault,
runtimeName: tc.runtimeName,
runtimeDir: runtimeDir,
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
SetAsDefault: tc.setAsDefault,
}
config := docker.Config(map[string]interface{}{})
err := UpdateConfig(&config, o)
err := o.UpdateConfig(&config)
require.NoError(t, err, "%d: %v", i, tc)
defaultRuntimeName := config["default-runtime"]
@@ -89,10 +86,6 @@ func TestUpdateConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.experimental",
"args": []string{},
},
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
@@ -114,35 +107,6 @@ func TestUpdateConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.experimental",
"args": []string{},
},
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
},
"nvidia-legacy": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.legacy",
"args": []string{},
},
},
},
},
{
config: map[string]interface{}{},
setAsDefault: false,
runtimeName: "nvidia-experimental",
expectedConfig: map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.experimental",
"args": []string{},
},
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
@@ -170,10 +134,6 @@ func TestUpdateConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.experimental",
"args": []string{},
},
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
@@ -204,10 +164,6 @@ func TestUpdateConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.experimental",
"args": []string{},
},
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
@@ -232,38 +188,6 @@ func TestUpdateConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.experimental",
"args": []string{},
},
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
},
"nvidia-legacy": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.legacy",
"args": []string{},
},
},
},
},
{
config: map[string]interface{}{
"default-runtime": "runc",
},
setAsDefault: true,
runtimeName: "nvidia-experimental",
expectedConfig: map[string]interface{}{
"default-runtime": "nvidia-experimental",
"runtimes": map[string]interface{}{
"nvidia": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.experimental",
"args": []string{},
},
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
@@ -296,10 +220,6 @@ func TestUpdateConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.experimental",
"args": []string{},
},
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
@@ -314,13 +234,15 @@ func TestUpdateConfig(t *testing.T) {
}
for i, tc := range testCases {
options := &options{
setAsDefault: tc.setAsDefault,
runtimeName: tc.runtimeName,
runtimeDir: runtimeDir,
tc := tc
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
SetAsDefault: tc.setAsDefault,
}
err := UpdateConfig(&tc.config, options)
err := o.UpdateConfig(&tc.config)
require.NoError(t, err, "%d: %v", i, tc)
configContent, err := json.MarshalIndent(tc.config, "", " ")
@@ -356,7 +278,7 @@ func TestRevertConfig(t *testing.T) {
{
config: map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia-experimental": map[string]interface{}{
"nvidia": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
@@ -371,25 +293,6 @@ func TestRevertConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.experimental",
"args": []string{},
},
},
},
expectedConfig: map[string]interface{}{},
},
{
config: map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.experimental",
"args": []string{},
},
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
@@ -457,7 +360,9 @@ func TestRevertConfig(t *testing.T) {
}
for i, tc := range testCases {
err := RevertConfig(&tc.config, &options{})
tc := tc
o := &container.Options{}
err := o.RevertConfig(&tc.config)
require.NoError(t, err, "%d: %v", i, tc)

View File

@@ -0,0 +1,204 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package runtime
import (
"fmt"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/runtime/containerd"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/runtime/crio"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/runtime/docker"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/toolkit"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
const (
defaultSetAsDefault = true
// defaultRuntimeName specifies the NVIDIA runtime to be use as the default runtime if setting the default runtime is enabled
defaultRuntimeName = "nvidia"
defaultHostRootMount = "/host"
runtimeSpecificDefault = "RUNTIME_SPECIFIC_DEFAULT"
)
type Options struct {
container.Options
containerdOptions containerd.Options
crioOptions crio.Options
}
func Flags(opts *Options) []cli.Flag {
flags := []cli.Flag{
&cli.StringFlag{
Name: "config",
Usage: "Path to the runtime config file",
Value: runtimeSpecificDefault,
Destination: &opts.Config,
EnvVars: []string{"RUNTIME_CONFIG", "CONTAINERD_CONFIG", "DOCKER_CONFIG"},
},
&cli.StringFlag{
Name: "executable-path",
Usage: "The path to the runtime executable. This is used to extract the current config",
Destination: &opts.ExecutablePath,
EnvVars: []string{"RUNTIME_EXECUTABLE_PATH"},
},
&cli.StringFlag{
Name: "socket",
Usage: "Path to the runtime socket file",
Value: runtimeSpecificDefault,
Destination: &opts.Socket,
EnvVars: []string{"RUNTIME_SOCKET", "CONTAINERD_SOCKET", "DOCKER_SOCKET"},
},
&cli.StringFlag{
Name: "restart-mode",
Usage: "Specify how the runtime should be restarted; If 'none' is selected it will not be restarted [signal | systemd | none ]",
Value: runtimeSpecificDefault,
Destination: &opts.RestartMode,
EnvVars: []string{"RUNTIME_RESTART_MODE"},
},
&cli.BoolFlag{
Name: "enable-cdi-in-runtime",
Usage: "Enable CDI in the configured runt ime",
Destination: &opts.EnableCDI,
EnvVars: []string{"RUNTIME_ENABLE_CDI"},
},
&cli.StringFlag{
Name: "host-root",
Usage: "Specify the path to the host root to be used when restarting the runtime using systemd",
Value: defaultHostRootMount,
Destination: &opts.HostRootMount,
EnvVars: []string{"HOST_ROOT_MOUNT"},
},
&cli.StringFlag{
Name: "runtime-name",
Aliases: []string{"nvidia-runtime-name", "runtime-class"},
Usage: "Specify the name of the `nvidia` runtime. If set-as-default is selected, the runtime is used as the default runtime.",
Value: defaultRuntimeName,
Destination: &opts.RuntimeName,
EnvVars: []string{"NVIDIA_RUNTIME_NAME", "CONTAINERD_RUNTIME_CLASS", "DOCKER_RUNTIME_NAME"},
},
&cli.BoolFlag{
Name: "set-as-default",
Usage: "Set the `nvidia` runtime as the default runtime.",
Value: defaultSetAsDefault,
Destination: &opts.SetAsDefault,
EnvVars: []string{"NVIDIA_RUNTIME_SET_AS_DEFAULT", "CONTAINERD_SET_AS_DEFAULT", "DOCKER_SET_AS_DEFAULT"},
Hidden: true,
},
}
flags = append(flags, containerd.Flags(&opts.containerdOptions)...)
flags = append(flags, crio.Flags(&opts.crioOptions)...)
return flags
}
// Validate checks whether the specified options are valid
func (opts *Options) Validate(logger logger.Interface, c *cli.Context, runtime string, toolkitRoot string, to *toolkit.Options) error {
// We set this option here to ensure that it is available in future calls.
opts.RuntimeDir = toolkitRoot
if !c.IsSet("enable-cdi-in-runtime") {
opts.EnableCDI = to.CDI.Enabled
}
if opts.ExecutablePath != "" && opts.RuntimeName == docker.Name {
logger.Warningf("Ignoring executable-path=%q flag for %v", opts.ExecutablePath, opts.RuntimeName)
opts.ExecutablePath = ""
}
// Apply the runtime-specific config changes.
switch runtime {
case containerd.Name:
if opts.Config == runtimeSpecificDefault {
opts.Config = containerd.DefaultConfig
}
if opts.Socket == runtimeSpecificDefault {
opts.Socket = containerd.DefaultSocket
}
if opts.RestartMode == runtimeSpecificDefault {
opts.RestartMode = containerd.DefaultRestartMode
}
case crio.Name:
if opts.Config == runtimeSpecificDefault {
opts.Config = crio.DefaultConfig
}
if opts.Socket == runtimeSpecificDefault {
opts.Socket = crio.DefaultSocket
}
if opts.RestartMode == runtimeSpecificDefault {
opts.RestartMode = crio.DefaultRestartMode
}
case docker.Name:
if opts.Config == runtimeSpecificDefault {
opts.Config = docker.DefaultConfig
}
if opts.Socket == runtimeSpecificDefault {
opts.Socket = docker.DefaultSocket
}
if opts.RestartMode == runtimeSpecificDefault {
opts.RestartMode = docker.DefaultRestartMode
}
default:
return fmt.Errorf("undefined runtime %v", runtime)
}
return nil
}
func Setup(c *cli.Context, opts *Options, runtime string) error {
switch runtime {
case containerd.Name:
return containerd.Setup(c, &opts.Options, &opts.containerdOptions)
case crio.Name:
return crio.Setup(c, &opts.Options, &opts.crioOptions)
case docker.Name:
return docker.Setup(c, &opts.Options)
default:
return fmt.Errorf("undefined runtime %v", runtime)
}
}
func Cleanup(c *cli.Context, opts *Options, runtime string) error {
switch runtime {
case containerd.Name:
return containerd.Cleanup(c, &opts.Options, &opts.containerdOptions)
case crio.Name:
return crio.Cleanup(c, &opts.Options, &opts.crioOptions)
case docker.Name:
return docker.Cleanup(c, &opts.Options)
default:
return fmt.Errorf("undefined runtime %v", runtime)
}
}
func GetLowlevelRuntimePaths(opts *Options, runtime string) ([]string, error) {
switch runtime {
case containerd.Name:
return containerd.GetLowlevelRuntimePaths(&opts.Options, &opts.containerdOptions)
case crio.Name:
return crio.GetLowlevelRuntimePaths(&opts.Options)
case docker.Name:
return docker.GetLowlevelRuntimePaths(&opts.Options)
default:
return nil, fmt.Errorf("undefined runtime %v", runtime)
}
}

View File

@@ -0,0 +1,328 @@
package main
import (
"fmt"
"os"
"os/signal"
"path/filepath"
"syscall"
"github.com/urfave/cli/v2"
"golang.org/x/sys/unix"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/runtime"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/toolkit"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
)
const (
toolkitPidFilename = "toolkit.pid"
defaultPidFile = "/run/nvidia/toolkit/" + toolkitPidFilename
defaultToolkitInstallDir = "/usr/local/nvidia"
toolkitSubDir = "toolkit"
defaultRuntime = "docker"
)
var availableRuntimes = map[string]struct{}{"docker": {}, "crio": {}, "containerd": {}}
var defaultLowLevelRuntimes = []string{"runc", "crun"}
var waitingForSignal = make(chan bool, 1)
var signalReceived = make(chan bool, 1)
// options stores the command line arguments
type options struct {
toolkitInstallDir string
noDaemon bool
runtime string
pidFile string
sourceRoot string
packageType string
toolkitOptions toolkit.Options
runtimeOptions runtime.Options
}
func (o options) toolkitRoot() string {
return filepath.Join(o.toolkitInstallDir, toolkitSubDir)
}
func main() {
logger := logger.New()
c := NewApp(logger)
// Run the CLI
logger.Infof("Starting %v", c.Name)
if err := c.Run(os.Args); err != nil {
logger.Errorf("error running %v: %v", c.Name, err)
os.Exit(1)
}
logger.Infof("Completed %v", c.Name)
}
// An app represents the nvidia-ctk-installer.
type app struct {
logger logger.Interface
toolkit *toolkit.Installer
}
// NewApp creates the CLI app fro the specified options.
func NewApp(logger logger.Interface) *cli.App {
a := app{
logger: logger,
}
return a.build()
}
func (a app) build() *cli.App {
options := options{
toolkitOptions: toolkit.Options{},
}
// Create the top-level CLI
c := cli.NewApp()
c.Name = "nvidia-ctk-installer"
c.Usage = "Install the NVIDIA Container Toolkit and configure the specified runtime to use the `nvidia` runtime."
c.Version = info.GetVersionString()
c.Before = func(ctx *cli.Context) error {
return a.Before(ctx, &options)
}
c.Action = func(ctx *cli.Context) error {
return a.Run(ctx, &options)
}
// Setup flags for the CLI
c.Flags = []cli.Flag{
&cli.BoolFlag{
Name: "no-daemon",
Aliases: []string{"n"},
Usage: "terminate immediately after setting up the runtime. Note that no cleanup will be performed",
Destination: &options.noDaemon,
EnvVars: []string{"NO_DAEMON"},
},
&cli.StringFlag{
Name: "runtime",
Aliases: []string{"r"},
Usage: "the runtime to setup on this node. One of {'docker', 'crio', 'containerd'}",
Value: defaultRuntime,
Destination: &options.runtime,
EnvVars: []string{"RUNTIME"},
},
&cli.StringFlag{
Name: "toolkit-install-dir",
Aliases: []string{"root"},
Usage: "The directory where the NVIDIA Container Toolkit is to be installed. " +
"The components of the toolkit will be installed to `ROOT`/toolkit. " +
"Note that in the case of a containerized installer, this is the path in the container and it is " +
"recommended that this match the path on the host.",
Value: defaultToolkitInstallDir,
Destination: &options.toolkitInstallDir,
EnvVars: []string{"TOOLKIT_INSTALL_DIR", "ROOT"},
},
&cli.StringFlag{
Name: "toolkit-source-root",
Usage: "The folder where the required toolkit artifacts can be found. If this is not specified, the path /artifacts/{{ .ToolkitPackageType }} is used where ToolkitPackageType is the resolved package type",
Destination: &options.sourceRoot,
EnvVars: []string{"TOOLKIT_SOURCE_ROOT"},
},
&cli.StringFlag{
Name: "toolkit-package-type",
Usage: "specify the package type to use for the toolkit. One of ['deb', 'rpm', 'auto', '']. If 'auto' or '' are used, the type is inferred automatically.",
Value: "auto",
Destination: &options.packageType,
EnvVars: []string{"TOOLKIT_PACKAGE_TYPE"},
},
&cli.StringFlag{
Name: "pid-file",
Value: defaultPidFile,
Usage: "the path to a toolkit.pid file to ensure that only a single configuration instance is running",
Destination: &options.pidFile,
EnvVars: []string{"TOOLKIT_PID_FILE", "PID_FILE"},
},
}
c.Flags = append(c.Flags, toolkit.Flags(&options.toolkitOptions)...)
c.Flags = append(c.Flags, runtime.Flags(&options.runtimeOptions)...)
return c
}
func (a *app) Before(c *cli.Context, o *options) error {
if o.sourceRoot == "" {
sourceRoot, err := a.resolveSourceRoot(o.runtimeOptions.HostRootMount, o.packageType)
if err != nil {
return fmt.Errorf("failed to resolve source root: %v", err)
}
a.logger.Infof("Resolved source root to %v", sourceRoot)
o.sourceRoot = sourceRoot
}
a.toolkit = toolkit.NewInstaller(
toolkit.WithLogger(a.logger),
toolkit.WithSourceRoot(o.sourceRoot),
toolkit.WithToolkitRoot(o.toolkitRoot()),
)
return a.validateFlags(c, o)
}
func (a *app) validateFlags(c *cli.Context, o *options) error {
if o.toolkitInstallDir == "" {
return fmt.Errorf("the install root must be specified")
}
if _, exists := availableRuntimes[o.runtime]; !exists {
return fmt.Errorf("unknown runtime: %v", o.runtime)
}
if filepath.Base(o.pidFile) != toolkitPidFilename {
return fmt.Errorf("invalid toolkit.pid path %v", o.pidFile)
}
if err := a.toolkit.ValidateOptions(&o.toolkitOptions); err != nil {
return err
}
if err := o.runtimeOptions.Validate(a.logger, c, o.runtime, o.toolkitRoot(), &o.toolkitOptions); err != nil {
return err
}
return nil
}
// Run installs the NVIDIA Container Toolkit and updates the requested runtime.
// If the application is run as a daemon, the application waits and unconfigures
// the runtime on termination.
func (a *app) Run(c *cli.Context, o *options) error {
err := a.initialize(o.pidFile)
if err != nil {
return fmt.Errorf("unable to initialize: %v", err)
}
defer a.shutdown(o.pidFile)
if len(o.toolkitOptions.ContainerRuntimeRuntimes.Value()) == 0 {
lowlevelRuntimePaths, err := runtime.GetLowlevelRuntimePaths(&o.runtimeOptions, o.runtime)
if err != nil {
return fmt.Errorf("unable to determine runtime options: %w", err)
}
lowlevelRuntimePaths = append(lowlevelRuntimePaths, defaultLowLevelRuntimes...)
o.toolkitOptions.ContainerRuntimeRuntimes = *cli.NewStringSlice(lowlevelRuntimePaths...)
}
err = a.toolkit.Install(c, &o.toolkitOptions)
if err != nil {
return fmt.Errorf("unable to install toolkit: %v", err)
}
err = runtime.Setup(c, &o.runtimeOptions, o.runtime)
if err != nil {
return fmt.Errorf("unable to setup runtime: %v", err)
}
if !o.noDaemon {
err = a.waitForSignal()
if err != nil {
return fmt.Errorf("unable to wait for signal: %v", err)
}
err = runtime.Cleanup(c, &o.runtimeOptions, o.runtime)
if err != nil {
return fmt.Errorf("unable to cleanup runtime: %v", err)
}
}
return nil
}
func (a *app) initialize(pidFile string) error {
a.logger.Infof("Initializing")
if dir := filepath.Dir(pidFile); dir != "" {
err := os.MkdirAll(dir, 0755)
if err != nil {
return fmt.Errorf("unable to create folder for pidfile: %w", err)
}
}
f, err := os.Create(pidFile)
if err != nil {
return fmt.Errorf("unable to create pidfile: %v", err)
}
err = unix.Flock(int(f.Fd()), unix.LOCK_EX|unix.LOCK_NB)
if err != nil {
a.logger.Warningf("Unable to get exclusive lock on '%v'", pidFile)
a.logger.Warningf("This normally means an instance of the NVIDIA toolkit Container is already running, aborting")
return fmt.Errorf("unable to get flock on pidfile: %v", err)
}
_, err = fmt.Fprintf(f, "%v\n", os.Getpid())
if err != nil {
return fmt.Errorf("unable to write PID to pidfile: %v", err)
}
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGHUP, syscall.SIGINT, syscall.SIGQUIT, syscall.SIGPIPE, syscall.SIGTERM)
go func() {
<-sigs
select {
case <-waitingForSignal:
signalReceived <- true
default:
a.logger.Infof("Signal received, exiting early")
a.shutdown(pidFile)
os.Exit(0)
}
}()
return nil
}
func (a *app) waitForSignal() error {
a.logger.Infof("Waiting for signal")
waitingForSignal <- true
<-signalReceived
return nil
}
func (a *app) shutdown(pidFile string) {
a.logger.Infof("Shutting Down")
err := os.Remove(pidFile)
if err != nil {
a.logger.Warningf("Unable to remove pidfile: %v", err)
}
}
func (a *app) resolveSourceRoot(hostRoot string, packageType string) (string, error) {
resolvedPackageType, err := a.resolvePackageType(hostRoot, packageType)
if err != nil {
return "", err
}
switch resolvedPackageType {
case "deb":
return "/artifacts/deb", nil
case "rpm":
return "/artifacts/rpm", nil
default:
return "", fmt.Errorf("invalid package type: %v", resolvedPackageType)
}
}
func (a *app) resolvePackageType(hostRoot string, packageType string) (rPackageTypes string, rerr error) {
if packageType != "" && packageType != "auto" {
return packageType, nil
}
locator := lookup.NewExecutableLocator(a.logger, hostRoot)
if candidates, err := locator.Locate("/usr/bin/rpm"); err == nil && len(candidates) > 0 {
return "rpm", nil
}
if candidates, err := locator.Locate("/usr/bin/dpkg"); err == nil && len(candidates) > 0 {
return "deb", nil
}
return "deb", nil
}

View File

@@ -0,0 +1,455 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package main
import (
"os"
"path/filepath"
"strings"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/internal/test"
)
func TestApp(t *testing.T) {
t.Setenv("__NVCT_TESTING_DEVICES_ARE_FILES", "true")
logger, _ := testlog.NewNullLogger()
moduleRoot, err := test.GetModuleRoot()
require.NoError(t, err)
artifactRoot := filepath.Join(moduleRoot, "testdata", "installer", "artifacts")
hostRoot := filepath.Join(moduleRoot, "testdata", "lookup", "rootfs-1")
testCases := []struct {
description string
args []string
expectedToolkitConfig string
expectedRuntimeConfig string
}{
{
description: "no args",
expectedToolkitConfig: `accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
swarm-resource = ""
[nvidia-container-cli]
debug = ""
environment = []
ldcache = ""
ldconfig = "@/run/nvidia/driver/sbin/ldconfig"
load-kmods = true
no-cgroups = false
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-cli"
root = "/run/nvidia/driver"
user = ""
[nvidia-container-runtime]
debug = "/dev/null"
log-level = "info"
mode = "auto"
runtimes = ["runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime.modes.legacy]
cuda-compat-mode = "ldconfig"
[nvidia-container-runtime-hook]
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime-hook"
skip-mode-detection = true
[nvidia-ctk]
path = "{{ .toolkitRoot }}/toolkit/nvidia-ctk"
`,
expectedRuntimeConfig: `{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime"
},
"nvidia-cdi": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.cdi"
},
"nvidia-legacy": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.legacy"
}
}
}`,
},
{
description: "CDI enabled enables CDI in docker",
args: []string{"--cdi-enabled", "--create-device-nodes=none"},
expectedToolkitConfig: `accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
swarm-resource = ""
[nvidia-container-cli]
debug = ""
environment = []
ldcache = ""
ldconfig = "@/run/nvidia/driver/sbin/ldconfig"
load-kmods = true
no-cgroups = false
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-cli"
root = "/run/nvidia/driver"
user = ""
[nvidia-container-runtime]
debug = "/dev/null"
log-level = "info"
mode = "auto"
runtimes = ["runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime.modes.legacy]
cuda-compat-mode = "ldconfig"
[nvidia-container-runtime-hook]
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime-hook"
skip-mode-detection = true
[nvidia-ctk]
path = "{{ .toolkitRoot }}/toolkit/nvidia-ctk"
`,
expectedRuntimeConfig: `{
"default-runtime": "nvidia",
"features": {
"cdi": true
},
"runtimes": {
"nvidia": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime"
},
"nvidia-cdi": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.cdi"
},
"nvidia-legacy": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.legacy"
}
}
}`,
},
{
description: "--enable-cdi-in-runtime=false overrides --cdi-enabled in Docker",
args: []string{"--cdi-enabled", "--create-device-nodes=none", "--enable-cdi-in-runtime=false"},
expectedToolkitConfig: `accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
swarm-resource = ""
[nvidia-container-cli]
debug = ""
environment = []
ldcache = ""
ldconfig = "@/run/nvidia/driver/sbin/ldconfig"
load-kmods = true
no-cgroups = false
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-cli"
root = "/run/nvidia/driver"
user = ""
[nvidia-container-runtime]
debug = "/dev/null"
log-level = "info"
mode = "auto"
runtimes = ["runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime.modes.legacy]
cuda-compat-mode = "ldconfig"
[nvidia-container-runtime-hook]
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime-hook"
skip-mode-detection = true
[nvidia-ctk]
path = "{{ .toolkitRoot }}/toolkit/nvidia-ctk"
`,
expectedRuntimeConfig: `{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime"
},
"nvidia-cdi": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.cdi"
},
"nvidia-legacy": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.legacy"
}
}
}`,
},
{
description: "CDI enabled enables CDI in containerd",
args: []string{"--cdi-enabled", "--runtime=containerd"},
expectedToolkitConfig: `accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
swarm-resource = ""
[nvidia-container-cli]
debug = ""
environment = []
ldcache = ""
ldconfig = "@/run/nvidia/driver/sbin/ldconfig"
load-kmods = true
no-cgroups = false
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-cli"
root = "/run/nvidia/driver"
user = ""
[nvidia-container-runtime]
debug = "/dev/null"
log-level = "info"
mode = "auto"
runtimes = ["runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime.modes.legacy]
cuda-compat-mode = "ldconfig"
[nvidia-container-runtime-hook]
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime-hook"
skip-mode-detection = true
[nvidia-ctk]
path = "{{ .toolkitRoot }}/toolkit/nvidia-ctk"
`,
expectedRuntimeConfig: `version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
enable_cdi = true
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-cdi]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-cdi.options]
BinaryName = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.cdi"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-legacy]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-legacy.options]
BinaryName = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.legacy"
`,
},
{
description: "--enable-cdi-in-runtime=false overrides --cdi-enabled in containerd",
args: []string{"--cdi-enabled", "--create-device-nodes=none", "--enable-cdi-in-runtime=false", "--runtime=containerd"},
expectedToolkitConfig: `accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
swarm-resource = ""
[nvidia-container-cli]
debug = ""
environment = []
ldcache = ""
ldconfig = "@/run/nvidia/driver/sbin/ldconfig"
load-kmods = true
no-cgroups = false
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-cli"
root = "/run/nvidia/driver"
user = ""
[nvidia-container-runtime]
debug = "/dev/null"
log-level = "info"
mode = "auto"
runtimes = ["runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime.modes.legacy]
cuda-compat-mode = "ldconfig"
[nvidia-container-runtime-hook]
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime-hook"
skip-mode-detection = true
[nvidia-ctk]
path = "{{ .toolkitRoot }}/toolkit/nvidia-ctk"
`,
expectedRuntimeConfig: `version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-cdi]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-cdi.options]
BinaryName = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.cdi"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-legacy]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-legacy.options]
BinaryName = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.legacy"
`,
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
testRoot := t.TempDir()
cdiOutputDir := filepath.Join(testRoot, "/var/run/cdi")
runtimeConfigFile := filepath.Join(testRoot, "config.file")
toolkitRoot := filepath.Join(testRoot, "toolkit-test")
toolkitConfigFile := filepath.Join(toolkitRoot, "toolkit/.config/nvidia-container-runtime/config.toml")
app := NewApp(logger)
testArgs := []string{
"nvidia-ctk-installer",
"--toolkit-install-dir=" + toolkitRoot,
"--no-daemon",
"--cdi-output-dir=" + cdiOutputDir,
"--config=" + runtimeConfigFile,
"--create-device-nodes=none",
"--driver-root-ctr-path=" + hostRoot,
"--pid-file=" + filepath.Join(testRoot, "toolkit.pid"),
"--restart-mode=none",
"--toolkit-source-root=" + filepath.Join(artifactRoot, "deb"),
}
err := app.Run(append(testArgs, tc.args...))
require.NoError(t, err)
require.FileExists(t, toolkitConfigFile)
toolkitConfigFileContents, err := os.ReadFile(toolkitConfigFile)
require.NoError(t, err)
require.EqualValues(t, strings.ReplaceAll(tc.expectedToolkitConfig, "{{ .toolkitRoot }}", toolkitRoot), string(toolkitConfigFileContents))
require.FileExists(t, runtimeConfigFile)
runtimeConfigFileContents, err := os.ReadFile(runtimeConfigFile)
require.NoError(t, err)
require.EqualValues(t, strings.ReplaceAll(tc.expectedRuntimeConfig, "{{ .toolkitRoot }}", toolkitRoot), string(runtimeConfigFileContents))
})
}
}

View File

@@ -0,0 +1,85 @@
/**
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package installer
import (
"fmt"
"path/filepath"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
)
// An artifactRoot is used as a source for installed artifacts.
// It is refined by a directory path, a library locator, and an executable locator.
type artifactRoot struct {
path string
libraries lookup.Locator
executables lookup.Locator
}
func newArtifactRoot(logger logger.Interface, rootDirectoryPath string) (*artifactRoot, error) {
relativeLibrarySearchPaths := []string{
"/usr/lib64",
"/usr/lib/x86_64-linux-gnu",
"/usr/lib/aarch64-linux-gnu",
}
var librarySearchPaths []string
for _, l := range relativeLibrarySearchPaths {
librarySearchPaths = append(librarySearchPaths, filepath.Join(rootDirectoryPath, l))
}
a := artifactRoot{
path: rootDirectoryPath,
libraries: lookup.NewLibraryLocator(
lookup.WithLogger(logger),
lookup.WithCount(1),
lookup.WithSearchPaths(librarySearchPaths...),
),
executables: lookup.NewExecutableLocator(
logger,
rootDirectoryPath,
),
}
return &a, nil
}
func (r *artifactRoot) findLibrary(name string) (string, error) {
candidates, err := r.libraries.Locate(name)
if err != nil {
return "", fmt.Errorf("error locating library: %w", err)
}
if len(candidates) == 0 {
return "", fmt.Errorf("library %v not found", name)
}
return candidates[0], nil
}
func (r *artifactRoot) findExecutable(name string) (string, error) {
candidates, err := r.executables.Locate(name)
if err != nil {
return "", fmt.Errorf("error locating executable: %w", err)
}
if len(candidates) == 0 {
return "", fmt.Errorf("executable %v not found", name)
}
return candidates[0], nil
}

View File

@@ -0,0 +1,47 @@
/**
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package installer
import (
"fmt"
"os"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type createDirectory struct {
logger logger.Interface
}
func (t *ToolkitInstaller) createDirectory() Installer {
return &createDirectory{
logger: t.logger,
}
}
func (d *createDirectory) Install(dir string) error {
if dir == "" {
return nil
}
d.logger.Infof("Creating directory '%v'", dir)
err := os.MkdirAll(dir, 0755)
if err != nil {
return fmt.Errorf("error creating directory: %v", err)
}
return nil
}

View File

@@ -0,0 +1,179 @@
/**
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package installer
import (
"bytes"
"fmt"
"html/template"
"io"
"path/filepath"
"strings"
log "github.com/sirupsen/logrus"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/operator"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
)
type executable struct {
requiresKernelModule bool
path string
symlink string
env map[string]string
}
func (t *ToolkitInstaller) collectExecutables(destDir string) ([]Installer, error) {
configFilePath := t.ConfigFilePath(destDir)
executables := []executable{
{
path: "nvidia-ctk",
},
{
path: "nvidia-cdi-hook",
},
}
for _, runtime := range operator.GetRuntimes() {
e := executable{
path: runtime.Path,
requiresKernelModule: true,
env: map[string]string{
config.FilePathOverrideEnvVar: configFilePath,
},
}
executables = append(executables, e)
}
executables = append(executables,
executable{
path: "nvidia-container-cli",
env: map[string]string{"LD_LIBRARY_PATH": destDir + ":$LD_LIBRARY_PATH"},
},
)
executables = append(executables,
executable{
path: "nvidia-container-runtime-hook",
symlink: "nvidia-container-toolkit",
env: map[string]string{
config.FilePathOverrideEnvVar: configFilePath,
},
},
)
var installers []Installer
for _, executable := range executables {
executablePath, err := t.artifactRoot.findExecutable(executable.path)
if err != nil {
if t.ignoreErrors {
log.Errorf("Ignoring error: %v", err)
continue
}
return nil, err
}
wrappedExecutableFilename := filepath.Base(executablePath)
dotRealFilename := wrappedExecutableFilename + ".real"
w := &wrapper{
Source: executablePath,
WrappedExecutable: dotRealFilename,
CheckModules: executable.requiresKernelModule,
Envvars: map[string]string{
"PATH": strings.Join([]string{destDir, "$PATH"}, ":"),
},
}
for k, v := range executable.env {
w.Envvars[k] = v
}
installers = append(installers, w)
if executable.symlink == "" {
continue
}
link := symlink{
linkname: executable.symlink,
target: filepath.Base(executablePath),
}
installers = append(installers, link)
}
return installers, nil
}
type wrapper struct {
Source string
Envvars map[string]string
WrappedExecutable string
CheckModules bool
}
type render struct {
*wrapper
DestDir string
}
func (w *wrapper) Install(destDir string) error {
// Copy the executable with a .real extension.
mode, err := installFile(w.Source, filepath.Join(destDir, w.WrappedExecutable))
if err != nil {
return err
}
// Create a wrapper file.
r := render{
wrapper: w,
DestDir: destDir,
}
content, err := r.render()
if err != nil {
return fmt.Errorf("failed to render wrapper: %w", err)
}
wrapperFile := filepath.Join(destDir, filepath.Base(w.Source))
return installContent(content, wrapperFile, mode|0111)
}
func (w *render) render() (io.Reader, error) {
wrapperTemplate := `#! /bin/sh
{{- if (.CheckModules) }}
cat /proc/modules | grep -e "^nvidia " >/dev/null 2>&1
if [ "${?}" != "0" ]; then
echo "nvidia driver modules are not yet loaded, invoking runc directly"
exec runc "$@"
fi
{{- end }}
{{- range $key, $value := .Envvars }}
{{$key}}={{$value}} \
{{- end }}
{{ .DestDir }}/{{ .WrappedExecutable }} \
"$@"
`
var content bytes.Buffer
tmpl, err := template.New("wrapper").Parse(wrapperTemplate)
if err != nil {
return nil, err
}
if err := tmpl.Execute(&content, w); err != nil {
return nil, err
}
return &content, nil
}

View File

@@ -0,0 +1,91 @@
/**
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package installer
import (
"bytes"
"testing"
"github.com/stretchr/testify/require"
)
func TestWrapperRender(t *testing.T) {
testCases := []struct {
description string
w *wrapper
expected string
}{
{
description: "executable is added",
w: &wrapper{
WrappedExecutable: "some-runtime",
},
expected: `#! /bin/sh
/dest-dir/some-runtime \
"$@"
`,
},
{
description: "module check is added",
w: &wrapper{
WrappedExecutable: "some-runtime",
CheckModules: true,
},
expected: `#! /bin/sh
cat /proc/modules | grep -e "^nvidia " >/dev/null 2>&1
if [ "${?}" != "0" ]; then
echo "nvidia driver modules are not yet loaded, invoking runc directly"
exec runc "$@"
fi
/dest-dir/some-runtime \
"$@"
`,
},
{
description: "environment is added",
w: &wrapper{
WrappedExecutable: "some-runtime",
Envvars: map[string]string{
"PATH": "/foo/bar/baz",
},
},
expected: `#! /bin/sh
PATH=/foo/bar/baz \
/dest-dir/some-runtime \
"$@"
`,
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
r := render{
wrapper: tc.w,
DestDir: "/dest-dir",
}
reader, err := r.render()
require.NoError(t, err)
var content bytes.Buffer
_, err = content.ReadFrom(reader)
require.NoError(t, err)
require.Equal(t, tc.expected, content.String())
})
}
}

View File

@@ -0,0 +1,188 @@
// Code generated by moq; DO NOT EDIT.
// github.com/matryer/moq
package installer
import (
"io"
"os"
"sync"
)
// Ensure, that fileInstallerMock does implement fileInstaller.
// If this is not the case, regenerate this file with moq.
var _ fileInstaller = &fileInstallerMock{}
// fileInstallerMock is a mock implementation of fileInstaller.
//
// func TestSomethingThatUsesfileInstaller(t *testing.T) {
//
// // make and configure a mocked fileInstaller
// mockedfileInstaller := &fileInstallerMock{
// installContentFunc: func(reader io.Reader, s string, v os.FileMode) error {
// panic("mock out the installContent method")
// },
// installFileFunc: func(s1 string, s2 string) (os.FileMode, error) {
// panic("mock out the installFile method")
// },
// installSymlinkFunc: func(s1 string, s2 string) error {
// panic("mock out the installSymlink method")
// },
// }
//
// // use mockedfileInstaller in code that requires fileInstaller
// // and then make assertions.
//
// }
type fileInstallerMock struct {
// installContentFunc mocks the installContent method.
installContentFunc func(reader io.Reader, s string, v os.FileMode) error
// installFileFunc mocks the installFile method.
installFileFunc func(s1 string, s2 string) (os.FileMode, error)
// installSymlinkFunc mocks the installSymlink method.
installSymlinkFunc func(s1 string, s2 string) error
// calls tracks calls to the methods.
calls struct {
// installContent holds details about calls to the installContent method.
installContent []struct {
// Reader is the reader argument value.
Reader io.Reader
// S is the s argument value.
S string
// V is the v argument value.
V os.FileMode
}
// installFile holds details about calls to the installFile method.
installFile []struct {
// S1 is the s1 argument value.
S1 string
// S2 is the s2 argument value.
S2 string
}
// installSymlink holds details about calls to the installSymlink method.
installSymlink []struct {
// S1 is the s1 argument value.
S1 string
// S2 is the s2 argument value.
S2 string
}
}
lockinstallContent sync.RWMutex
lockinstallFile sync.RWMutex
lockinstallSymlink sync.RWMutex
}
// installContent calls installContentFunc.
func (mock *fileInstallerMock) installContent(reader io.Reader, s string, v os.FileMode) error {
if mock.installContentFunc == nil {
panic("fileInstallerMock.installContentFunc: method is nil but fileInstaller.installContent was just called")
}
callInfo := struct {
Reader io.Reader
S string
V os.FileMode
}{
Reader: reader,
S: s,
V: v,
}
mock.lockinstallContent.Lock()
mock.calls.installContent = append(mock.calls.installContent, callInfo)
mock.lockinstallContent.Unlock()
return mock.installContentFunc(reader, s, v)
}
// installContentCalls gets all the calls that were made to installContent.
// Check the length with:
//
// len(mockedfileInstaller.installContentCalls())
func (mock *fileInstallerMock) installContentCalls() []struct {
Reader io.Reader
S string
V os.FileMode
} {
var calls []struct {
Reader io.Reader
S string
V os.FileMode
}
mock.lockinstallContent.RLock()
calls = mock.calls.installContent
mock.lockinstallContent.RUnlock()
return calls
}
// installFile calls installFileFunc.
func (mock *fileInstallerMock) installFile(s1 string, s2 string) (os.FileMode, error) {
if mock.installFileFunc == nil {
panic("fileInstallerMock.installFileFunc: method is nil but fileInstaller.installFile was just called")
}
callInfo := struct {
S1 string
S2 string
}{
S1: s1,
S2: s2,
}
mock.lockinstallFile.Lock()
mock.calls.installFile = append(mock.calls.installFile, callInfo)
mock.lockinstallFile.Unlock()
return mock.installFileFunc(s1, s2)
}
// installFileCalls gets all the calls that were made to installFile.
// Check the length with:
//
// len(mockedfileInstaller.installFileCalls())
func (mock *fileInstallerMock) installFileCalls() []struct {
S1 string
S2 string
} {
var calls []struct {
S1 string
S2 string
}
mock.lockinstallFile.RLock()
calls = mock.calls.installFile
mock.lockinstallFile.RUnlock()
return calls
}
// installSymlink calls installSymlinkFunc.
func (mock *fileInstallerMock) installSymlink(s1 string, s2 string) error {
if mock.installSymlinkFunc == nil {
panic("fileInstallerMock.installSymlinkFunc: method is nil but fileInstaller.installSymlink was just called")
}
callInfo := struct {
S1 string
S2 string
}{
S1: s1,
S2: s2,
}
mock.lockinstallSymlink.Lock()
mock.calls.installSymlink = append(mock.calls.installSymlink, callInfo)
mock.lockinstallSymlink.Unlock()
return mock.installSymlinkFunc(s1, s2)
}
// installSymlinkCalls gets all the calls that were made to installSymlink.
// Check the length with:
//
// len(mockedfileInstaller.installSymlinkCalls())
func (mock *fileInstallerMock) installSymlinkCalls() []struct {
S1 string
S2 string
} {
var calls []struct {
S1 string
S2 string
}
mock.lockinstallSymlink.RLock()
calls = mock.calls.installSymlink
mock.lockinstallSymlink.RUnlock()
return calls
}

View File

@@ -0,0 +1,172 @@
/**
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package installer
import (
"errors"
"fmt"
"io"
"io/fs"
"os"
"path/filepath"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
//go:generate moq -rm -fmt=goimports -out installer_mock.go . Installer
type Installer interface {
Install(string) error
}
type ToolkitInstaller struct {
logger logger.Interface
ignoreErrors bool
sourceRoot string
artifactRoot *artifactRoot
ensureTargetDirectory Installer
}
var _ Installer = (*ToolkitInstaller)(nil)
// New creates a toolkit installer with the specified options.
func New(opts ...Option) (*ToolkitInstaller, error) {
t := &ToolkitInstaller{
sourceRoot: "/",
}
for _, opt := range opts {
opt(t)
}
if t.logger == nil {
t.logger = logger.New()
}
if t.artifactRoot == nil {
artifactRoot, err := newArtifactRoot(t.logger, t.sourceRoot)
if err != nil {
return nil, err
}
t.artifactRoot = artifactRoot
}
if t.ensureTargetDirectory == nil {
t.ensureTargetDirectory = t.createDirectory()
}
return t, nil
}
// Install ensures that the required toolkit files are installed in the specified directory.
func (t *ToolkitInstaller) Install(destDir string) error {
var installers []Installer
installers = append(installers, t.ensureTargetDirectory)
libraries, err := t.collectLibraries()
if err != nil {
return fmt.Errorf("failed to collect libraries: %w", err)
}
installers = append(installers, libraries...)
executables, err := t.collectExecutables(destDir)
if err != nil {
return fmt.Errorf("failed to collect executables: %w", err)
}
installers = append(installers, executables...)
var errs error
for _, i := range installers {
errs = errors.Join(errs, i.Install(destDir))
}
return errs
}
func (t *ToolkitInstaller) ConfigFilePath(destDir string) string {
toolkitConfigDir := filepath.Join(destDir, ".config", "nvidia-container-runtime")
return filepath.Join(toolkitConfigDir, "config.toml")
}
type symlink struct {
linkname string
target string
}
func (s symlink) Install(destDir string) error {
symlinkPath := filepath.Join(destDir, s.linkname)
return installSymlink(s.target, symlinkPath)
}
//go:generate moq -rm -fmt=goimports -out file-installer_mock.go . fileInstaller
type fileInstaller interface {
installContent(io.Reader, string, os.FileMode) error
installFile(string, string) (os.FileMode, error)
installSymlink(string, string) error
}
var installSymlink = installSymlinkStub
func installSymlinkStub(target string, link string) error {
err := os.Symlink(target, link)
if err != nil {
return fmt.Errorf("error creating symlink '%v' => '%v': %v", link, target, err)
}
return nil
}
var installFile = installFileStub
func installFileStub(src string, dest string) (os.FileMode, error) {
sourceInfo, err := os.Stat(src)
if err != nil {
return 0, fmt.Errorf("error getting file info for '%v': %v", src, err)
}
source, err := os.Open(src)
if err != nil {
return 0, fmt.Errorf("error opening source: %w", err)
}
defer source.Close()
mode := sourceInfo.Mode()
if err := installContent(source, dest, mode); err != nil {
return 0, err
}
return mode, nil
}
var installContent = installContentStub
func installContentStub(content io.Reader, dest string, mode fs.FileMode) error {
destination, err := os.Create(dest)
if err != nil {
return fmt.Errorf("error creating destination: %w", err)
}
defer destination.Close()
_, err = io.Copy(destination, content)
if err != nil {
return fmt.Errorf("error copying file: %w", err)
}
err = os.Chmod(dest, mode)
if err != nil {
return fmt.Errorf("error setting mode for '%v': %v", dest, err)
}
return nil
}

View File

@@ -0,0 +1,74 @@
// Code generated by moq; DO NOT EDIT.
// github.com/matryer/moq
package installer
import (
"sync"
)
// Ensure, that InstallerMock does implement Installer.
// If this is not the case, regenerate this file with moq.
var _ Installer = &InstallerMock{}
// InstallerMock is a mock implementation of Installer.
//
// func TestSomethingThatUsesInstaller(t *testing.T) {
//
// // make and configure a mocked Installer
// mockedInstaller := &InstallerMock{
// InstallFunc: func(s string) error {
// panic("mock out the Install method")
// },
// }
//
// // use mockedInstaller in code that requires Installer
// // and then make assertions.
//
// }
type InstallerMock struct {
// InstallFunc mocks the Install method.
InstallFunc func(s string) error
// calls tracks calls to the methods.
calls struct {
// Install holds details about calls to the Install method.
Install []struct {
// S is the s argument value.
S string
}
}
lockInstall sync.RWMutex
}
// Install calls InstallFunc.
func (mock *InstallerMock) Install(s string) error {
if mock.InstallFunc == nil {
panic("InstallerMock.InstallFunc: method is nil but Installer.Install was just called")
}
callInfo := struct {
S string
}{
S: s,
}
mock.lockInstall.Lock()
mock.calls.Install = append(mock.calls.Install, callInfo)
mock.lockInstall.Unlock()
return mock.InstallFunc(s)
}
// InstallCalls gets all the calls that were made to Install.
// Check the length with:
//
// len(mockedInstaller.InstallCalls())
func (mock *InstallerMock) InstallCalls() []struct {
S string
} {
var calls []struct {
S string
}
mock.lockInstall.RLock()
calls = mock.calls.Install
mock.lockInstall.RUnlock()
return calls
}

View File

@@ -0,0 +1,251 @@
/**
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package installer
import (
"bytes"
"fmt"
"io"
"io/fs"
"os"
"path/filepath"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
)
func TestToolkitInstaller(t *testing.T) {
logger, _ := testlog.NewNullLogger()
type contentCall struct {
wrapper string
path string
mode fs.FileMode
}
var contentCalls []contentCall
installer := &fileInstallerMock{
installFileFunc: func(s1, s2 string) (os.FileMode, error) {
return 0666, nil
},
installContentFunc: func(reader io.Reader, s string, fileMode fs.FileMode) error {
var b bytes.Buffer
if _, err := b.ReadFrom(reader); err != nil {
return err
}
contents := contentCall{
wrapper: b.String(),
path: s,
mode: fileMode,
}
contentCalls = append(contentCalls, contents)
return nil
},
installSymlinkFunc: func(s1, s2 string) error {
return nil
},
}
installFile = installer.installFile
installContent = installer.installContent
installSymlink = installer.installSymlink
root := "/artifacts/test"
libraries := &lookup.LocatorMock{
LocateFunc: func(s string) ([]string, error) {
switch s {
case "libnvidia-container.so.1":
return []string{filepath.Join(root, "libnvidia-container.so.987.65.43")}, nil
case "libnvidia-container-go.so.1":
return []string{filepath.Join(root, "libnvidia-container-go.so.1.23.4")}, nil
}
return nil, fmt.Errorf("%v not found", s)
},
}
executables := &lookup.LocatorMock{
LocateFunc: func(s string) ([]string, error) {
switch s {
case "nvidia-container-runtime.cdi":
fallthrough
case "nvidia-container-runtime.legacy":
fallthrough
case "nvidia-container-runtime":
fallthrough
case "nvidia-ctk":
fallthrough
case "nvidia-container-cli":
fallthrough
case "nvidia-container-runtime-hook":
fallthrough
case "nvidia-cdi-hook":
return []string{filepath.Join(root, "usr/bin", s)}, nil
}
return nil, fmt.Errorf("%v not found", s)
},
}
r := &artifactRoot{
libraries: libraries,
executables: executables,
}
createDirectory := &InstallerMock{
InstallFunc: func(c string) error {
return nil
},
}
i := ToolkitInstaller{
logger: logger,
artifactRoot: r,
ensureTargetDirectory: createDirectory,
}
err := i.Install("/foo/bar/baz")
require.NoError(t, err)
require.ElementsMatch(t,
[]struct {
S string
}{
{"/foo/bar/baz"},
},
createDirectory.InstallCalls(),
)
require.ElementsMatch(t,
installer.installFileCalls(),
[]struct {
S1 string
S2 string
}{
{"/artifacts/test/libnvidia-container-go.so.1.23.4", "/foo/bar/baz/libnvidia-container-go.so.1.23.4"},
{"/artifacts/test/libnvidia-container.so.987.65.43", "/foo/bar/baz/libnvidia-container.so.987.65.43"},
{"/artifacts/test/usr/bin/nvidia-container-runtime.cdi", "/foo/bar/baz/nvidia-container-runtime.cdi.real"},
{"/artifacts/test/usr/bin/nvidia-container-runtime.legacy", "/foo/bar/baz/nvidia-container-runtime.legacy.real"},
{"/artifacts/test/usr/bin/nvidia-container-runtime", "/foo/bar/baz/nvidia-container-runtime.real"},
{"/artifacts/test/usr/bin/nvidia-ctk", "/foo/bar/baz/nvidia-ctk.real"},
{"/artifacts/test/usr/bin/nvidia-cdi-hook", "/foo/bar/baz/nvidia-cdi-hook.real"},
{"/artifacts/test/usr/bin/nvidia-container-cli", "/foo/bar/baz/nvidia-container-cli.real"},
{"/artifacts/test/usr/bin/nvidia-container-runtime-hook", "/foo/bar/baz/nvidia-container-runtime-hook.real"},
},
)
require.ElementsMatch(t,
installer.installSymlinkCalls(),
[]struct {
S1 string
S2 string
}{
{"libnvidia-container-go.so.1.23.4", "/foo/bar/baz/libnvidia-container-go.so.1"},
{"libnvidia-container.so.987.65.43", "/foo/bar/baz/libnvidia-container.so.1"},
{"nvidia-container-runtime-hook", "/foo/bar/baz/nvidia-container-toolkit"},
},
)
require.ElementsMatch(t,
contentCalls,
[]contentCall{
{
path: "/foo/bar/baz/nvidia-container-runtime",
mode: 0777,
wrapper: `#! /bin/sh
cat /proc/modules | grep -e "^nvidia " >/dev/null 2>&1
if [ "${?}" != "0" ]; then
echo "nvidia driver modules are not yet loaded, invoking runc directly"
exec runc "$@"
fi
NVIDIA_CTK_CONFIG_FILE_PATH=/foo/bar/baz/.config/nvidia-container-runtime/config.toml \
PATH=/foo/bar/baz:$PATH \
/foo/bar/baz/nvidia-container-runtime.real \
"$@"
`,
},
{
path: "/foo/bar/baz/nvidia-container-runtime.cdi",
mode: 0777,
wrapper: `#! /bin/sh
cat /proc/modules | grep -e "^nvidia " >/dev/null 2>&1
if [ "${?}" != "0" ]; then
echo "nvidia driver modules are not yet loaded, invoking runc directly"
exec runc "$@"
fi
NVIDIA_CTK_CONFIG_FILE_PATH=/foo/bar/baz/.config/nvidia-container-runtime/config.toml \
PATH=/foo/bar/baz:$PATH \
/foo/bar/baz/nvidia-container-runtime.cdi.real \
"$@"
`,
},
{
path: "/foo/bar/baz/nvidia-container-runtime.legacy",
mode: 0777,
wrapper: `#! /bin/sh
cat /proc/modules | grep -e "^nvidia " >/dev/null 2>&1
if [ "${?}" != "0" ]; then
echo "nvidia driver modules are not yet loaded, invoking runc directly"
exec runc "$@"
fi
NVIDIA_CTK_CONFIG_FILE_PATH=/foo/bar/baz/.config/nvidia-container-runtime/config.toml \
PATH=/foo/bar/baz:$PATH \
/foo/bar/baz/nvidia-container-runtime.legacy.real \
"$@"
`,
},
{
path: "/foo/bar/baz/nvidia-ctk",
mode: 0777,
wrapper: `#! /bin/sh
PATH=/foo/bar/baz:$PATH \
/foo/bar/baz/nvidia-ctk.real \
"$@"
`,
},
{
path: "/foo/bar/baz/nvidia-cdi-hook",
mode: 0777,
wrapper: `#! /bin/sh
PATH=/foo/bar/baz:$PATH \
/foo/bar/baz/nvidia-cdi-hook.real \
"$@"
`,
},
{
path: "/foo/bar/baz/nvidia-container-cli",
mode: 0777,
wrapper: `#! /bin/sh
LD_LIBRARY_PATH=/foo/bar/baz:$LD_LIBRARY_PATH \
PATH=/foo/bar/baz:$PATH \
/foo/bar/baz/nvidia-container-cli.real \
"$@"
`,
},
{
path: "/foo/bar/baz/nvidia-container-runtime-hook",
mode: 0777,
wrapper: `#! /bin/sh
NVIDIA_CTK_CONFIG_FILE_PATH=/foo/bar/baz/.config/nvidia-container-runtime/config.toml \
PATH=/foo/bar/baz:$PATH \
/foo/bar/baz/nvidia-container-runtime-hook.real \
"$@"
`,
},
},
)
}

View File

@@ -0,0 +1,73 @@
/**
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package installer
import (
"path/filepath"
log "github.com/sirupsen/logrus"
)
// collectLibraries locates and installs the libraries that are part of
// the nvidia-container-toolkit.
// A predefined set of library candidates are considered, with the first one
// resulting in success being installed to the toolkit folder. The install process
// resolves the symlink for the library and copies the versioned library itself.
func (t *ToolkitInstaller) collectLibraries() ([]Installer, error) {
requiredLibraries := []string{
"libnvidia-container.so.1",
"libnvidia-container-go.so.1",
}
var installers []Installer
for _, l := range requiredLibraries {
libraryPath, err := t.artifactRoot.findLibrary(l)
if err != nil {
if t.ignoreErrors {
log.Errorf("Ignoring error: %v", err)
continue
}
return nil, err
}
installers = append(installers, library(libraryPath))
if filepath.Base(libraryPath) == l {
continue
}
link := symlink{
linkname: l,
target: filepath.Base(libraryPath),
}
installers = append(installers, link)
}
return installers, nil
}
type library string
// Install copies the library l to the destination folder.
// The same basename is used in the destination folder.
func (l library) Install(destinationDir string) error {
dest := filepath.Join(destinationDir, filepath.Base(string(l)))
_, err := installFile(string(l), dest)
return err
}

View File

@@ -0,0 +1,47 @@
/**
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package installer
import "github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
type Option func(*ToolkitInstaller)
func WithLogger(logger logger.Interface) Option {
return func(ti *ToolkitInstaller) {
ti.logger = logger
}
}
func WithArtifactRoot(artifactRoot *artifactRoot) Option {
return func(ti *ToolkitInstaller) {
ti.artifactRoot = artifactRoot
}
}
func WithIgnoreErrors(ignoreErrors bool) Option {
return func(ti *ToolkitInstaller) {
ti.ignoreErrors = ignoreErrors
}
}
// WithSourceRoot sets the root directory for locating artifacts to be installed.
func WithSourceRoot(sourceRoot string) Option {
return func(ti *ToolkitInstaller) {
ti.sourceRoot = sourceRoot
}
}

View File

@@ -0,0 +1,40 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package toolkit
import "github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
// An Option provides a mechanism to configure an Installer.
type Option func(*Installer)
func WithLogger(logger logger.Interface) Option {
return func(i *Installer) {
i.logger = logger
}
}
func WithToolkitRoot(toolkitRoot string) Option {
return func(i *Installer) {
i.toolkitRoot = toolkitRoot
}
}
func WithSourceRoot(sourceRoot string) Option {
return func(i *Installer) {
i.sourceRoot = sourceRoot
}
}

View File

@@ -0,0 +1,534 @@
/**
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package toolkit
import (
"fmt"
"os"
"path/filepath"
"strings"
"github.com/urfave/cli/v2"
"tags.cncf.io/container-device-interface/pkg/cdi"
"tags.cncf.io/container-device-interface/pkg/parser"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/toolkit/installer"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/system/nvdevices"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi"
transformroot "github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi/transform/root"
)
const (
// DefaultNvidiaDriverRoot specifies the default NVIDIA driver run directory
DefaultNvidiaDriverRoot = "/run/nvidia/driver"
)
type cdiOptions struct {
Enabled bool
outputDir string
kind string
vendor string
class string
}
type Options struct {
DriverRoot string
DevRoot string
DriverRootCtrPath string
DevRootCtrPath string
ContainerRuntimeMode string
ContainerRuntimeDebug string
ContainerRuntimeLogLevel string
ContainerRuntimeModesCdiDefaultKind string
ContainerRuntimeModesCDIAnnotationPrefixes cli.StringSlice
ContainerRuntimeRuntimes cli.StringSlice
ContainerRuntimeHookSkipModeDetection bool
ContainerCLIDebug string
// CDI stores the CDI options for the toolkit.
CDI cdiOptions
createDeviceNodes cli.StringSlice
acceptNVIDIAVisibleDevicesWhenUnprivileged bool
acceptNVIDIAVisibleDevicesAsVolumeMounts bool
ignoreErrors bool
optInFeatures cli.StringSlice
}
func Flags(opts *Options) []cli.Flag {
flags := []cli.Flag{
&cli.StringFlag{
Name: "driver-root",
Aliases: []string{"nvidia-driver-root"},
Value: DefaultNvidiaDriverRoot,
Destination: &opts.DriverRoot,
EnvVars: []string{"NVIDIA_DRIVER_ROOT", "DRIVER_ROOT"},
},
&cli.StringFlag{
Name: "driver-root-ctr-path",
Value: DefaultNvidiaDriverRoot,
Destination: &opts.DriverRootCtrPath,
EnvVars: []string{"DRIVER_ROOT_CTR_PATH"},
},
&cli.StringFlag{
Name: "dev-root",
Usage: "Specify the root where `/dev` is located. If this is not specified, the driver-root is assumed.",
Destination: &opts.DevRoot,
EnvVars: []string{"NVIDIA_DEV_ROOT", "DEV_ROOT"},
},
&cli.StringFlag{
Name: "dev-root-ctr-path",
Usage: "Specify the root where `/dev` is located in the container. If this is not specified, the driver-root-ctr-path is assumed.",
Destination: &opts.DevRootCtrPath,
EnvVars: []string{"DEV_ROOT_CTR_PATH"},
},
&cli.StringFlag{
Name: "nvidia-container-runtime.debug",
Aliases: []string{"nvidia-container-runtime-debug"},
Usage: "Specify the location of the debug log file for the NVIDIA Container Runtime",
Destination: &opts.ContainerRuntimeDebug,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_DEBUG"},
},
&cli.StringFlag{
Name: "nvidia-container-runtime.log-level",
Aliases: []string{"nvidia-container-runtime-debug-log-level"},
Destination: &opts.ContainerRuntimeLogLevel,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_LOG_LEVEL"},
},
&cli.StringFlag{
Name: "nvidia-container-runtime.mode",
Aliases: []string{"nvidia-container-runtime-mode"},
Destination: &opts.ContainerRuntimeMode,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_MODE"},
},
&cli.StringFlag{
Name: "nvidia-container-runtime.modes.cdi.default-kind",
Destination: &opts.ContainerRuntimeModesCdiDefaultKind,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_MODES_CDI_DEFAULT_KIND"},
},
&cli.StringSliceFlag{
Name: "nvidia-container-runtime.modes.cdi.annotation-prefixes",
Destination: &opts.ContainerRuntimeModesCDIAnnotationPrefixes,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_MODES_CDI_ANNOTATION_PREFIXES"},
},
&cli.StringSliceFlag{
Name: "nvidia-container-runtime.runtimes",
Destination: &opts.ContainerRuntimeRuntimes,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_RUNTIMES"},
},
&cli.BoolFlag{
Name: "nvidia-container-runtime-hook.skip-mode-detection",
Value: true,
Destination: &opts.ContainerRuntimeHookSkipModeDetection,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_HOOK_SKIP_MODE_DETECTION"},
},
&cli.StringFlag{
Name: "nvidia-container-cli.debug",
Aliases: []string{"nvidia-container-cli-debug"},
Usage: "Specify the location of the debug log file for the NVIDIA Container CLI",
Destination: &opts.ContainerCLIDebug,
EnvVars: []string{"NVIDIA_CONTAINER_CLI_DEBUG"},
},
&cli.BoolFlag{
Name: "accept-nvidia-visible-devices-envvar-when-unprivileged",
Usage: "Set the accept-nvidia-visible-devices-envvar-when-unprivileged config option",
Value: true,
Destination: &opts.acceptNVIDIAVisibleDevicesWhenUnprivileged,
EnvVars: []string{"ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED"},
},
&cli.BoolFlag{
Name: "accept-nvidia-visible-devices-as-volume-mounts",
Usage: "Set the accept-nvidia-visible-devices-as-volume-mounts config option",
Destination: &opts.acceptNVIDIAVisibleDevicesAsVolumeMounts,
EnvVars: []string{"ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS"},
},
&cli.BoolFlag{
Name: "cdi-enabled",
Aliases: []string{"enable-cdi"},
Usage: "enable the generation of a CDI specification",
Destination: &opts.CDI.Enabled,
EnvVars: []string{"CDI_ENABLED", "ENABLE_CDI"},
},
&cli.StringFlag{
Name: "cdi-output-dir",
Usage: "the directory where the CDI output files are to be written. If this is set to '', no CDI specification is generated.",
Value: "/var/run/cdi",
Destination: &opts.CDI.outputDir,
EnvVars: []string{"CDI_OUTPUT_DIR"},
},
&cli.StringFlag{
Name: "cdi-kind",
Usage: "the vendor string to use for the generated CDI specification",
Value: "management.nvidia.com/gpu",
Destination: &opts.CDI.kind,
EnvVars: []string{"CDI_KIND"},
},
&cli.BoolFlag{
Name: "ignore-errors",
Usage: "ignore errors when installing the NVIDIA Container toolkit. This is used for testing purposes only.",
Hidden: true,
Destination: &opts.ignoreErrors,
},
&cli.StringSliceFlag{
Name: "create-device-nodes",
Usage: "(Only applicable with --cdi-enabled) specifies which device nodes should be created. If any one of the options is set to '' or 'none', no device nodes will be created.",
Value: cli.NewStringSlice("control"),
Destination: &opts.createDeviceNodes,
EnvVars: []string{"CREATE_DEVICE_NODES"},
},
&cli.StringSliceFlag{
Name: "opt-in-features",
Hidden: true,
Destination: &opts.optInFeatures,
EnvVars: []string{"NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES"},
},
}
return flags
}
// An Installer is used to install the NVIDIA Container Toolkit from the toolkit container.
type Installer struct {
logger logger.Interface
sourceRoot string
// toolkitRoot specifies the destination path at which the toolkit is installed.
toolkitRoot string
}
// NewInstaller creates an installer for the NVIDIA Container Toolkit.
func NewInstaller(opts ...Option) *Installer {
i := &Installer{}
for _, opt := range opts {
opt(i)
}
if i.logger == nil {
i.logger = logger.New()
}
return i
}
// ValidateOptions checks whether the specified options are valid
func (t *Installer) ValidateOptions(opts *Options) error {
if t == nil {
return fmt.Errorf("toolkit installer is not initilized")
}
if t.toolkitRoot == "" {
return fmt.Errorf("invalid --toolkit-root option: %v", t.toolkitRoot)
}
vendor, class := parser.ParseQualifier(opts.CDI.kind)
if err := parser.ValidateVendorName(vendor); err != nil {
return fmt.Errorf("invalid CDI vendor name: %v", err)
}
if err := parser.ValidateClassName(class); err != nil {
return fmt.Errorf("invalid CDI class name: %v", err)
}
opts.CDI.vendor = vendor
opts.CDI.class = class
if opts.CDI.Enabled && opts.CDI.outputDir == "" {
t.logger.Warning("Skipping CDI spec generation (no output directory specified)")
opts.CDI.Enabled = false
}
isDisabled := false
for _, mode := range opts.createDeviceNodes.Value() {
if mode != "" && mode != "none" && mode != "control" {
return fmt.Errorf("invalid --create-device-nodes value: %v", mode)
}
if mode == "" || mode == "none" {
isDisabled = true
break
}
}
if !opts.CDI.Enabled && !isDisabled {
t.logger.Info("disabling device node creation since --cdi-enabled=false")
isDisabled = true
}
if isDisabled {
opts.createDeviceNodes = *cli.NewStringSlice()
}
return nil
}
// Install installs the components of the NVIDIA container toolkit.
// Any existing installation is removed.
func (t *Installer) Install(cli *cli.Context, opts *Options) error {
if t == nil {
return fmt.Errorf("toolkit installer is not initilized")
}
t.logger.Infof("Installing NVIDIA container toolkit to '%v'", t.toolkitRoot)
t.logger.Infof("Removing existing NVIDIA container toolkit installation")
err := os.RemoveAll(t.toolkitRoot)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error removing toolkit directory: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error removing toolkit directory: %v", err))
}
// Create a toolkit installer to actually install the toolkit components.
toolkit, err := installer.New(
installer.WithLogger(t.logger),
installer.WithSourceRoot(t.sourceRoot),
installer.WithIgnoreErrors(opts.ignoreErrors),
)
if err != nil {
if !opts.ignoreErrors {
return fmt.Errorf("could not create toolkit installer: %w", err)
}
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("could not create toolkit installer: %w", err))
}
if err := toolkit.Install(t.toolkitRoot); err != nil {
if !opts.ignoreErrors {
return fmt.Errorf("could not install toolkit components: %w", err)
}
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("could not install toolkit components: %w", err))
}
err = t.installToolkitConfig(cli, opts, toolkit.ConfigFilePath(t.toolkitRoot))
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error installing NVIDIA container toolkit config: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error installing NVIDIA container toolkit config: %v", err))
}
err = t.createDeviceNodes(opts)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error creating device nodes: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error creating device nodes: %v", err))
}
nvidiaCDIHookPath := filepath.Join(t.toolkitRoot, "nvidia-cdi-hook")
err = t.generateCDISpec(opts, nvidiaCDIHookPath)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error generating CDI specification: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error generating CDI specification: %v", err))
}
return nil
}
// installToolkitConfig installs the config file for the NVIDIA container toolkit ensuring
// that the settings are updated to match the desired install and nvidia driver directories.
func (t *Installer) installToolkitConfig(c *cli.Context, opts *Options, toolkitConfigPath string) error {
t.logger.Infof("Installing NVIDIA container toolkit config '%v'", toolkitConfigPath)
err := t.createDirectories(filepath.Dir(toolkitConfigPath))
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("could not create required directories: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("could not create required directories: %v", err))
}
nvidiaContainerCliExecutablePath := filepath.Join(t.toolkitRoot, "nvidia-container-cli")
nvidiaCTKPath := filepath.Join(t.toolkitRoot, "nvidia-ctk")
nvidiaContainerRuntimeHookPath := filepath.Join(t.toolkitRoot, "nvidia-container-runtime-hook")
cfg, err := config.New()
if err != nil {
return fmt.Errorf("could not open source config file: %v", err)
}
targetConfig, err := os.Create(toolkitConfigPath)
if err != nil {
return fmt.Errorf("could not create target config file: %v", err)
}
defer targetConfig.Close()
// Read the ldconfig path from the config as this may differ per platform
// On ubuntu-based systems this ends in `.real`
ldconfigPath := fmt.Sprintf("%s", cfg.GetDefault("nvidia-container-cli.ldconfig", "/sbin/ldconfig"))
// Use the driver run root as the root:
driverLdconfigPath := config.NormalizeLDConfigPath("@" + filepath.Join(opts.DriverRoot, strings.TrimPrefix(ldconfigPath, "@/")))
configValues := map[string]interface{}{
// Set the options in the root toml table
"accept-nvidia-visible-devices-envvar-when-unprivileged": opts.acceptNVIDIAVisibleDevicesWhenUnprivileged,
"accept-nvidia-visible-devices-as-volume-mounts": opts.acceptNVIDIAVisibleDevicesAsVolumeMounts,
// Set the nvidia-container-cli options
"nvidia-container-cli.root": opts.DriverRoot,
"nvidia-container-cli.path": nvidiaContainerCliExecutablePath,
"nvidia-container-cli.ldconfig": driverLdconfigPath,
// Set nvidia-ctk options
"nvidia-ctk.path": nvidiaCTKPath,
// Set the nvidia-container-runtime-hook options
"nvidia-container-runtime-hook.path": nvidiaContainerRuntimeHookPath,
"nvidia-container-runtime-hook.skip-mode-detection": opts.ContainerRuntimeHookSkipModeDetection,
}
toolkitRuntimeList := opts.ContainerRuntimeRuntimes.Value()
if len(toolkitRuntimeList) > 0 {
configValues["nvidia-container-runtime.runtimes"] = toolkitRuntimeList
}
for _, optInFeature := range opts.optInFeatures.Value() {
configValues["features."+optInFeature] = true
}
for key, value := range configValues {
cfg.Set(key, value)
}
// Set the optional config options
optionalConfigValues := map[string]interface{}{
"nvidia-container-runtime.debug": opts.ContainerRuntimeDebug,
"nvidia-container-runtime.log-level": opts.ContainerRuntimeLogLevel,
"nvidia-container-runtime.mode": opts.ContainerRuntimeMode,
"nvidia-container-runtime.modes.cdi.annotation-prefixes": opts.ContainerRuntimeModesCDIAnnotationPrefixes,
"nvidia-container-runtime.modes.cdi.default-kind": opts.ContainerRuntimeModesCdiDefaultKind,
"nvidia-container-runtime.runtimes": opts.ContainerRuntimeRuntimes,
"nvidia-container-cli.debug": opts.ContainerCLIDebug,
}
for key, value := range optionalConfigValues {
if !c.IsSet(key) {
t.logger.Infof("Skipping unset option: %v", key)
continue
}
if value == nil {
t.logger.Infof("Skipping option with nil value: %v", key)
continue
}
switch v := value.(type) {
case string:
if v == "" {
continue
}
case cli.StringSlice:
if len(v.Value()) == 0 {
continue
}
value = v.Value()
default:
t.logger.Warningf("Unexpected type for option %v=%v: %T", key, value, v)
}
cfg.Set(key, value)
}
if _, err := cfg.WriteTo(targetConfig); err != nil {
return fmt.Errorf("error writing config: %v", err)
}
os.Stdout.WriteString("Using config:\n")
if _, err = cfg.WriteTo(os.Stdout); err != nil {
t.logger.Warningf("Failed to output config to STDOUT: %v", err)
}
return nil
}
func (t *Installer) createDirectories(dir ...string) error {
for _, d := range dir {
t.logger.Infof("Creating directory '%v'", d)
err := os.MkdirAll(d, 0755)
if err != nil {
return fmt.Errorf("error creating directory: %v", err)
}
}
return nil
}
func (t *Installer) createDeviceNodes(opts *Options) error {
modes := opts.createDeviceNodes.Value()
if len(modes) == 0 {
return nil
}
devices, err := nvdevices.New(
nvdevices.WithDevRoot(opts.DevRootCtrPath),
)
if err != nil {
return fmt.Errorf("failed to create library: %v", err)
}
for _, mode := range modes {
t.logger.Infof("Creating %v device nodes at %v", mode, opts.DevRootCtrPath)
if mode != "control" {
t.logger.Warningf("Unrecognised device mode: %v", mode)
continue
}
if err := devices.CreateNVIDIAControlDevices(); err != nil {
return fmt.Errorf("failed to create control device nodes: %v", err)
}
}
return nil
}
// generateCDISpec generates a CDI spec for use in management containers
func (t *Installer) generateCDISpec(opts *Options, nvidiaCDIHookPath string) error {
if !opts.CDI.Enabled {
return nil
}
t.logger.Info("Generating CDI spec for management containers")
cdilib, err := nvcdi.New(
nvcdi.WithLogger(t.logger),
nvcdi.WithMode(nvcdi.ModeManagement),
nvcdi.WithDriverRoot(opts.DriverRootCtrPath),
nvcdi.WithDevRoot(opts.DevRootCtrPath),
nvcdi.WithNVIDIACDIHookPath(nvidiaCDIHookPath),
nvcdi.WithVendor(opts.CDI.vendor),
nvcdi.WithClass(opts.CDI.class),
)
if err != nil {
return fmt.Errorf("failed to create CDI library for management containers: %v", err)
}
spec, err := cdilib.GetSpec()
if err != nil {
return fmt.Errorf("failed to genereate CDI spec for management containers: %v", err)
}
transformer := transformroot.NewDriverTransformer(
transformroot.WithDriverRoot(opts.DriverRootCtrPath),
transformroot.WithTargetDriverRoot(opts.DriverRoot),
transformroot.WithDevRoot(opts.DevRootCtrPath),
transformroot.WithTargetDevRoot(opts.DevRoot),
)
if err := transformer.Transform(spec.Raw()); err != nil {
return fmt.Errorf("failed to transform driver root in CDI spec: %v", err)
}
name, err := cdi.GenerateNameForSpec(spec.Raw())
if err != nil {
return fmt.Errorf("failed to generate CDI name for management containers: %v", err)
}
err = spec.Save(filepath.Join(opts.CDI.outputDir, name))
if err != nil {
return fmt.Errorf("failed to save CDI spec for management containers: %v", err)
}
return nil
}

View File

@@ -0,0 +1,232 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package toolkit
import (
"fmt"
"os"
"path/filepath"
"strings"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup/symlinks"
"github.com/NVIDIA/nvidia-container-toolkit/internal/test"
)
func TestInstall(t *testing.T) {
t.Setenv("__NVCT_TESTING_DEVICES_ARE_FILES", "true")
logger, _ := testlog.NewNullLogger()
moduleRoot, err := test.GetModuleRoot()
require.NoError(t, err)
artifactRoot := filepath.Join(moduleRoot, "testdata", "installer", "artifacts")
testCases := []struct {
description string
hostRoot string
packageType string
cdiEnabled bool
expectedError error
expectedCdiSpec string
}{
{
hostRoot: "rootfs-empty",
packageType: "deb",
},
{
hostRoot: "rootfs-empty",
packageType: "rpm",
},
{
hostRoot: "rootfs-empty",
packageType: "deb",
cdiEnabled: true,
expectedError: fmt.Errorf("no NVIDIA device nodes found"),
},
{
hostRoot: "rootfs-1",
packageType: "deb",
cdiEnabled: true,
expectedCdiSpec: `---
cdiVersion: 0.5.0
kind: example.com/class
devices:
- name: all
containerEdits:
deviceNodes:
- path: /dev/nvidia0
hostPath: /host/driver/root/dev/nvidia0
- path: /dev/nvidiactl
hostPath: /host/driver/root/dev/nvidiactl
- path: /dev/nvidia-caps-imex-channels/channel0
hostPath: /host/driver/root/dev/nvidia-caps-imex-channels/channel0
- path: /dev/nvidia-caps-imex-channels/channel1
hostPath: /host/driver/root/dev/nvidia-caps-imex-channels/channel1
- path: /dev/nvidia-caps-imex-channels/channel2047
hostPath: /host/driver/root/dev/nvidia-caps-imex-channels/channel2047
containerEdits:
env:
- NVIDIA_CTK_LIBCUDA_DIR=/lib/x86_64-linux-gnu
- NVIDIA_VISIBLE_DEVICES=void
hooks:
- hookName: createContainer
path: {{ .toolkitRoot }}/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- create-symlinks
- --link
- libcuda.so.1::/lib/x86_64-linux-gnu/libcuda.so
env:
- NVIDIA_CTK_DEBUG=false
- hookName: createContainer
path: {{ .toolkitRoot }}/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- create-soname-symlinks
- --folder
- /lib/x86_64-linux-gnu
env:
- NVIDIA_CTK_DEBUG=false
- hookName: createContainer
path: {{ .toolkitRoot }}/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- update-ldcache
- --folder
- /lib/x86_64-linux-gnu
env:
- NVIDIA_CTK_DEBUG=false
mounts:
- hostPath: /host/driver/root/lib/x86_64-linux-gnu/libcuda.so.999.88.77
containerPath: /lib/x86_64-linux-gnu/libcuda.so.999.88.77
options:
- ro
- nosuid
- nodev
- rbind
- rprivate
`,
},
}
for _, tc := range testCases {
// hostRoot := filepath.Join(moduleRoot, "testdata", "lookup", tc.hostRoot)
t.Run(tc.description, func(t *testing.T) {
testRoot := t.TempDir()
toolkitRoot := filepath.Join(testRoot, "toolkit-test")
cdiOutputDir := filepath.Join(moduleRoot, "toolkit-test", "/var/cdi")
sourceRoot := filepath.Join(artifactRoot, tc.packageType)
options := Options{
DriverRoot: "/host/driver/root",
DriverRootCtrPath: filepath.Join(moduleRoot, "testdata", "lookup", tc.hostRoot),
CDI: cdiOptions{
Enabled: tc.cdiEnabled,
outputDir: cdiOutputDir,
kind: "example.com/class",
},
}
ti := NewInstaller(
WithLogger(logger),
WithToolkitRoot(toolkitRoot),
WithSourceRoot(sourceRoot),
)
require.NoError(t, ti.ValidateOptions(&options))
err := ti.Install(&cli.Context{}, &options)
if tc.expectedError == nil {
require.NoError(t, err)
} else {
require.Contains(t, err.Error(), tc.expectedError.Error())
}
require.DirExists(t, toolkitRoot)
requireSymlink(t, toolkitRoot, "libnvidia-container.so.1", "libnvidia-container.so.99.88.77")
requireSymlink(t, toolkitRoot, "libnvidia-container-go.so.1", "libnvidia-container-go.so.99.88.77")
requireWrappedExecutable(t, toolkitRoot, "nvidia-cdi-hook")
requireWrappedExecutable(t, toolkitRoot, "nvidia-container-cli")
requireWrappedExecutable(t, toolkitRoot, "nvidia-container-runtime")
requireWrappedExecutable(t, toolkitRoot, "nvidia-container-runtime-hook")
requireWrappedExecutable(t, toolkitRoot, "nvidia-container-runtime.cdi")
requireWrappedExecutable(t, toolkitRoot, "nvidia-container-runtime.legacy")
requireWrappedExecutable(t, toolkitRoot, "nvidia-ctk")
requireSymlink(t, toolkitRoot, "nvidia-container-toolkit", "nvidia-container-runtime-hook")
// TODO: Add checks for wrapper contents
// grep -q -E "nvidia driver modules are not yet loaded, invoking runc directly" "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-runtime"
// grep -q -E "exec runc \".@\"" "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-runtime"
require.DirExists(t, filepath.Join(toolkitRoot, ".config"))
require.DirExists(t, filepath.Join(toolkitRoot, ".config", "nvidia-container-runtime"))
require.FileExists(t, filepath.Join(toolkitRoot, ".config", "nvidia-container-runtime", "config.toml"))
cfgToml, err := config.New(config.WithConfigFile(filepath.Join(toolkitRoot, ".config", "nvidia-container-runtime", "config.toml")))
require.NoError(t, err)
cfg, err := cfgToml.Config()
require.NoError(t, err)
// Ensure that the config file has the required contents.
// TODO: Add checks for additional config options.
require.Equal(t, "/host/driver/root", cfg.NVIDIAContainerCLIConfig.Root)
require.Equal(t, "@/host/driver/root/sbin/ldconfig", string(cfg.NVIDIAContainerCLIConfig.Ldconfig))
require.EqualValues(t, filepath.Join(toolkitRoot, "nvidia-container-cli"), cfg.NVIDIAContainerCLIConfig.Path)
require.EqualValues(t, filepath.Join(toolkitRoot, "nvidia-ctk"), cfg.NVIDIACTKConfig.Path)
if len(tc.expectedCdiSpec) > 0 {
cdiSpecFile := filepath.Join(cdiOutputDir, "example.com-class.yaml")
require.FileExists(t, cdiSpecFile)
info, err := os.Stat(cdiSpecFile)
require.NoError(t, err)
require.NotZero(t, info.Mode()&0004)
contents, err := os.ReadFile(cdiSpecFile)
require.NoError(t, err)
require.Equal(t, strings.ReplaceAll(tc.expectedCdiSpec, "{{ .toolkitRoot }}", toolkitRoot), string(contents))
}
})
}
}
func requireWrappedExecutable(t *testing.T, toolkitRoot string, expectedExecutable string) {
requireExecutable(t, toolkitRoot, expectedExecutable)
requireExecutable(t, toolkitRoot, expectedExecutable+".real")
}
func requireExecutable(t *testing.T, toolkitRoot string, expectedExecutable string) {
executable := filepath.Join(toolkitRoot, expectedExecutable)
require.FileExists(t, executable)
info, err := os.Lstat(executable)
require.NoError(t, err)
require.Zero(t, info.Mode()&os.ModeSymlink)
require.NotZero(t, info.Mode()&0111)
}
func requireSymlink(t *testing.T, toolkitRoot string, expectedLink string, expectedTarget string) {
link := filepath.Join(toolkitRoot, expectedLink)
require.FileExists(t, link)
target, err := symlinks.Resolve(link)
require.NoError(t, err)
require.Equal(t, expectedTarget, target)
}

View File

@@ -16,9 +16,34 @@ nvidia-ctk runtime configure --set-as-default
will ensure that the NVIDIA Container Runtime is added as the default runtime to the default container
engine.
## Configure the NVIDIA Container Toolkit
The `config` command of the `nvidia-ctk` CLI allows a user to display and manipulate the NVIDIA Container Toolkit
configuration.
For example, running the following command:
```bash
nvidia-ctk config default
```
will display the default config for the detected platform.
Whereas
```bash
nvidia-ctk config
```
will display the effective NVIDIA Container Toolkit config using the configured config file, and running:
Individual config options can be set by specifying these are key-value pairs to the `--set` argument:
```bash
nvidia-ctk config --set nvidia-container-cli.no-cgroups=true
```
By default, all commands output to `STDOUT`, but specifying the `--output` flag writes the config to the specified file.
### Generate CDI specifications
The [Container Device Interface (CDI)](https://github.com/container-orchestrated-devices/container-device-interface) provides
The [Container Device Interface (CDI)](https://tags.cncf.io/container-device-interface) provides
a vendor-agnostic mechanism to make arbitrary devices accessible in containerized environments. To allow NVIDIA devices to be
used in these environments, the NVIDIA Container Toolkit CLI includes functionality to generate a CDI specification for the
available NVIDIA GPUs in a system.

View File

@@ -17,18 +17,20 @@
package cdi
import (
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/generate"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/transform"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/generate"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/list"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/transform"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type command struct {
logger *logrus.Logger
logger logger.Interface
}
// NewCommand constructs an info command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
@@ -46,6 +48,7 @@ func (m command) build() *cli.Command {
hook.Subcommands = []*cli.Command{
generate.NewCommand(m.logger),
transform.NewCommand(m.logger),
list.NewCommand(m.logger),
}
return &hook

View File

@@ -22,14 +22,17 @@ import (
"path/filepath"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/edits"
"github.com/urfave/cli/v2"
cdi "tags.cncf.io/container-device-interface/pkg/parser"
"github.com/NVIDIA/go-nvml/pkg/nvml"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/platform-support/tegra/csv"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi/spec"
"github.com/container-orchestrated-devices/container-device-interface/pkg/cdi"
specs "github.com/container-orchestrated-devices/container-device-interface/specs-go"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi/transform"
)
const (
@@ -37,22 +40,36 @@ const (
)
type command struct {
logger *logrus.Logger
logger logger.Interface
}
type config struct {
output string
format string
deviceNameStrategy string
driverRoot string
nvidiaCTKPath string
mode string
vendor string
class string
type options struct {
output string
format string
deviceNameStrategies cli.StringSlice
driverRoot string
devRoot string
nvidiaCDIHookPath string
ldconfigPath string
mode string
vendor string
class string
configSearchPaths cli.StringSlice
librarySearchPaths cli.StringSlice
disabledHooks cli.StringSlice
csv struct {
files cli.StringSlice
ignorePatterns cli.StringSlice
}
// the following are used for dependency injection during spec generation.
nvmllib nvml.Interface
}
// NewCommand constructs a generate-cdi command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
@@ -61,127 +78,183 @@ func NewCommand(logger *logrus.Logger) *cli.Command {
// build creates the CLI command
func (m command) build() *cli.Command {
cfg := config{}
opts := options{}
// Create the 'generate-cdi' command
c := cli.Command{
Name: "generate",
Usage: "Generate CDI specifications for use with CDI-enabled runtimes",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &cfg)
return m.validateFlags(c, &opts)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
return m.run(c, &opts)
},
}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "config-search-path",
Usage: "Specify the path to search for config files when discovering the entities that should be included in the CDI specification.",
Destination: &opts.configSearchPaths,
EnvVars: []string{"NVIDIA_CTK_CDI_GENERATE_CONFIG_SEARCH_PATHS"},
},
&cli.StringFlag{
Name: "output",
Usage: "Specify the file to output the generated CDI specification to. If this is '' the specification is output to STDOUT",
Destination: &cfg.output,
Destination: &opts.output,
EnvVars: []string{"NVIDIA_CTK_CDI_OUTPUT_FILE_PATH"},
},
&cli.StringFlag{
Name: "format",
Usage: "The output format for the generated spec [json | yaml]. This overrides the format defined by the output file extension (if specified).",
Value: spec.FormatYAML,
Destination: &cfg.format,
Destination: &opts.format,
EnvVars: []string{"NVIDIA_CTK_CDI_GENERATE_OUTPUT_FORMAT"},
},
&cli.StringFlag{
Name: "mode",
Aliases: []string{"discovery-mode"},
Usage: "The mode to use when discovering the available entities. One of [auto | nvml | wsl]. If mode is set to 'auto' the mode will be determined based on the system configuration.",
Value: nvcdi.ModeAuto,
Destination: &cfg.mode,
Name: "mode",
Aliases: []string{"discovery-mode"},
Usage: "The mode to use when discovering the available entities. " +
"One of [" + strings.Join(nvcdi.AllModes[string](), " | ") + "]. " +
"If mode is set to 'auto' the mode will be determined based on the system configuration.",
Value: string(nvcdi.ModeAuto),
Destination: &opts.mode,
EnvVars: []string{"NVIDIA_CTK_CDI_GENERATE_MODE"},
},
&cli.StringFlag{
Name: "dev-root",
Usage: "Specify the root where `/dev` is located. If this is not specified, the driver-root is assumed.",
Destination: &opts.devRoot,
EnvVars: []string{"NVIDIA_CTK_DEV_ROOT"},
},
&cli.StringSliceFlag{
Name: "device-name-strategy",
Usage: "Specify the strategy for generating device names. One of [index | uuid | type-index]",
Value: nvcdi.DeviceNameStrategyIndex,
Destination: &cfg.deviceNameStrategy,
Usage: "Specify the strategy for generating device names. If this is specified multiple times, the devices will be duplicated for each strategy. One of [index | uuid | type-index]",
Value: cli.NewStringSlice(nvcdi.DeviceNameStrategyIndex, nvcdi.DeviceNameStrategyUUID),
Destination: &opts.deviceNameStrategies,
EnvVars: []string{"NVIDIA_CTK_CDI_GENERATE_DEVICE_NAME_STRATEGIES"},
},
&cli.StringFlag{
Name: "driver-root",
Usage: "Specify the NVIDIA GPU driver root to use when discovering the entities that should be included in the CDI specification.",
Destination: &cfg.driverRoot,
Destination: &opts.driverRoot,
EnvVars: []string{"NVIDIA_CTK_DRIVER_ROOT"},
},
&cli.StringSliceFlag{
Name: "library-search-path",
Usage: "Specify the path to search for libraries when discovering the entities that should be included in the CDI specification.\n\tNote: This option only applies to CSV mode.",
Destination: &opts.librarySearchPaths,
EnvVars: []string{"NVIDIA_CTK_CDI_GENERATE_LIBRARY_SEARCH_PATHS"},
},
&cli.StringFlag{
Name: "nvidia-ctk-path",
Usage: "Specify the path to use for the nvidia-ctk in the generated CDI specification. If this is left empty, the path will be searched.",
Destination: &cfg.nvidiaCTKPath,
Name: "nvidia-cdi-hook-path",
Aliases: []string{"nvidia-ctk-path"},
Usage: "Specify the path to use for the nvidia-cdi-hook in the generated CDI specification. " +
"If not specified, the PATH will be searched for `nvidia-cdi-hook`. " +
"NOTE: That if this is specified as `nvidia-ctk`, the PATH will be searched for `nvidia-ctk` instead.",
Destination: &opts.nvidiaCDIHookPath,
EnvVars: []string{"NVIDIA_CTK_CDI_HOOK_PATH"},
},
&cli.StringFlag{
Name: "ldconfig-path",
Usage: "Specify the path to use for ldconfig in the generated CDI specification",
Destination: &opts.ldconfigPath,
EnvVars: []string{"NVIDIA_CTK_CDI_GENERATE_LDCONFIG_PATH"},
},
&cli.StringFlag{
Name: "vendor",
Aliases: []string{"cdi-vendor"},
Usage: "the vendor string to use for the generated CDI specification.",
Value: "nvidia.com",
Destination: &cfg.vendor,
Destination: &opts.vendor,
EnvVars: []string{"NVIDIA_CTK_CDI_GENERATE_VENDOR"},
},
&cli.StringFlag{
Name: "class",
Aliases: []string{"cdi-class"},
Usage: "the class string to use for the generated CDI specification.",
Value: "gpu",
Destination: &cfg.class,
Destination: &opts.class,
EnvVars: []string{"NVIDIA_CTK_CDI_GENERATE_CLASS"},
},
&cli.StringSliceFlag{
Name: "csv.file",
Usage: "The path to the list of CSV files to use when generating the CDI specification in CSV mode.",
Value: cli.NewStringSlice(csv.DefaultFileList()...),
Destination: &opts.csv.files,
EnvVars: []string{"NVIDIA_CTK_CDI_GENERATE_CSV_FILES"},
},
&cli.StringSliceFlag{
Name: "csv.ignore-pattern",
Usage: "specify a pattern the CSV mount specifications.",
Destination: &opts.csv.ignorePatterns,
EnvVars: []string{"NVIDIA_CTK_CDI_GENERATE_CSV_IGNORE_PATTERNS"},
},
&cli.StringSliceFlag{
Name: "disable-hook",
Aliases: []string{"disable-hooks"},
Usage: "specify a specific hook to skip when generating CDI " +
"specifications. This can be specified multiple times and the " +
"special hook name 'all' can be used ensure that the generated " +
"CDI specification does not include any hooks.",
Destination: &opts.disabledHooks,
EnvVars: []string{"NVIDIA_CTK_CDI_GENERATE_DISABLED_HOOKS"},
},
}
return &c
}
func (m command) validateFlags(c *cli.Context, cfg *config) error {
cfg.format = strings.ToLower(cfg.format)
switch cfg.format {
func (m command) validateFlags(c *cli.Context, opts *options) error {
opts.format = strings.ToLower(opts.format)
switch opts.format {
case spec.FormatJSON:
case spec.FormatYAML:
default:
return fmt.Errorf("invalid output format: %v", cfg.format)
return fmt.Errorf("invalid output format: %v", opts.format)
}
cfg.mode = strings.ToLower(cfg.mode)
switch cfg.mode {
case nvcdi.ModeAuto:
case nvcdi.ModeNvml:
case nvcdi.ModeWsl:
case nvcdi.ModeManagement:
default:
return fmt.Errorf("invalid discovery mode: %v", cfg.mode)
opts.mode = strings.ToLower(opts.mode)
if !nvcdi.IsValidMode(opts.mode) {
return fmt.Errorf("invalid discovery mode: %v", opts.mode)
}
_, err := nvcdi.NewDeviceNamer(cfg.deviceNameStrategy)
if err != nil {
return err
}
cfg.nvidiaCTKPath = discover.FindNvidiaCTK(m.logger, cfg.nvidiaCTKPath)
if outputFileFormat := formatFromFilename(cfg.output); outputFileFormat != "" {
m.logger.Debugf("Inferred output format as %q from output file name", outputFileFormat)
if !c.IsSet("format") {
cfg.format = outputFileFormat
} else if outputFileFormat != cfg.format {
m.logger.Warningf("Requested output format %q does not match format implied by output file name: %q", cfg.format, outputFileFormat)
for _, strategy := range opts.deviceNameStrategies.Value() {
_, err := nvcdi.NewDeviceNamer(strategy)
if err != nil {
return err
}
}
if err := cdi.ValidateVendorName(cfg.vendor); err != nil {
opts.nvidiaCDIHookPath = config.ResolveNVIDIACDIHookPath(m.logger, opts.nvidiaCDIHookPath)
if outputFileFormat := formatFromFilename(opts.output); outputFileFormat != "" {
m.logger.Debugf("Inferred output format as %q from output file name", outputFileFormat)
if !c.IsSet("format") {
opts.format = outputFileFormat
} else if outputFileFormat != opts.format {
m.logger.Warningf("Requested output format %q does not match format implied by output file name: %q", opts.format, outputFileFormat)
}
}
if err := cdi.ValidateVendorName(opts.vendor); err != nil {
return fmt.Errorf("invalid CDI vendor name: %v", err)
}
if err := cdi.ValidateClassName(cfg.class); err != nil {
if err := cdi.ValidateClassName(opts.class); err != nil {
return fmt.Errorf("invalid CDI class name: %v", err)
}
return nil
}
func (m command) run(c *cli.Context, cfg *config) error {
spec, err := m.generateSpec(cfg)
func (m command) run(c *cli.Context, opts *options) error {
spec, err := m.generateSpec(opts)
if err != nil {
return fmt.Errorf("failed to generate CDI spec: %v", err)
}
m.logger.Infof("Generated CDI spec with version %v", spec.Raw().Version)
if cfg.output == "" {
if opts.output == "" {
_, err := spec.WriteTo(os.Stdout)
if err != nil {
return fmt.Errorf("failed to write CDI spec to STDOUT: %v", err)
@@ -189,7 +262,7 @@ func (m command) run(c *cli.Context, cfg *config) error {
return nil
}
return spec.Save(cfg.output)
return spec.Save(opts.output)
}
func formatFromFilename(filename string) string {
@@ -204,19 +277,37 @@ func formatFromFilename(filename string) string {
return ""
}
func (m command) generateSpec(cfg *config) (spec.Interface, error) {
deviceNamer, err := nvcdi.NewDeviceNamer(cfg.deviceNameStrategy)
if err != nil {
return nil, fmt.Errorf("failed to create device namer: %v", err)
func (m command) generateSpec(opts *options) (spec.Interface, error) {
var deviceNamers []nvcdi.DeviceNamer
for _, strategy := range opts.deviceNameStrategies.Value() {
deviceNamer, err := nvcdi.NewDeviceNamer(strategy)
if err != nil {
return nil, fmt.Errorf("failed to create device namer: %v", err)
}
deviceNamers = append(deviceNamers, deviceNamer)
}
cdilib, err := nvcdi.New(
cdiOptions := []nvcdi.Option{
nvcdi.WithLogger(m.logger),
nvcdi.WithDriverRoot(cfg.driverRoot),
nvcdi.WithNVIDIACTKPath(cfg.nvidiaCTKPath),
nvcdi.WithDeviceNamer(deviceNamer),
nvcdi.WithMode(string(cfg.mode)),
)
nvcdi.WithDriverRoot(opts.driverRoot),
nvcdi.WithDevRoot(opts.devRoot),
nvcdi.WithNVIDIACDIHookPath(opts.nvidiaCDIHookPath),
nvcdi.WithLdconfigPath(opts.ldconfigPath),
nvcdi.WithDeviceNamers(deviceNamers...),
nvcdi.WithMode(opts.mode),
nvcdi.WithConfigSearchPaths(opts.configSearchPaths.Value()),
nvcdi.WithLibrarySearchPaths(opts.librarySearchPaths.Value()),
nvcdi.WithCSVFiles(opts.csv.files.Value()),
nvcdi.WithCSVIgnorePatterns(opts.csv.ignorePatterns.Value()),
// We set the following to allow for dependency injection:
nvcdi.WithNvmlLib(opts.nvmllib),
}
for _, hook := range opts.disabledHooks.Value() {
cdiOptions = append(cdiOptions, nvcdi.WithDisabledHook(hook))
}
cdilib, err := nvcdi.New(cdiOptions...)
if err != nil {
return nil, fmt.Errorf("failed to create CDI library: %v", err)
}
@@ -225,20 +316,6 @@ func (m command) generateSpec(cfg *config) (spec.Interface, error) {
if err != nil {
return nil, fmt.Errorf("failed to create device CDI specs: %v", err)
}
var hasAll bool
for _, deviceSpec := range deviceSpecs {
if deviceSpec.Name == allDeviceName {
hasAll = true
break
}
}
if !hasAll {
allDevice, err := MergeDeviceSpecs(deviceSpecs, allDeviceName)
if err != nil {
return nil, fmt.Errorf("failed to create CDI specification for %q device: %v", allDeviceName, err)
}
deviceSpecs = append(deviceSpecs, allDevice)
}
commonEdits, err := cdilib.GetCommonEdits()
if err != nil {
@@ -246,38 +323,15 @@ func (m command) generateSpec(cfg *config) (spec.Interface, error) {
}
return spec.New(
spec.WithVendor(cfg.vendor),
spec.WithClass(cfg.class),
spec.WithVendor(opts.vendor),
spec.WithClass(opts.class),
spec.WithDeviceSpecs(deviceSpecs),
spec.WithEdits(*commonEdits.ContainerEdits),
spec.WithFormat(cfg.format),
spec.WithFormat(opts.format),
spec.WithMergedDeviceOptions(
transform.WithName(allDeviceName),
transform.WithSkipIfExists(true),
),
spec.WithPermissions(0644),
)
}
// MergeDeviceSpecs creates a device with the specified name which combines the edits from the previous devices.
// If a device of the specified name already exists, an error is returned.
func MergeDeviceSpecs(deviceSpecs []specs.Device, mergedDeviceName string) (specs.Device, error) {
if err := cdi.ValidateDeviceName(mergedDeviceName); err != nil {
return specs.Device{}, fmt.Errorf("invalid device name %q: %v", mergedDeviceName, err)
}
for _, d := range deviceSpecs {
if d.Name == mergedDeviceName {
return specs.Device{}, fmt.Errorf("device %q already exists", mergedDeviceName)
}
}
mergedEdits := edits.NewContainerEdits()
for _, d := range deviceSpecs {
edit := cdi.ContainerEdits{
ContainerEdits: &d.ContainerEdits,
}
mergedEdits.Append(&edit)
}
merged := specs.Device{
Name: mergedDeviceName,
ContainerEdits: *mergedEdits.ContainerEdits,
}
return merged, nil
}

View File

@@ -1,5 +1,5 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -17,101 +17,386 @@
package generate
import (
"fmt"
"bytes"
"path/filepath"
"strings"
"testing"
"github.com/container-orchestrated-devices/container-device-interface/specs-go"
"github.com/NVIDIA/go-nvml/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml/mock/dgxa100"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/test"
)
func TestMergeDeviceSpecs(t *testing.T) {
func TestGenerateSpec(t *testing.T) {
t.Setenv("__NVCT_TESTING_DEVICES_ARE_FILES", "true")
moduleRoot, err := test.GetModuleRoot()
require.NoError(t, err)
driverRoot := filepath.Join(moduleRoot, "testdata", "lookup", "rootfs-1")
logger, _ := testlog.NewNullLogger()
testCases := []struct {
description string
deviceSpecs []specs.Device
mergedDeviceName string
expectedError error
expected specs.Device
description string
options options
expectedValidateError error
expectedOptions options
expectedError error
expectedSpec string
}{
{
description: "no devices",
mergedDeviceName: "all",
expected: specs.Device{
Name: "all",
description: "default",
options: options{
format: "yaml",
mode: "nvml",
vendor: "example.com",
class: "device",
driverRoot: driverRoot,
},
expectedOptions: options{
format: "yaml",
mode: "nvml",
vendor: "example.com",
class: "device",
nvidiaCDIHookPath: "/usr/bin/nvidia-cdi-hook",
driverRoot: driverRoot,
},
expectedSpec: `---
cdiVersion: 0.5.0
kind: example.com/device
devices:
- name: "0"
containerEdits:
deviceNodes:
- path: /dev/nvidia0
hostPath: {{ .driverRoot }}/dev/nvidia0
- name: all
containerEdits:
deviceNodes:
- path: /dev/nvidia0
hostPath: {{ .driverRoot }}/dev/nvidia0
containerEdits:
env:
- NVIDIA_CTK_LIBCUDA_DIR=/lib/x86_64-linux-gnu
- NVIDIA_VISIBLE_DEVICES=void
deviceNodes:
- path: /dev/nvidiactl
hostPath: {{ .driverRoot }}/dev/nvidiactl
hooks:
- hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- create-symlinks
- --link
- libcuda.so.1::/lib/x86_64-linux-gnu/libcuda.so
env:
- NVIDIA_CTK_DEBUG=false
- hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- enable-cuda-compat
- --host-driver-version=999.88.77
env:
- NVIDIA_CTK_DEBUG=false
- hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- create-soname-symlinks
- --folder
- /lib/x86_64-linux-gnu
env:
- NVIDIA_CTK_DEBUG=false
- hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- update-ldcache
- --folder
- /lib/x86_64-linux-gnu
env:
- NVIDIA_CTK_DEBUG=false
- hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- disable-device-node-modification
env:
- NVIDIA_CTK_DEBUG=false
mounts:
- hostPath: {{ .driverRoot }}/lib/x86_64-linux-gnu/libcuda.so.999.88.77
containerPath: /lib/x86_64-linux-gnu/libcuda.so.999.88.77
options:
- ro
- nosuid
- nodev
- rbind
- rprivate
`,
},
{
description: "one device",
mergedDeviceName: "all",
deviceSpecs: []specs.Device{
{
Name: "gpu0",
ContainerEdits: specs.ContainerEdits{
Env: []string{"GPU=0"},
},
},
description: "disableHooks1",
options: options{
format: "yaml",
mode: "nvml",
vendor: "example.com",
class: "device",
driverRoot: driverRoot,
disabledHooks: valueOf(cli.NewStringSlice("enable-cuda-compat")),
},
expected: specs.Device{
Name: "all",
ContainerEdits: specs.ContainerEdits{
Env: []string{"GPU=0"},
},
expectedOptions: options{
format: "yaml",
mode: "nvml",
vendor: "example.com",
class: "device",
nvidiaCDIHookPath: "/usr/bin/nvidia-cdi-hook",
driverRoot: driverRoot,
disabledHooks: valueOf(cli.NewStringSlice("enable-cuda-compat")),
},
expectedSpec: `---
cdiVersion: 0.5.0
kind: example.com/device
devices:
- name: "0"
containerEdits:
deviceNodes:
- path: /dev/nvidia0
hostPath: {{ .driverRoot }}/dev/nvidia0
- name: all
containerEdits:
deviceNodes:
- path: /dev/nvidia0
hostPath: {{ .driverRoot }}/dev/nvidia0
containerEdits:
env:
- NVIDIA_CTK_LIBCUDA_DIR=/lib/x86_64-linux-gnu
- NVIDIA_VISIBLE_DEVICES=void
deviceNodes:
- path: /dev/nvidiactl
hostPath: {{ .driverRoot }}/dev/nvidiactl
hooks:
- hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- create-symlinks
- --link
- libcuda.so.1::/lib/x86_64-linux-gnu/libcuda.so
env:
- NVIDIA_CTK_DEBUG=false
- hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- create-soname-symlinks
- --folder
- /lib/x86_64-linux-gnu
env:
- NVIDIA_CTK_DEBUG=false
- hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- update-ldcache
- --folder
- /lib/x86_64-linux-gnu
env:
- NVIDIA_CTK_DEBUG=false
- hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- disable-device-node-modification
env:
- NVIDIA_CTK_DEBUG=false
mounts:
- hostPath: {{ .driverRoot }}/lib/x86_64-linux-gnu/libcuda.so.999.88.77
containerPath: /lib/x86_64-linux-gnu/libcuda.so.999.88.77
options:
- ro
- nosuid
- nodev
- rbind
- rprivate
`,
},
{
description: "two devices",
mergedDeviceName: "all",
deviceSpecs: []specs.Device{
{
Name: "gpu0",
ContainerEdits: specs.ContainerEdits{
Env: []string{"GPU=0"},
},
},
{
Name: "gpu1",
ContainerEdits: specs.ContainerEdits{
Env: []string{"GPU=1"},
},
},
description: "disableHooks2",
options: options{
format: "yaml",
mode: "nvml",
vendor: "example.com",
class: "device",
driverRoot: driverRoot,
disabledHooks: valueOf(cli.NewStringSlice("enable-cuda-compat", "update-ldcache")),
},
expected: specs.Device{
Name: "all",
ContainerEdits: specs.ContainerEdits{
Env: []string{"GPU=0", "GPU=1"},
},
expectedOptions: options{
format: "yaml",
mode: "nvml",
vendor: "example.com",
class: "device",
nvidiaCDIHookPath: "/usr/bin/nvidia-cdi-hook",
driverRoot: driverRoot,
disabledHooks: valueOf(cli.NewStringSlice("enable-cuda-compat", "update-ldcache")),
},
expectedSpec: `---
cdiVersion: 0.5.0
kind: example.com/device
devices:
- name: "0"
containerEdits:
deviceNodes:
- path: /dev/nvidia0
hostPath: {{ .driverRoot }}/dev/nvidia0
- name: all
containerEdits:
deviceNodes:
- path: /dev/nvidia0
hostPath: {{ .driverRoot }}/dev/nvidia0
containerEdits:
env:
- NVIDIA_CTK_LIBCUDA_DIR=/lib/x86_64-linux-gnu
- NVIDIA_VISIBLE_DEVICES=void
deviceNodes:
- path: /dev/nvidiactl
hostPath: {{ .driverRoot }}/dev/nvidiactl
hooks:
- hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- create-symlinks
- --link
- libcuda.so.1::/lib/x86_64-linux-gnu/libcuda.so
env:
- NVIDIA_CTK_DEBUG=false
- hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- create-soname-symlinks
- --folder
- /lib/x86_64-linux-gnu
env:
- NVIDIA_CTK_DEBUG=false
- hookName: createContainer
path: /usr/bin/nvidia-cdi-hook
args:
- nvidia-cdi-hook
- disable-device-node-modification
env:
- NVIDIA_CTK_DEBUG=false
mounts:
- hostPath: {{ .driverRoot }}/lib/x86_64-linux-gnu/libcuda.so.999.88.77
containerPath: /lib/x86_64-linux-gnu/libcuda.so.999.88.77
options:
- ro
- nosuid
- nodev
- rbind
- rprivate
`,
},
{
description: "has merged device",
mergedDeviceName: "gpu0",
deviceSpecs: []specs.Device{
{
Name: "gpu0",
ContainerEdits: specs.ContainerEdits{
Env: []string{"GPU=0"},
},
},
description: "disableHooksAll",
options: options{
format: "yaml",
mode: "nvml",
vendor: "example.com",
class: "device",
driverRoot: driverRoot,
disabledHooks: valueOf(cli.NewStringSlice("all")),
},
expectedError: fmt.Errorf("device %q already exists", "gpu0"),
},
{
description: "invalid merged device name",
mergedDeviceName: ".-not-valid",
expectedError: fmt.Errorf("invalid device name %q", ".-not-valid"),
expectedOptions: options{
format: "yaml",
mode: "nvml",
vendor: "example.com",
class: "device",
nvidiaCDIHookPath: "/usr/bin/nvidia-cdi-hook",
driverRoot: driverRoot,
disabledHooks: valueOf(cli.NewStringSlice("all")),
},
expectedSpec: `---
cdiVersion: 0.5.0
kind: example.com/device
devices:
- name: "0"
containerEdits:
deviceNodes:
- path: /dev/nvidia0
hostPath: {{ .driverRoot }}/dev/nvidia0
- name: all
containerEdits:
deviceNodes:
- path: /dev/nvidia0
hostPath: {{ .driverRoot }}/dev/nvidia0
containerEdits:
env:
- NVIDIA_CTK_LIBCUDA_DIR=/lib/x86_64-linux-gnu
- NVIDIA_VISIBLE_DEVICES=void
deviceNodes:
- path: /dev/nvidiactl
hostPath: {{ .driverRoot }}/dev/nvidiactl
mounts:
- hostPath: {{ .driverRoot }}/lib/x86_64-linux-gnu/libcuda.so.999.88.77
containerPath: /lib/x86_64-linux-gnu/libcuda.so.999.88.77
options:
- ro
- nosuid
- nodev
- rbind
- rprivate
`,
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
mergedDevice, err := MergeDeviceSpecs(tc.deviceSpecs, tc.mergedDeviceName)
if tc.expectedError != nil {
require.Error(t, err)
return
c := command{
logger: logger,
}
err := c.validateFlags(nil, &tc.options)
require.ErrorIs(t, err, tc.expectedValidateError)
require.EqualValues(t, tc.expectedOptions, tc.options)
// Set up a mock server, reusing the DGX A100 mock.
server := dgxa100.New()
// Override the driver version to match the version in our mock filesystem.
server.SystemGetDriverVersionFunc = func() (string, nvml.Return) {
return "999.88.77", nvml.SUCCESS
}
// Set the device count to 1 explicitly since we only have a single device node.
server.DeviceGetCountFunc = func() (int, nvml.Return) {
return 1, nvml.SUCCESS
}
for _, d := range server.Devices {
// TODO: This is not implemented in the mock.
(d.(*dgxa100.Device)).GetMaxMigDeviceCountFunc = func() (int, nvml.Return) {
return 0, nvml.SUCCESS
}
}
tc.options.nvmllib = server
spec, err := c.generateSpec(&tc.options)
require.ErrorIs(t, err, tc.expectedError)
var buf bytes.Buffer
_, err = spec.WriteTo(&buf)
require.NoError(t, err)
require.EqualValues(t, tc.expected, mergedDevice)
require.Equal(t, strings.ReplaceAll(tc.expectedSpec, "{{ .driverRoot }}", driverRoot), buf.String())
})
}
}
// valueOf returns the value of a pointer.
// Note that this does not check for a nil pointer and is only used for testing.
func valueOf[T any](v *T) T {
return *v
}

View File

@@ -0,0 +1,105 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package list
import (
"errors"
"fmt"
"github.com/urfave/cli/v2"
"tags.cncf.io/container-device-interface/pkg/cdi"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type command struct {
logger logger.Interface
}
type config struct {
cdiSpecDirs cli.StringSlice
}
// NewCommand constructs a cdi list command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build creates the CLI command
func (m command) build() *cli.Command {
cfg := config{}
// Create the command
c := cli.Command{
Name: "list",
Usage: "List the available CDI devices",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &cfg)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "spec-dir",
Usage: "specify the directories to scan for CDI specifications",
Value: cli.NewStringSlice(cdi.DefaultSpecDirs...),
Destination: &cfg.cdiSpecDirs,
EnvVars: []string{"NVIDIA_CTK_CDI_SPEC_DIRS"},
},
}
return &c
}
func (m command) validateFlags(c *cli.Context, cfg *config) error {
if len(cfg.cdiSpecDirs.Value()) == 0 {
return errors.New("at least one CDI specification directory must be specified")
}
return nil
}
func (m command) run(c *cli.Context, cfg *config) error {
registry, err := cdi.NewCache(
cdi.WithAutoRefresh(false),
cdi.WithSpecDirs(cfg.cdiSpecDirs.Value()...),
)
if err != nil {
return fmt.Errorf("failed to create CDI cache: %v", err)
}
_ = registry.Refresh()
if errors := registry.GetErrors(); len(errors) > 0 {
m.logger.Warningf("The following registry errors were reported:")
for k, err := range errors {
m.logger.Warningf("%v: %v", k, err)
}
}
devices := registry.ListDevices()
m.logger.Infof("Found %d CDI devices", len(devices))
for _, device := range devices {
fmt.Printf("%s\n", device)
}
return nil
}

View File

@@ -21,20 +21,16 @@ import (
"io"
"os"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi/spec"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi/transform"
"github.com/container-orchestrated-devices/container-device-interface/pkg/cdi"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"tags.cncf.io/container-device-interface/pkg/cdi"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi/spec"
transformroot "github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi/transform/root"
)
type loadSaver interface {
Load() (spec.Interface, error)
Save(spec.Interface) error
}
type command struct {
logger *logrus.Logger
logger logger.Interface
}
type transformOptions struct {
@@ -44,12 +40,13 @@ type transformOptions struct {
type options struct {
transformOptions
from string
to string
from string
to string
relativeTo string
}
// NewCommand constructs a generate-cdi command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
@@ -72,6 +69,11 @@ func (m command) build() *cli.Command {
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "from",
Usage: "specify the root to be transformed",
Destination: &opts.from,
},
&cli.StringFlag{
Name: "input",
Usage: "Specify the file to read the CDI specification from. If this is '-' the specification is read from STDIN",
@@ -84,9 +86,10 @@ func (m command) build() *cli.Command {
Destination: &opts.output,
},
&cli.StringFlag{
Name: "from",
Usage: "specify the root to be transformed",
Destination: &opts.from,
Name: "relative-to",
Usage: "specify whether the transform is relative to the host or to the container. One of [ host | container ]",
Value: "host",
Destination: &opts.relativeTo,
},
&cli.StringFlag{
Name: "to",
@@ -100,6 +103,12 @@ func (m command) build() *cli.Command {
}
func (m command) validateFlags(c *cli.Context, opts *options) error {
switch opts.relativeTo {
case "host":
case "container":
default:
return fmt.Errorf("invalid --relative-to value: %v", opts.relativeTo)
}
return nil
}
@@ -109,9 +118,10 @@ func (m command) run(c *cli.Context, opts *options) error {
return fmt.Errorf("failed to load CDI specification: %w", err)
}
err = transform.NewRootTransformer(
opts.from,
opts.to,
err = transformroot.New(
transformroot.WithRoot(opts.from),
transformroot.WithTargetRoot(opts.to),
transformroot.WithRelativeTo(opts.relativeTo),
).Transform(spec.Raw())
if err != nil {
return fmt.Errorf("failed to transform CDI specification: %w", err)

View File

@@ -17,17 +17,18 @@
package transform
import (
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/transform/root"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/transform/root"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type command struct {
logger *logrus.Logger
logger logger.Interface
}
// NewCommand constructs a command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}

View File

@@ -0,0 +1,251 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package config
import (
"errors"
"fmt"
"reflect"
"strconv"
"strings"
"github.com/urfave/cli/v2"
createdefault "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/config/create-default"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/config/flags"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type command struct {
logger logger.Interface
}
// options stores the subcommand options
type options struct {
flags.Options
setListSeparator string
sets cli.StringSlice
}
// NewCommand constructs an config command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build
func (m command) build() *cli.Command {
opts := options{}
// Create the 'config' command
c := cli.Command{
Name: "config",
Usage: "Interact with the NVIDIA Container Toolkit configuration",
Before: func(ctx *cli.Context) error {
return validateFlags(ctx, &opts)
},
Action: func(ctx *cli.Context) error {
return run(ctx, &opts)
},
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "config-file",
Aliases: []string{"config", "c"},
Usage: "Specify the config file to modify.",
Value: config.GetConfigFilePath(),
Destination: &opts.Config,
},
&cli.StringSliceFlag{
Name: "set",
Usage: "Set a config value using the pattern 'key[=value]'. " +
"Specifying only 'key' is equivalent to 'key=true' for boolean settings. " +
"This flag can be specified multiple times, but only the last value for a specific " +
"config option is applied. " +
"If the setting represents a list, the elements are colon-separated.",
Destination: &opts.sets,
},
&cli.StringFlag{
Name: "set-list-separator",
Usage: "Specify a separator for lists applied using the set command.",
Hidden: true,
Value: ":",
Destination: &opts.setListSeparator,
},
&cli.BoolFlag{
Name: "in-place",
Aliases: []string{"i"},
Usage: "Modify the config file in-place",
Destination: &opts.InPlace,
},
&cli.StringFlag{
Name: "output",
Aliases: []string{"o"},
Usage: "Specify the output file to write to; If not specified, the output is written to stdout",
Destination: &opts.Output,
},
}
c.Subcommands = []*cli.Command{
createdefault.NewCommand(m.logger),
}
return &c
}
func validateFlags(c *cli.Context, opts *options) error {
if opts.setListSeparator == "" {
return fmt.Errorf("set-list-separator must be set")
}
return nil
}
func run(c *cli.Context, opts *options) error {
cfgToml, err := config.New(
config.WithConfigFile(opts.Config),
)
if err != nil {
return fmt.Errorf("unable to create config: %v", err)
}
for _, set := range opts.sets.Value() {
key, value, err := setFlagToKeyValue(set, opts.setListSeparator)
if err != nil {
return fmt.Errorf("invalid --set option %v: %w", set, err)
}
if value == nil {
_ = cfgToml.Delete(key)
} else {
cfgToml.Set(key, value)
}
}
if err := opts.EnsureOutputFolder(); err != nil {
return fmt.Errorf("failed to create output directory: %v", err)
}
output, err := opts.CreateOutput()
if err != nil {
return fmt.Errorf("failed to open output file: %v", err)
}
defer output.Close()
if _, err := cfgToml.Save(output); err != nil {
return fmt.Errorf("failed to save config: %v", err)
}
return nil
}
var errInvalidConfigOption = errors.New("invalid config option")
var errUndefinedField = errors.New("undefined field")
var errInvalidFormat = errors.New("invalid format")
// setFlagToKeyValue converts a --set flag to a key-value pair.
// The set flag is of the form key[=value], with the value being optional if key refers to a
// boolean config option.
func setFlagToKeyValue(setFlag string, setListSeparator string) (string, interface{}, error) {
setParts := strings.SplitN(setFlag, "=", 2)
key := setParts[0]
field, err := getField(key)
if err != nil {
return key, nil, fmt.Errorf("%w: %w", errInvalidConfigOption, err)
}
kind := field.Kind()
if len(setParts) != 2 {
if kind == reflect.Bool || (kind == reflect.Pointer && field.Elem().Kind() == reflect.Bool) {
return key, true, nil
}
return key, nil, fmt.Errorf("%w: expected key=value; got %v", errInvalidFormat, setFlag)
}
value := setParts[1]
if kind == reflect.Pointer && value != "nil" {
kind = field.Elem().Kind()
}
switch kind {
case reflect.Pointer:
return key, nil, nil
case reflect.Bool:
b, err := strconv.ParseBool(value)
if err != nil {
return key, value, fmt.Errorf("%w: %w", errInvalidFormat, err)
}
return key, b, nil
case reflect.String:
return key, value, nil
case reflect.Slice:
valueParts := []string{value}
for _, sep := range []string{setListSeparator, ","} {
if !strings.Contains(value, sep) {
continue
}
valueParts = strings.Split(value, sep)
break
}
switch field.Elem().Kind() {
case reflect.String:
return key, valueParts, nil
case reflect.Int:
var output []int64
for _, v := range valueParts {
vi, err := strconv.ParseInt(v, 10, 0)
if err != nil {
return key, nil, fmt.Errorf("%w: %w", errInvalidFormat, err)
}
output = append(output, vi)
}
return key, output, nil
}
}
return key, nil, fmt.Errorf("unsupported type for %v (%v)", setParts, kind)
}
func getField(key string) (reflect.Type, error) {
s, err := getStruct(reflect.TypeOf(config.Config{}), strings.Split(key, ".")...)
if err != nil {
return nil, err
}
return s.Type, err
}
func getStruct(current reflect.Type, paths ...string) (reflect.StructField, error) {
if len(paths) < 1 {
return reflect.StructField{}, fmt.Errorf("%w: no fields selected", errUndefinedField)
}
tomlField := paths[0]
for i := 0; i < current.NumField(); i++ {
f := current.Field(i)
v, ok := f.Tag.Lookup("toml")
if !ok {
continue
}
if strings.SplitN(v, ",", 2)[0] != tomlField {
continue
}
if len(paths) == 1 {
return f, nil
}
return getStruct(f.Type, paths[1:]...)
}
return reflect.StructField{}, fmt.Errorf("%w: %q", errUndefinedField, tomlField)
}

View File

@@ -0,0 +1,143 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package config
import (
"testing"
"github.com/stretchr/testify/require"
)
func TestSetFlagToKeyValue(t *testing.T) {
// TODO: We need to enable this test again since switching to reflect.
testCases := []struct {
description string
setFlag string
setListSeparator string
expectedKey string
expectedValue interface{}
expectedError error
}{
{
description: "option not present returns an error",
setFlag: "undefined=new-value",
expectedKey: "undefined",
expectedError: errInvalidConfigOption,
},
{
description: "undefined nexted option returns error",
setFlag: "nvidia-container-cli.undefined",
expectedKey: "nvidia-container-cli.undefined",
expectedError: errInvalidConfigOption,
},
{
description: "boolean option assumes true",
setFlag: "disable-require",
expectedKey: "disable-require",
expectedValue: true,
},
{
description: "boolean option returns true",
setFlag: "disable-require=true",
expectedKey: "disable-require",
expectedValue: true,
},
{
description: "boolean option returns false",
setFlag: "disable-require=false",
expectedKey: "disable-require",
expectedValue: false,
},
{
description: "invalid boolean option returns error",
setFlag: "disable-require=something",
expectedKey: "disable-require",
expectedValue: "something",
expectedError: errInvalidFormat,
},
{
description: "string option requires value",
setFlag: "swarm-resource",
expectedKey: "swarm-resource",
expectedValue: nil,
expectedError: errInvalidFormat,
},
{
description: "string option returns value",
setFlag: "swarm-resource=string-value",
expectedKey: "swarm-resource",
expectedValue: "string-value",
},
{
description: "string option returns value with equals",
setFlag: "swarm-resource=string-value=more",
expectedKey: "swarm-resource",
expectedValue: "string-value=more",
},
{
description: "string option treats bool value as string",
setFlag: "swarm-resource=true",
expectedKey: "swarm-resource",
expectedValue: "true",
},
{
description: "string option treats int value as string",
setFlag: "swarm-resource=5",
expectedKey: "swarm-resource",
expectedValue: "5",
},
{
description: "[]string option returns single value",
setFlag: "nvidia-container-cli.environment=string-value",
expectedKey: "nvidia-container-cli.environment",
expectedValue: []string{"string-value"},
},
{
description: "[]string option returns multiple values",
setFlag: "nvidia-container-cli.environment=first,second",
setListSeparator: ",",
expectedKey: "nvidia-container-cli.environment",
expectedValue: []string{"first", "second"},
},
{
description: "[]string option returns values with equals",
setFlag: "nvidia-container-cli.environment=first=1,second=2",
setListSeparator: ",",
expectedKey: "nvidia-container-cli.environment",
expectedValue: []string{"first=1", "second=2"},
},
{
description: "[]string option returns multiple values semi-colon",
setFlag: "nvidia-container-cli.environment=first;second",
setListSeparator: ";",
expectedKey: "nvidia-container-cli.environment",
expectedValue: []string{"first", "second"},
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
if tc.setListSeparator == "" {
tc.setListSeparator = ","
}
k, v, err := setFlagToKeyValue(tc.setFlag, tc.setListSeparator)
require.ErrorIs(t, err, tc.expectedError)
require.EqualValues(t, tc.expectedKey, k)
require.EqualValues(t, tc.expectedValue, v)
})
}
}

View File

@@ -0,0 +1,94 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package defaultsubcommand
import (
"fmt"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/config/flags"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type command struct {
logger logger.Interface
}
// NewCommand constructs a default command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build creates the CLI command
func (m command) build() *cli.Command {
opts := flags.Options{}
// Create the 'default' command
c := cli.Command{
Name: "default",
Aliases: []string{"create-default", "generate-default"},
Usage: "Generate the default NVIDIA Container Toolkit configuration file",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &opts)
},
Action: func(c *cli.Context) error {
return m.run(c, &opts)
},
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "output",
Aliases: []string{"o"},
Usage: "Specify the output file to write to; If not specified, the output is written to stdout",
Destination: &opts.Output,
},
}
return &c
}
func (m command) validateFlags(c *cli.Context, opts *flags.Options) error {
return opts.Validate()
}
func (m command) run(c *cli.Context, opts *flags.Options) error {
cfgToml, err := config.New()
if err != nil {
return fmt.Errorf("unable to load or create config: %v", err)
}
if err := opts.EnsureOutputFolder(); err != nil {
return fmt.Errorf("failed to create output directory: %v", err)
}
output, err := opts.CreateOutput()
if err != nil {
return fmt.Errorf("failed to open output file: %v", err)
}
defer output.Close()
if _, err = cfgToml.Save(output); err != nil {
return fmt.Errorf("failed to write output: %v", err)
}
return nil
}

View File

@@ -0,0 +1,82 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package flags
import (
"fmt"
"io"
"os"
"path/filepath"
)
// Options stores options for the config commands
type Options struct {
Config string
Output string
InPlace bool
}
// Validate checks whether the options are valid.
func (o Options) Validate() error {
if o.InPlace && o.Output != "" {
return fmt.Errorf("cannot specify both --in-place and --output")
}
return nil
}
// GetOutput returns the effective output
func (o Options) GetOutput() string {
if o.InPlace {
return o.Config
}
return o.Output
}
// EnsureOutputFolder creates the output folder if it does not exist.
// If the output folder is not specified (i.e. output to STDOUT), it is ignored.
func (o Options) EnsureOutputFolder() error {
output := o.GetOutput()
if output == "" {
return nil
}
if dir := filepath.Dir(output); dir != "" {
return os.MkdirAll(dir, 0755)
}
return nil
}
// CreateOutput creates the writer for the output.
func (o Options) CreateOutput() (io.WriteCloser, error) {
output := o.GetOutput()
if output == "" {
return nullCloser{os.Stdout}, nil
}
return os.Create(output)
}
// nullCloser is a writer that does nothing on Close.
type nullCloser struct {
io.Writer
}
// Close is a no-op for a nullCloser.
func (d nullCloser) Close() error {
return nil
}

View File

@@ -1,229 +0,0 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package symlinks
import (
"fmt"
"os"
"path/filepath"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover/csv"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
type command struct {
logger *logrus.Logger
}
type config struct {
hostRoot string
filenames cli.StringSlice
links cli.StringSlice
containerSpec string
}
// NewCommand constructs a hook command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build
func (m command) build() *cli.Command {
cfg := config{}
// Create the '' command
c := cli.Command{
Name: "create-symlinks",
Usage: "A hook to create symlinks in the container. This can be used to proces CSV mount specs",
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "host-root",
Usage: "The root on the host filesystem to use to resolve symlinks",
Destination: &cfg.hostRoot,
},
&cli.StringSliceFlag{
Name: "csv-filename",
Usage: "Specify a (CSV) filename to process",
Destination: &cfg.filenames,
},
&cli.StringSliceFlag{
Name: "link",
Usage: "Specify a specific link to create. The link is specified as target::link",
Destination: &cfg.links,
},
&cli.StringFlag{
Name: "container-spec",
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN",
Destination: &cfg.containerSpec,
},
}
return &c
}
func (m command) run(c *cli.Context, cfg *config) error {
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
return fmt.Errorf("failed to load container state: %v", err)
}
containerRoot, err := s.GetContainerRoot()
if err != nil {
return fmt.Errorf("failed to determined container root: %v", err)
}
csvFiles := cfg.filenames.Value()
chainLocator := lookup.NewSymlinkChainLocator(m.logger, cfg.hostRoot)
var candidates []string
for _, file := range csvFiles {
mountSpecs, err := csv.NewCSVFileParser(m.logger, file).Parse()
if err != nil {
m.logger.Debugf("Skipping CSV file %v: %v", file, err)
continue
}
for _, ms := range mountSpecs {
if ms.Type != csv.MountSpecSym {
continue
}
targets, err := chainLocator.Locate(ms.Path)
if err != nil {
m.logger.Warnf("Failed to locate symlink %v", ms.Path)
}
candidates = append(candidates, targets...)
}
}
created := make(map[string]bool)
// candidates is a list of absolute paths to symlinks in a chain, or the final target of the chain.
for _, candidate := range candidates {
targets, err := m.Locate(candidate)
if err != nil {
m.logger.Debugf("Skipping invalid link: %v", err)
continue
} else if len(targets) != 1 {
m.logger.Debugf("Unexepected number of targets: %v", targets)
continue
} else if targets[0] == candidate {
m.logger.Debugf("%v is not a symlink", candidate)
continue
}
err = m.createLink(created, cfg.hostRoot, containerRoot, targets[0], candidate)
if err != nil {
m.logger.Warnf("Failed to create link %v: %v", []string{targets[0], candidate}, err)
}
}
links := cfg.links.Value()
for _, l := range links {
parts := strings.Split(l, "::")
if len(parts) != 2 {
m.logger.Warnf("Invalid link specification %v", l)
continue
}
err := m.createLink(created, cfg.hostRoot, containerRoot, parts[0], parts[1])
if err != nil {
m.logger.Warnf("Failed to create link %v: %v", parts, err)
}
}
return nil
}
func (m command) createLink(created map[string]bool, hostRoot string, containerRoot string, target string, link string) error {
linkPath, err := changeRoot(hostRoot, containerRoot, link)
if err != nil {
m.logger.Warnf("Failed to resolve path for link %v relative to %v: %v", link, containerRoot, err)
}
if created[linkPath] {
m.logger.Debugf("Link %v already created", linkPath)
return nil
}
targetPath, err := changeRoot(hostRoot, "/", target)
if err != nil {
m.logger.Warnf("Failed to resolve path for target %v relative to %v: %v", target, "/", err)
}
m.logger.Infof("Symlinking %v to %v", linkPath, targetPath)
err = os.MkdirAll(filepath.Dir(linkPath), 0755)
if err != nil {
return fmt.Errorf("failed to create directory: %v", err)
}
err = os.Symlink(target, linkPath)
if err != nil {
return fmt.Errorf("failed to create symlink: %v", err)
}
return nil
}
func changeRoot(current string, new string, path string) (string, error) {
if !filepath.IsAbs(path) {
return path, nil
}
relative := path
if current != "" {
r, err := filepath.Rel(current, path)
if err != nil {
return "", err
}
relative = r
}
return filepath.Join(new, relative), nil
}
// Locate returns the link target of the specified filename or an empty slice if the
// specified filename is not a symlink.
func (m command) Locate(filename string) ([]string, error) {
info, err := os.Lstat(filename)
if err != nil {
return nil, fmt.Errorf("failed to get file info: %v", info)
}
if info.Mode()&os.ModeSymlink == 0 {
m.logger.Debugf("%v is not a symlink", filename)
return nil, nil
}
target, err := os.Readlink(filename)
if err != nil {
return nil, fmt.Errorf("error checking symlink: %v", err)
}
m.logger.Debugf("Resolved link: '%v' => '%v'", filename, target)
return []string{target}, nil
}

View File

@@ -17,20 +17,18 @@
package hook
import (
chmod "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook/chmod"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/commands"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
symlinks "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook/create-symlinks"
ldcache "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook/update-ldcache"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
type hookCommand struct {
logger *logrus.Logger
logger logger.Interface
}
// NewCommand constructs a hook command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
// NewCommand constructs CLI subcommand for handling CDI hooks.
func NewCommand(logger logger.Interface) *cli.Command {
c := hookCommand{
logger: logger,
}
@@ -39,17 +37,24 @@ func NewCommand(logger *logrus.Logger) *cli.Command {
// build
func (m hookCommand) build() *cli.Command {
// Create the 'hook' command
// Create the 'hook' subcommand
hook := cli.Command{
Name: "hook",
Usage: "A collection of hooks that may be injected into an OCI spec",
// We set the default action for the `hook` subcommand to issue a
// warning and exit with no error.
// This means that if an unsupported hook is run, a container will not fail
// to launch. An unsupported hook could be the result of a CDI specification
// referring to a new hook that is not yet supported by an older NVIDIA
// Container Toolkit version or a hook that has been removed in newer
// version.
Action: func(ctx *cli.Context) error {
commands.IssueUnsupportedHookWarning(m.logger, ctx)
return nil
},
}
hook.Subcommands = []*cli.Command{
ldcache.NewCommand(m.logger),
symlinks.NewCommand(m.logger),
chmod.NewCommand(m.logger),
}
hook.Subcommands = commands.New(m.logger)
return &hook
}

View File

@@ -1,129 +0,0 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package ldcache
import (
"fmt"
"os"
"path/filepath"
"syscall"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
type command struct {
logger *logrus.Logger
}
type config struct {
folders cli.StringSlice
containerSpec string
}
// NewCommand constructs an update-ldcache command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build the update-ldcache command
func (m command) build() *cli.Command {
cfg := config{}
// Create the 'update-ldcache' command
c := cli.Command{
Name: "update-ldcache",
Usage: "Update ldcache in a container by running ldconfig",
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "folder",
Usage: "Specifiy a folder to add to /etc/ld.so.conf before updating the ld cache",
Destination: &cfg.folders,
},
&cli.StringFlag{
Name: "container-spec",
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN",
Destination: &cfg.containerSpec,
},
}
return &c
}
func (m command) run(c *cli.Context, cfg *config) error {
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
return fmt.Errorf("failed to load container state: %v", err)
}
containerRoot, err := s.GetContainerRoot()
if err != nil {
return fmt.Errorf("failed to determined container root: %v", err)
}
err = m.createConfig(containerRoot, cfg.folders.Value())
if err != nil {
return fmt.Errorf("failed to update ld.so.conf: %v", err)
}
args := []string{"/sbin/ldconfig"}
if containerRoot != "" {
args = append(args, "-r", containerRoot)
}
return syscall.Exec(args[0], args, nil)
}
// createConfig creates (or updates) /etc/ld.so.conf.d/nvcr-<RANDOM_STRING>.conf in the container
// to include the required paths.
func (m command) createConfig(root string, folders []string) error {
if len(folders) == 0 {
m.logger.Debugf("No folders to add to /etc/ld.so.conf")
return nil
}
configFile, err := os.CreateTemp(filepath.Join(root, "/etc/ld.so.conf.d"), "nvcr-*.conf")
if err != nil {
return fmt.Errorf("failed to create config file: %v", err)
}
defer configFile.Close()
m.logger.Debugf("Adding folders %v to %v", folders, configFile.Name())
configured := make(map[string]bool)
for _, folder := range folders {
if configured[folder] {
continue
}
_, err = configFile.WriteString(fmt.Sprintf("%s\n", folder))
if err != nil {
return fmt.Errorf("failed to update ld.so.conf.d: %v", err)
}
configured[folder] = true
}
return nil
}

View File

@@ -17,16 +17,17 @@
package info
import (
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type command struct {
logger *logrus.Logger
logger logger.Interface
}
// NewCommand constructs an info command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
@@ -35,13 +36,13 @@ func NewCommand(logger *logrus.Logger) *cli.Command {
// build
func (m command) build() *cli.Command {
// Create the 'hook' command
hook := cli.Command{
// Create the 'info' command
info := cli.Command{
Name: "info",
Usage: "Provide information about the system",
}
hook.Subcommands = []*cli.Command{}
info.Subcommands = []*cli.Command{}
return &hook
return &info
}

View File

@@ -19,32 +19,37 @@ package main
import (
"os"
"github.com/sirupsen/logrus"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/config"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook"
infoCLI "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/info"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/system"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
log "github.com/sirupsen/logrus"
cli "github.com/urfave/cli/v2"
)
var logger = log.New()
// config defines the options that can be set for the CLI through config files,
// options defines the options that can be set for the CLI through config files,
// environment variables, or command line flags
type config struct {
type options struct {
// Debug indicates whether the CLI is started in "debug" mode
Debug bool
// Quiet indicates whether the CLI is started in "quiet" mode
Quiet bool
}
func main() {
// Create a config struct to hold the parsed environment variables or command line flags
config := config{}
logger := logrus.New()
// Create a options struct to hold the parsed environment variables or command line flags
opts := options{}
// Create the top-level CLI
c := cli.NewApp()
c.DisableSliceFlagSeparator = true
c.Name = "NVIDIA Container Toolkit CLI"
c.UseShortOptionHandling = true
c.EnableBashCompletion = true
@@ -57,16 +62,25 @@ func main() {
Name: "debug",
Aliases: []string{"d"},
Usage: "Enable debug-level logging",
Destination: &config.Debug,
Destination: &opts.Debug,
EnvVars: []string{"NVIDIA_CTK_DEBUG"},
},
&cli.BoolFlag{
Name: "quiet",
Usage: "Suppress all output except for errors; overrides --debug",
Destination: &opts.Quiet,
EnvVars: []string{"NVIDIA_CTK_QUIET"},
},
}
// Set log-level for all subcommands
c.Before = func(c *cli.Context) error {
logLevel := log.InfoLevel
if config.Debug {
logLevel = log.DebugLevel
logLevel := logrus.InfoLevel
if opts.Debug {
logLevel = logrus.DebugLevel
}
if opts.Quiet {
logLevel = logrus.ErrorLevel
}
logger.SetLevel(logLevel)
return nil
@@ -79,12 +93,13 @@ func main() {
infoCLI.NewCommand(logger),
cdi.NewCommand(logger),
system.NewCommand(logger),
config.NewCommand(logger),
}
// Run the CLI
err := c.Run(os.Args)
if err != nil {
log.Errorf("%v", err)
log.Exit(1)
logger.Errorf("%v", err)
os.Exit(1)
}
}

View File

@@ -17,31 +17,45 @@
package configure
import (
"encoding/json"
"fmt"
"os"
"path/filepath"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime/nvidia"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/engine/crio"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/engine/docker"
"github.com/pelletier/go-toml"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/containerd"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/crio"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/docker"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/ocihook"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/toml"
)
const (
defaultRuntime = "docker"
defaultDockerConfigFilePath = "/etc/docker/daemon.json"
defaultCrioConfigFilePath = "/etc/crio/crio.conf"
// defaultNVIDIARuntimeName is the default name to use in configs for the NVIDIA Container Runtime
defaultNVIDIARuntimeName = "nvidia"
// defaultNVIDIARuntimeExecutable is the default NVIDIA Container Runtime executable file name
defaultNVIDIARuntimeExecutable = "nvidia-container-runtime"
defaultNVIDIARuntimeExpecutablePath = "/usr/bin/nvidia-container-runtime"
defaultNVIDIARuntimeHookExpecutablePath = "/usr/bin/nvidia-container-runtime-hook"
defaultContainerdConfigFilePath = "/etc/containerd/config.toml"
defaultCrioConfigFilePath = "/etc/crio/crio.conf"
defaultDockerConfigFilePath = "/etc/docker/daemon.json"
defaultConfigSource = configSourceFile
configSourceCommand = "command"
configSourceFile = "file"
)
type command struct {
logger *logrus.Logger
logger logger.Interface
}
// NewCommand constructs an configure command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
// NewCommand constructs a configure command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
@@ -54,7 +68,22 @@ type config struct {
dryRun bool
runtime string
configFilePath string
nvidiaOptions nvidia.Options
executablePath string
configSource string
mode string
hookFilePath string
nvidiaRuntime struct {
name string
path string
hookPath string
setAsDefault bool
}
// cdi-specific options
cdi struct {
enabled bool
}
}
func (m command) build() *cli.Command {
@@ -65,6 +94,9 @@ func (m command) build() *cli.Command {
configure := cli.Command{
Name: "configure",
Usage: "Add a runtime to the specified container engine",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &config)
},
Action: func(c *cli.Context) error {
return m.configureWrapper(c, &config)
},
@@ -78,7 +110,7 @@ func (m command) build() *cli.Command {
},
&cli.StringFlag{
Name: "runtime",
Usage: "the target runtime engine. One of [crio, docker]",
Usage: "the target runtime engine; one of [containerd, crio, docker]",
Value: defaultRuntime,
Destination: &config.runtime,
},
@@ -88,126 +120,240 @@ func (m command) build() *cli.Command {
Destination: &config.configFilePath,
},
&cli.StringFlag{
Name: "nvidia-runtime-name",
Usage: "specify the name of the NVIDIA runtime that will be added",
Value: nvidia.RuntimeName,
Destination: &config.nvidiaOptions.RuntimeName,
Name: "executable-path",
Usage: "The path to the runtime executable. This is used to extract the current config",
Destination: &config.executablePath,
},
&cli.StringFlag{
Name: "runtime-path",
Name: "config-mode",
Usage: "the config mode for runtimes that support multiple configuration mechanisms",
Destination: &config.mode,
},
&cli.StringFlag{
Name: "config-source",
Usage: "the source to retrieve the container runtime configuration; one of [command, file]\"",
Destination: &config.configSource,
Value: defaultConfigSource,
},
&cli.StringFlag{
Name: "oci-hook-path",
Usage: "the path to the OCI runtime hook to create if --config-mode=oci-hook is specified. If no path is specified, the generated hook is output to STDOUT.\n\tNote: The use of OCI hooks is deprecated.",
Destination: &config.hookFilePath,
},
&cli.StringFlag{
Name: "nvidia-runtime-name",
Usage: "specify the name of the NVIDIA runtime that will be added",
Value: defaultNVIDIARuntimeName,
Destination: &config.nvidiaRuntime.name,
},
&cli.StringFlag{
Name: "nvidia-runtime-path",
Aliases: []string{"runtime-path"},
Usage: "specify the path to the NVIDIA runtime executable",
Value: nvidia.RuntimeExecutable,
Destination: &config.nvidiaOptions.RuntimePath,
Value: defaultNVIDIARuntimeExecutable,
Destination: &config.nvidiaRuntime.path,
},
&cli.StringFlag{
Name: "nvidia-runtime-hook-path",
Usage: "specify the path to the NVIDIA Container Runtime hook executable",
Value: defaultNVIDIARuntimeHookExpecutablePath,
Destination: &config.nvidiaRuntime.hookPath,
},
&cli.BoolFlag{
Name: "set-as-default",
Usage: "set the specified runtime as the default runtime",
Destination: &config.nvidiaOptions.SetAsDefault,
Name: "nvidia-set-as-default",
Aliases: []string{"set-as-default"},
Usage: "set the NVIDIA runtime as the default runtime",
Destination: &config.nvidiaRuntime.setAsDefault,
},
&cli.BoolFlag{
Name: "cdi.enabled",
Aliases: []string{"cdi.enable", "enable-cdi"},
Usage: "Enable CDI in the configured runtime",
Destination: &config.cdi.enabled,
},
}
return &configure
}
func (m command) configureWrapper(c *cli.Context, config *config) error {
func (m command) validateFlags(c *cli.Context, config *config) error {
if config.mode == "oci-hook" {
if !filepath.IsAbs(config.nvidiaRuntime.hookPath) {
return fmt.Errorf("the NVIDIA runtime hook path %q is not an absolute path", config.nvidiaRuntime.hookPath)
}
return nil
}
if config.mode != "" && config.mode != "config-file" {
m.logger.Warningf("Ignoring unsupported config mode for %v: %q", config.runtime, config.mode)
}
config.mode = "config-file"
switch config.runtime {
case "containerd", "crio", "docker":
break
default:
return fmt.Errorf("unrecognized runtime '%v'", config.runtime)
}
switch config.runtime {
case "containerd", "crio":
if config.nvidiaRuntime.path == defaultNVIDIARuntimeExecutable {
config.nvidiaRuntime.path = defaultNVIDIARuntimeExpecutablePath
}
if !filepath.IsAbs(config.nvidiaRuntime.path) {
return fmt.Errorf("the NVIDIA runtime path %q is not an absolute path", config.nvidiaRuntime.path)
}
}
if config.runtime != "containerd" && config.runtime != "docker" {
if config.cdi.enabled {
m.logger.Warningf("Ignoring cdi.enabled flag for %v", config.runtime)
}
config.cdi.enabled = false
}
if config.executablePath != "" && config.runtime == "docker" {
m.logger.Warningf("Ignoring executable-path=%q flag for %v", config.executablePath, config.runtime)
config.executablePath = ""
}
switch config.configSource {
case configSourceCommand:
if config.runtime == "docker" {
m.logger.Warningf("A %v Config Source is not supported for %v; using %v", config.configSource, config.runtime, configSourceFile)
config.configSource = configSourceFile
}
case configSourceFile:
break
default:
return fmt.Errorf("unrecognized Config Source: %v", config.configSource)
}
if config.configFilePath == "" {
switch config.runtime {
case "containerd":
config.configFilePath = defaultContainerdConfigFilePath
case "crio":
config.configFilePath = defaultCrioConfigFilePath
case "docker":
config.configFilePath = defaultDockerConfigFilePath
}
}
return nil
}
// configureWrapper updates the specified container engine config to enable the NVIDIA runtime
func (m command) configureWrapper(c *cli.Context, config *config) error {
switch config.mode {
case "oci-hook":
return m.configureOCIHook(c, config)
case "config-file":
return m.configureConfigFile(c, config)
}
return fmt.Errorf("unsupported config-mode: %v", config.mode)
}
// configureConfigFile updates the specified container engine config file to enable the NVIDIA runtime.
func (m command) configureConfigFile(c *cli.Context, config *config) error {
configSource, err := config.resolveConfigSource()
if err != nil {
return err
}
var cfg engine.Interface
switch config.runtime {
case "containerd":
cfg, err = containerd.New(
containerd.WithLogger(m.logger),
containerd.WithPath(config.configFilePath),
containerd.WithConfigSource(configSource),
)
case "crio":
return m.configureCrio(c, config)
cfg, err = crio.New(
crio.WithLogger(m.logger),
crio.WithPath(config.configFilePath),
crio.WithConfigSource(configSource),
)
case "docker":
return m.configureDocker(c, config)
cfg, err = docker.New(
docker.WithLogger(m.logger),
docker.WithPath(config.configFilePath),
)
default:
err = fmt.Errorf("unrecognized runtime '%v'", config.runtime)
}
return fmt.Errorf("unrecognized runtime '%v'", config.runtime)
}
// configureDocker updates the docker config to enable the NVIDIA Container Runtime
func (m command) configureDocker(c *cli.Context, config *config) error {
configFilePath := config.configFilePath
if configFilePath == "" {
configFilePath = defaultDockerConfigFilePath
}
cfg, err := docker.New(
docker.WithPath(configFilePath),
)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
if err != nil || cfg == nil {
return fmt.Errorf("unable to load config for runtime %v: %v", config.runtime, err)
}
err = cfg.AddRuntime(
config.nvidiaOptions.RuntimeName,
config.nvidiaOptions.RuntimePath,
config.nvidiaOptions.SetAsDefault,
config.nvidiaRuntime.name,
config.nvidiaRuntime.path,
config.nvidiaRuntime.setAsDefault,
)
if err != nil {
return fmt.Errorf("unable to update config: %v", err)
}
if config.dryRun {
output, err := json.MarshalIndent(cfg, "", " ")
if err != nil {
return fmt.Errorf("unable to convert to JSON: %v", err)
}
os.Stdout.WriteString(fmt.Sprintf("%s\n", output))
return nil
if config.cdi.enabled {
cfg.EnableCDI()
}
n, err := cfg.Save(configFilePath)
outputPath := config.getOutputConfigPath()
n, err := cfg.Save(outputPath)
if err != nil {
return fmt.Errorf("unable to flush config: %v", err)
}
if n == 0 {
m.logger.Infof("Removed empty config from %v", configFilePath)
} else {
m.logger.Infof("Wrote updated config to %v", configFilePath)
}
m.logger.Infof("It is recommended that the docker daemon be restarted.")
return nil
}
// configureCrio updates the crio config to enable the NVIDIA Container Runtime
func (m command) configureCrio(c *cli.Context, config *config) error {
configFilePath := config.configFilePath
if configFilePath == "" {
configFilePath = defaultCrioConfigFilePath
}
cfg, err := crio.New(
crio.WithPath(configFilePath),
)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = cfg.AddRuntime(
config.nvidiaOptions.RuntimeName,
config.nvidiaOptions.RuntimePath,
config.nvidiaOptions.SetAsDefault,
)
if err != nil {
return fmt.Errorf("unable to update config: %v", err)
}
if config.dryRun {
output, err := toml.Marshal(cfg)
if err != nil {
return fmt.Errorf("unable to convert to TOML: %v", err)
if outputPath != "" {
if n == 0 {
m.logger.Infof("Removed empty config from %v", outputPath)
} else {
m.logger.Infof("Wrote updated config to %v", outputPath)
}
os.Stdout.WriteString(fmt.Sprintf("%s\n", output))
return nil
m.logger.Infof("It is recommended that %v daemon be restarted.", config.runtime)
}
n, err := cfg.Save(configFilePath)
if err != nil {
return fmt.Errorf("unable to flush config: %v", err)
}
if n == 0 {
m.logger.Infof("Removed empty config from %v", configFilePath)
} else {
m.logger.Infof("Wrote updated config to %v", configFilePath)
}
m.logger.Infof("It is recommended that the cri-o daemon be restarted.")
return nil
}
// resolveConfigSource returns the default config source or the user provided config source
func (c *config) resolveConfigSource() (toml.Loader, error) {
switch c.configSource {
case configSourceCommand:
return c.getCommandConfigSource(), nil
case configSourceFile:
return toml.FromFile(c.configFilePath), nil
default:
return nil, fmt.Errorf("unrecognized config source: %s", c.configSource)
}
}
// getConfigSourceCommand returns the default cli command to fetch the current runtime config
func (c *config) getCommandConfigSource() toml.Loader {
switch c.runtime {
case "containerd":
return containerd.CommandLineSource("", c.executablePath)
case "crio":
return crio.CommandLineSource("", c.executablePath)
}
return toml.Empty
}
// getOutputConfigPath returns the configured config path or "" if dry-run is enabled
func (c *config) getOutputConfigPath() string {
if c.dryRun {
return ""
}
return c.configFilePath
}
// configureOCIHook creates and configures the OCI hook for the NVIDIA runtime
func (m *command) configureOCIHook(c *cli.Context, config *config) error {
err := ocihook.CreateHook(config.hookFilePath, config.nvidiaRuntime.hookPath)
if err != nil {
return fmt.Errorf("error creating OCI hook: %v", err)
}
return nil
}

View File

@@ -1,75 +0,0 @@
/*
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package nvidia
const (
// RuntimeName is the default name to use in configs for the NVIDIA Container Runtime
RuntimeName = "nvidia"
// RuntimeExecutable is the default NVIDIA Container Runtime executable file name
RuntimeExecutable = "nvidia-container-runtime"
)
// Options specifies the options for the NVIDIA Container Runtime w.r.t a container engine such as docker.
type Options struct {
SetAsDefault bool
RuntimeName string
RuntimePath string
}
// Runtime defines an NVIDIA runtime with a name and a executable
type Runtime struct {
Name string
Path string
}
// DefaultRuntime returns the default runtime for the configured options.
// If the configuration is invalid or the default runtimes should not be set
// the empty string is returned.
func (o Options) DefaultRuntime() string {
if !o.SetAsDefault {
return ""
}
return o.RuntimeName
}
// Runtime creates a runtime struct based on the options.
func (o Options) Runtime() Runtime {
path := o.RuntimePath
if o.RuntimePath == "" {
path = RuntimeExecutable
}
r := Runtime{
Name: o.RuntimeName,
Path: path,
}
return r
}
// DockerRuntimesConfig generatest the expected docker config for the specified runtime
func (r Runtime) DockerRuntimesConfig() map[string]interface{} {
runtimes := make(map[string]interface{})
runtimes[r.Name] = map[string]interface{}{
"path": r.Path,
"args": []string{},
}
return runtimes
}

View File

@@ -17,17 +17,18 @@
package runtime
import (
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime/configure"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime/configure"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type runtimeCommand struct {
logger *logrus.Logger
logger logger.Interface
}
// NewCommand constructs a runtime command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
func NewCommand(logger logger.Interface) *cli.Command {
c := runtimeCommand{
logger: logger,
}

View File

@@ -20,37 +20,49 @@ import (
"fmt"
"path/filepath"
"github.com/NVIDIA/go-nvlib/pkg/nvpci"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info/proc/devices"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/nvcaps"
"github.com/sirupsen/logrus"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvpci"
)
type allPossible struct {
logger *logrus.Logger
driverRoot string
logger logger.Interface
devRoot string
deviceMajors devices.Devices
migCaps nvcaps.MigCaps
}
// newAllPossible returns a new allPossible device node lister.
// This lister lists all possible device nodes for NVIDIA GPUs, control devices, and capability devices.
func newAllPossible(logger *logrus.Logger, driverRoot string) (nodeLister, error) {
func newAllPossible(logger logger.Interface, devRoot string) (nodeLister, error) {
deviceMajors, err := devices.GetNVIDIADevices()
if err != nil {
return nil, fmt.Errorf("failed reading device majors: %v", err)
}
var requiredMajors []devices.Name
migCaps, err := nvcaps.NewMigCaps()
if err != nil {
return nil, fmt.Errorf("failed to read MIG caps: %v", err)
}
if migCaps == nil {
migCaps = make(nvcaps.MigCaps)
} else {
requiredMajors = append(requiredMajors, devices.NVIDIACaps)
}
requiredMajors = append(requiredMajors, devices.NVIDIAGPU, devices.NVIDIAUVM)
for _, name := range requiredMajors {
if !deviceMajors.Exists(name) {
return nil, fmt.Errorf("missing required device major %s", name)
}
}
l := allPossible{
logger: logger,
driverRoot: driverRoot,
devRoot: devRoot,
deviceMajors: deviceMajors,
migCaps: migCaps,
}
@@ -60,8 +72,9 @@ func newAllPossible(logger *logrus.Logger, driverRoot string) (nodeLister, error
// DeviceNodes returns a list of all possible device nodes for NVIDIA GPUs, control devices, and capability devices.
func (m allPossible) DeviceNodes() ([]deviceNode, error) {
gpus, err := nvpci.NewFrom(
filepath.Join(m.driverRoot, nvpci.PCIDevicesRoot),
gpus, err := nvpci.New(
nvpci.WithPCIDevicesRoot(filepath.Join(m.devRoot, nvpci.PCIDevicesRoot)),
nvpci.WithLogger(m.logger),
).GetGPUs()
if err != nil {
return nil, fmt.Errorf("failed to get GPU information: %v", err)
@@ -69,7 +82,7 @@ func (m allPossible) DeviceNodes() ([]deviceNode, error) {
count := len(gpus)
if count == 0 {
m.logger.Infof("No NVIDIA devices found in %s", m.driverRoot)
m.logger.Infof("No NVIDIA devices found in %s", m.devRoot)
return nil, nil
}
@@ -168,7 +181,7 @@ func (m allPossible) newDeviceNode(deviceName devices.Name, path string, minor i
major, _ := m.deviceMajors.Get(deviceName)
return deviceNode{
path: filepath.Join(m.driverRoot, path),
path: filepath.Join(m.devRoot, path),
major: uint32(major),
minor: uint32(minor),
}

Some files were not shown because too many files have changed in this diff Show More