Compare commits

..

229 Commits

Author SHA1 Message Date
Evan Lezar
dc1240922a Merge pull request #592 from elezar/fix-release-scripts
[no-relnote] fix package release scripts
2024-07-09 15:35:21 +02:00
Evan Lezar
eda83913a8 [no-relnote] fix package release scripts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 15:34:31 +02:00
Evan Lezar
b093d6a8b3 Merge pull request #575 from elezar/fix-centos7-builds
Fix centos7 builds
2024-07-01 16:55:57 +02:00
Evan Lezar
b787b46480 Bump libnvidia-container to 24b3f92
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 16:47:37 +02:00
Evan Lezar
4cb817ce92 [no-relnote] Use centos:7 vault repos for builds
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 16:47:37 +02:00
Evan Lezar
770eb72ee4 [no-relnote] Update submodule to release-1.15
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 16:47:31 +02:00
Evan Lezar
8b9f22b6c5 Merge pull request #540 from elezar/fix-ppcle64-builds
Fix ppcle64 builds
2024-06-11 14:54:03 +02:00
Evan Lezar
644610b16a Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-11 14:53:33 +02:00
Evan Lezar
a8ec6e5b56 Use archived package repo for centos:stream8
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-11 14:52:32 +02:00
Evan Lezar
4f4324e105 Merge pull request #535 from NVIDIA/dependabot/go_modules/release-1.15/golang.org/x/mod-0.18.0
Bump golang.org/x/mod from 0.17.0 to 0.18.0
2024-06-10 14:44:59 +02:00
dependabot[bot]
675f259bc8 Bump golang.org/x/mod from 0.17.0 to 0.18.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.17.0 to 0.18.0.
- [Commits](https://github.com/golang/mod/compare/v0.17.0...v0.18.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-10 11:45:06 +00:00
Evan Lezar
3fd22e54b6 Merge pull request #534 from NVIDIA/dependabot/go_modules/release-1.15/golang.org/x/sys-0.21.0
Bump golang.org/x/sys from 0.20.0 to 0.21.0
2024-06-10 13:43:57 +02:00
dependabot[bot]
cc728cf96c Bump golang.org/x/sys from 0.20.0 to 0.21.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.20.0 to 0.21.0.
- [Commits](https://github.com/golang/sys/compare/v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-10 11:36:59 +00:00
Evan Lezar
ffc9824291 Merge pull request #525 from NVIDIA/dependabot/docker/deployments/container/release-1.15/nvidia/cuda-12.5.0-base-ubuntu20.04
Bump nvidia/cuda from 12.4.1-base-ubuntu20.04 to 12.5.0-base-ubuntu20.04 in /deployments/container
2024-06-10 13:36:05 +02:00
Evan Lezar
eb619869d6 Merge pull request #537 from elezar/bump-github.com/xrash/smetrics
Bump github.com/xrash/smetrics to v0.0.0-20240521201337-686a1a2994c1
2024-06-10 13:35:36 +02:00
Evan Lezar
34bae6f368 Bump github.com/xrash/smetrics to v0.0.0-20240521201337-686a1a2994c1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-10 13:32:26 +02:00
dependabot[bot]
11e50626c6 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.4.1-base-ubuntu20.04 to 12.5.0-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-03 08:23:53 +00:00
Evan Lezar
2b3599d63b Merge pull request #505 from NVIDIA/dependabot/go_modules/release-1.15/github.com/NVIDIA/go-nvlib-0.4.0
Bump github.com/NVIDIA/go-nvlib from 0.3.0 to 0.4.0
2024-05-21 13:50:53 +02:00
dependabot[bot]
aeef21029e ---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvlib
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-21 13:40:27 +02:00
Evan Lezar
99e08b572a Merge pull request #496 from NVIDIA/dependabot/go_modules/release-1.15/github.com/NVIDIA/go-nvml-0.12.0-6
Bump github.com/NVIDIA/go-nvml from 0.12.0-4 to 0.12.0-6
2024-05-15 21:12:01 +02:00
Evan Lezar
1947c7b571 Merge pull request #481 from NVIDIA/dependabot/go_modules/release-1.15/github.com/urfave/cli/v2-2.27.2
Bump github.com/urfave/cli/v2 from 2.27.1 to 2.27.2
2024-05-15 21:11:23 +02:00
dependabot[bot]
674271dff3 Bump github.com/NVIDIA/go-nvml from 0.12.0-4 to 0.12.0-6
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.0-4 to 0.12.0-6.
- [Commits](https://github.com/NVIDIA/go-nvml/compare/v0.12.0-4...v0.12.0-6)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-15 12:32:17 +00:00
Evan Lezar
1538204ff7 Merge pull request #480 from NVIDIA/dependabot/go_modules/release-1.15/github.com/NVIDIA/go-nvml-0.12.0-5
Bump github.com/NVIDIA/go-nvml from 0.12.0-3 to 0.12.0-5
2024-05-15 14:31:24 +02:00
Evan Lezar
38137a2d1c Merge pull request #486 from NVIDIA/dependabot/go_modules/release-1.15/golang.org/x/sys-0.20.0
Bump golang.org/x/sys from 0.19.0 to 0.20.0
2024-05-15 14:31:08 +02:00
dependabot[bot]
8efd0f26b0 Bump github.com/urfave/cli/v2 from 2.27.1 to 2.27.2
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.27.1 to 2.27.2.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.27.1...v2.27.2)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-15 14:28:20 +02:00
dependabot[bot]
7bf948938f Bump github.com/NVIDIA/go-nvml from 0.12.0-3 to 0.12.0-5
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.0-3 to 0.12.0-5.
- [Commits](https://github.com/NVIDIA/go-nvml/compare/v0.12.0-3...v0.12.0-5)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-07 13:22:21 +00:00
Evan Lezar
90d27b8abd Merge pull request #468 from elezar/manual/go_modules/release-1.15/github.com/NVIDIA/go-nvml-0.12.0-4
Manual/go modules/release 1.15/GitHub.com/nvidia/go nvml 0.12.0 4
2024-05-07 15:21:40 +02:00
dependabot[bot]
29c4fec1f8 Bump golang.org/x/sys from 0.19.0 to 0.20.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.19.0 to 0.20.0.
- [Commits](https://github.com/golang/sys/compare/v0.19.0...v0.20.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-05 08:31:45 +00:00
Evan Lezar
28bb021935 Merge pull request #470 from NVIDIA/dependabot/go_modules/release-1.15/tags.cncf.io/container-device-interface-0.7.2
Bump tags.cncf.io/container-device-interface from 0.7.1 to 0.7.2
2024-04-22 14:55:26 +02:00
dependabot[bot]
993412c804 Bump tags.cncf.io/container-device-interface from 0.7.1 to 0.7.2
Bumps [tags.cncf.io/container-device-interface](https://github.com/cncf-tags/container-device-interface) from 0.7.1 to 0.7.2.
- [Release notes](https://github.com/cncf-tags/container-device-interface/releases)
- [Commits](https://github.com/cncf-tags/container-device-interface/compare/v0.7.1...v0.7.2)

---
updated-dependencies:
- dependency-name: tags.cncf.io/container-device-interface
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-21 08:25:18 +00:00
Evan Lezar
b6be911eaa Replace go-nvlib/pkg/nvml with go-nvml/pkg/nvml
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-18 15:01:20 +02:00
Evan Lezar
2019cd6f0a Update to go-nvlib v0.3.0 and go-nvml v0.12.0-4
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-18 15:01:03 +02:00
Evan Lezar
ddeeca392c Merge pull request #462 from elezar/bump-version-v1.15.0
Bump version to v1.15.0
2024-04-15 15:21:52 +02:00
Evan Lezar
9944feee45 Bump version to v1.15.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-15 14:37:16 +02:00
Evan Lezar
762b14b6cd Merge pull request #459 from elezar/remove-runtime-docker
Remove runtime and docker packages
2024-04-15 11:51:32 +02:00
Evan Lezar
e76e10fb36 Remove third_party package folders
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-12 14:35:03 +02:00
Evan Lezar
fcdf565586 Remove tooling to build packages
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-12 14:32:46 +02:00
Evan Lezar
7a9bc14d98 Merge pull request #425 from jmbaur/discover-use-xdg
Use XDG_DATA_DIRS instead of hardcoding /usr/share
2024-04-11 19:09:53 +02:00
Jared Baur
5788e622f4 Use XDG_DATA_DIRS instead of hardcoding /usr/share
When running nvidia-ctk on a system that uses a custom XDG_DATA_DIRS
environment variable value, the configuration files for `glvnd`,
`vulkan`, and `egl` fail to get passed through from the host to the
container. Reading from XDG_DATA_DIRS instead of hardcoding the default
value allows for finding said files so they can be mounted in the
container.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-11 17:13:02 +02:00
Evan Lezar
29c0f82ed2 Merge pull request #327 from elezar/add-driver-config
Add config search path option to driver root
2024-04-11 16:58:33 +02:00
Evan Lezar
e1417bee64 Merge pull request #456 from elezar/fix-ubi8-image-build
Remove unneeded repo manipulation
2024-04-11 16:54:31 +02:00
Evan Lezar
5f9e49705c Remove unneeded repo manipulation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-11 16:44:42 +02:00
Evan Lezar
1d2b61ee11 Merge pull request #455 from elezar/fix-ubi8-image-build
Fix typo in dockerfile
2024-04-11 15:10:04 +02:00
Evan Lezar
271987d448 Fix typo in dockerfile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-11 15:06:22 +02:00
Evan Lezar
6cac2f5848 Merge pull request #446 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.19.0
Bump golang.org/x/sys from 0.18.0 to 0.19.0
2024-04-11 11:41:51 +02:00
dependabot[bot]
ef4eb0d3c6 Bump golang.org/x/sys from 0.18.0 to 0.19.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.18.0 to 0.19.0.
- [Commits](https://github.com/golang/sys/compare/v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-11 09:39:33 +00:00
Evan Lezar
04ab0595fa Merge pull request #449 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.17.0
Bump golang.org/x/mod from 0.16.0 to 0.17.0
2024-04-11 11:38:52 +02:00
Evan Lezar
9d3418d603 Merge pull request #454 from NVIDIA/dependabot/docker/deployments/container/nvidia/cuda-12.4.1-base-ubuntu20.04
Bump nvidia/cuda from 12.3.2-base-ubuntu20.04 to 12.4.1-base-ubuntu20.04 in /deployments/container
2024-04-11 11:38:18 +02:00
dependabot[bot]
57acd85fb1 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.3.2-base-ubuntu20.04 to 12.4.1-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-10 14:11:46 +00:00
Evan Lezar
6d69ca81de Merge pull request #453 from elezar/bump-cuda-version-in-containers
Add dependabot update for CUDA base images
2024-04-10 16:11:26 +02:00
Evan Lezar
be73581489 Add dependabot update for Dockefiles
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-10 16:06:27 +02:00
Evan Lezar
5682ce3149 Specify CUDA base images directly in Dockerfile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-10 16:06:27 +02:00
dependabot[bot]
cb2b000ddc Bump golang.org/x/mod from 0.16.0 to 0.17.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.16.0 to 0.17.0.
- [Commits](https://github.com/golang/mod/compare/v0.16.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-08 11:01:21 +00:00
Evan Lezar
cbc6ff73a4 Merge pull request #441 from elezar/bump-cdi-version
Update tags.cncf.io/container-device-interface to v0.7.1
2024-04-08 13:00:33 +02:00
Evan Lezar
4cd86caf67 Use NewCache instead of GetRegistry
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-05 17:09:17 +02:00
Evan Lezar
885313af3b Bump tags.cncf.io/container-device-interface to v0.7.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-05 17:05:40 +02:00
Evan Lezar
26e52b8013 Merge pull request #438 from elezar/refactor-driver-root
Create root.Driver instance at first usage
2024-04-03 15:11:45 +02:00
Evan Lezar
011c658945 Create root.Driver instance at first usage
This allows for testing through injection of the driver root.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-03 15:03:33 +02:00
Evan Lezar
413da20838 Merge pull request #362 from elezar/add-feature-flags
Add support for feature flags
2024-04-03 12:04:18 +02:00
Evan Lezar
09341a0934 Add support for feature flags
This change adds a features config that allows
individual features to be toggled at a global level. Each feature can (by default)
be controlled by an environment variable.

The GDS, MOFED, NVSWITCH, and GDRCOPY features are examples of such features.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-03 11:58:37 +02:00
Evan Lezar
2a9e3537ec Add config search paths option to driver root.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-02 23:03:05 +02:00
Evan Lezar
c374520b64 Merge pull request #436 from elezar/remove-verbose-from-tests
Remove verbose from tests
2024-04-02 17:46:33 +02:00
Evan Lezar
e982b9798c Remove verbose from tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-02 17:45:25 +02:00
Evan Lezar
7eb031919c Merge pull request #430 from lengrongfu/main
fix doc link
2024-04-02 17:41:19 +02:00
Evan Lezar
97950d6b8d Merge pull request #418 from elezar/bump-golang-version
Bump golang version to v1.22.1
2024-04-02 10:59:45 +02:00
Evan Lezar
1613f35bf5 Merge pull request #419 from elezar/rename-build-deployments
Rename build folder deployments
2024-04-02 10:57:54 +02:00
rongfu.leng
a78a7f866f fix doc
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
2024-03-28 13:55:25 +08:00
Evan Lezar
643b89e539 Add driver.Config
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-25 20:18:15 +02:00
Evan Lezar
bdfa525a75 Merge pull request #427 from elezar/add-functional-options-to-driver-root
Use functional options to construct driver root
2024-03-25 20:17:54 +02:00
Evan Lezar
93763d25f0 Use functional options to construct driver root
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-25 20:13:25 +02:00
Evan Lezar
5800e55027 Rename build folder deployments
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 14:00:24 +02:00
Evan Lezar
c572c3b787 Remove lint-internal make target
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:54:16 +02:00
Evan Lezar
3f7ed7c8db Rename golangci-lint target to lint
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:54:16 +02:00
Evan Lezar
cc6cbd4a89 Use versions.mk GOLANG version in CI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:54:16 +02:00
Evan Lezar
98ad835a77 Add vendor and check-vendor make targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:48:58 +02:00
Evan Lezar
3a1ac85020 Bump golang version to 1.22.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:46:47 +02:00
Evan Lezar
1ddc859700 Merge pull request #417 from elezar/bump-version-v1.15.0-rc.4
Bump version v1.15.0 rc.4
2024-03-19 10:47:10 +02:00
Evan Lezar
f1f629674e Bump CUDA base image to 12.3.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 10:37:36 +02:00
Evan Lezar
5a6bf02914 Bump version to v1.15.0-rc.4
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 10:36:02 +02:00
Evan Lezar
197cbbe0c6 Merge pull request #411 from elezar/bump-go-nvlib
Bump go-nvlib to v0.2.0  and go-nvml v0.12.0-3
2024-03-15 15:12:01 +02:00
Evan Lezar
b9abb44613 Bump go-nvlib to v0.2.0 and go-nvml v0.12.0-3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-15 15:09:12 +02:00
Evan Lezar
c4ec4a01f8 Merge pull request #384 from tariq1890/upd-go
Fix dockerfile arch selection
2024-03-15 08:58:27 +02:00
Tariq Ibrahim
f40f4369a1 Fix dockerfile arch selection
This change includes an arm64 arch check when installing golang.

Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-15 08:54:00 +02:00
Evan Lezar
2733661125 Merge pull request #383 from elezar/fix-golangci-lint-in-shell
Add GOLANGCI_LINT_CACHE to docker make targets
2024-03-15 08:51:29 +02:00
Evan Lezar
4806f6e70d Merge pull request #395 from elezar/add-nvidia-visible-devices-void
Add NVIDIA_VISIBLE_DEVICES=void to CDI specs
2024-03-13 10:00:38 +02:00
Evan Lezar
db21f5f9a8 Merge pull request #400 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.18.0
Bump golang.org/x/sys from 0.17.0 to 0.18.0
2024-03-12 15:16:05 +02:00
Evan Lezar
07443a0e86 Merge pull request #405 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-6c8f1df
Bump third_party/libnvidia-container from `86f1946` to `6c8f1df`
2024-03-12 15:15:32 +02:00
dependabot[bot]
675db67ebb Bump third_party/libnvidia-container from 86f1946 to 6c8f1df
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `86f1946` to `6c8f1df`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](86f19460fb...6c8f1df7fd)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-12 12:10:07 +00:00
Evan Lezar
14ecacf6d1 Merge pull request #403 from elezar/add-github-submodule-update
Add dependabot config to update libnvidia-container submodule
2024-03-12 13:47:17 +02:00
Evan Lezar
9451da1e6d Add dependabot config to update libnvidia-container submodule
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-12 13:45:04 +02:00
Evan Lezar
e30ddb398f Merge pull request #397 from jbujak/enumAdapters3
Use D3DKMTEnumAdapters3 for adapter enumeration
2024-03-12 13:33:25 +02:00
dependabot[bot]
375188495e Bump golang.org/x/sys from 0.17.0 to 0.18.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.17.0 to 0.18.0.
- [Commits](https://github.com/golang/sys/compare/v0.17.0...v0.18.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-12 11:14:05 +00:00
Evan Lezar
5ff48a5a89 Merge pull request #401 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.16.0
Bump golang.org/x/mod from 0.15.0 to 0.16.0
2024-03-12 13:13:24 +02:00
Jakub Bujak
44ae31d101 Use D3DKMTEnumAdapters3 for adapter enumeration
D3DKMTEnumAdapters3 is required to enumerate MCDM compute-only adapters

Signed-off-by: Jakub Bujak <jbujak@nvidia.com>
2024-03-12 11:42:08 +01:00
dependabot[bot]
942e5c7224 Bump golang.org/x/mod from 0.15.0 to 0.16.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.15.0 to 0.16.0.
- [Commits](https://github.com/golang/mod/compare/v0.15.0...v0.16.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-10 08:26:44 +00:00
Evan Lezar
88ad42ccd1 Add NVIDIA_VISIBLE_DEVICES=void to CDI specs
This change ensures taht NVIDIA_VISIBLE_DEVICES=void is included in
generated CDI specs. This prevents the NVIDIA Container Runtime Hook
from injecting devices if NVIDIA_VISIBLE_DEVICES=all is set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-04 16:10:06 +02:00
Evan Lezar
df9732dae4 Merge pull request #369 from NVIDIA/dependabot/go_modules/main/github.com/opencontainers/runtime-spec-1.2.0
Bump github.com/opencontainers/runtime-spec from 1.1.0 to 1.2.0
2024-03-01 22:49:29 +02:00
Evan Lezar
e66cc6a7b1 Merge pull request #392 from elezar/fix-non-fork-image-builds
Allow finer control of pushing images
2024-03-01 22:49:01 +02:00
dependabot[bot]
8f5a9a1918 Bump github.com/opencontainers/runtime-spec from 1.1.0 to 1.2.0
Bumps [github.com/opencontainers/runtime-spec](https://github.com/opencontainers/runtime-spec) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/opencontainers/runtime-spec/releases)
- [Changelog](https://github.com/opencontainers/runtime-spec/blob/main/ChangeLog)
- [Commits](https://github.com/opencontainers/runtime-spec/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: github.com/opencontainers/runtime-spec
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-01 22:42:29 +02:00
Evan Lezar
6b9dee5b77 Allow finer control of pushing images
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-01 22:42:13 +02:00
Evan Lezar
50bbf32cf0 Add GOLANGCI_LINT_CACHE to docker make targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-01 22:27:06 +02:00
Evan Lezar
413c1264ce Merge pull request #391 from elezar/add-lint-excludes-for-prestart
Add lint exclude for hooks.Prestart deprecation
2024-03-01 22:20:54 +02:00
Evan Lezar
c084756e48 Add lint exclude for hooks.Prestart deprecation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-01 22:18:27 +02:00
Evan Lezar
6265a2d89e Merge pull request #389 from NVIDIA/dependabot/go_modules/main/github.com/stretchr/testify-1.9.0
Bump github.com/stretchr/testify from 1.8.4 to 1.9.0
2024-03-01 22:16:05 +02:00
dependabot[bot]
72778ee536 Bump github.com/stretchr/testify from 1.8.4 to 1.9.0
Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.8.4 to 1.9.0.
- [Release notes](https://github.com/stretchr/testify/releases)
- [Commits](https://github.com/stretchr/testify/compare/v1.8.4...v1.9.0)

---
updated-dependencies:
- dependency-name: github.com/stretchr/testify
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-01 15:45:26 +00:00
Evan Lezar
2f11a190bf Merge pull request #388 from elezar/add-gh-pages-dependabot
Add dependabot for gh-pages branches
2024-03-01 17:05:02 +02:00
Evan Lezar
2d394f4624 Add dependabot for gh-pages branches
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-01 16:55:14 +02:00
Evan Lezar
ea55757bc3 Merge pull request #382 from elezar/remove-centos7-image
Remove centos7 container-toolkit image
2024-02-29 12:01:26 +02:00
Evan Lezar
2a620dc845 Merge pull request #387 from elezar/update-libnvidia-container
Update libnvidia-container
2024-02-29 12:00:34 +02:00
Evan Lezar
bad5369760 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-29 11:53:51 +02:00
Evan Lezar
2623e8a707 Allow finer control of pushing images
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-29 11:45:40 +02:00
Evan Lezar
05dd438489 Remove centos7 container-toolkit image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-28 15:45:09 +02:00
Evan Lezar
6780afbed1 Merge pull request #379 from NVIDIA/exists-fallback
[R550 driver support] add fallback logic to device.Exists(name)
2024-02-27 22:25:33 +02:00
Tariq Ibrahim
f80f4c485d [R550 driver support] add fallback logic to device.Exists(name)
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-02-27 11:59:35 -08:00
Evan Lezar
ac63063362 Merge pull request #375 from klueska/add-imex-support
Add imex support
2024-02-27 13:24:07 +02:00
Kevin Klues
761a425e0d Update reference to latest libnvidia-container
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-02-27 10:17:32 +01:00
Kevin Klues
296d4560b0 Add support for an NVIDIA_IMEX_CHANNELS envvar
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-02-26 20:09:43 +01:00
Evan Lezar
0409824106 Merge pull request #370 from elezar/remove-libnvidia-container0-dependency
Remove additional libnvidia-container0 dependency
2024-02-19 13:56:02 +01:00
Evan Lezar
562addc3c6 Remove additional libnvidia-container0 dependency
This change removes the additional libnvidia-container0=0.10.0+jetpack dependency
that was introduced for Tegra-based systems. These have since been migrated to
CDI-based direct injection using the NVIDIA Container Runtime.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-19 13:12:57 +01:00
Evan Lezar
ae82b2d9b6 Merge pull request #345 from elezar/allow-skip-of-device-nodes
Add --create-device-nodes option to toolkit config
2024-02-14 20:28:15 +01:00
Evan Lezar
355997d2d6 Merge pull request #314 from elezar/CNT-4032/mulitple-naming-strategies
Allow multiple naming strategies when generating CDI specification
2024-02-13 16:46:54 +01:00
Evan Lezar
b6efd3091d Use index and uuid as default device-name-strategies
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-13 16:38:18 +01:00
Evan Lezar
52da12cf9a Allow multiple device name strategies to be specified
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-13 16:38:05 +01:00
Evan Lezar
cd7d586afa Also ignore CDI errors if required
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-13 12:37:41 +01:00
Evan Lezar
cc4c2783a3 Add --create-device-nodes option to toolkit config
This change adds a --create-device-nodes option to the toolkit config CLI.
Most noteably, this allows the creation of control devices to be skipped
when CDI spec generation is enabled.

Currently values of "", "node", and "control" are supported and can be set
via the command line flag or the CREATE_DEVICE_NODES environment variable.

The default value of CREATE_DEVICE_NODES=control will trigger the creation
of control device nodes. Setting this envvar to include the (comma-separated)
strings of "" or "none" will disable device node creation regardless of
whether other supported strings are included.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-13 12:37:41 +01:00
Evan Lezar
a8d48808d7 Merge pull request #354 from elezar/add-release-1.14-dependabot
Add dependabot for release-1.14
2024-02-12 17:34:38 +01:00
Evan Lezar
aa724f1ac6 Add dependabot for release-1.14
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-12 17:32:18 +01:00
Evan Lezar
519b9f3cc8 Merge pull request #353 from elezar/update-changelog
Update changelog for #330
2024-02-12 17:29:06 +01:00
Evan Lezar
6e1bc0d7fb Update changelog for #330
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-12 17:28:40 +01:00
Evan Lezar
a2a1a78620 Merge pull request #330 from tariq1890/nvidia-dev-maj-num-lookup
add fallback logic when retrieving major number of the nvidia control device
2024-02-12 17:22:32 +01:00
Evan Lezar
ab7693ac9f Merge pull request #347 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.15.0
Bump golang.org/x/mod from 0.14.0 to 0.15.0
2024-02-12 15:47:01 +01:00
dependabot[bot]
f4df5308d0 Bump golang.org/x/mod from 0.14.0 to 0.15.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.14.0 to 0.15.0.
- [Commits](https://github.com/golang/mod/compare/v0.14.0...v0.15.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-12 13:35:09 +00:00
Evan Lezar
8dcc57c614 Merge pull request #348 from NVIDIA/dependabot/go_modules/main/github.com/sirupsen/logrus-1.9.3
Bump github.com/sirupsen/logrus from 1.9.0 to 1.9.3
2024-02-12 14:35:07 +01:00
Evan Lezar
6594f06e9a Merge pull request #349 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.17.0
Bump golang.org/x/sys from 0.16.0 to 0.17.0
2024-02-12 14:34:27 +01:00
Evan Lezar
8a706a97a0 Merge pull request #350 from NVIDIA/dependabot/github_actions/golangci/golangci-lint-action-4
Bump golangci/golangci-lint-action from 3 to 4
2024-02-12 14:33:43 +01:00
Evan Lezar
39f0bf21ce Merge pull request #346 from elezar/consistent-driver-root
Specify DRIVER_ROOT consistently
2024-02-12 14:33:15 +01:00
dependabot[bot]
0915a12e38 Bump golangci/golangci-lint-action from 3 to 4
Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 3 to 4.
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](https://github.com/golangci/golangci-lint-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: golangci/golangci-lint-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-12 08:13:37 +00:00
dependabot[bot]
e6cd897cc4 Bump golang.org/x/sys from 0.16.0 to 0.17.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.16.0 to 0.17.0.
- [Commits](https://github.com/golang/sys/compare/v0.16.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-11 08:49:59 +00:00
dependabot[bot]
35600910e0 Bump github.com/sirupsen/logrus from 1.9.0 to 1.9.3
Bumps [github.com/sirupsen/logrus](https://github.com/sirupsen/logrus) from 1.9.0 to 1.9.3.
- [Release notes](https://github.com/sirupsen/logrus/releases)
- [Changelog](https://github.com/sirupsen/logrus/blob/master/CHANGELOG.md)
- [Commits](https://github.com/sirupsen/logrus/compare/v1.9.0...v1.9.3)

---
updated-dependencies:
- dependency-name: github.com/sirupsen/logrus
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-11 08:49:55 +00:00
Evan Lezar
f89cef307d Specify DRIVER_ROOT consistently
This change ensures that CLI tools that require the path to the
driver root accept both the NVIDIA_DRIVER_ROOT and DRIVER_ROOT
environment variables in addition to the --driver-root command
line argument.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-09 14:28:56 +01:00
Evan Lezar
e96edb3f36 Merge pull request #342 from elezar/add-spec-dirs-to-cdi-list
Add spec-dir flag to nvidia-ctk cdi list command
2024-02-09 14:17:45 +01:00
Evan Lezar
bab4ec30af Improve error reporting for cdi list
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-08 14:58:48 +01:00
Evan Lezar
b6ab444529 Add spec-dirs argument to cdi list
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-08 14:50:14 +01:00
Evan Lezar
15d905def0 Merge pull request #309 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.1
Bump github.com/urfave/cli/v2 from 2.3.0 to 2.27.1
2024-02-07 11:01:46 +01:00
Evan Lezar
e64b723b71 Add proc.devices.New constructor
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-06 11:19:43 +01:00
Evan Lezar
f0545dd979 Merge pull request #333 from elezar/test-on-darwin
Fix build and tests targets on darwin
2024-02-06 10:16:04 +01:00
Tariq Ibrahim
f414ac2865 add fallback logic when retrieving major number of the nvidia control device
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-02-05 22:55:54 -08:00
Evan Lezar
772cf77dcc Fix build and test on darwin
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-05 23:58:28 +01:00
dependabot[bot]
026055a0b7 Bump github.com/urfave/cli/v2 from 2.3.0 to 2.27.1
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.3.0 to 2.27.1.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.3.0...v2.27.1)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-05 15:04:18 +00:00
Evan Lezar
812e6a2402 Merge pull request #308 from NVIDIA/dependabot/go_modules/main/github.com/pelletier/go-toml-1.9.5
Bump github.com/pelletier/go-toml from 1.9.4 to 1.9.5
2024-02-05 16:03:36 +01:00
Evan Lezar
b56aebb26f Merge pull request #310 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.16.0
Bump golang.org/x/sys from 0.7.0 to 0.16.0
2024-02-05 10:39:58 +01:00
dependabot[bot]
870903e03e Bump golang.org/x/sys from 0.7.0 to 0.16.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.7.0 to 0.16.0.
- [Commits](https://github.com/golang/sys/compare/v0.7.0...v0.16.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-05 09:33:36 +00:00
dependabot[bot]
b233a8b6ba Bump github.com/pelletier/go-toml from 1.9.4 to 1.9.5
Bumps [github.com/pelletier/go-toml](https://github.com/pelletier/go-toml) from 1.9.4 to 1.9.5.
- [Release notes](https://github.com/pelletier/go-toml/releases)
- [Changelog](https://github.com/pelletier/go-toml/blob/v2/.goreleaser.yaml)
- [Commits](https://github.com/pelletier/go-toml/compare/v1.9.4...v1.9.5)

---
updated-dependencies:
- dependency-name: github.com/pelletier/go-toml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-05 09:32:35 +00:00
Evan Lezar
e96e1baed5 Merge pull request #336 from NVIDIA/dependabot/go_modules/main/github.com/fsnotify/fsnotify-1.7.0
Bump github.com/fsnotify/fsnotify from 1.5.4 to 1.7.0
2024-02-05 10:31:45 +01:00
dependabot[bot]
dce368e308 Bump github.com/fsnotify/fsnotify from 1.5.4 to 1.7.0
Bumps [github.com/fsnotify/fsnotify](https://github.com/fsnotify/fsnotify) from 1.5.4 to 1.7.0.
- [Release notes](https://github.com/fsnotify/fsnotify/releases)
- [Changelog](https://github.com/fsnotify/fsnotify/blob/main/CHANGELOG.md)
- [Commits](https://github.com/fsnotify/fsnotify/compare/v1.5.4...v1.7.0)

---
updated-dependencies:
- dependency-name: github.com/fsnotify/fsnotify
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-04 08:26:11 +00:00
Evan Lezar
15f609a52d Merge pull request #332 from elezar/filter-actions
Enable a subset of package and image builds in PRs
2024-02-02 11:34:10 +01:00
Evan Lezar
0bf08085ce Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-02 09:21:54 +01:00
Evan Lezar
da68ad393c Enable subset of image builds in PRs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-02 09:20:05 +01:00
Evan Lezar
2f3600af9a Merge pull request #329 from elezar/migrate-libnvidia-container-to-github
Update libnvidia-container to github ref
2024-02-01 17:08:55 +01:00
Evan Lezar
0ff28aa21b Update libnvidia-container to github ref
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-01 16:36:10 +01:00
Evan Lezar
b88ff4470c Merge pull request #328 from NVIDIA/revert-323-sheckler/cuda-12.3.2
Revert "chore: Update CUDA base image to 12.3.2"
2024-02-01 16:28:59 +01:00
Evan Lezar
cfb1daee0a Revert "chore: Update CUDA base image to 12.3.2" 2024-02-01 16:27:53 +01:00
Evan Lezar
e3ab55beed Merge pull request #323 from heckler1/sheckler/cuda-12.3.2
chore: Update CUDA base image to 12.3.2
2024-02-01 14:02:33 +01:00
Evan Lezar
9530d9949f Merge pull request #325 from elezar/remove-jenkinsfile
Remove unneeded Jenkinsfile
2024-02-01 08:41:41 +01:00
Evan Lezar
6b2cd487a6 Remove unneeded Jenkinsfile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-31 22:05:01 +01:00
Stephen Heckler
e5ec408a5c Release 1.15.0-rc4
Signed-off-by: Stephen Heckler <sheckler@cloudflare.com>
2024-01-31 13:49:31 -06:00
Stephen Heckler
301b666790 chore: Update CUDA base image to 12.3.2
Signed-off-by: Stephen Heckler <sheckler@cloudflare.com>
2024-01-31 13:48:05 -06:00
Evan Lezar
e99b519509 Merge pull request #321 from elezar/bump-libnvidia-container
Update libnvidia-container
2024-01-30 16:44:01 +01:00
Evan Lezar
d123273800 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-30 16:11:57 +01:00
Evan Lezar
07136d9ac4 Merge pull request #312 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.14.0
Bump golang.org/x/mod from 0.5.0 to 0.14.0
2024-01-30 16:04:48 +01:00
Evan Lezar
0ef06be477 Merge pull request #320 from elezar/remove-centos8-jobs
Remove centos8 jobs
2024-01-30 16:03:30 +01:00
Evan Lezar
5a70e75547 Use stable/rpm repo for release tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-30 12:40:57 +01:00
Evan Lezar
46b4cd7b03 Remove unused centos8 jobs
This change removes the centos8-x86_64 and centos8-aarch64 pipeline jobs.

These packages are no longer used since centos7 packages are used instead.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-30 12:40:16 +01:00
Evan Lezar
93e15bc641 Merge pull request #319 from elezar/bump-version-v1.15.0-rc.3
Bump version to v1.15.0-rc.3
2024-01-30 11:41:17 +01:00
Evan Lezar
07d1f48778 Bump version to v1.15.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-30 11:18:17 +01:00
Carlos Eduardo Arango Gutierrez
21ed60bc46 Merge pull request #313 from elezar/fix-update-ldconfig
Fix bug in update-ldcache hook
2024-01-29 18:24:33 +01:00
dependabot[bot]
7abbd98ff0 Bump golang.org/x/mod from 0.5.0 to 0.14.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.5.0 to 0.14.0.
- [Commits](https://github.com/golang/mod/compare/v0.5.0...v0.14.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-29 14:20:35 +00:00
Evan Lezar
862f071557 Fix bug in update-ldcache hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-29 15:20:21 +01:00
Carlos Eduardo Arango Gutierrez
73cd63e4e5 Merge pull request #307 from ArangoGutierrez/githubactions
Add GitHub Actions
2024-01-29 15:18:32 +01:00
Carlos Eduardo Arango Gutierrez
6857f538a6 Add github actions
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-01-29 14:32:02 +01:00
Carlos Eduardo Arango Gutierrez
195e3a22b4 Drop go checks from gitlab ci
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-01-28 18:23:20 +01:00
Carlos Eduardo Arango Gutierrez
88debb8e34 Use test-infra devel image
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-01-28 18:22:51 +01:00
Christopher Desiniotis
03cbf9c6cd Merge branch 'CNT-4798/gdrcopy' into 'main'
Add a new gated modifier for GDRCopy which injects the gdrdrv device node

See merge request nvidia/container-toolkit/container-toolkit!530
2024-01-24 23:17:15 +00:00
Christopher Desiniotis
55097b3d7d Add a new gated modifier for GDRCopy which injects the gdrdrv device node
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-01-24 14:25:58 -08:00
Evan Lezar
738ebd83d3 Merge branch 'fix-cdi-enable-docker' into 'main'
Fix --cdi.enabled for Docker

See merge request nvidia/container-toolkit/container-toolkit!541
2024-01-23 20:16:12 +00:00
Tariq Ibrahim
9c6dd94ac8 Merge branch 'disc-typos' into 'main'
fix minor typos and rm unused logger param

See merge request nvidia/container-toolkit/container-toolkit!540
2024-01-23 18:42:21 +00:00
Evan Lezar
f936f4c0bc Add --cdi.enable as alias for --cdi.enabled
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-23 14:57:15 +01:00
Evan Lezar
ab598f004d Fix --cdi.enabled for Docker
Instead of relying only on Experimental mode, the docker daemon
config requires that CDI is an opt-in feature.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-23 14:56:08 +01:00
Tariq Ibrahim
9c1f0bb08b fix minor typos and rm unused logger param
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-01-22 16:48:11 -08:00
Evan Lezar
b3519fadc4 Merge branch 'ldconfig-path' into 'main'
Allow for customizing the path to ldconfig and call ldconfig with explicit cache/config paths

See merge request nvidia/container-toolkit/container-toolkit!525
2024-01-22 12:36:51 +00:00
Jared Baur
d80657dd0a Explicitly set ldconfig cache and config file
Since the `update-ldcache` hook uses the host's `ldconfig`, the default
cache and config files configured on the host will be used. If those
defaults differ from what nvidia-ctk expects it to be (/etc/ld.so.cache
and /etc/ld.so.conf, respectively), then the hook will fail. This change
makes the call to ldconfig explicit in which cache and config files are
being used.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2024-01-18 02:23:27 -08:00
Jared Baur
838493b8b9 Allow for customizing the path to ldconfig
Since the `createContainer` `runc` hook runs with the environment that
the container's config.json specifies, the path to `ldconfig` may not be
easily resolvable if the host environment differs enough from the
container (e.g. on a NixOS host where all binaries are under hashed
paths in /nix/store with an Ubuntu container whose PATH contains
FHS-style paths such as /bin and /usr/bin). This change allows for
specifying exactly where ldconfig comes from.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2024-01-17 21:07:00 -08:00
Evan Lezar
26a4eb327c Merge branch 'add-crun-as-configured-runtime' into 'main'
Set default low-level runtimes to runc, crun

See merge request nvidia/container-toolkit/container-toolkit!536
2024-01-17 21:28:14 +00:00
Evan Lezar
f6c252cbde Add crun as a default low-level runtime.
This change adds crun as a configured low-level runtime.
Note that runc still preferred and will be used if present on the
system.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-17 11:31:07 +01:00
Evan Lezar
11692a8499 Merge branch 'bump-cuda-12.3.1' into 'main'
Bump CUDA base image to 12.3.1

See merge request nvidia/container-toolkit/container-toolkit!535
2024-01-11 14:03:32 +00:00
Evan Lezar
ba3d80e8ea Merge branch 'fix-user-group' into 'main'
Fix bug in determining CLI user on SUSE systems

See merge request nvidia/container-toolkit/container-toolkit!532
2024-01-11 13:19:25 +00:00
Evan Lezar
9c029cac72 Fix bug in determining CLI user on SUSE systems
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-11 13:54:40 +01:00
Evan Lezar
dd065fa69e Bump CUDA base image to 12.3.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-11 10:35:54 +01:00
Evan Lezar
6f3d9307bb Merge branch 'update-libnvidia-container' into 'main'
Update libnvidia-container to a4ef85eb

See merge request nvidia/container-toolkit/container-toolkit!533
2024-01-10 14:14:39 +00:00
Evan Lezar
72584cd863 Update libnvidia-container to a4ef85eb
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-10 14:29:03 +01:00
Evan Lezar
2a7bfcd36b Merge branch 'fix-mid-device-nodes' into 'main'
Use devRoot to resolve MIG device nodes

See merge request nvidia/container-toolkit/container-toolkit!526
2024-01-09 14:58:19 +00:00
Evan Lezar
21fc1f24e4 Use devRoot to resolve MIG device nodes
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-09 15:40:17 +01:00
Evan Lezar
9396858834 Merge branch 'libnvdxgdmal' into 'main'
Add libnvdxgdmal library

See merge request nvidia/container-toolkit/container-toolkit!529
2024-01-09 14:37:08 +00:00
Jakub Bujak
79acd7acff Add libnvdxgdmal library
This change adds the new libnvdxgdmal.so.1 library to the list of files copied from the DriverStore.

Signed-off-by: Jakub Bujak <jbujak@nvidia.com>
2024-01-09 15:29:55 +01:00
Evan Lezar
fab711ddf3 Merge branch 'remove-libseccomp-dependency' into 'main'
Remove libseccomp package dependency

See merge request nvidia/container-toolkit/container-toolkit!531
2024-01-09 09:44:23 +00:00
Evan Lezar
760cf93317 Remove libseccomp package dependency
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-09 10:02:06 +01:00
Evan Lezar
f4838dde9b Merge branch 'log-requested-mode' into 'main'
Log explicitly requested runtime mode

See merge request nvidia/container-toolkit/container-toolkit!527
2024-01-08 11:42:42 +00:00
Evan Lezar
c90211e070 Log explicitly requested runtime mode
For users running the nvidia-container-runtime it would be useful
to determine the runtime mode used from the logs directly instead
of relying on other log messages as signals. This change ensures
that an explicitly selected mode is also logged instead of only
when mode=auto.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-15 15:35:35 +01:00
Evan Lezar
a2262d00cc Merge branch 'tegra-chardev-discover' into 'main'
Use `devRoot` for discovering character devices on Tegra platforms

See merge request nvidia/container-toolkit/container-toolkit!524
2023-12-15 14:34:07 +00:00
Jared Baur
95b8ebc297 Use devRoot for discovering character devices on Tegra platforms
Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2023-12-14 11:46:21 -08:00
Evan Lezar
99b3050d20 Merge branch 'update-changelog' into 'main'
Update changelog

See merge request nvidia/container-toolkit/container-toolkit!523
2023-12-14 15:35:45 +00:00
Evan Lezar
883f7ec3d8 Update changelog
See merge request nvidia/container-toolkit/container-toolkit!522

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-14 13:47:38 +01:00
Evan Lezar
9dd324be9c Merge branch 'tegra-dev-root' into 'main'
Fix using `devRoot` on Tegra platforms

See merge request nvidia/container-toolkit/container-toolkit!522
2023-12-14 09:46:05 +00:00
Jared Baur
508438a0c5 Fix using devRoot on Tegra platforms
Using `WithDevRoot` on Tegra platforms was incorrectly setting
`driverRoot`, fix it so that it correctly sets `devRoot`.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2023-12-13 19:56:02 -08:00
Christopher Desiniotis
9baed635d1 Merge branch 'CNT-4778/bump-gonvlib' into 'main'
Update to github.com/NVIDIA/go-nvlib@f3264c8a6a7a

See merge request nvidia/container-toolkit/container-toolkit!520
2023-12-13 18:11:59 +00:00
Christopher Desiniotis
895a5ed73a Update to github.com/NVIDIA/go-nvlib@f3264c8a6a7a
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-13 10:08:14 -08:00
Christopher Desiniotis
2d7b126bc9 Merge branch 'CNT-4762/extend-runtime-cdi-device-names' into 'main'
Extend the 'runtime.nvidia.com/gpu' CDI device kind to support full-GPUs...

See merge request nvidia/container-toolkit/container-toolkit!514
2023-12-06 17:10:26 +00:00
Christopher Desiniotis
86d86395ea Update changelog for the automatic CDI spec generation added for the 'runtime.nvidia.com/gpu' CDI kind
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:09:10 -08:00
Christopher Desiniotis
32c3bd1ded Fallback to standard CDI modifier when creation of automatic CDI modifier fails
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
3158146946 Extend the 'runtime.nvidia.com/gpu' CDI device kind to support MIG devices specified by index or UUID
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
def7d09f85 Refactor how device identifiers are parsed before performing automatic CDI spec generation
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
b9ac54b922 Add GetDeviceSpecsByID() API to the nvcdi Interface
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
ae1b7e126c Extend the 'runtime.nvidia.com/gpu' CDI device kind to support full-GPUs specified by index or UUID
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Evan Lezar
08ef3e7969 Merge branch 'bump-version-1.15.0-rc.2' into 'main'
Bump version to v1.15.0-rc.2

See merge request nvidia/container-toolkit/container-toolkit!519
2023-12-06 16:45:29 +00:00
Evan Lezar
ea977fb43e Bump version to v1.15.0-rc.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-06 17:45:08 +01:00
Evan Lezar
7b47eee634 Merge branch 'CNT-4774/implement-set-for-crio' into 'main'
Implement Set() for the crio implementation of engine.Interface

See merge request nvidia/container-toolkit/container-toolkit!517
2023-12-05 09:30:42 +00:00
Christopher Desiniotis
d7a3d93024 Implement Set() for the crio implementation of engine.Interface
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-04 15:27:35 -08:00
Christopher Desiniotis
527248ef5b Merge branch 'CNT-4764/cleanup-engine-interface' into 'main'
Refactor the engine.Interface such that the Set() API does not return an extraneous error

See merge request nvidia/container-toolkit/container-toolkit!515
2023-12-04 23:05:30 +00:00
Christopher Desiniotis
83ad09b179 Refactor the engine.Interface such that the Set() API does not return an extraneous error
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-01 15:59:34 -08:00
663 changed files with 56489 additions and 20505 deletions

View File

@@ -19,7 +19,6 @@ default:
variables:
GIT_SUBMODULE_STRATEGY: recursive
BUILDIMAGE: "${CI_REGISTRY_IMAGE}/build:${CI_COMMIT_SHORT_SHA}"
BUILD_MULTI_ARCH_IMAGES: "true"
stages:
@@ -145,7 +144,7 @@ trigger-pipeline:
- docker login -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}" "${CI_REGISTRY}"
- docker pull "${IMAGE_NAME}:${VERSION}-${DIST}"
script:
- make -f build/container/Makefile test-${DIST}
- make -f deployments/container/Makefile test-${DIST}
# Define the test targets
test-packaging:
@@ -195,7 +194,7 @@ test-packaging:
# Since OUT_IMAGE_NAME and OUT_IMAGE_VERSION are set, this will push the CI image to the
# Target
- make -f build/container/Makefile push-${DIST}
- make -f deployments/container/Makefile push-${DIST}
# Define a staging release step that pushes an image to an internal "staging" repository
# This is triggered for all pipelines (i.e. not only tags) to test the pipeline steps
@@ -225,13 +224,6 @@ test-packaging:
OUT_IMAGE_VERSION: "${DEVEL_RELEASE_IMAGE_VERSION}"
# Define the release jobs
release:staging-centos7:
extends:
- .release:staging
- .dist-centos7
needs:
- image-centos7
release:staging-ubi8:
extends:
- .release:staging

56
.github/dependabot.yml vendored Normal file
View File

@@ -0,0 +1,56 @@
# Please see the documentation for all configuration options:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
version: 2
updates:
- package-ecosystem: "gomod"
target-branch: main
directory: "/"
schedule:
interval: "weekly"
day: "sunday"
ignore:
- dependency-name: k8s.io/*
labels:
- dependencies
- package-ecosystem: "docker"
directory: "/deployments/container"
schedule:
interval: "daily"
- package-ecosystem: "gomod"
# This defines a specific dependabot rule for the latest release-* branch.
target-branch: release-1.14
directory: "/"
schedule:
interval: "weekly"
day: "sunday"
ignore:
- dependency-name: k8s.io/*
labels:
- dependencies
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"
- package-ecosystem: "github-actions"
target-branch: gh-pages
directory: "/"
schedule:
interval: "weekly"
day: "monday"
# Allow dependabot to update the libnvidia-container submodule.
- package-ecosystem: "gitsubmodule"
target-branch: main
directory: "/"
allow:
- dependency-name: "third_party/libnvidia-container"
schedule:
interval: "daily"
labels:
- dependencies
- libnvidia-container

View File

@@ -1,113 +0,0 @@
# Copyright (c) 2020-2023, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# A workflow to trigger ci on hybrid infra (github + self hosted runner)
name: Blossom-CI
on:
issue_comment:
types: [created]
workflow_dispatch:
inputs:
platform:
description: 'runs-on argument'
required: false
args:
description: 'argument'
required: false
jobs:
Authorization:
name: Authorization
runs-on: blossom
outputs:
args: ${{ env.args }}
# This job only runs for pull request comments
if: |
contains( '\
anstockatnv,\
rorajani,\
cdesiniotis,\
shivamerla,\
ArangoGutierrez,\
elezar,\
klueska,\
zvonkok,\
', format('{0},', github.actor)) &&
github.event.comment.body == '/blossom-ci'
steps:
- name: Check if comment is issued by authorized person
run: blossom-ci
env:
OPERATION: 'AUTH'
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO_KEY_DATA: ${{ secrets.BLOSSOM_KEY }}
Vulnerability-scan:
name: Vulnerability scan
needs: [Authorization]
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
with:
repository: ${{ fromJson(needs.Authorization.outputs.args).repo }}
ref: ${{ fromJson(needs.Authorization.outputs.args).ref }}
lfs: 'true'
# repo specific steps
#- name: Setup java
# uses: actions/setup-java@v1
# with:
# java-version: 1.8
# add blackduck properties https://synopsys.atlassian.net/wiki/spaces/INTDOCS/pages/631308372/Methods+for+Configuring+Analysis#Using-a-configuration-file
#- name: Setup blackduck properties
# run: |
# PROJECTS=$(mvn -am dependency:tree | grep maven-dependency-plugin | awk '{ out="com.nvidia:"$(NF-1);print out }' | grep rapids | xargs | sed -e 's/ /,/g')
# echo detect.maven.build.command="-pl=$PROJECTS -am" >> application.properties
# echo detect.maven.included.scopes=compile >> application.properties
- name: Run blossom action
uses: NVIDIA/blossom-action@main
env:
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO_KEY_DATA: ${{ secrets.BLOSSOM_KEY }}
with:
args1: ${{ fromJson(needs.Authorization.outputs.args).args1 }}
args2: ${{ fromJson(needs.Authorization.outputs.args).args2 }}
args3: ${{ fromJson(needs.Authorization.outputs.args).args3 }}
Job-trigger:
name: Start ci job
needs: [Vulnerability-scan]
runs-on: blossom
steps:
- name: Start ci job
run: blossom-ci
env:
OPERATION: 'START-CI-JOB'
CI_SERVER: ${{ secrets.CI_SERVER }}
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Upload-Log:
name: Upload log
runs-on: blossom
if : github.event_name == 'workflow_dispatch'
steps:
- name: Jenkins log for pull request ${{ fromJson(github.event.inputs.args).pr }} (click here)
run: blossom-ci
env:
OPERATION: 'POST-PROCESSING'
CI_SERVER: ${{ secrets.CI_SERVER }}
REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}

76
.github/workflows/golang.yaml vendored Normal file
View File

@@ -0,0 +1,76 @@
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Golang
on:
pull_request:
types:
- opened
- synchronize
branches:
- main
- release-*
push:
branches:
- main
- release-*
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
name: Checkout code
- name: Get Golang version
id: vars
run: |
GOLANG_VERSION=$( grep "GOLANG_VERSION :=" versions.mk )
echo "GOLANG_VERSION=${GOLANG_VERSION##GOLANG_VERSION := }" >> $GITHUB_ENV
- name: Install Go
uses: actions/setup-go@v5
with:
go-version: ${{ env.GOLANG_VERSION }}
- name: Lint
uses: golangci/golangci-lint-action@v4
with:
version: latest
args: -v --timeout 5m
skip-cache: true
- name: Check golang modules
run: make check-vendor
test:
name: Unit test
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Get Golang version
id: vars
run: |
GOLANG_VERSION=$( grep "GOLANG_VERSION :=" versions.mk )
echo "GOLANG_VERSION=${GOLANG_VERSION##GOLANG_VERSION := }" >> $GITHUB_ENV
- name: Install Go
uses: actions/setup-go@v5
with:
go-version: ${{ env.GOLANG_VERSION }}
- run: make test
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
name: Checkout code
- name: Build
run: make docker-build

138
.github/workflows/image.yaml vendored Normal file
View File

@@ -0,0 +1,138 @@
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Run this workflow on pull requests
name: image
on:
pull_request:
types:
- opened
- synchronize
branches:
- main
- release-*
push:
branches:
- main
- release-*
jobs:
packages:
runs-on: ubuntu-latest
strategy:
matrix:
target:
- ubuntu18.04-arm64
- ubuntu18.04-amd64
- ubuntu18.04-ppc64le
- centos7-aarch64
- centos7-x86_64
- centos8-ppc64le
ispr:
- ${{github.event_name == 'pull_request'}}
exclude:
- ispr: true
target: ubuntu18.04-arm64
- ispr: true
target: ubuntu18.04-ppc64le
- ispr: true
target: centos7-aarch64
- ispr: true
target: centos8-ppc64le
fail-fast: false
steps:
- uses: actions/checkout@v4
name: Check out code
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: build ${{ matrix.target }} packages
run: |
sudo apt-get install -y coreutils build-essential sed git bash make
echo "Building packages"
./scripts/build-packages.sh ${{ matrix.target }}
- name: 'Upload Artifacts'
uses: actions/upload-artifact@v4
with:
compression-level: 0
name: toolkit-container-${{ matrix.target }}-${{ github.run_id }}
path: ${{ github.workspace }}/dist/*
image:
runs-on: ubuntu-latest
strategy:
matrix:
dist:
- ubuntu20.04
- ubi8
- packaging
ispr:
- ${{github.event_name == 'pull_request'}}
exclude:
- ispr: true
dist: ubi8
needs: packages
steps:
- uses: actions/checkout@v4
name: Check out code
- name: Calculate build vars
id: vars
run: |
echo "COMMIT_SHORT_SHA=${GITHUB_SHA:0:8}" >> $GITHUB_ENV
echo "LOWERCASE_REPO_OWNER=$(echo "${GITHUB_REPOSITORY_OWNER}" | awk '{print tolower($0)}')" >> $GITHUB_ENV
REPO_FULL_NAME="${{ github.event.pull_request.head.repo.full_name }}"
echo "${REPO_FULL_NAME}"
echo "LABEL_IMAGE_SOURCE=https://github.com/${REPO_FULL_NAME}" >> $GITHUB_ENV
PUSH_ON_BUILD="false"
BUILD_MULTI_ARCH_IMAGES="false"
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
if [[ "${{ github.actor }}" != "dependabot[bot]" && "${{ github.event.pull_request.head.repo.full_name }}" == "${{ github.repository }}" ]]; then
# For non-fork PRs that are not created by dependabot we do push images
PUSH_ON_BUILD="true"
fi
elif [[ "${{ github.event_name }}" == "push" ]]; then
# On push events we do generate images and enable muilti-arch builds
PUSH_ON_BUILD="true"
BUILD_MULTI_ARCH_IMAGES="true"
fi
echo "PUSH_ON_BUILD=${PUSH_ON_BUILD}" >> $GITHUB_ENV
echo "BUILD_MULTI_ARCH_IMAGES=${BUILD_MULTI_ARCH_IMAGES}" >> $GITHUB_ENV
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Get built packages
uses: actions/download-artifact@v4
with:
path: ${{ github.workspace }}/dist/
pattern: toolkit-container-*-${{ github.run_id }}
merge-multiple: true
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build image
env:
IMAGE_NAME: ghcr.io/${LOWERCASE_REPO_OWNER}/container-toolkit
VERSION: ${COMMIT_SHORT_SHA}
run: |
echo "${VERSION}"
make -f deployments/container/Makefile build-${{ matrix.dist }}

View File

@@ -1,22 +0,0 @@
name: Run pre sanity
# run this workflow for each commit
on: [pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build dev image
run: make .build-image
- name: Build
run: make docker-build
- name: Tests
run: make docker-coverage
- name: Checks
run: make docker-check

View File

@@ -15,40 +15,6 @@
include:
- .common-ci.yml
build-dev-image:
stage: image
script:
- apk --no-cache add make bash
- make .build-image
- docker login -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}" "${CI_REGISTRY}"
- make .push-build-image
.requires-build-image:
image: "${BUILDIMAGE}"
needs:
- job: build-dev-image
check:
extends:
- .requires-build-image
stage: go-checks
script:
- make check
go-build:
extends:
- .requires-build-image
stage: go-build
script:
- make build
unit-tests:
extends:
- .requires-build-image
stage: unit-tests
script:
- make coverage
# Define the package build helpers
.multi-arch-build:
before_script:
@@ -110,24 +76,12 @@ package-centos7-x86_64:
- .dist-centos7
- .arch-x86_64
package-centos8-aarch64:
extends:
- .package-build
- .dist-centos8
- .arch-aarch64
package-centos8-ppc64le:
extends:
- .package-build
- .dist-centos8
- .arch-ppc64le
package-centos8-x86_64:
extends:
- .package-build
- .dist-centos8
- .arch-x86_64
package-ubuntu18.04-amd64:
extends:
- .package-build
@@ -172,15 +126,7 @@ package-ubuntu18.04-ppc64le:
- 'echo "Logging in to CI registry ${CI_REGISTRY}"'
- docker login -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}" "${CI_REGISTRY}"
script:
- make -f build/container/Makefile build-${DIST}
image-centos7:
extends:
- .image-build
- .package-artifacts
- .dist-centos7
needs:
- package-centos7-x86_64
- make -f deployments/container/Makefile build-${DIST}
image-ubi8:
extends:
@@ -210,8 +156,6 @@ image-packaging:
- .package-artifacts
- .dist-packaging
needs:
- job: package-centos8-aarch64
- job: package-centos8-x86_64
- job: package-ubuntu18.04-amd64
- job: package-ubuntu18.04-arm64
- job: package-amazonlinux2-aarch64
@@ -288,4 +232,3 @@ test-docker-ubuntu20.04:
- .dist-ubuntu20.04
needs:
- image-ubuntu20.04

4
.gitmodules vendored
View File

@@ -1,4 +1,4 @@
[submodule "third_party/libnvidia-container"]
path = third_party/libnvidia-container
url = https://gitlab.com/nvidia/container-toolkit/libnvidia-container.git
branch = main
url = https://github.com/NVIDIA/libnvidia-container.git
branch = release-1.15

View File

@@ -20,6 +20,9 @@ linters-settings:
local-prefixes: github.com/NVIDIA/nvidia-container-toolkit
issues:
exclude:
# The legacy hook relies on spec.Hooks.Prestart, which is deprecated as of the v1.2.0 OCI runtime spec.
- "SA1019:(.+).Prestart is deprecated(.+)"
exclude-rules:
# Exclude the gocritic dupSubExpr issue for cgo files.
- path: internal/dxcore/dxcore.go

View File

@@ -67,12 +67,7 @@ variables:
regctl manifest get ${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST} --list > /dev/null && echo "${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST}" || ( echo "${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST} does not exist" && sleep infinity )
script:
- regctl registry login "${OUT_REGISTRY}" -u "${OUT_REGISTRY_USER}" -p "${OUT_REGISTRY_TOKEN}"
- make -f build/container/Makefile IMAGE=${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST} OUT_IMAGE=${OUT_IMAGE_NAME}:${CI_COMMIT_SHORT_SHA}-${DIST} push-${DIST}
image-centos7:
extends:
- .dist-centos7
- .image-pull
- make -f deployments/container/Makefile IMAGE=${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST} OUT_IMAGE=${OUT_IMAGE_NAME}:${CI_COMMIT_SHORT_SHA}-${DIST} push-${DIST}
image-ubi8:
extends:
@@ -132,14 +127,6 @@ image-packaging:
- policy_evaluation.json
# Define the scan targets
scan-centos7-amd64:
extends:
- .dist-centos7
- .platform-amd64
- .scan
needs:
- image-centos7
scan-ubuntu20.04-amd64:
extends:
- .dist-ubuntu20.04
@@ -243,11 +230,6 @@ release:staging-ubuntu20.04:
# Define the external release targets
# Release to NGC
release:ngc-centos7:
extends:
- .dist-centos7
- .release:ngc
release:ngc-ubuntu20.04:
extends:
- .dist-ubuntu20.04

View File

@@ -1,5 +1,49 @@
# NVIDIA Container Toolkit Changelog
## v1.15.0
* Remove `nvidia-container-runtime` and `nvidia-docker2` packages.
* Use `XDG_DATA_DIRS` environment variable when locating config files such as graphics config files.
* Add support for v0.7.0 Container Device Interface (CDI) specification.
* Add `--config-search-path` option to `nvidia-ctk cdi generate` command. These paths are used when locating driver files such as graphics config files.
* Use D3DKMTEnumAdapters3 to enumerate adpaters on WSL2 if available.
* Add support for v1.2.0 OCI Runtime specification.
* Explicitly set `NVIDIA_VISIBLE_DEVICES=void` in generated CDI specifications. This prevents the NVIDIA Container Runtime from making additional modifications.
* [libnvidia-container] Use D3DKMTEnumAdapters3 to enumerate adpaters on WSL2 if available.
* [toolkit-container] Bump CUDA base image version to 12.4.1
## v1.15.0-rc.4
* Add a `--spec-dir` option to the `nvidia-ctk cdi generate` command. This allows specs outside of `/etc/cdi` and `/var/run/cdi` to be processed.
* Add support for extracting device major number from `/proc/devices` if `nvidia` is used as a device name over `nvidia-frontend`.
* Allow multiple device naming strategies for `nvidia-ctk cdi generate` command. This allows a single
CDI spec to be generated that includes GPUs by index and UUID.
* Set the default `--device-name-strategy` for the `nvidia-ctk cdi generate` command to `[index, uuid]`.
* Remove `libnvidia-container0` jetpack dependency included for legacy Tegra-based systems.
* Add `NVIDIA_VISIBLE_DEVICES=void` to generated CDI specifications.
* [toolkit-container] Remove centos7 image. The ubi8 image can be used on all RPM-based platforms.
* [toolkit-container] Bump CUDA base image version to 12.3.2
## v1.15.0-rc.3
* Fix bug in `nvidia-ctk hook update-ldcache` where default `--ldconfig-path` value was not applied.
## v1.15.0-rc.2
* Extend the `runtime.nvidia.com/gpu` CDI kind to support full-GPUs and MIG devices specified by index or UUID.
* Fix bug when specifying `--dev-root` for Tegra-based systems.
* Log explicitly requested runtime mode.
* Remove package dependency on libseccomp.
* Added detection of libnvdxgdmal.so.1 on WSL2
* Use devRoot to resolve MIG device nodes.
* Fix bug in determining default nvidia-container-runtime.user config value on SUSE-based systems.
* Add `crun` to the list of configured low-level runtimes.
* Added support for `--ldconfig-path` to `nvidia-ctk cdi generate` command.
* Fix `nvidia-ctk runtime configure --cdi.enabled` for Docker.
* Add discovery of the GDRCopy device (`gdrdrv`) if the `NVIDIA_GDRCOPY` environment variable of the container is set to `enabled`
* [toolkit-container] Bump CUDA base image version to 12.3.1.
## v1.15.0-rc.1
* Skip update of ldcache in containers without ldconfig. The .so.SONAME symlinks are still created.
* Normalize ldconfig path on use. This automatically adjust the ldconfig setting applied to ldconfig.real on systems where this exists.
@@ -10,6 +54,7 @@
* Added support for `nvidia-ctk runtime configure --enable-cdi` for the `docker` runtime. Note that this requires Docker >= 25.
* Fixed bug in `nvidia-ctk config` command when using `--set`. The types of applied config options are now applied correctly.
* Add `--relative-to` option to `nvidia-ctk transform root` command. This controls whether the root transformation is applied to host or container paths.
* Added automatic CDI spec generation when the `runtime.nvidia.com/gpu=all` device is requested by a container.
* [libnvidia-container] Fix device permission check when using cgroupv2 (fixes #227)

View File

@@ -19,7 +19,7 @@ where `TARGET` is a make target that is valid for each of the sub-components.
These include:
* `ubuntu18.04-amd64`
* `centos8-x86_64`
* `centos7-x86_64`
If no `TARGET` is specified, all valid release targets are built.

142
Jenkinsfile vendored
View File

@@ -1,142 +0,0 @@
/*
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
podTemplate (cloud:'sw-gpu-cloudnative',
containers: [
containerTemplate(name: 'docker', image: 'docker:dind', ttyEnabled: true, privileged: true),
containerTemplate(name: 'golang', image: 'golang:1.16.3', ttyEnabled: true)
]) {
node(POD_LABEL) {
def scmInfo
stage('checkout') {
scmInfo = checkout(scm)
}
stage('dependencies') {
container('golang') {
sh 'GO111MODULE=off go get -u github.com/client9/misspell/cmd/misspell'
sh 'GO111MODULE=off go get -u github.com/gordonklaus/ineffassign'
sh 'GO111MODULE=off go get -u golang.org/x/lint/golint'
}
container('docker') {
sh 'apk add --no-cache make bash git'
}
}
stage('check') {
parallel (
getGolangStages(["assert-fmt", "lint", "vet", "ineffassign", "misspell"])
)
}
stage('test') {
parallel (
getGolangStages(["test"])
)
}
def versionInfo
stage('version') {
container('docker') {
versionInfo = getVersionInfo(scmInfo)
println "versionInfo=${versionInfo}"
}
}
def dist = 'ubuntu20.04'
def arch = 'amd64'
def stageLabel = "${dist}-${arch}"
stage('build-one') {
container('docker') {
stage (stageLabel) {
sh "make ${dist}-${arch}"
}
}
}
stage('release') {
container('docker') {
stage (stageLabel) {
def component = 'main'
def repository = 'sw-gpu-cloudnative-debian-local/pool/main/'
def uploadSpec = """{
"files":
[ {
"pattern": "./dist/${dist}/${arch}/*.deb",
"target": "${repository}",
"props": "deb.distribution=${dist};deb.component=${component};deb.architecture=${arch}"
}
]
}"""
sh "echo starting release with versionInfo=${versionInfo}"
if (versionInfo.isTag) {
// upload to artifactory repository
def server = Artifactory.server 'sw-gpu-artifactory'
server.upload spec: uploadSpec
} else {
sh "echo skipping release for non-tagged build"
}
}
}
}
}
}
def getGolangStages(def targets) {
stages = [:]
for (t in targets) {
stages[t] = getLintClosure(t)
}
return stages
}
def getLintClosure(def target) {
return {
container('golang') {
stage(target) {
sh "make ${target}"
}
}
}
}
// getVersionInfo returns a hash of version info
def getVersionInfo(def scmInfo) {
def versionInfo = [
isTag: isTag(scmInfo.GIT_BRANCH)
]
scmInfo.each { k, v -> versionInfo[k] = v }
return versionInfo
}
def isTag(def branch) {
if (!branch.startsWith('v')) {
return false
}
def version = shOutput('git describe --all --exact-match --always')
return version == "tags/${branch}"
}
def shOuptut(def script) {
return sh(script: script, returnStdout: true).trim()
}

View File

@@ -38,8 +38,8 @@ EXAMPLE_TARGETS := $(patsubst %,example-%, $(EXAMPLES))
CMDS := $(patsubst ./cmd/%/,%,$(sort $(dir $(wildcard ./cmd/*/))))
CMD_TARGETS := $(patsubst %,cmd-%, $(CMDS))
CHECK_TARGETS := golangci-lint
MAKE_TARGETS := binaries build check fmt lint-internal test examples cmds coverage generate licenses $(CHECK_TARGETS)
CHECK_TARGETS := lint
MAKE_TARGETS := binaries build check fmt test examples cmds coverage generate licenses vendor check-vendor $(CHECK_TARGETS)
TARGETS := $(MAKE_TARGETS) $(EXAMPLE_TARGETS) $(CMD_TARGETS)
@@ -53,22 +53,26 @@ CLI_VERSION = $(VERSION)
endif
CLI_VERSION_PACKAGE = github.com/NVIDIA/nvidia-container-toolkit/internal/info
GOOS ?= linux
binaries: cmds
ifneq ($(PREFIX),)
cmd-%: COMMAND_BUILD_OPTIONS = -o $(PREFIX)/$(*)
endif
cmds: $(CMD_TARGETS)
ifneq ($(shell uname),Darwin)
EXTLDFLAGS = -Wl,--export-dynamic -Wl,--unresolved-symbols=ignore-in-object-files
else
EXTLDFLAGS = -Wl,-undefined,dynamic_lookup
endif
$(CMD_TARGETS): cmd-%:
GOOS=$(GOOS) go build -ldflags "-extldflags=-Wl,-z,lazy -s -w -X $(CLI_VERSION_PACKAGE).gitCommit=$(GIT_COMMIT) -X $(CLI_VERSION_PACKAGE).version=$(CLI_VERSION)" $(COMMAND_BUILD_OPTIONS) $(MODULE)/cmd/$(*)
go build -ldflags "-s -w '-extldflags=$(EXTLDFLAGS)' -X $(CLI_VERSION_PACKAGE).gitCommit=$(GIT_COMMIT) -X $(CLI_VERSION_PACKAGE).version=$(CLI_VERSION)" $(COMMAND_BUILD_OPTIONS) $(MODULE)/cmd/$(*)
build:
GOOS=$(GOOS) go build ./...
go build ./...
examples: $(EXAMPLE_TARGETS)
$(EXAMPLE_TARGETS): example-%:
GOOS=$(GOOS) go build ./examples/$(*)
go build ./examples/$(*)
all: check test build binary
check: $(CHECK_TARGETS)
@@ -83,15 +87,23 @@ goimports:
go list -f {{.Dir}} $(MODULE)/... \
| xargs goimports -local $(MODULE) -w
golangci-lint:
lint:
golangci-lint run ./...
vendor:
go mod tidy
go mod vendor
go mod verify
check-vendor: vendor
git diff --quiet HEAD -- go.mod go.sum vendor
licenses:
go-licenses csv $(MODULE)/...
COVERAGE_FILE := coverage.out
test: build cmds
go test -v -coverprofile=$(COVERAGE_FILE) $(MODULE)/...
go test -coverprofile=$(COVERAGE_FILE) $(MODULE)/...
coverage: test
cat $(COVERAGE_FILE) | grep -v "_mock.go" > $(COVERAGE_FILE).no-mocks
@@ -100,31 +112,13 @@ coverage: test
generate:
go generate $(MODULE)/...
# Generate an image for containerized builds
# Note: This image is local only
.PHONY: .build-image .pull-build-image .push-build-image
.build-image: docker/Dockerfile.devel
if [ x"$(SKIP_IMAGE_BUILD)" = x"" ]; then \
$(DOCKER) build \
--progress=plain \
--build-arg GOLANG_VERSION="$(GOLANG_VERSION)" \
--tag $(BUILDIMAGE) \
-f $(^) \
docker; \
fi
.pull-build-image:
$(DOCKER) pull $(BUILDIMAGE)
.push-build-image:
$(DOCKER) push $(BUILDIMAGE)
$(DOCKER_TARGETS): docker-%: .build-image
@echo "Running 'make $(*)' in docker container $(BUILDIMAGE)"
$(DOCKER_TARGETS): docker-%:
@echo "Running 'make $(*)' in container image $(BUILDIMAGE)"
$(DOCKER) run \
--rm \
-e GOCACHE=/tmp/.cache \
-e GOLANGCI_LINT_CACHE=/tmp/.cache \
-e GOCACHE=/tmp/.cache/go \
-e GOMODCACHE=/tmp/.cache/gomod \
-e GOLANGCI_LINT_CACHE=/tmp/.cache/golangci-lint \
-v $(PWD):/work \
-w /work \
--user $$(id -u):$$(id -g) \
@@ -137,8 +131,9 @@ PHONY: .shell
$(DOCKER) run \
--rm \
-ti \
-e GOCACHE=/tmp/.cache \
-e GOLANGCI_LINT_CACHE=/tmp/.cache \
-e GOCACHE=/tmp/.cache/go \
-e GOMODCACHE=/tmp/.cache/gomod \
-e GOLANGCI_LINT_CACHE=/tmp/.cache/golangci-lint \
-v $(PWD):/work \
-w /work \
--user $$(id -u):$$(id -g) \

View File

@@ -23,6 +23,7 @@ const (
envNVVisibleDevices = "NVIDIA_VISIBLE_DEVICES"
envNVMigConfigDevices = "NVIDIA_MIG_CONFIG_DEVICES"
envNVMigMonitorDevices = "NVIDIA_MIG_MONITOR_DEVICES"
envNVImexChannels = "NVIDIA_IMEX_CHANNELS"
envNVDriverCapabilities = "NVIDIA_DRIVER_CAPABILITIES"
)
@@ -38,6 +39,7 @@ type nvidiaConfig struct {
Devices string
MigConfigDevices string
MigMonitorDevices string
ImexChannels string
DriverCapabilities string
// Requirements defines the requirements DSL for the container to run.
// This is empty if no specific requirements are needed, or if requirements are
@@ -274,6 +276,14 @@ func getMigDevices(image image.CUDA, envvar string) *string {
return &devices
}
func getImexChannels(image image.CUDA) *string {
if !image.HasEnvvar(envNVImexChannels) {
return nil
}
chans := image.Getenv(envNVImexChannels)
return &chans
}
func (c *HookConfig) getDriverCapabilities(cudaImage image.CUDA, legacyImage bool) image.DriverCapabilities {
// We use the default driver capabilities by default. This is filtered to only include the
// supported capabilities
@@ -328,6 +338,11 @@ func getNvidiaConfig(hookConfig *HookConfig, image image.CUDA, mounts []Mount, p
log.Panicln("cannot set MIG_MONITOR_DEVICES in non privileged container")
}
var imexChannels string
if c := getImexChannels(image); c != nil {
imexChannels = *c
}
driverCapabilities := hookConfig.getDriverCapabilities(image, legacyImage).String()
requirements, err := image.GetRequirements()
@@ -339,6 +354,7 @@ func getNvidiaConfig(hookConfig *HookConfig, image image.CUDA, mounts []Mount, p
Devices: devices,
MigConfigDevices: migConfigDevices,
MigMonitorDevices: migMonitorDevices,
ImexChannels: imexChannels,
DriverCapabilities: driverCapabilities,
Requirements: requirements,
}

View File

@@ -126,6 +126,9 @@ func doPrestart() {
if len(nvidia.MigMonitorDevices) > 0 {
args = append(args, fmt.Sprintf("--mig-monitor=%s", nvidia.MigMonitorDevices))
}
if len(nvidia.ImexChannels) > 0 {
args = append(args, fmt.Sprintf("--imex-channel=%s", nvidia.ImexChannels))
}
for _, cap := range strings.Split(nvidia.DriverCapabilities, ",") {
if len(cap) == 0 {

View File

@@ -198,7 +198,7 @@ invoked from the command line as `runc` would. For example:
```sh
# Setup a rootfs based on Ubuntu 16.04
cd $(mktemp -d) && mkdir rootfs
curl -sS http://cdimage.ubuntu.com/ubuntu-base/releases/16.04/release/ubuntu-base-16.04-core-amd64.tar.gz | tar --exclude 'dev/*' -C rootfs -xz
curl -sS http://cdimage.ubuntu.com/ubuntu-base/releases/16.04/release/ubuntu-base-16.04.6-base-amd64.tar.gz | tar --exclude 'dev/*' -C rootfs -xz
# Create an OCI runtime spec
nvidia-container-runtime spec

View File

@@ -42,16 +42,18 @@ type command struct {
}
type options struct {
output string
format string
deviceNameStrategy string
driverRoot string
devRoot string
nvidiaCTKPath string
mode string
vendor string
class string
output string
format string
deviceNameStrategies cli.StringSlice
driverRoot string
devRoot string
nvidiaCTKPath string
ldconfigPath string
mode string
vendor string
class string
configSearchPaths cli.StringSlice
librarySearchPaths cli.StringSlice
csv struct {
@@ -85,6 +87,11 @@ func (m command) build() *cli.Command {
}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "config-search-path",
Usage: "Specify the path to search for config files when discovering the entities that should be included in the CDI specification.",
Destination: &opts.configSearchPaths,
},
&cli.StringFlag{
Name: "output",
Usage: "Specify the file to output the generated CDI specification to. If this is '' the specification is output to STDOUT",
@@ -108,11 +115,11 @@ func (m command) build() *cli.Command {
Usage: "Specify the root where `/dev` is located. If this is not specified, the driver-root is assumed.",
Destination: &opts.devRoot,
},
&cli.StringFlag{
&cli.StringSliceFlag{
Name: "device-name-strategy",
Usage: "Specify the strategy for generating device names. One of [index | uuid | type-index]",
Value: nvcdi.DeviceNameStrategyIndex,
Destination: &opts.deviceNameStrategy,
Usage: "Specify the strategy for generating device names. If this is specified multiple times, the devices will be duplicated for each strategy. One of [index | uuid | type-index]",
Value: cli.NewStringSlice(nvcdi.DeviceNameStrategyIndex, nvcdi.DeviceNameStrategyUUID),
Destination: &opts.deviceNameStrategies,
},
&cli.StringFlag{
Name: "driver-root",
@@ -129,6 +136,11 @@ func (m command) build() *cli.Command {
Usage: "Specify the path to use for the nvidia-ctk in the generated CDI specification. If this is left empty, the path will be searched.",
Destination: &opts.nvidiaCTKPath,
},
&cli.StringFlag{
Name: "ldconfig-path",
Usage: "Specify the path to use for ldconfig in the generated CDI specification",
Destination: &opts.ldconfigPath,
},
&cli.StringFlag{
Name: "vendor",
Aliases: []string{"cdi-vendor"},
@@ -179,9 +191,11 @@ func (m command) validateFlags(c *cli.Context, opts *options) error {
return fmt.Errorf("invalid discovery mode: %v", opts.mode)
}
_, err := nvcdi.NewDeviceNamer(opts.deviceNameStrategy)
if err != nil {
return err
for _, strategy := range opts.deviceNameStrategies.Value() {
_, err := nvcdi.NewDeviceNamer(strategy)
if err != nil {
return err
}
}
opts.nvidiaCTKPath = config.ResolveNVIDIACTKPath(m.logger, opts.nvidiaCTKPath)
@@ -235,9 +249,13 @@ func formatFromFilename(filename string) string {
}
func (m command) generateSpec(opts *options) (spec.Interface, error) {
deviceNamer, err := nvcdi.NewDeviceNamer(opts.deviceNameStrategy)
if err != nil {
return nil, fmt.Errorf("failed to create device namer: %v", err)
var deviceNamers []nvcdi.DeviceNamer
for _, strategy := range opts.deviceNameStrategies.Value() {
deviceNamer, err := nvcdi.NewDeviceNamer(strategy)
if err != nil {
return nil, fmt.Errorf("failed to create device namer: %v", err)
}
deviceNamers = append(deviceNamers, deviceNamer)
}
cdilib, err := nvcdi.New(
@@ -245,8 +263,10 @@ func (m command) generateSpec(opts *options) (spec.Interface, error) {
nvcdi.WithDriverRoot(opts.driverRoot),
nvcdi.WithDevRoot(opts.devRoot),
nvcdi.WithNVIDIACTKPath(opts.nvidiaCTKPath),
nvcdi.WithDeviceNamer(deviceNamer),
nvcdi.WithLdconfigPath(opts.ldconfigPath),
nvcdi.WithDeviceNamers(deviceNamers...),
nvcdi.WithMode(opts.mode),
nvcdi.WithConfigSearchPaths(opts.configSearchPaths.Value()),
nvcdi.WithLibrarySearchPaths(opts.librarySearchPaths.Value()),
nvcdi.WithCSVFiles(opts.csv.files.Value()),
nvcdi.WithCSVIgnorePatterns(opts.csv.ignorePatterns.Value()),

View File

@@ -17,6 +17,7 @@
package list
import (
"errors"
"fmt"
"github.com/urfave/cli/v2"
@@ -29,7 +30,9 @@ type command struct {
logger logger.Interface
}
type config struct{}
type config struct {
cdiSpecDirs cli.StringSlice
}
// NewCommand constructs a cdi list command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
@@ -55,30 +58,44 @@ func (m command) build() *cli.Command {
},
}
c.Flags = []cli.Flag{}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "spec-dir",
Usage: "specify the directories to scan for CDI specifications",
Value: cli.NewStringSlice(cdi.DefaultSpecDirs...),
Destination: &cfg.cdiSpecDirs,
},
}
return &c
}
func (m command) validateFlags(c *cli.Context, cfg *config) error {
if len(cfg.cdiSpecDirs.Value()) == 0 {
return errors.New("at least one CDI specification directory must be specified")
}
return nil
}
func (m command) run(c *cli.Context, cfg *config) error {
registry, err := cdi.NewCache(
cdi.WithAutoRefresh(false),
cdi.WithSpecDirs(cdi.DefaultSpecDirs...),
cdi.WithSpecDirs(cfg.cdiSpecDirs.Value()...),
)
if err != nil {
return fmt.Errorf("failed to create CDI cache: %v", err)
}
refreshErr := registry.Refresh()
_ = registry.Refresh()
if errors := registry.GetErrors(); len(errors) > 0 {
m.logger.Warningf("The following registry errors were reported:")
for k, err := range errors {
m.logger.Warningf("%v: %v", k, err)
}
}
devices := registry.ListDevices()
m.logger.Infof("Found %d CDI devices", len(devices))
if refreshErr != nil {
m.logger.Warningf("Refreshing the CDI registry returned the following error(s): %v", refreshErr)
}
for _, device := range devices {
fmt.Printf("%s\n", device)
}

View File

@@ -109,7 +109,11 @@ func run(c *cli.Context, opts *options) error {
if err != nil {
return fmt.Errorf("invalid --set option %v: %w", set, err)
}
cfgToml.Set(key, value)
if value == nil {
_ = cfgToml.Delete(key)
} else {
cfgToml.Set(key, value)
}
}
if err := opts.EnsureOutputFolder(); err != nil {
@@ -146,20 +150,25 @@ func setFlagToKeyValue(setFlag string) (string, interface{}, error) {
kind := field.Kind()
if len(setParts) != 2 {
if kind == reflect.Bool {
if kind == reflect.Bool || (kind == reflect.Pointer && field.Elem().Kind() == reflect.Bool) {
return key, true, nil
}
return key, nil, fmt.Errorf("%w: expected key=value; got %v", errInvalidFormat, setFlag)
}
value := setParts[1]
if kind == reflect.Pointer && value != "nil" {
kind = field.Elem().Kind()
}
switch kind {
case reflect.Pointer:
return key, nil, nil
case reflect.Bool:
b, err := strconv.ParseBool(value)
if err != nil {
return key, value, fmt.Errorf("%w: %w", errInvalidFormat, err)
}
return key, b, err
return key, b, nil
case reflect.String:
return key, value, nil
case reflect.Slice:
@@ -201,7 +210,7 @@ func getStruct(current reflect.Type, paths ...string) (reflect.StructField, erro
if !ok {
continue
}
if v != tomlField {
if strings.SplitN(v, ",", 2)[0] != tomlField {
continue
}
if len(paths) == 1 {

View File

@@ -17,6 +17,7 @@
package ldcache
import (
"errors"
"fmt"
"os"
"path/filepath"
@@ -36,6 +37,7 @@ type command struct {
type options struct {
folders cli.StringSlice
ldconfigPath string
containerSpec string
}
@@ -55,6 +57,9 @@ func (m command) build() *cli.Command {
c := cli.Command{
Name: "update-ldcache",
Usage: "Update ldcache in a container by running ldconfig",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &cfg)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
@@ -66,6 +71,12 @@ func (m command) build() *cli.Command {
Usage: "Specify a folder to add to /etc/ld.so.conf before updating the ld cache",
Destination: &cfg.folders,
},
&cli.StringFlag{
Name: "ldconfig-path",
Usage: "Specify the path to the ldconfig program",
Destination: &cfg.ldconfigPath,
Value: "/sbin/ldconfig",
},
&cli.StringFlag{
Name: "container-spec",
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN",
@@ -76,6 +87,13 @@ func (m command) build() *cli.Command {
return &c
}
func (m command) validateFlags(c *cli.Context, cfg *options) error {
if cfg.ldconfigPath == "" {
return errors.New("ldconfig-path must be specified")
}
return nil
}
func (m command) run(c *cli.Context, cfg *options) error {
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
@@ -87,27 +105,33 @@ func (m command) run(c *cli.Context, cfg *options) error {
return fmt.Errorf("failed to determined container root: %v", err)
}
ldconfigPath := m.resolveLDConfigPath("/sbin/ldconfig")
ldconfigPath := m.resolveLDConfigPath(cfg.ldconfigPath)
args := []string{filepath.Base(ldconfigPath)}
if containerRoot != "" {
args = append(args, "-r", containerRoot)
}
if !root(containerRoot).hasPath("/etc/ld.so.cache") {
if root(containerRoot).hasPath("/etc/ld.so.cache") {
args = append(args, "-C", "/etc/ld.so.cache")
} else {
m.logger.Debugf("No ld.so.cache found, skipping update")
args = append(args, "-N")
}
folders := cfg.folders.Value()
if root(containerRoot).hasPath("/etc/ld.so.conf.d") {
err = m.createConfig(containerRoot, folders)
err := m.createConfig(containerRoot, folders)
if err != nil {
return fmt.Errorf("failed to update ld.so.conf: %v", err)
return fmt.Errorf("failed to update ld.so.conf.d: %v", err)
}
} else {
args = append(args, folders...)
}
// Explicitly specify using /etc/ld.so.conf since the host's ldconfig may
// be configured to use a different config file by default.
args = append(args, "-f", "/etc/ld.so.conf")
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
return syscall.Exec(ldconfigPath, args, nil)
}

View File

@@ -149,6 +149,7 @@ func (m command) build() *cli.Command {
},
&cli.BoolFlag{
Name: "cdi.enabled",
Aliases: []string{"cdi.enable"},
Usage: "Enable CDI in the configured runtime",
Destination: &config.cdi.enabled,
},
@@ -308,9 +309,11 @@ func enableCDI(config *config, cfg engine.Interface) error {
}
switch config.runtime {
case "containerd":
return cfg.Set("enable_cdi", true)
cfg.Set("enable_cdi", true)
case "docker":
return cfg.Set("experimental", true)
cfg.Set("features", map[string]bool{"cdi": true})
default:
return fmt.Errorf("enabling CDI in %s is not supported", config.runtime)
}
return fmt.Errorf("enabling CDI in %s is not supported", config.runtime)
return nil
}

View File

@@ -87,7 +87,7 @@ func (m command) build() *cli.Command {
Usage: "The path to the driver root. `DRIVER_ROOT`/dev is searched for NVIDIA device nodes.",
Value: "/",
Destination: &cfg.driverRoot,
EnvVars: []string{"DRIVER_ROOT"},
EnvVars: []string{"NVIDIA_DRIVER_ROOT", "DRIVER_ROOT"},
},
&cli.BoolFlag{
Name: "watch",

View File

@@ -69,7 +69,7 @@ func (m command) build() *cli.Command {
Usage: "the path to the driver root. Device nodes will be created at `DRIVER_ROOT`/dev",
Value: "/",
Destination: &opts.driverRoot,
EnvVars: []string{"DRIVER_ROOT"},
EnvVars: []string{"NVIDIA_DRIVER_ROOT", "DRIVER_ROOT"},
},
&cli.BoolFlag{
Name: "control-devices",

View File

@@ -62,7 +62,7 @@ func (m command) build() *cli.Command {
Usage: "the path to the driver root. Device nodes will be created at `DRIVER_ROOT`/dev",
Value: "/",
Destination: &opts.driverRoot,
EnvVars: []string{"DRIVER_ROOT"},
EnvVars: []string{"NVIDIA_DRIVER_ROOT", "DRIVER_ROOT"},
},
}

View File

@@ -12,11 +12,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.
ARG BASE_DIST
ARG CUDA_VERSION
ARG GOLANG_VERSION=x.x.x
FROM nvidia/cuda:${CUDA_VERSION}-base-${BASE_DIST}
FROM nvidia/cuda:12.5.0-base-ubuntu20.04
ARG ARTIFACTS_ROOT
COPY ${ARTIFACTS_ROOT} /artifacts/packages/
@@ -24,7 +22,6 @@ COPY ${ARTIFACTS_ROOT} /artifacts/packages/
WORKDIR /artifacts/packages
# build-args are added to the manifest.txt file below.
ARG BASE_DIST
ARG PACKAGE_DIST
ARG PACKAGE_VERSION
ARG GIT_BRANCH

View File

@@ -12,12 +12,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.
ARG BASE_DIST
ARG CUDA_VERSION
ARG GOLANG_VERSION=x.x.x
ARG VERSION="N/A"
FROM nvidia/cuda:${CUDA_VERSION}-base-${BASE_DIST} as build
FROM nvidia/cuda:12.5.0-base-ubi8 as build
RUN yum install -y \
wget make git gcc \
@@ -31,7 +29,7 @@ RUN set -eux; \
case "${arch##*-}" in \
x86_64 | amd64) ARCH='amd64' ;; \
ppc64el | ppc64le) ARCH='ppc64le' ;; \
aarch64) ARCH='arm64' ;; \
aarch64 | arm64) ARCH='arm64' ;; \
*) echo "unsupported architecture" ; exit 1 ;; \
esac; \
wget -nv -O - https://storage.googleapis.com/golang/go${GOLANG_VERSION}.linux-${ARCH}.tar.gz \
@@ -50,17 +48,7 @@ COPY . .
RUN GOPATH=/artifacts go install -ldflags="-s -w -X 'main.Version=${VERSION}'" ./tools/...
FROM nvidia/cuda:${CUDA_VERSION}-base-${BASE_DIST}
ARG BASE_DIST
# See https://www.centos.org/centos-linux-eol/
# and https://stackoverflow.com/a/70930049 for move to vault.centos.org
# and https://serverfault.com/questions/1093922/failing-to-run-yum-update-in-centos-8 for move to vault.epel.cloud
RUN [[ "${BASE_DIST}" != "centos8" ]] || \
( \
sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-Linux-* && \
sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.epel.cloud|g' /etc/yum.repos.d/CentOS-Linux-* \
)
FROM nvidia/cuda:12.5.0-base-ubi8
ENV NVIDIA_DISABLE_REQUIRE="true"
ENV NVIDIA_VISIBLE_DEVICES=void

View File

@@ -12,12 +12,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.
ARG BASE_DIST
ARG CUDA_VERSION
ARG GOLANG_VERSION=x.x.x
ARG VERSION="N/A"
FROM nvidia/cuda:${CUDA_VERSION}-base-${BASE_DIST} as build
FROM nvidia/cuda:12.5.0-base-ubuntu20.04 as build
RUN apt-get update && \
apt-get install -y wget make git gcc \
@@ -31,7 +29,7 @@ RUN set -eux; \
case "${arch##*-}" in \
x86_64 | amd64) ARCH='amd64' ;; \
ppc64el | ppc64le) ARCH='ppc64le' ;; \
aarch64) ARCH='arm64' ;; \
aarch64 | arm64) ARCH='arm64' ;; \
*) echo "unsupported architecture" ; exit 1 ;; \
esac; \
wget -nv -O - https://storage.googleapis.com/golang/go${GOLANG_VERSION}.linux-${ARCH}.tar.gz \
@@ -49,7 +47,7 @@ COPY . .
RUN GOPATH=/artifacts go install -ldflags="-s -w -X 'main.Version=${VERSION}'" ./tools/...
FROM nvcr.io/nvidia/cuda:${CUDA_VERSION}-base-${BASE_DIST}
FROM nvcr.io/nvidia/cuda:12.5.0-base-ubuntu20.04
# Remove the CUDA repository configurations to avoid issues with rotated GPG keys
RUN rm -f /etc/apt/sources.list.d/cuda.list
@@ -75,14 +73,6 @@ ARG PACKAGE_VERSION
ARG TARGETARCH
ENV PACKAGE_ARCH ${TARGETARCH}
ARG LIBNVIDIA_CONTAINER_REPO="https://nvidia.github.io/libnvidia-container/stable"
ARG LIBNVIDIA_CONTAINER0_VERSION
RUN if [ "${PACKAGE_ARCH}" = "arm64" ]; then \
curl -L ${LIBNVIDIA_CONTAINER_REPO}/${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container0_${LIBNVIDIA_CONTAINER0_VERSION}_${PACKAGE_ARCH}.deb \
--output ${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container0_${LIBNVIDIA_CONTAINER0_VERSION}_${PACKAGE_ARCH}.deb && \
dpkg -i ${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container0_${LIBNVIDIA_CONTAINER0_VERSION}_${PACKAGE_ARCH}.deb; \
fi
RUN dpkg -i \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container1_1.*.deb \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container-tools_1.*.deb \

View File

@@ -45,7 +45,7 @@ OUT_IMAGE = $(OUT_IMAGE_NAME):$(OUT_IMAGE_TAG)
##### Public rules #####
DEFAULT_PUSH_TARGET := ubuntu20.04
DISTRIBUTIONS := ubuntu20.04 ubi8 centos7
DISTRIBUTIONS := ubuntu20.04 ubi8
META_TARGETS := packaging
@@ -56,9 +56,9 @@ TEST_TARGETS := $(patsubst %,test-%,$(DISTRIBUTIONS))
.PHONY: $(DISTRIBUTIONS) $(PUSH_TARGETS) $(BUILD_TARGETS) $(TEST_TARGETS)
ifneq ($(BUILD_MULTI_ARCH_IMAGES),true)
include $(CURDIR)/build/container/native-only.mk
include $(CURDIR)/deployments/container/native-only.mk
else
include $(CURDIR)/build/container/multi-arch.mk
include $(CURDIR)/deployments/container/multi-arch.mk
endif
# For the default push target we also push a short tag equal to the version.
@@ -84,7 +84,7 @@ push-short:
build-%: DIST = $(*)
build-%: DOCKERFILE = $(CURDIR)/build/container/Dockerfile.$(DOCKERFILE_SUFFIX)
build-%: DOCKERFILE = $(CURDIR)/deployments/container/Dockerfile.$(DOCKERFILE_SUFFIX)
ARTIFACTS_ROOT ?= $(shell realpath --relative-to=$(CURDIR) $(DIST_DIR))
@@ -96,10 +96,7 @@ $(BUILD_TARGETS): build-%: $(ARTIFACTS_ROOT)
$(DOCKER_BUILD_PLATFORM_OPTIONS) \
--tag $(IMAGE) \
--build-arg ARTIFACTS_ROOT="$(ARTIFACTS_ROOT)" \
--build-arg BASE_DIST="$(BASE_DIST)" \
--build-arg CUDA_VERSION="$(CUDA_VERSION)" \
--build-arg GOLANG_VERSION="$(GOLANG_VERSION)" \
--build-arg LIBNVIDIA_CONTAINER0_VERSION="$(LIBNVIDIA_CONTAINER0_DEPENDENCY)" \
--build-arg PACKAGE_DIST="$(PACKAGE_DIST)" \
--build-arg PACKAGE_VERSION="$(PACKAGE_VERSION)" \
--build-arg VERSION="$(VERSION)" \
@@ -111,20 +108,12 @@ $(BUILD_TARGETS): build-%: $(ARTIFACTS_ROOT)
$(CURDIR)
build-ubuntu%: BASE_DIST = $(*)
build-ubuntu%: DOCKERFILE_SUFFIX := ubuntu
build-ubuntu%: PACKAGE_DIST = ubuntu18.04
build-ubuntu%: LIBNVIDIA_CONTAINER0_DEPENDENCY=$(LIBNVIDIA_CONTAINER0_VERSION)
build-ubi8: BASE_DIST := ubi8
build-ubi8: DOCKERFILE_SUFFIX := centos
build-ubi8: DOCKERFILE_SUFFIX := ubi8
build-ubi8: PACKAGE_DIST = centos7
build-centos7: BASE_DIST = $(*)
build-centos7: DOCKERFILE_SUFFIX := centos
build-centos7: PACKAGE_DIST = $(BASE_DIST)
build-packaging: BASE_DIST := ubuntu20.04
build-packaging: DOCKERFILE_SUFFIX := packaging
build-packaging: PACKAGE_ARCH := amd64
build-packaging: PACKAGE_DIST = all
@@ -145,9 +134,7 @@ test-packaging:
@echo "Testing package image contents"
@$(DOCKER) run --rm $(IMAGE) test -d "/artifacts/packages/centos7/aarch64" || echo "Missing centos7/aarch64"
@$(DOCKER) run --rm $(IMAGE) test -d "/artifacts/packages/centos7/x86_64" || echo "Missing centos7/x86_64"
@$(DOCKER) run --rm $(IMAGE) test -d "/artifacts/packages/centos8/aarch64" || echo "Missing centos8/aarch64"
@$(DOCKER) run --rm $(IMAGE) test -d "/artifacts/packages/centos8/ppc64le" || echo "Missing centos8/ppc64le"
@$(DOCKER) run --rm $(IMAGE) test -d "/artifacts/packages/centos8/x86_64" || echo "Missing centos8/x86_64"
@$(DOCKER) run --rm $(IMAGE) test -d "/artifacts/packages/ubuntu18.04/amd64" || echo "Missing ubuntu18.04/amd64"
@$(DOCKER) run --rm $(IMAGE) test -d "/artifacts/packages/ubuntu18.04/arm64" || echo "Missing ubuntu18.04/arm64"
@$(DOCKER) run --rm $(IMAGE) test -d "/artifacts/packages/ubuntu18.04/ppc64le" || echo "Missing ubuntu18.04/ppc64le"

View File

@@ -16,9 +16,6 @@ PUSH_ON_BUILD ?= false
DOCKER_BUILD_OPTIONS = --output=type=image,push=$(PUSH_ON_BUILD)
DOCKER_BUILD_PLATFORM_OPTIONS = --platform=linux/amd64,linux/arm64
# We only have x86_64 packages for centos7
build-centos7: DOCKER_BUILD_PLATFORM_OPTIONS = --platform=linux/amd64
# We only generate amd64 image for ubuntu18.04
build-ubuntu18.04: DOCKER_BUILD_PLATFORM_OPTIONS = --platform=linux/amd64

View File

@@ -22,7 +22,7 @@ RUN set -eux; \
case "${arch##*-}" in \
x86_64 | amd64) ARCH='amd64' ;; \
ppc64el | ppc64le) ARCH='ppc64le' ;; \
aarch64) ARCH='arm64' ;; \
aarch64 | arm64) ARCH='arm64' ;; \
*) echo "unsupported architecture" ; exit 1 ;; \
esac; \
wget -nv -O - https://storage.googleapis.com/golang/go${GOLANG_VERSION}.linux-${ARCH}.tar.gz \

View File

@@ -15,7 +15,7 @@ RUN set -eux; \
case "${arch##*-}" in \
x86_64 | amd64) ARCH='amd64' ;; \
ppc64el | ppc64le) ARCH='ppc64le' ;; \
aarch64) ARCH='arm64' ;; \
aarch64 | arm64) ARCH='arm64' ;; \
*) echo "unsupported architecture"; exit 1 ;; \
esac; \
wget -nv -O - https://storage.googleapis.com/golang/go${GOLANG_VERSION}.linux-${ARCH}.tar.gz \

View File

@@ -17,6 +17,13 @@
ARG BASEIMAGE
FROM ${BASEIMAGE}
# centos:stream8 is EOL.
# We switch to the vault repositories for this base image.
ARG BASEIMAGE
RUN sed -i -e "s|mirrorlist=|#mirrorlist=|g" \
-e "s|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g" \
/etc/yum.repos.d/CentOS-*
RUN yum install -y \
ca-certificates \
gcc \
@@ -33,7 +40,7 @@ RUN set -eux; \
case "${arch##*-}" in \
x86_64 | amd64) ARCH='amd64' ;; \
ppc64el | ppc64le) ARCH='ppc64le' ;; \
aarch64) ARCH='arm64' ;; \
aarch64 | arm64) ARCH='arm64' ;; \
*) echo "unsupported architecture"; exit 1 ;; \
esac; \
wget -nv -O - https://storage.googleapis.com/golang/go${GOLANG_VERSION}.linux-${ARCH}.tar.gz \

View File

@@ -20,7 +20,7 @@ RUN set -eux; \
case "${arch##*-}" in \
x86_64 | amd64) ARCH='amd64' ;; \
ppc64el | ppc64le) ARCH='ppc64le' ;; \
aarch64) ARCH='arm64' ;; \
aarch64 | arm64) ARCH='arm64' ;; \
*) echo "unsupported architecture" ; exit 1 ;; \
esac; \
wget -nv -O - https://storage.googleapis.com/golang/go${GOLANG_VERSION}.linux-${ARCH}.tar.gz \

28
go.mod
View File

@@ -3,23 +3,24 @@ module github.com/NVIDIA/nvidia-container-toolkit
go 1.20
require (
github.com/NVIDIA/go-nvlib v0.0.0-20231116150931-9fd385bace0d
github.com/NVIDIA/go-nvml v0.12.0-1.0.20231020145430-e06766c5e74f
github.com/fsnotify/fsnotify v1.5.4
github.com/opencontainers/runtime-spec v1.1.0
github.com/pelletier/go-toml v1.9.4
github.com/sirupsen/logrus v1.9.0
github.com/stretchr/testify v1.8.4
github.com/urfave/cli/v2 v2.3.0
golang.org/x/mod v0.5.0
golang.org/x/sys v0.7.0
tags.cncf.io/container-device-interface v0.6.2
tags.cncf.io/container-device-interface/specs-go v0.6.0
github.com/NVIDIA/go-nvlib v0.4.0
github.com/NVIDIA/go-nvml v0.12.0-6
github.com/fsnotify/fsnotify v1.7.0
github.com/opencontainers/runtime-spec v1.2.0
github.com/pelletier/go-toml v1.9.5
github.com/sirupsen/logrus v1.9.3
github.com/stretchr/testify v1.9.0
github.com/urfave/cli/v2 v2.27.2
golang.org/x/mod v0.18.0
golang.org/x/sys v0.21.0
tags.cncf.io/container-device-interface v0.7.2
tags.cncf.io/container-device-interface/specs-go v0.7.0
)
require (
github.com/cpuguy83/go-md2man/v2 v2.0.2 // indirect
github.com/cpuguy83/go-md2man/v2 v2.0.4 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/hashicorp/errwrap v1.1.0 // indirect
github.com/kr/pretty v0.3.1 // indirect
github.com/opencontainers/runtime-tools v0.9.1-0.20221107090550-2e043c6bd626 // indirect
@@ -28,6 +29,7 @@ require (
github.com/russross/blackfriday/v2 v2.1.0 // indirect
github.com/syndtr/gocapability v0.0.0-20200815063812-42c35b437635 // indirect
github.com/xeipuuv/gojsonpointer v0.0.0-20190905194746-02993c407bfb // indirect
github.com/xrash/smetrics v0.0.0-20240521201337-686a1a2994c1 // indirect
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect

66
go.sum
View File

@@ -1,20 +1,20 @@
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
github.com/NVIDIA/go-nvlib v0.0.0-20231116150931-9fd385bace0d h1:XxRHS7eNkZVcPpZZmUcoT4oO8FEcoYKn06sooQh5niU=
github.com/NVIDIA/go-nvlib v0.0.0-20231116150931-9fd385bace0d/go.mod h1:HPFNPAYqQeoos58MKUboWsdZMu71EzSQrbmd+QBRD40=
github.com/NVIDIA/go-nvml v0.12.0-1.0.20231020145430-e06766c5e74f h1:FTblgO87K1vPB8tcwM5EOFpFf6UpsrlDpErPm25mFWE=
github.com/NVIDIA/go-nvml v0.12.0-1.0.20231020145430-e06766c5e74f/go.mod h1:7ruy85eOM73muOc/I37euONSwEyFqZsv5ED9AogD4G0=
github.com/NVIDIA/go-nvlib v0.4.0 h1:dvuqjjSamBODFuxttPg4H/xtNVQRZOSlwFtuNKybcGI=
github.com/NVIDIA/go-nvlib v0.4.0/go.mod h1:87z49ULPr4GWPSGfSIp3taU4XENRYN/enIg88MzcL4k=
github.com/NVIDIA/go-nvml v0.12.0-6 h1:FJYc2KrpvX+VOC/8QQvMiQMmZ/nPMRpdJO/Ik4xfcr0=
github.com/NVIDIA/go-nvml v0.12.0-6/go.mod h1:8Llmj+1Rr+9VGGwZuRer5N/aCjxGuR5nPb/9ebBiIEQ=
github.com/blang/semver/v4 v4.0.0 h1:1PFHFE6yCCTv8C1TeyNNarDzntLi7wMI5i/pzqYIsAM=
github.com/blang/semver/v4 v4.0.0/go.mod h1:IbckMUScFkM3pff0VJDNKRiT6TG/YpiHIM2yvyW5YoQ=
github.com/cpuguy83/go-md2man/v2 v2.0.0-20190314233015-f79a8a8ca69d/go.mod h1:maD7wRr/U5Z6m/iR4s+kqSMx2CaBsrgA7czyZG/E6dU=
github.com/cpuguy83/go-md2man/v2 v2.0.2 h1:p1EgwI/C7NhT0JmVkwCD2ZBK8j4aeHQX2pMHHBfMQ6w=
github.com/cpuguy83/go-md2man/v2 v2.0.2/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o=
github.com/cpuguy83/go-md2man/v2 v2.0.4 h1:wfIWP927BUkWJb2NmU/kNDYIBTh/ziUX91+lVfRxZq4=
github.com/cpuguy83/go-md2man/v2 v2.0.4/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o=
github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/fsnotify/fsnotify v1.5.4 h1:jRbGcIw6P2Meqdwuo0H1p6JVLbL5DHKAKlYndzMwVZI=
github.com/fsnotify/fsnotify v1.5.4/go.mod h1:OVB6XrOHzAwXMpEM7uPOzcehqUV2UqJxmVXmkdnm1bU=
github.com/fsnotify/fsnotify v1.7.0 h1:8JEhPFa5W2WU7YfeZzPNqzMP6Lwt7L2715Ggo0nosvA=
github.com/fsnotify/fsnotify v1.7.0/go.mod h1:40Bi/Hjc2AVfZrqy+aj+yEI+/bRxZnMJyTJwOpGvigM=
github.com/google/uuid v1.3.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/hashicorp/errwrap v1.0.0/go.mod h1:YH+1FKiLXxHSkmPseP+kNlulaMuP3n2brvKWEqk/Jc4=
github.com/hashicorp/errwrap v1.1.0 h1:OxrOeh75EUXMY8TBjag2fzXGZ40LB6IKw45YeGUDY2I=
github.com/hashicorp/errwrap v1.1.0/go.mod h1:YH+1FKiLXxHSkmPseP+kNlulaMuP3n2brvKWEqk/Jc4=
@@ -30,42 +30,36 @@ github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
github.com/mndrix/tap-go v0.0.0-20171203230836-629fa407e90b/go.mod h1:pzzDgJWZ34fGzaAZGFW22KVZDfyrYW+QABMrWnJBnSs=
github.com/mrunalp/fileutils v0.5.0/go.mod h1:M1WthSahJixYnrXQl/DFQuteStB1weuxD2QJNHXfbSQ=
github.com/opencontainers/runtime-spec v1.0.3-0.20220825212826-86290f6a00fb/go.mod h1:jwyrGlmzljRJv/Fgzds9SsS/C5hL+LL3ko9hs6T5lQ0=
github.com/opencontainers/runtime-spec v1.1.0 h1:HHUyrt9mwHUjtasSbXSMvs4cyFxh+Bll4AjJ9odEGpg=
github.com/opencontainers/runtime-spec v1.1.0/go.mod h1:jwyrGlmzljRJv/Fgzds9SsS/C5hL+LL3ko9hs6T5lQ0=
github.com/opencontainers/runtime-spec v1.2.0 h1:z97+pHb3uELt/yiAWD691HNHQIF07bE7dzrbT927iTk=
github.com/opencontainers/runtime-spec v1.2.0/go.mod h1:jwyrGlmzljRJv/Fgzds9SsS/C5hL+LL3ko9hs6T5lQ0=
github.com/opencontainers/runtime-tools v0.9.1-0.20221107090550-2e043c6bd626 h1:DmNGcqH3WDbV5k8OJ+esPWbqUOX5rMLR2PMvziDMJi0=
github.com/opencontainers/runtime-tools v0.9.1-0.20221107090550-2e043c6bd626/go.mod h1:BRHJJd0E+cx42OybVYSgUvZmU0B8P9gZuRXlZUP7TKI=
github.com/opencontainers/selinux v1.9.1/go.mod h1:2i0OySw99QjzBBQByd1Gr9gSjvuho1lHsJxIJ3gGbJI=
github.com/opencontainers/selinux v1.11.0 h1:+5Zbo97w3Lbmb3PeqQtpmTkMwsW5nRI3YaLpt7tQ7oU=
github.com/opencontainers/selinux v1.11.0/go.mod h1:E5dMC3VPuVvVHDYmi78qvhJp8+M586T4DlDRYpFkyec=
github.com/pelletier/go-toml v1.9.4 h1:tjENF6MfZAg8e4ZmZTeWaWiT2vXtsoO6+iuOjFhECwM=
github.com/pelletier/go-toml v1.9.4/go.mod h1:u1nR/EPcESfeI/szUZKdtJ0xRNbUoANCkoOuaOx1Y+c=
github.com/pelletier/go-toml v1.9.5 h1:4yBQzkHv+7BHq2PQUZF3Mx0IYxG7LsP222s7Agd3ve8=
github.com/pelletier/go-toml v1.9.5/go.mod h1:u1nR/EPcESfeI/szUZKdtJ0xRNbUoANCkoOuaOx1Y+c=
github.com/pkg/diff v0.0.0-20210226163009-20ebb0f2a09e/go.mod h1:pJLUxLENpZxwdsKMEsNbx1VGcRFpLqf3715MtcvvzbA=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/rogpeppe/go-internal v1.9.0 h1:73kH8U+JUqXU8lRuOHeVHaa/SZPifC7BkcraZVejAe8=
github.com/rogpeppe/go-internal v1.9.0/go.mod h1:WtVeX8xhTBvf0smdhujwtBcq4Qrzq/fJaraNFVN+nFs=
github.com/russross/blackfriday/v2 v2.0.1/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/russross/blackfriday/v2 v2.1.0 h1:JIOH55/0cWyOuilr9/qlrm0BSXldqnqwMsf35Ld67mk=
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/shurcooL/sanitized_anchor_name v1.0.0/go.mod h1:1NzhyTcUVG4SuEtjjoZeVRXNmyL/1OwPU0+IJeTBvfc=
github.com/sirupsen/logrus v1.8.1/go.mod h1:yWOB1SBYBC5VeMP7gHvWumXLIWorT60ONWic61uBYv0=
github.com/sirupsen/logrus v1.9.0 h1:trlNQbNUG3OdDrDil03MCb1H2o9nJ1x4/5LYw7byDE0=
github.com/sirupsen/logrus v1.9.0/go.mod h1:naHLuLoDiP4jHNo9R0sCBMtWGeIprob74mVsIT4qYEQ=
github.com/sirupsen/logrus v1.9.3 h1:dueUQJ1C2q9oE3F7wvmSGAaVtTmUizReu6fjN8uqzbQ=
github.com/sirupsen/logrus v1.9.3/go.mod h1:naHLuLoDiP4jHNo9R0sCBMtWGeIprob74mVsIT4qYEQ=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo=
github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
github.com/stretchr/testify v1.8.4 h1:CcVxjf3Q8PM0mHUKJCdn+eZZtm5yQwehR5yeSVQQcUk=
github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
github.com/syndtr/gocapability v0.0.0-20200815063812-42c35b437635 h1:kdXcSzyDtseVEc4yCz2qF8ZrQvIDBJLl4S1c3GCXmoI=
github.com/syndtr/gocapability v0.0.0-20200815063812-42c35b437635/go.mod h1:hkRG7XYTFWNJGYcbNJQlaLq0fg1yr4J4t/NcTQtrfww=
github.com/urfave/cli v1.19.1/go.mod h1:70zkFmudgCuE/ngEzBv17Jvp/497gISqfk5gWijbERA=
github.com/urfave/cli/v2 v2.3.0 h1:qph92Y649prgesehzOrQjdWyxFOp/QVM+6imKHad91M=
github.com/urfave/cli/v2 v2.3.0/go.mod h1:LJmUH05zAU44vOAcrfzZQKsZbVcdbOG8rtL3/XcUArI=
github.com/urfave/cli/v2 v2.27.2 h1:6e0H+AkS+zDckwPCUrZkKX38mRaau4nL2uipkJpbkcI=
github.com/urfave/cli/v2 v2.27.2/go.mod h1:g0+79LmHHATl7DAcHO99smiR/T7uGLw84w8Y42x+4eM=
github.com/xeipuuv/gojsonpointer v0.0.0-20180127040702-4e3ac2762d5f/go.mod h1:N2zxlSyiKSe5eX1tZViRH5QA0qijqEDrYZiPEAiq3wU=
github.com/xeipuuv/gojsonpointer v0.0.0-20190905194746-02993c407bfb h1:zGWFAtiMcyryUHoUjUJX0/lt1H2+i2Ka2n+D3DImSNo=
github.com/xeipuuv/gojsonpointer v0.0.0-20190905194746-02993c407bfb/go.mod h1:N2zxlSyiKSe5eX1tZViRH5QA0qijqEDrYZiPEAiq3wU=
@@ -73,18 +67,18 @@ github.com/xeipuuv/gojsonreference v0.0.0-20180127040603-bd5ef7bd5415 h1:EzJWgHo
github.com/xeipuuv/gojsonreference v0.0.0-20180127040603-bd5ef7bd5415/go.mod h1:GwrjFmJcFw6At/Gs6z4yjiIwzuJ1/+UwLxMQDVQXShQ=
github.com/xeipuuv/gojsonschema v1.2.0 h1:LhYJRs+L4fBtjZUfuSZIKGeVu0QRy8e5Xi7D17UxZ74=
github.com/xeipuuv/gojsonschema v1.2.0/go.mod h1:anYRn/JVcOK2ZgGU+IjEV4nwlhoK5sQluxsYJ78Id3Y=
golang.org/x/mod v0.5.0 h1:UG21uOlmZabA4fW5i7ZX6bjw1xELEGg/ZLgZq9auk/Q=
golang.org/x/mod v0.5.0/go.mod h1:5OXOZSfqPIIbmVBIIKWRFfZjPR0E5r58TLhUjH0a2Ro=
github.com/xrash/smetrics v0.0.0-20240521201337-686a1a2994c1 h1:gEOO8jv9F4OT7lGCjxCBTO/36wtF6j2nSip77qHd4x4=
github.com/xrash/smetrics v0.0.0-20240521201337-686a1a2994c1/go.mod h1:Ohn+xnUBiLI6FVj/9LpzZWtj1/D6lUovWYBkxHVV3aM=
golang.org/x/mod v0.18.0 h1:5+9lSbEzPSdWkH32vYPBwEpX8KwDbM52Ud9xBUvNlb0=
golang.org/x/mod v0.18.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/sys v0.0.0-20191026070338-33540a1f6037/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191115151921-52ab43148777/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20220412211240-33da011f77ad/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.7.0 h1:3jlCCIQZPdOYu1h8BkNvLz8Kgwtae2cagcG/VamtZRU=
golang.org/x/sys v0.7.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.21.0 h1:rF+pYz3DAGSQAxAu1CbC7catZg4ebC4UIeIhKxBZvws=
golang.org/x/sys v0.21.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
gopkg.in/yaml.v2 v2.2.3/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY=
gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
@@ -92,7 +86,7 @@ gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
sigs.k8s.io/yaml v1.3.0 h1:a2VclLzOGrwOHDiV8EfBGhvjHvP46CtW5j6POvhYGGo=
sigs.k8s.io/yaml v1.3.0/go.mod h1:GeOyir5tyXNByN85N/dRIT9es5UQNerPYEKK56eTBm8=
tags.cncf.io/container-device-interface v0.6.2 h1:dThE6dtp/93ZDGhqaED2Pu374SOeUkBfuvkLuiTdwzg=
tags.cncf.io/container-device-interface v0.6.2/go.mod h1:Shusyhjs1A5Na/kqPVLL0KqnHQHuunol9LFeUNkuGVE=
tags.cncf.io/container-device-interface/specs-go v0.6.0 h1:V+tJJN6dqu8Vym6p+Ru+K5mJ49WL6Aoc5SJFSY0RLsQ=
tags.cncf.io/container-device-interface/specs-go v0.6.0/go.mod h1:hMAwAbMZyBLdmYqWgYcKH0F/yctNpV3P35f+/088A80=
tags.cncf.io/container-device-interface v0.7.2 h1:MLqGnWfOr1wB7m08ieI4YJ3IoLKKozEnnNYBtacDPQU=
tags.cncf.io/container-device-interface v0.7.2/go.mod h1:Xb1PvXv2BhfNb3tla4r9JL129ck1Lxv9KuU6eVOfKto=
tags.cncf.io/container-device-interface/specs-go v0.7.0 h1:w/maMGVeLP6TIQJVYT5pbqTi8SCw/iHZ+n4ignuGHqg=
tags.cncf.io/container-device-interface/specs-go v0.7.0/go.mod h1:hMAwAbMZyBLdmYqWgYcKH0F/yctNpV3P35f+/088A80=

View File

@@ -63,6 +63,9 @@ type Config struct {
NVIDIACTKConfig CTKConfig `toml:"nvidia-ctk"`
NVIDIAContainerRuntimeConfig RuntimeConfig `toml:"nvidia-container-runtime"`
NVIDIAContainerRuntimeHookConfig RuntimeHookConfig `toml:"nvidia-container-runtime-hook"`
// Features allows for finer control over optional features.
Features features `toml:"features,omitempty"`
}
// GetConfigFilePath returns the path to the config file for the configured system
@@ -95,6 +98,7 @@ func GetDefault() (*Config, error) {
NVIDIAContainerCLIConfig: ContainerCLIConfig{
LoadKmods: true,
Ldconfig: getLdConfigPath(),
User: getUserGroup(),
},
NVIDIACTKConfig: CTKConfig{
Path: nvidiaCTKExecutable,
@@ -102,7 +106,7 @@ func GetDefault() (*Config, error) {
NVIDIAContainerRuntimeConfig: RuntimeConfig{
DebugFilePath: "/dev/null",
LogLevel: "info",
Runtimes: []string{"docker-runc", "runc"},
Runtimes: []string{"docker-runc", "runc", "crun"},
Mode: "auto",
Modes: modesConfig{
CSV: csvModeConfig{
@@ -126,24 +130,32 @@ func getLdConfigPath() string {
return NormalizeLDConfigPath("@/sbin/ldconfig")
}
// getCommentedUserGroup returns whether the nvidia-container-cli user and group config option should be commented.
func getCommentedUserGroup() bool {
uncommentIf := map[string]bool{
func getUserGroup() string {
if isSuse() {
return "root:video"
}
return ""
}
// isSuse returns whether a SUSE-based distribution was detected.
func isSuse() bool {
suseDists := map[string]bool{
"suse": true,
"opensuse": true,
}
idsLike := getDistIDLike()
for _, id := range idsLike {
if uncommentIf[id] {
return false
if suseDists[id] {
return true
}
}
return true
return false
}
// getDistIDLike returns the ID_LIKE field from /etc/os-release.
func getDistIDLike() []string {
// We can override this for testing.
var getDistIDLike = func() []string {
releaseFile, err := os.Open("/etc/os-release")
if err != nil {
return nil

View File

@@ -48,6 +48,7 @@ func TestGetConfig(t *testing.T) {
contents []string
expectedError error
inspectLdconfig bool
distIdsLike []string
expectedConfig *Config
}{
{
@@ -64,7 +65,7 @@ func TestGetConfig(t *testing.T) {
NVIDIAContainerRuntimeConfig: RuntimeConfig{
DebugFilePath: "/dev/null",
LogLevel: "info",
Runtimes: []string{"docker-runc", "runc"},
Runtimes: []string{"docker-runc", "runc", "crun"},
Mode: "auto",
Modes: modesConfig{
CSV: csvModeConfig{
@@ -93,6 +94,7 @@ func TestGetConfig(t *testing.T) {
"nvidia-container-cli.root = \"/bar/baz\"",
"nvidia-container-cli.load-kmods = false",
"nvidia-container-cli.ldconfig = \"/foo/bar/ldconfig\"",
"nvidia-container-cli.user = \"foo:bar\"",
"nvidia-container-runtime.debug = \"/foo/bar\"",
"nvidia-container-runtime.discover-mode = \"not-legacy\"",
"nvidia-container-runtime.log-level = \"debug\"",
@@ -112,6 +114,7 @@ func TestGetConfig(t *testing.T) {
Root: "/bar/baz",
LoadKmods: false,
Ldconfig: "/foo/bar/ldconfig",
User: "foo:bar",
},
NVIDIAContainerRuntimeConfig: RuntimeConfig{
DebugFilePath: "/foo/bar",
@@ -152,6 +155,7 @@ func TestGetConfig(t *testing.T) {
"root = \"/bar/baz\"",
"load-kmods = false",
"ldconfig = \"/foo/bar/ldconfig\"",
"user = \"foo:bar\"",
"[nvidia-container-runtime]",
"debug = \"/foo/bar\"",
"discover-mode = \"not-legacy\"",
@@ -176,6 +180,7 @@ func TestGetConfig(t *testing.T) {
Root: "/bar/baz",
LoadKmods: false,
Ldconfig: "/foo/bar/ldconfig",
User: "foo:bar",
},
NVIDIAContainerRuntimeConfig: RuntimeConfig{
DebugFilePath: "/foo/bar",
@@ -207,10 +212,88 @@ func TestGetConfig(t *testing.T) {
},
},
},
{
description: "suse config",
distIdsLike: []string{"suse", "opensuse"},
inspectLdconfig: true,
expectedConfig: &Config{
AcceptEnvvarUnprivileged: true,
SupportedDriverCapabilities: "compat32,compute,display,graphics,ngx,utility,video",
NVIDIAContainerCLIConfig: ContainerCLIConfig{
Root: "",
LoadKmods: true,
Ldconfig: "WAS_CHECKED",
User: "root:video",
},
NVIDIAContainerRuntimeConfig: RuntimeConfig{
DebugFilePath: "/dev/null",
LogLevel: "info",
Runtimes: []string{"docker-runc", "runc", "crun"},
Mode: "auto",
Modes: modesConfig{
CSV: csvModeConfig{
MountSpecPath: "/etc/nvidia-container-runtime/host-files-for-container.d",
},
CDI: cdiModeConfig{
DefaultKind: "nvidia.com/gpu",
AnnotationPrefixes: []string{"cdi.k8s.io/"},
SpecDirs: []string{"/etc/cdi", "/var/run/cdi"},
},
},
},
NVIDIAContainerRuntimeHookConfig: RuntimeHookConfig{
Path: "nvidia-container-runtime-hook",
},
NVIDIACTKConfig: CTKConfig{
Path: "nvidia-ctk",
},
},
},
{
description: "suse config overrides user",
distIdsLike: []string{"suse", "opensuse"},
inspectLdconfig: true,
contents: []string{
"nvidia-container-cli.user = \"foo:bar\"",
},
expectedConfig: &Config{
AcceptEnvvarUnprivileged: true,
SupportedDriverCapabilities: "compat32,compute,display,graphics,ngx,utility,video",
NVIDIAContainerCLIConfig: ContainerCLIConfig{
Root: "",
LoadKmods: true,
Ldconfig: "WAS_CHECKED",
User: "foo:bar",
},
NVIDIAContainerRuntimeConfig: RuntimeConfig{
DebugFilePath: "/dev/null",
LogLevel: "info",
Runtimes: []string{"docker-runc", "runc", "crun"},
Mode: "auto",
Modes: modesConfig{
CSV: csvModeConfig{
MountSpecPath: "/etc/nvidia-container-runtime/host-files-for-container.d",
},
CDI: cdiModeConfig{
DefaultKind: "nvidia.com/gpu",
AnnotationPrefixes: []string{"cdi.k8s.io/"},
SpecDirs: []string{"/etc/cdi", "/var/run/cdi"},
},
},
},
NVIDIAContainerRuntimeHookConfig: RuntimeHookConfig{
Path: "nvidia-container-runtime-hook",
},
NVIDIACTKConfig: CTKConfig{
Path: "nvidia-ctk",
},
},
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
defer setGetDistIDLikeForTest(tc.distIdsLike)()
reader := strings.NewReader(strings.Join(tc.contents, "\n"))
tomlCfg, err := loadConfigTomlFrom(reader)
@@ -236,3 +319,19 @@ func TestGetConfig(t *testing.T) {
})
}
}
// setGetDistIDsLikeForTest overrides the distribution IDs that would normally be read from the /etc/os-release file.
func setGetDistIDLikeForTest(ids []string) func() {
if ids == nil {
return func() {}
}
original := getDistIDLike
getDistIDLike = func() []string {
return ids
}
return func() {
getDistIDLike = original
}
}

View File

@@ -0,0 +1,85 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package config
type featureName string
const (
FeatureGDS = featureName("gds")
FeatureMOFED = featureName("mofed")
FeatureNVSWITCH = featureName("nvswitch")
FeatureGDRCopy = featureName("gdrcopy")
)
// features specifies a set of named features.
type features struct {
GDS *feature `toml:"gds,omitempty"`
MOFED *feature `toml:"mofed,omitempty"`
NVSWITCH *feature `toml:"nvswitch,omitempty"`
GDRCopy *feature `toml:"gdrcopy,omitempty"`
}
type feature bool
// IsEnabled checks whether a specified named feature is enabled.
// An optional list of environments to check for feature-specific environment
// variables can also be supplied.
func (fs features) IsEnabled(n featureName, in ...getenver) bool {
featureEnvvars := map[featureName]string{
FeatureGDS: "NVIDIA_GDS",
FeatureMOFED: "NVIDIA_MOFED",
FeatureNVSWITCH: "NVIDIA_NVSWITCH",
FeatureGDRCopy: "NVIDIA_GDRCOPY",
}
envvar := featureEnvvars[n]
switch n {
case FeatureGDS:
return fs.GDS.isEnabled(envvar, in...)
case FeatureMOFED:
return fs.MOFED.isEnabled(envvar, in...)
case FeatureNVSWITCH:
return fs.NVSWITCH.isEnabled(envvar, in...)
case FeatureGDRCopy:
return fs.GDRCopy.isEnabled(envvar, in...)
default:
return false
}
}
// isEnabled checks whether a feature is enabled.
// If the enabled value is explicitly set, this is returned, otherwise the
// associated envvar is checked in the specified getenver for the string "enabled"
// A CUDA container / image can be passed here.
func (f *feature) isEnabled(envvar string, ins ...getenver) bool {
if f != nil {
return bool(*f)
}
if envvar == "" {
return false
}
for _, in := range ins {
if in.Getenv(envvar) == "enabled" {
return true
}
}
return false
}
type getenver interface {
Getenv(string) string
}

View File

@@ -201,7 +201,7 @@ func (t *Toml) commentDefaults() *Toml {
}
func shouldComment(key string, defaultValue interface{}, setTo interface{}) bool {
if key == "nvidia-container-cli.user" && !getCommentedUserGroup() {
if key == "nvidia-container-cli.user" && defaultValue == setTo && isSuse() {
return false
}
if key == "nvidia-container-runtime.debug" && setTo == "/dev/null" {

View File

@@ -62,7 +62,7 @@ load-kmods = true
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc"]
runtimes = ["docker-runc", "runc", "crun"]
[nvidia-container-runtime.modes]

View File

@@ -23,7 +23,8 @@ import (
)
/*
#cgo LDFLAGS: -Wl,--unresolved-symbols=ignore-in-object-files
#cgo linux LDFLAGS: -Wl,--export-dynamic -Wl,--unresolved-symbols=ignore-in-object-files
#cgo darwin LDFLAGS: -Wl,-undefined,dynamic_lookup
#ifdef _WIN32
#define CUDAAPI __stdcall

View File

@@ -30,8 +30,8 @@ type filtered struct {
filter Filter
}
// newFilteredDisoverer creates a discoverer that applies the specified filter to the returned entities of the discoverer
func newFilteredDisoverer(logger logger.Interface, applyTo Discover, filter Filter) Discover {
// newFilteredDiscoverer creates a discoverer that applies the specified filter to the returned entities of the discoverer
func newFilteredDiscoverer(logger logger.Interface, applyTo Discover, filter Filter) Discover {
return filtered{
Discover: applyTo,
logger: logger,

View File

@@ -0,0 +1,27 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import "github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
func NewGDRCopyDiscoverer(logger logger.Interface, devRoot string) (Discover, error) {
return NewCharDeviceDiscoverer(
logger,
devRoot,
[]string{"/dev/gdrdrv"},
), nil
}

View File

@@ -31,7 +31,7 @@ import (
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup/root"
)
// NewDRMNodesDiscoverer returns a discoverrer for the DRM device nodes associated with the specified visible devices.
// NewDRMNodesDiscoverer returns a discoverer for the DRM device nodes associated with the specified visible devices.
//
// TODO: The logic for creating DRM devices should be consolidated between this
// and the logic for generating CDI specs for a single device. This is only used
@@ -61,11 +61,7 @@ func NewGraphicsMountsDiscoverer(logger logger.Interface, driver *root.Driver, n
jsonMounts := NewMounts(
logger,
lookup.NewFileLocator(
lookup.WithLogger(logger),
lookup.WithRoot(driver.Root),
lookup.WithSearchPaths("/etc", "/usr/share"),
),
driver.Configs(),
driver.Root,
[]string{
"glvnd/egl_vendor.d/10_nvidia.json",
@@ -140,7 +136,7 @@ func (d drmDevicesByPath) Hooks() ([]Hook, error) {
return []Hook{hook}, nil
}
// getSpecificLinkArgs returns the required specic links that need to be created
// getSpecificLinkArgs returns the required specific links that need to be created
func (d drmDevicesByPath) getSpecificLinkArgs(devices []Device) ([]string, error) {
selectedDevices := make(map[string]bool)
for _, d := range devices {
@@ -185,13 +181,13 @@ func newDRMDeviceDiscoverer(logger logger.Interface, devices image.VisibleDevice
},
)
filter, err := newDRMDeviceFilter(logger, devices, devRoot)
filter, err := newDRMDeviceFilter(devices, devRoot)
if err != nil {
return nil, fmt.Errorf("failed to construct DRM device filter: %v", err)
}
// We return a discoverer that applies the DRM device filter created above to all discovered DRM device nodes.
d := newFilteredDisoverer(
d := newFilteredDiscoverer(
logger,
allDevices,
filter,
@@ -201,7 +197,7 @@ func newDRMDeviceDiscoverer(logger logger.Interface, devices image.VisibleDevice
}
// newDRMDeviceFilter creates a filter that matches DRM devices nodes for the visible devices.
func newDRMDeviceFilter(logger logger.Interface, devices image.VisibleDevices, devRoot string) (Filter, error) {
func newDRMDeviceFilter(devices image.VisibleDevices, devRoot string) (Filter, error) {
gpuInformationPaths, err := proc.GetInformationFilePaths(devRoot)
if err != nil {
return nil, fmt.Errorf("failed to read GPU information: %v", err)
@@ -290,20 +286,16 @@ func newXorgDiscoverer(logger logger.Interface, driver *root.Driver, nvidiaCTKPa
nvidiaCTKPath: nvidiaCTKPath,
}
xorgConfg := NewMounts(
xorgConfig := NewMounts(
logger,
lookup.NewFileLocator(
lookup.WithLogger(logger),
lookup.WithRoot(driver.Root),
lookup.WithSearchPaths("/usr/share"),
),
driver.Configs(),
driver.Root,
[]string{"X11/xorg.conf.d/10-nvidia.conf"},
)
d := Merge(
xorgLibs,
xorgConfg,
xorgConfig,
xorgHooks,
)

View File

@@ -25,10 +25,11 @@ import (
)
// NewLDCacheUpdateHook creates a discoverer that updates the ldcache for the specified mounts. A logger can also be specified
func NewLDCacheUpdateHook(logger logger.Interface, mounts Discover, nvidiaCTKPath string) (Discover, error) {
func NewLDCacheUpdateHook(logger logger.Interface, mounts Discover, nvidiaCTKPath, ldconfigPath string) (Discover, error) {
d := ldconfig{
logger: logger,
nvidiaCTKPath: nvidiaCTKPath,
ldconfigPath: ldconfigPath,
mountsFrom: mounts,
}
@@ -39,6 +40,7 @@ type ldconfig struct {
None
logger logger.Interface
nvidiaCTKPath string
ldconfigPath string
mountsFrom Discover
}
@@ -50,14 +52,20 @@ func (d ldconfig) Hooks() ([]Hook, error) {
}
h := CreateLDCacheUpdateHook(
d.nvidiaCTKPath,
d.ldconfigPath,
getLibraryPaths(mounts),
)
return []Hook{h}, nil
}
// CreateLDCacheUpdateHook locates the NVIDIA Container Toolkit CLI and creates a hook for updating the LD Cache
func CreateLDCacheUpdateHook(executable string, libraries []string) Hook {
func CreateLDCacheUpdateHook(executable string, ldconfig string, libraries []string) Hook {
var args []string
if ldconfig != "" {
args = append(args, "--ldconfig-path", ldconfig)
}
for _, f := range uniqueFolders(libraries) {
args = append(args, "--folder", f)
}
@@ -69,7 +77,6 @@ func CreateLDCacheUpdateHook(executable string, libraries []string) Hook {
)
return hook
}
// getLibraryPaths extracts the library dirs from the specified mounts
@@ -86,7 +93,6 @@ func getLibraryPaths(mounts []Mount) []string {
// isLibName checks if the specified filename is a library (i.e. ends in `.so*`)
func isLibName(filename string) bool {
base := filepath.Base(filename)
isLib, err := filepath.Match("lib?*.so*", base)

View File

@@ -26,6 +26,7 @@ import (
const (
testNvidiaCTKPath = "/foo/bar/nvidia-ctk"
testLdconfigPath = "/bar/baz/ldconfig"
)
func TestLDCacheUpdateHook(t *testing.T) {
@@ -33,6 +34,7 @@ func TestLDCacheUpdateHook(t *testing.T) {
testCases := []struct {
description string
ldconfigPath string
mounts []Mount
mountError error
expectedError error
@@ -75,6 +77,11 @@ func TestLDCacheUpdateHook(t *testing.T) {
},
expectedArgs: []string{"nvidia-ctk", "hook", "update-ldcache", "--folder", "/usr/local/lib"},
},
{
description: "explicit ldconfig path is passed",
ldconfigPath: testLdconfigPath,
expectedArgs: []string{"nvidia-ctk", "hook", "update-ldcache", "--ldconfig-path", testLdconfigPath},
},
}
for _, tc := range testCases {
@@ -90,7 +97,7 @@ func TestLDCacheUpdateHook(t *testing.T) {
Lifecycle: "createContainer",
}
d, err := NewLDCacheUpdateHook(logger, mountMock, testNvidiaCTKPath)
d, err := NewLDCacheUpdateHook(logger, mountMock, testNvidiaCTKPath, tc.ldconfigPath)
require.NoError(t, err)
hooks, err := d.Hooks()
@@ -114,10 +121,8 @@ func TestLDCacheUpdateHook(t *testing.T) {
mounts, err := d.Mounts()
require.NoError(t, err)
require.Empty(t, mounts)
})
}
}
func TestIsLibName(t *testing.T) {

View File

@@ -41,14 +41,17 @@ static const char * const dxcore_nvidia_driver_store_components[] = {
*/
struct dxcore_enumAdapters2;
struct dxcore_enumAdapters3;
struct dxcore_queryAdapterInfo;
typedef int(*pfnDxcoreEnumAdapters2)(struct dxcore_enumAdapters2* pParams);
typedef int(*pfnDxcoreEnumAdapters3)(struct dxcore_enumAdapters3* pParams);
typedef int(*pfnDxcoreQueryAdapterInfo)(struct dxcore_queryAdapterInfo* pParams);
struct dxcore_lib {
void* hDxcoreLib;
pfnDxcoreEnumAdapters2 pDxcoreEnumAdapters2;
pfnDxcoreEnumAdapters3 pDxcoreEnumAdapters3;
pfnDxcoreQueryAdapterInfo pDxcoreQueryAdapterInfo;
};
@@ -66,6 +69,15 @@ struct dxcore_enumAdapters2
struct dxcore_adapterInfo *pAdapters;
};
#define ENUMADAPTER3_FILTER_COMPUTE_ONLY (0x0000000000000001)
struct dxcore_enumAdapters3
{
unsigned long long Filter;
unsigned int NumAdapters;
struct dxcore_adapterInfo *pAdapters;
};
enum dxcore_kmtqueryAdapterInfoType
{
DXCORE_QUERYDRIVERVERSION = 13,
@@ -239,7 +251,37 @@ static void dxcore_add_adapter(struct dxcore_context* pCtx, struct dxcore_lib* p
log_infof("Adding new adapter via dxcore hAdapter:%x luid:%llx wddm version:%d", pAdapterInfo->hAdapter, *((unsigned long long*)&pAdapterInfo->AdapterLuid), wddmVersion);
}
static void dxcore_enum_adapters(struct dxcore_context* pCtx, struct dxcore_lib* pLib)
static int dxcore_enum_adapters3(struct dxcore_context* pCtx, struct dxcore_lib* pLib)
{
struct dxcore_enumAdapters3 params = {0};
unsigned int adapterIndex = 0;
// Include compute-only in addition to display+compute adapters
params.Filter = ENUMADAPTER3_FILTER_COMPUTE_ONLY;
params.NumAdapters = 0;
params.pAdapters = NULL;
if (pLib->pDxcoreEnumAdapters3(&params)) {
log_err("Failed to enumerate adapters via enumAdapers3");
return 1;
}
params.pAdapters = malloc(sizeof(struct dxcore_adapterInfo) * params.NumAdapters);
if (pLib->pDxcoreEnumAdapters3(&params)) {
free(params.pAdapters);
log_err("Failed to enumerate adapters via enumAdapers3");
return 1;
}
for (adapterIndex = 0; adapterIndex < params.NumAdapters; adapterIndex++) {
dxcore_add_adapter(pCtx, pLib, &params.pAdapters[adapterIndex]);
}
free(params.pAdapters);
return 0;
}
static int dxcore_enum_adapters2(struct dxcore_context* pCtx, struct dxcore_lib* pLib)
{
struct dxcore_enumAdapters2 params = {0};
unsigned int adapterIndex = 0;
@@ -248,15 +290,15 @@ static void dxcore_enum_adapters(struct dxcore_context* pCtx, struct dxcore_lib*
params.pAdapters = NULL;
if (pLib->pDxcoreEnumAdapters2(&params)) {
log_err("Failed to enumerate adapters via dxcore");
return;
log_err("Failed to enumerate adapters via enumAdapters2");
return 1;
}
params.pAdapters = malloc(sizeof(struct dxcore_adapterInfo) * params.NumAdapters);
if (pLib->pDxcoreEnumAdapters2(&params)) {
free(params.pAdapters);
log_err("Failed to enumerate adapters via dxcore");
return;
log_err("Failed to enumerate adapters via enumAdapters2");
return 1;
}
for (adapterIndex = 0; adapterIndex < params.NumAdapters; adapterIndex++) {
@@ -264,6 +306,27 @@ static void dxcore_enum_adapters(struct dxcore_context* pCtx, struct dxcore_lib*
}
free(params.pAdapters);
return 0;
}
static void dxcore_enum_adapters(struct dxcore_context* pCtx, struct dxcore_lib* pLib)
{
int status;
if (pLib->pDxcoreEnumAdapters3) {
status = dxcore_enum_adapters3(pCtx, pLib);
if (status == 0) {
return;
}
}
// Fall back to EnumAdapters2 if the OS doesn't support EnumAdapters3
if (pLib->pDxcoreEnumAdapters2) {
status = dxcore_enum_adapters2(pCtx, pLib);
if (status == 0) {
return;
}
}
log_err("Failed to enumerate adapters via dxcore");
}
int dxcore_init_context(struct dxcore_context* pCtx)
@@ -280,8 +343,9 @@ int dxcore_init_context(struct dxcore_context* pCtx)
}
lib.pDxcoreEnumAdapters2 = (pfnDxcoreEnumAdapters2)dlsym(lib.hDxcoreLib, "D3DKMTEnumAdapters2");
if (!lib.pDxcoreEnumAdapters2) {
log_err("dxcore library is present but the symbol D3DKMTEnumAdapters2 is missing");
lib.pDxcoreEnumAdapters3 = (pfnDxcoreEnumAdapters3)dlsym(lib.hDxcoreLib, "D3DKMTEnumAdapters3");
if (!lib.pDxcoreEnumAdapters2 && !lib.pDxcoreEnumAdapters3) {
log_err("dxcore library is present but the symbols D3DKMTEnumAdapters2 and D3DKMTEnumAdapters3 are missing");
goto error;
}

View File

@@ -17,7 +17,9 @@
package dxcore
/*
#cgo LDFLAGS: -Wl,--unresolved-symbols=ignore-in-object-files
#cgo linux LDFLAGS: -Wl,--export-dynamic -Wl,--unresolved-symbols=ignore-in-object-files
#cgo darwin LDFLAGS: -Wl,-undefined,dynamic_lookup
#include <dxcore.h>
*/
import "C"

View File

@@ -22,7 +22,7 @@ import (
"github.com/NVIDIA/go-nvlib/pkg/nvlib/device"
"github.com/NVIDIA/go-nvlib/pkg/nvlib/info"
"github.com/NVIDIA/go-nvlib/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml"
)
// additionalInfo allows for the info.Interface to be extened to implement the infoInterface.

View File

@@ -20,7 +20,8 @@ import (
"testing"
"github.com/NVIDIA/go-nvlib/pkg/nvlib/device"
"github.com/NVIDIA/go-nvlib/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml/mock"
"github.com/stretchr/testify/require"
)
@@ -32,7 +33,7 @@ func TestUsesNVGPUModule(t *testing.T) {
}{
{
description: "init failure returns false",
nvmllib: &nvml.InterfaceMock{
nvmllib: &mock.Interface{
InitFunc: func() nvml.Return {
return nvml.ERROR_LIBRARY_NOT_FOUND
},
@@ -41,7 +42,7 @@ func TestUsesNVGPUModule(t *testing.T) {
},
{
description: "no devices returns false",
nvmllib: &nvml.InterfaceMock{
nvmllib: &mock.Interface{
InitFunc: func() nvml.Return {
return nvml.SUCCESS
},
@@ -56,7 +57,7 @@ func TestUsesNVGPUModule(t *testing.T) {
},
{
description: "DeviceGetCount error returns false",
nvmllib: &nvml.InterfaceMock{
nvmllib: &mock.Interface{
InitFunc: func() nvml.Return {
return nvml.SUCCESS
},
@@ -71,7 +72,7 @@ func TestUsesNVGPUModule(t *testing.T) {
},
{
description: "Failure to get device name returns false",
nvmllib: &nvml.InterfaceMock{
nvmllib: &mock.Interface{
InitFunc: func() nvml.Return {
return nvml.SUCCESS
},
@@ -82,7 +83,7 @@ func TestUsesNVGPUModule(t *testing.T) {
return 1, nvml.SUCCESS
},
DeviceGetHandleByIndexFunc: func(index int) (nvml.Device, nvml.Return) {
device := &nvml.DeviceMock{
device := &mock.Device{
GetNameFunc: func() (string, nvml.Return) {
return "", nvml.ERROR_UNKNOWN
},
@@ -94,7 +95,7 @@ func TestUsesNVGPUModule(t *testing.T) {
},
{
description: "nested panic returns false",
nvmllib: &nvml.InterfaceMock{
nvmllib: &mock.Interface{
InitFunc: func() nvml.Return {
return nvml.SUCCESS
},
@@ -105,7 +106,7 @@ func TestUsesNVGPUModule(t *testing.T) {
return 1, nvml.SUCCESS
},
DeviceGetHandleByIndexFunc: func(index int) (nvml.Device, nvml.Return) {
device := &nvml.DeviceMock{
device := &mock.Device{
GetNameFunc: func() (string, nvml.Return) {
panic("deep panic")
},
@@ -117,7 +118,7 @@ func TestUsesNVGPUModule(t *testing.T) {
},
{
description: "Single device name with no nvgpu",
nvmllib: &nvml.InterfaceMock{
nvmllib: &mock.Interface{
InitFunc: func() nvml.Return {
return nvml.SUCCESS
},
@@ -128,7 +129,7 @@ func TestUsesNVGPUModule(t *testing.T) {
return 1, nvml.SUCCESS
},
DeviceGetHandleByIndexFunc: func(index int) (nvml.Device, nvml.Return) {
device := &nvml.DeviceMock{
device := &mock.Device{
GetNameFunc: func() (string, nvml.Return) {
return "NVIDIA A100-SXM4-40GB", nvml.SUCCESS
},
@@ -140,7 +141,7 @@ func TestUsesNVGPUModule(t *testing.T) {
},
{
description: "Single device name with nvgpu",
nvmllib: &nvml.InterfaceMock{
nvmllib: &mock.Interface{
InitFunc: func() nvml.Return {
return nvml.SUCCESS
},
@@ -151,7 +152,7 @@ func TestUsesNVGPUModule(t *testing.T) {
return 1, nvml.SUCCESS
},
DeviceGetHandleByIndexFunc: func(index int) (nvml.Device, nvml.Return) {
device := &nvml.DeviceMock{
device := &mock.Device{
GetNameFunc: func() (string, nvml.Return) {
return "Orin (nvgpu)", nvml.SUCCESS
},
@@ -163,7 +164,7 @@ func TestUsesNVGPUModule(t *testing.T) {
},
{
description: "Multiple device names with no nvgpu",
nvmllib: &nvml.InterfaceMock{
nvmllib: &mock.Interface{
InitFunc: func() nvml.Return {
return nvml.SUCCESS
},
@@ -174,7 +175,7 @@ func TestUsesNVGPUModule(t *testing.T) {
return 2, nvml.SUCCESS
},
DeviceGetHandleByIndexFunc: func(index int) (nvml.Device, nvml.Return) {
device := &nvml.DeviceMock{
device := &mock.Device{
GetNameFunc: func() (string, nvml.Return) {
return "NVIDIA A100-SXM4-40GB", nvml.SUCCESS
},
@@ -186,7 +187,7 @@ func TestUsesNVGPUModule(t *testing.T) {
},
{
description: "Multiple device names with nvgpu",
nvmllib: &nvml.InterfaceMock{
nvmllib: &mock.Interface{
InitFunc: func() nvml.Return {
return nvml.SUCCESS
},
@@ -197,7 +198,7 @@ func TestUsesNVGPUModule(t *testing.T) {
return 2, nvml.SUCCESS
},
DeviceGetHandleByIndexFunc: func(index int) (nvml.Device, nvml.Return) {
device := &nvml.DeviceMock{
device := &mock.Device{
GetNameFunc: func() (string, nvml.Return) {
return "Orin (nvgpu)", nvml.SUCCESS
},
@@ -209,7 +210,7 @@ func TestUsesNVGPUModule(t *testing.T) {
},
{
description: "Mixed device names",
nvmllib: &nvml.InterfaceMock{
nvmllib: &mock.Interface{
InitFunc: func() nvml.Return {
return nvml.SUCCESS
},
@@ -226,7 +227,7 @@ func TestUsesNVGPUModule(t *testing.T) {
} else {
deviceName = "Orin (nvgpu)"
}
device := &nvml.DeviceMock{
device := &mock.Device{
GetNameFunc: func() (string, nvml.Return) {
return deviceName, nvml.SUCCESS
},

View File

@@ -19,24 +19,15 @@ package info
import (
"github.com/NVIDIA/go-nvlib/pkg/nvlib/device"
"github.com/NVIDIA/go-nvlib/pkg/nvlib/info"
"github.com/NVIDIA/go-nvlib/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
// infoInterface provides an alias for mocking.
//
//go:generate moq -stub -out info-interface_mock.go . infoInterface
type infoInterface interface {
info.Interface
// UsesNVGPUModule indicates whether the system is using the nvgpu kernel module
UsesNVGPUModule() (bool, string)
}
type resolver struct {
logger logger.Interface
info infoInterface
info info.PropertyExtractor
}
// ResolveAutoMode determines the correct mode for the platform if set to "auto"
@@ -63,6 +54,7 @@ func ResolveAutoMode(logger logger.Interface, mode string, image image.CUDA) (rm
// resolveMode determines the correct mode for the platform if set to "auto"
func (r resolver) resolveMode(mode string, image image.CUDA) (rmode string) {
if mode != "auto" {
r.logger.Infof("Using requested mode '%s'", mode)
return mode
}
defer func() {
@@ -73,13 +65,13 @@ func (r resolver) resolveMode(mode string, image image.CUDA) (rmode string) {
return "cdi"
}
isTegra, reason := r.info.IsTegraSystem()
isTegra, reason := r.info.HasTegraFiles()
r.logger.Debugf("Is Tegra-based system? %v: %v", isTegra, reason)
hasNVML, reason := r.info.HasNvml()
r.logger.Debugf("Has NVML? %v: %v", hasNVML, reason)
usesNVGPUModule, reason := r.info.UsesNVGPUModule()
usesNVGPUModule, reason := r.info.UsesOnlyNVGPUModule()
r.logger.Debugf("Uses nvgpu kernel module? %v: %v", usesNVGPUModule, reason)
if (isTegra && !hasNVML) || usesNVGPUModule {

View File

@@ -19,6 +19,7 @@ package info
import (
"testing"
"github.com/NVIDIA/go-nvlib/pkg/nvlib/info"
"github.com/opencontainers/runtime-spec/specs-go"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
@@ -202,14 +203,14 @@ func TestResolveAutoMode(t *testing.T) {
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
info := &infoInterfaceMock{
info := &info.PropertyExtractorMock{
HasNvmlFunc: func() (bool, string) {
return tc.info["nvml"], "nvml"
},
IsTegraSystemFunc: func() (bool, string) {
HasTegraFilesFunc: func() (bool, string) {
return tc.info["tegra"], "tegra"
},
UsesNVGPUModuleFunc: func() (bool, string) {
UsesOnlyNVGPUModuleFunc: func() (bool, string) {
return tc.info["nvgpu"], "nvgpu"
},
}

View File

@@ -1,194 +0,0 @@
// Code generated by moq; DO NOT EDIT.
// github.com/matryer/moq
package info
import (
"sync"
)
// Ensure, that infoInterfaceMock does implement infoInterface.
// If this is not the case, regenerate this file with moq.
var _ infoInterface = &infoInterfaceMock{}
// infoInterfaceMock is a mock implementation of infoInterface.
//
// func TestSomethingThatUsesinfoInterface(t *testing.T) {
//
// // make and configure a mocked infoInterface
// mockedinfoInterface := &infoInterfaceMock{
// HasDXCoreFunc: func() (bool, string) {
// panic("mock out the HasDXCore method")
// },
// HasNvmlFunc: func() (bool, string) {
// panic("mock out the HasNvml method")
// },
// IsTegraSystemFunc: func() (bool, string) {
// panic("mock out the IsTegraSystem method")
// },
// UsesNVGPUModuleFunc: func() (bool, string) {
// panic("mock out the UsesNVGPUModule method")
// },
// }
//
// // use mockedinfoInterface in code that requires infoInterface
// // and then make assertions.
//
// }
type infoInterfaceMock struct {
// HasDXCoreFunc mocks the HasDXCore method.
HasDXCoreFunc func() (bool, string)
// HasNvmlFunc mocks the HasNvml method.
HasNvmlFunc func() (bool, string)
// IsTegraSystemFunc mocks the IsTegraSystem method.
IsTegraSystemFunc func() (bool, string)
// UsesNVGPUModuleFunc mocks the UsesNVGPUModule method.
UsesNVGPUModuleFunc func() (bool, string)
// calls tracks calls to the methods.
calls struct {
// HasDXCore holds details about calls to the HasDXCore method.
HasDXCore []struct {
}
// HasNvml holds details about calls to the HasNvml method.
HasNvml []struct {
}
// IsTegraSystem holds details about calls to the IsTegraSystem method.
IsTegraSystem []struct {
}
// UsesNVGPUModule holds details about calls to the UsesNVGPUModule method.
UsesNVGPUModule []struct {
}
}
lockHasDXCore sync.RWMutex
lockHasNvml sync.RWMutex
lockIsTegraSystem sync.RWMutex
lockUsesNVGPUModule sync.RWMutex
}
// HasDXCore calls HasDXCoreFunc.
func (mock *infoInterfaceMock) HasDXCore() (bool, string) {
callInfo := struct {
}{}
mock.lockHasDXCore.Lock()
mock.calls.HasDXCore = append(mock.calls.HasDXCore, callInfo)
mock.lockHasDXCore.Unlock()
if mock.HasDXCoreFunc == nil {
var (
bOut bool
sOut string
)
return bOut, sOut
}
return mock.HasDXCoreFunc()
}
// HasDXCoreCalls gets all the calls that were made to HasDXCore.
// Check the length with:
//
// len(mockedinfoInterface.HasDXCoreCalls())
func (mock *infoInterfaceMock) HasDXCoreCalls() []struct {
} {
var calls []struct {
}
mock.lockHasDXCore.RLock()
calls = mock.calls.HasDXCore
mock.lockHasDXCore.RUnlock()
return calls
}
// HasNvml calls HasNvmlFunc.
func (mock *infoInterfaceMock) HasNvml() (bool, string) {
callInfo := struct {
}{}
mock.lockHasNvml.Lock()
mock.calls.HasNvml = append(mock.calls.HasNvml, callInfo)
mock.lockHasNvml.Unlock()
if mock.HasNvmlFunc == nil {
var (
bOut bool
sOut string
)
return bOut, sOut
}
return mock.HasNvmlFunc()
}
// HasNvmlCalls gets all the calls that were made to HasNvml.
// Check the length with:
//
// len(mockedinfoInterface.HasNvmlCalls())
func (mock *infoInterfaceMock) HasNvmlCalls() []struct {
} {
var calls []struct {
}
mock.lockHasNvml.RLock()
calls = mock.calls.HasNvml
mock.lockHasNvml.RUnlock()
return calls
}
// IsTegraSystem calls IsTegraSystemFunc.
func (mock *infoInterfaceMock) IsTegraSystem() (bool, string) {
callInfo := struct {
}{}
mock.lockIsTegraSystem.Lock()
mock.calls.IsTegraSystem = append(mock.calls.IsTegraSystem, callInfo)
mock.lockIsTegraSystem.Unlock()
if mock.IsTegraSystemFunc == nil {
var (
bOut bool
sOut string
)
return bOut, sOut
}
return mock.IsTegraSystemFunc()
}
// IsTegraSystemCalls gets all the calls that were made to IsTegraSystem.
// Check the length with:
//
// len(mockedinfoInterface.IsTegraSystemCalls())
func (mock *infoInterfaceMock) IsTegraSystemCalls() []struct {
} {
var calls []struct {
}
mock.lockIsTegraSystem.RLock()
calls = mock.calls.IsTegraSystem
mock.lockIsTegraSystem.RUnlock()
return calls
}
// UsesNVGPUModule calls UsesNVGPUModuleFunc.
func (mock *infoInterfaceMock) UsesNVGPUModule() (bool, string) {
callInfo := struct {
}{}
mock.lockUsesNVGPUModule.Lock()
mock.calls.UsesNVGPUModule = append(mock.calls.UsesNVGPUModule, callInfo)
mock.lockUsesNVGPUModule.Unlock()
if mock.UsesNVGPUModuleFunc == nil {
var (
bOut bool
sOut string
)
return bOut, sOut
}
return mock.UsesNVGPUModuleFunc()
}
// UsesNVGPUModuleCalls gets all the calls that were made to UsesNVGPUModule.
// Check the length with:
//
// len(mockedinfoInterface.UsesNVGPUModuleCalls())
func (mock *infoInterfaceMock) UsesNVGPUModuleCalls() []struct {
} {
var calls []struct {
}
mock.lockUsesNVGPUModule.RLock()
calls = mock.calls.UsesNVGPUModule
mock.lockUsesNVGPUModule.RUnlock()
return calls
}

View File

@@ -0,0 +1,62 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package devices
type builder struct {
asMap devices
filter func(string) bool
}
// New creates a new devices struct with the specified options.
func New(opts ...Option) Devices {
b := &builder{}
for _, opt := range opts {
opt(b)
}
if b.filter == nil {
b.filter = func(string) bool { return false }
}
devices := make(devices)
for k, v := range b.asMap {
if b.filter(string(k)) {
continue
}
devices[k] = v
}
return devices
}
type Option func(*builder)
// WithDeviceToMajor specifies an explicit device name to major number map.
func WithDeviceToMajor(deviceToMajor map[string]int) Option {
return func(b *builder) {
b.asMap = make(devices)
for name, major := range deviceToMajor {
b.asMap[Name(name)] = Major(major)
}
}
}
// WithFilter specifies a filter to exclude devices.
func WithFilter(filter func(string) bool) Option {
return func(b *builder) {
b.filter = filter
}
}

View File

@@ -33,7 +33,7 @@ const (
NVIDIAModesetMinor = 254
NVIDIAFrontend = Name("nvidia-frontend")
NVIDIAGPU = NVIDIAFrontend
NVIDIAGPU = Name("nvidia")
NVIDIACaps = Name("nvidia-caps")
NVIDIAUVM = Name("nvidia-uvm")
@@ -53,22 +53,43 @@ type Major int
type Devices interface {
Exists(Name) bool
Get(Name) (Major, bool)
Count() int
}
type devices map[Name]Major
var _ Devices = devices(nil)
// Count returns the number of devices defined.
func (d devices) Count() int {
return len(d)
}
// Exists checks if a Device with a given name exists or not
func (d devices) Exists(name Name) bool {
_, exists := d[name]
_, exists := d.Get(name)
return exists
}
// Get a Device from Devices
// Get a Device from Devices. It also has fallback logic to ensure device name changes in /proc/devices are handled
// For e.g:- For GPU drivers 550.40.x or greater, the gpu device has been renamed from "nvidia-frontend" to "nvidia".
func (d devices) Get(name Name) (Major, bool) {
device, exists := d[name]
return device, exists
for _, n := range name.getWithFallback() {
device, exists := d[n]
if exists {
return device, true
}
}
return 0, false
}
// getWithFallback returns a prioritised list of device names for a specific name.
// This allows multiple names to be associated with a single name to support various driver versions.
func (n Name) getWithFallback() []Name {
if n == NVIDIAGPU || n == NVIDIAFrontend {
return []Name{NVIDIAGPU, NVIDIAFrontend}
}
return []Name{n}
}
// GetNVIDIADevices returns the set of NVIDIA Devices on the machine
@@ -94,27 +115,23 @@ func nvidiaDevices(devicesPath string) (Devices, error) {
var errNoNvidiaDevices = errors.New("no NVIDIA devices found")
func nvidiaDeviceFrom(reader io.Reader) (devices, error) {
func nvidiaDeviceFrom(reader io.Reader) (Devices, error) {
allDevices := devicesFrom(reader)
nvidiaDevices := make(devices)
var hasNvidiaDevices bool
for n, d := range allDevices {
if !strings.HasPrefix(string(n), nvidiaDevicePrefix) {
continue
}
nvidiaDevices[n] = d
hasNvidiaDevices = true
}
if !hasNvidiaDevices {
nvidiaDevices := New(
WithDeviceToMajor(allDevices),
WithFilter(func(n string) bool {
return !strings.HasPrefix(n, nvidiaDevicePrefix)
}),
)
if nvidiaDevices.Count() == 0 {
return nil, errNoNvidiaDevices
}
return nvidiaDevices, nil
}
func devicesFrom(reader io.Reader) devices {
allDevices := make(devices)
func devicesFrom(reader io.Reader) map[string]int {
allDevices := make(map[string]int)
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
device, major, err := processProcDeviceLine(scanner.Text())
@@ -126,11 +143,11 @@ func devicesFrom(reader io.Reader) devices {
return allDevices
}
func processProcDeviceLine(line string) (Name, Major, error) {
func processProcDeviceLine(line string) (string, int, error) {
trimmed := strings.TrimSpace(line)
var name Name
var major Major
var name string
var major int
n, _ := fmt.Sscanf(trimmed, "%d %s", &major, &name)
if n == 2 {

View File

@@ -17,6 +17,9 @@ var _ Devices = &DevicesMock{}
//
// // make and configure a mocked Devices
// mockedDevices := &DevicesMock{
// CountFunc: func() int {
// panic("mock out the Count method")
// },
// ExistsFunc: func(name Name) bool {
// panic("mock out the Exists method")
// },
@@ -30,6 +33,9 @@ var _ Devices = &DevicesMock{}
//
// }
type DevicesMock struct {
// CountFunc mocks the Count method.
CountFunc func() int
// ExistsFunc mocks the Exists method.
ExistsFunc func(name Name) bool
@@ -38,6 +44,9 @@ type DevicesMock struct {
// calls tracks calls to the methods.
calls struct {
// Count holds details about calls to the Count method.
Count []struct {
}
// Exists holds details about calls to the Exists method.
Exists []struct {
// Name is the name argument value.
@@ -49,10 +58,41 @@ type DevicesMock struct {
Name Name
}
}
lockCount sync.RWMutex
lockExists sync.RWMutex
lockGet sync.RWMutex
}
// Count calls CountFunc.
func (mock *DevicesMock) Count() int {
callInfo := struct {
}{}
mock.lockCount.Lock()
mock.calls.Count = append(mock.calls.Count, callInfo)
mock.lockCount.Unlock()
if mock.CountFunc == nil {
var (
nOut int
)
return nOut
}
return mock.CountFunc()
}
// CountCalls gets all the calls that were made to Count.
// Check the length with:
//
// len(mockedDevices.CountCalls())
func (mock *DevicesMock) CountCalls() []struct {
} {
var calls []struct {
}
mock.lockCount.RLock()
calls = mock.calls.Count
mock.lockCount.RUnlock()
return calls
}
// Exists calls ExistsFunc.
func (mock *DevicesMock) Exists(name Name) bool {
callInfo := struct {

View File

@@ -25,22 +25,46 @@ import (
)
func TestNvidiaDevices(t *testing.T) {
devices := map[Name]Major{
"nvidia-frontend": 195,
"nvidia-nvlink": 234,
"nvidia-caps": 235,
"nvidia-uvm": 510,
"nvidia-nvswitch": 511,
perDriverDeviceMaps := map[string]map[string]int{
"pre550": {
"nvidia-frontend": 195,
"nvidia-nvlink": 234,
"nvidia-caps": 235,
"nvidia-uvm": 510,
"nvidia-nvswitch": 511,
},
"post550": {
"nvidia": 195,
"nvidia-nvlink": 234,
"nvidia-caps": 235,
"nvidia-uvm": 510,
"nvidia-nvswitch": 511,
},
}
nvidiaDevices := testDevices(devices)
for name, major := range devices {
device, exists := nvidiaDevices.Get(name)
require.True(t, exists, "Unexpected missing device")
require.Equal(t, device, major, "Unexpected device major")
for k, devices := range perDriverDeviceMaps {
nvidiaDevices := New(WithDeviceToMajor(devices))
t.Run(k, func(t *testing.T) {
// Each of the expected devices needs to exist.
for name, major := range devices {
device, exists := nvidiaDevices.Get(Name(name))
require.True(t, exists)
require.Equal(t, device, Major(major))
}
// An unexpected device cannot exist
_, exists := nvidiaDevices.Get("bogus")
require.False(t, exists)
// Regardles of the driver version, the nvidia and nvidia-frontend
// names are supported and have the same value.
nvidia, exists := nvidiaDevices.Get(NVIDIAGPU)
require.True(t, exists)
nvidiaFrontend, exists := nvidiaDevices.Get(NVIDIAFrontend)
require.True(t, exists)
require.Equal(t, nvidia, nvidiaFrontend)
})
}
_, exists := nvidiaDevices.Get("bogus")
require.False(t, exists, "Unexpected 'bogus' device found")
}
func TestProcessDeviceFile(t *testing.T) {
@@ -52,6 +76,7 @@ func TestProcessDeviceFile(t *testing.T) {
{lines: []string{}, expectedError: errNoNvidiaDevices},
{lines: []string{"Not a valid line:"}, expectedError: errNoNvidiaDevices},
{lines: []string{"195 nvidia-frontend"}, expected: devices{"nvidia-frontend": 195}},
{lines: []string{"195 nvidia"}, expected: devices{"nvidia": 195}},
{lines: []string{"195 nvidia-frontend", "235 nvidia-caps"}, expected: devices{"nvidia-frontend": 195, "nvidia-caps": 235}},
{lines: []string{" 195 nvidia-frontend"}, expected: devices{"nvidia-frontend": 195}},
{lines: []string{"Not a valid line:", "", "195 nvidia-frontend"}, expected: devices{"nvidia-frontend": 195}},
@@ -63,7 +88,10 @@ func TestProcessDeviceFile(t *testing.T) {
d, err := nvidiaDeviceFrom(contents)
require.ErrorIs(t, err, tc.expectedError)
require.EqualValues(t, tc.expected, d)
if tc.expectedError == nil {
require.EqualValues(t, tc.expected, d.(devices))
}
})
}
}
@@ -71,8 +99,8 @@ func TestProcessDeviceFile(t *testing.T) {
func TestProcessDeviceFileLine(t *testing.T) {
testCases := []struct {
line string
name Name
major Major
name string
major int
err bool
}{
{"", "", 0, true},
@@ -97,8 +125,3 @@ func TestProcessDeviceFileLine(t *testing.T) {
})
}
}
// testDevices creates a set of test NVIDIA devices
func testDevices(d map[Name]Major) Devices {
return devices(d)
}

View File

@@ -0,0 +1,45 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package root
import "github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
type Option func(*Driver)
func WithLogger(logger logger.Interface) Option {
return func(d *Driver) {
d.logger = logger
}
}
func WithDriverRoot(root string) Option {
return func(d *Driver) {
d.Root = root
}
}
func WithLibrarySearchPaths(paths ...string) Option {
return func(d *Driver) {
d.librarySearchPaths = paths
}
}
func WithConfigSearchPaths(paths ...string) Option {
return func(d *Driver) {
d.configSearchPaths = paths
}
}

View File

@@ -17,6 +17,7 @@
package root
import (
"os"
"path/filepath"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
@@ -30,27 +31,55 @@ type Driver struct {
Root string
// librarySearchPaths specifies explicit search paths for discovering libraries.
librarySearchPaths []string
// configSearchPaths specified explicit search paths for discovering driver config files.
configSearchPaths []string
}
// New creates a new Driver root at the specified path.
// TODO: Use functional options here.
func New(logger logger.Interface, path string, librarySearchPaths []string) *Driver {
return &Driver{
logger: logger,
Root: path,
librarySearchPaths: normalizeSearchPaths(librarySearchPaths...),
// New creates a new Driver root using the specified options.
func New(opts ...Option) *Driver {
d := &Driver{}
for _, opt := range opts {
opt(d)
}
if d.logger == nil {
d.logger = logger.New()
}
return d
}
// Drivers returns a Locator for driver libraries.
// Libraries returns a Locator for driver libraries.
func (r *Driver) Libraries() lookup.Locator {
return lookup.NewLibraryLocator(
lookup.WithLogger(r.logger),
lookup.WithRoot(r.Root),
lookup.WithSearchPaths(r.librarySearchPaths...),
lookup.WithSearchPaths(normalizeSearchPaths(r.librarySearchPaths...)...),
)
}
// Configs returns a locator for driver configs.
// If configSearchPaths is specified, these paths are used as absolute paths,
// otherwise, /etc and /usr/share are searched.
func (r *Driver) Configs() lookup.Locator {
return lookup.NewFileLocator(r.configSearchOptions()...)
}
func (r *Driver) configSearchOptions() []lookup.Option {
if len(r.configSearchPaths) > 0 {
return []lookup.Option{
lookup.WithLogger(r.logger),
lookup.WithRoot("/"),
lookup.WithSearchPaths(normalizeSearchPaths(r.configSearchPaths...)...),
}
}
searchPaths := []string{"/etc"}
searchPaths = append(searchPaths, xdgDataDirs()...)
return []lookup.Option{
lookup.WithLogger(r.logger),
lookup.WithRoot(r.Root),
lookup.WithSearchPaths(searchPaths...),
}
}
// normalizeSearchPaths takes a list of paths and normalized these.
// Each of the elements in the list is expanded if it is a path list and the
// resultant list is returned.
@@ -63,3 +92,13 @@ func normalizeSearchPaths(paths ...string) []string {
}
return normalized
}
// xdgDataDirs finds the paths as specified in the environment variable XDG_DATA_DIRS.
// See https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html.
func xdgDataDirs() []string {
if dirs, exists := os.LookupEnv("XDG_DATA_DIRS"); exists && dirs != "" {
return normalizeSearchPaths(dirs)
}
return []string{"/usr/local/share", "/usr/share"}
}

View File

@@ -50,7 +50,12 @@ func NewCDIModifier(logger logger.Interface, cfg *config.Config, ociSpec oci.Spe
return nil, fmt.Errorf("requesting a CDI device with vendor 'runtime.nvidia.com' is not supported when requesting other CDI devices")
}
if len(automaticDevices) > 0 {
return newAutomaticCDISpecModifier(logger, cfg, automaticDevices)
automaticModifier, err := newAutomaticCDISpecModifier(logger, cfg, automaticDevices)
if err == nil {
return automaticModifier, nil
}
logger.Warningf("Failed to create the automatic CDI modifier: %w", err)
logger.Debugf("Falling back to the standard CDI modifier")
}
return cdi.New(
@@ -152,7 +157,8 @@ func getAnnotationDevices(prefixes []string, annotations map[string]string) ([]s
func filterAutomaticDevices(devices []string) []string {
var automatic []string
for _, device := range devices {
if device == "runtime.nvidia.com/gpu=all" {
vendor, class, _ := parser.ParseDevice(device)
if vendor == "runtime.nvidia.com" && class == "gpu" {
automatic = append(automatic, device)
}
}
@@ -176,9 +182,6 @@ func newAutomaticCDISpecModifier(logger logger.Interface, cfg *config.Config, de
return cdiModifier, nil
}
// TODO: use the requested devices when generating the CDI spec once we add
// automatic CDI generation for more than just the 'runtime.nvidia.com/gpu=all'
// device
func generateAutomaticCDISpec(logger logger.Interface, cfg *config.Config, devices []string) (spec.Interface, error) {
cdilib, err := nvcdi.New(
nvcdi.WithLogger(logger),
@@ -191,5 +194,26 @@ func generateAutomaticCDISpec(logger logger.Interface, cfg *config.Config, devic
return nil, fmt.Errorf("failed to construct CDI library: %w", err)
}
return cdilib.GetSpec()
identifiers := []string{}
for _, device := range devices {
_, _, id := parser.ParseDevice(device)
identifiers = append(identifiers, id)
}
deviceSpecs, err := cdilib.GetDeviceSpecsByID(identifiers...)
if err != nil {
return nil, fmt.Errorf("failed to get CDI device specs: %w", err)
}
commonEdits, err := cdilib.GetCommonEdits()
if err != nil {
return nil, fmt.Errorf("failed to get common CDI spec edits: %w", err)
}
return spec.New(
spec.WithDeviceSpecs(deviceSpecs),
spec.WithEdits(*commonEdits.ContainerEdits),
spec.WithVendor("runtime.nvidia.com"),
spec.WithClass("gpu"),
)
}

View File

@@ -26,18 +26,13 @@ import (
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
)
const (
nvidiaGDSEnvvar = "NVIDIA_GDS"
nvidiaMOFEDEnvvar = "NVIDIA_MOFED"
nvidiaNVSWITCHEnvvar = "NVIDIA_NVSWITCH"
)
// NewFeatureGatedModifier creates the modifiers for optional features.
// These include:
//
// NVIDIA_GDS=enabled
// NVIDIA_MOFED=enabled
// NVIDIA_NVSWITCH=enabled
// NVIDIA_GDRCOPY=enabled
//
// If not devices are selected, no changes are made.
func NewFeatureGatedModifier(logger logger.Interface, cfg *config.Config, image image.CUDA) (oci.SpecModifier, error) {
@@ -51,7 +46,7 @@ func NewFeatureGatedModifier(logger logger.Interface, cfg *config.Config, image
driverRoot := cfg.NVIDIAContainerCLIConfig.Root
devRoot := cfg.NVIDIAContainerCLIConfig.Root
if image.Getenv(nvidiaGDSEnvvar) == "enabled" {
if cfg.Features.IsEnabled(config.FeatureGDS, image) {
d, err := discover.NewGDSDiscoverer(logger, driverRoot, devRoot)
if err != nil {
return nil, fmt.Errorf("failed to construct discoverer for GDS devices: %w", err)
@@ -59,7 +54,7 @@ func NewFeatureGatedModifier(logger logger.Interface, cfg *config.Config, image
discoverers = append(discoverers, d)
}
if image.Getenv(nvidiaMOFEDEnvvar) == "enabled" {
if cfg.Features.IsEnabled(config.FeatureMOFED, image) {
d, err := discover.NewMOFEDDiscoverer(logger, devRoot)
if err != nil {
return nil, fmt.Errorf("failed to construct discoverer for MOFED devices: %w", err)
@@ -67,7 +62,7 @@ func NewFeatureGatedModifier(logger logger.Interface, cfg *config.Config, image
discoverers = append(discoverers, d)
}
if image.Getenv(nvidiaNVSWITCHEnvvar) == "enabled" {
if cfg.Features.IsEnabled(config.FeatureNVSWITCH, image) {
d, err := discover.NewNvSwitchDiscoverer(logger, devRoot)
if err != nil {
return nil, fmt.Errorf("failed to construct discoverer for NVSWITCH devices: %w", err)
@@ -75,5 +70,13 @@ func NewFeatureGatedModifier(logger logger.Interface, cfg *config.Config, image
discoverers = append(discoverers, d)
}
if cfg.Features.IsEnabled(config.FeatureGDRCopy, image) {
d, err := discover.NewGDRCopyDiscoverer(logger, devRoot)
if err != nil {
return nil, fmt.Errorf("failed to construct discoverer for GDRCopy devices: %w", err)
}
discoverers = append(discoverers, d)
}
return NewModifierFromDiscoverer(logger, discover.Merge(discoverers...))
}

View File

@@ -29,14 +29,12 @@ import (
// NewGraphicsModifier constructs a modifier that injects graphics-related modifications into an OCI runtime specification.
// The value of the NVIDIA_DRIVER_CAPABILITIES environment variable is checked to determine if this modification should be made.
func NewGraphicsModifier(logger logger.Interface, cfg *config.Config, image image.CUDA) (oci.SpecModifier, error) {
func NewGraphicsModifier(logger logger.Interface, cfg *config.Config, image image.CUDA, driver *root.Driver) (oci.SpecModifier, error) {
if required, reason := requiresGraphicsModifier(image); !required {
logger.Infof("No graphics modifier required: %v", reason)
return nil, nil
}
// TODO: We should not just pass `nil` as the search path here.
driver := root.New(logger, cfg.NVIDIAContainerCLIConfig.Root, nil)
nvidiaCTKPath := cfg.NVIDIACTKConfig.Path
mounts, err := discover.NewGraphicsMountsDiscoverer(

View File

@@ -38,7 +38,7 @@ func (o tegraOptions) newDiscovererFromCSVFiles() (discover.Discover, error) {
devices := discover.NewCharDeviceDiscoverer(
o.logger,
o.driverRoot,
o.devRoot,
targetsByType[csv.MountSpecDev],
)

View File

@@ -31,6 +31,7 @@ type tegraOptions struct {
driverRoot string
devRoot string
nvidiaCTKPath string
ldconfigPath string
librarySearchPaths []string
ignorePatterns ignoreMountSpecPatterns
@@ -79,7 +80,7 @@ func New(opts ...Option) (discover.Discover, error) {
return nil, fmt.Errorf("failed to create CSV discoverer: %v", err)
}
ldcacheUpdateHook, err := discover.NewLDCacheUpdateHook(o.logger, csvDiscoverer, o.nvidiaCTKPath)
ldcacheUpdateHook, err := discover.NewLDCacheUpdateHook(o.logger, csvDiscoverer, o.nvidiaCTKPath, o.ldconfigPath)
if err != nil {
return nil, fmt.Errorf("failed to create ldcach update hook discoverer: %v", err)
}
@@ -119,9 +120,9 @@ func WithDriverRoot(driverRoot string) Option {
// WithDevRoot sets the /dev root.
// If this is unset, the driver root is assumed.
func WithDevRoot(driverRoot string) Option {
func WithDevRoot(devRoot string) Option {
return func(o *tegraOptions) {
o.driverRoot = driverRoot
o.devRoot = devRoot
}
}
@@ -139,6 +140,13 @@ func WithNVIDIACTKPath(nvidiaCTKPath string) Option {
}
}
// WithLdconfigPath sets the path to the ldconfig program
func WithLdconfigPath(ldconfigPath string) Option {
return func(o *tegraOptions) {
o.ldconfigPath = ldconfigPath
}
}
// WithLibrarySearchPaths sets the library search paths for the discoverer.
func WithLibrarySearchPaths(librarySearchPaths ...string) Option {
return func(o *tegraOptions) {

View File

@@ -26,6 +26,7 @@ import (
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup/root"
)
// Run is an entry point that allows for idiomatic handling of errors
@@ -76,8 +77,13 @@ func (r rt) Run(argv []string) (rerr error) {
r.logger.Infof("Running with config:\n%+v", cfg)
}
driver := root.New(
root.WithLogger(r.logger),
root.WithDriverRoot(cfg.NVIDIAContainerCLIConfig.Root),
)
r.logger.Debugf("Command line arguments: %v", argv)
runtime, err := newNVIDIAContainerRuntime(r.logger, cfg, argv)
runtime, err := newNVIDIAContainerRuntime(r.logger, cfg, argv, driver)
if err != nil {
return fmt.Errorf("failed to create NVIDIA Container Runtime: %v", err)
}

View File

@@ -23,12 +23,13 @@ import (
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup/root"
"github.com/NVIDIA/nvidia-container-toolkit/internal/modifier"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
)
// newNVIDIAContainerRuntime is a factory method that constructs a runtime based on the selected configuration and specified logger
func newNVIDIAContainerRuntime(logger logger.Interface, cfg *config.Config, argv []string) (oci.Runtime, error) {
func newNVIDIAContainerRuntime(logger logger.Interface, cfg *config.Config, argv []string, driver *root.Driver) (oci.Runtime, error) {
lowLevelRuntime, err := oci.NewLowLevelRuntime(logger, cfg.NVIDIAContainerRuntimeConfig.Runtimes)
if err != nil {
return nil, fmt.Errorf("error constructing low-level runtime: %v", err)
@@ -44,7 +45,7 @@ func newNVIDIAContainerRuntime(logger logger.Interface, cfg *config.Config, argv
return nil, fmt.Errorf("error constructing OCI specification: %v", err)
}
specModifier, err := newSpecModifier(logger, cfg, ociSpec)
specModifier, err := newSpecModifier(logger, cfg, ociSpec, driver)
if err != nil {
return nil, fmt.Errorf("failed to construct OCI spec modifier: %v", err)
}
@@ -61,7 +62,7 @@ func newNVIDIAContainerRuntime(logger logger.Interface, cfg *config.Config, argv
}
// newSpecModifier is a factory method that creates constructs an OCI spec modifer based on the provided config.
func newSpecModifier(logger logger.Interface, cfg *config.Config, ociSpec oci.Spec) (oci.SpecModifier, error) {
func newSpecModifier(logger logger.Interface, cfg *config.Config, ociSpec oci.Spec, driver *root.Driver) (oci.SpecModifier, error) {
rawSpec, err := ociSpec.Load()
if err != nil {
return nil, fmt.Errorf("failed to load OCI spec: %v", err)
@@ -82,7 +83,7 @@ func newSpecModifier(logger logger.Interface, cfg *config.Config, ociSpec oci.Sp
return modeModifier, nil
}
graphicsModifier, err := modifier.NewGraphicsModifier(logger, cfg, image)
graphicsModifier, err := modifier.NewGraphicsModifier(logger, cfg, image, driver)
if err != nil {
return nil, err
}

View File

@@ -29,6 +29,7 @@ import (
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup/root"
"github.com/NVIDIA/nvidia-container-toolkit/internal/test"
)
@@ -63,6 +64,9 @@ func TestMain(m *testing.M) {
func TestFactoryMethod(t *testing.T) {
logger, _ := testlog.NewNullLogger()
driver := root.New(
root.WithDriverRoot("/nvidia/driver/root"),
)
testCases := []struct {
description string
@@ -143,6 +147,7 @@ func TestFactoryMethod(t *testing.T) {
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
bundleDir := t.TempDir()
specFile, err := os.Create(filepath.Join(bundleDir, "config.json"))
@@ -151,7 +156,7 @@ func TestFactoryMethod(t *testing.T) {
argv := []string{"--bundle", bundleDir, "create"}
_, err = newNVIDIAContainerRuntime(logger, tc.cfg, argv)
_, err = newNVIDIAContainerRuntime(logger, tc.cfg, argv, driver)
if tc.expectedError {
require.Error(t, err)
} else {

View File

@@ -29,15 +29,18 @@ import (
func TestCreateControlDevices(t *testing.T) {
logger, _ := testlog.NewNullLogger()
nvidiaDevices := &devices.DevicesMock{
GetFunc: func(name devices.Name) (devices.Major, bool) {
devices := map[devices.Name]devices.Major{
"nvidia-frontend": 195,
"nvidia-uvm": 243,
}
return devices[name], true
},
}
nvidiaDevices := devices.New(
devices.WithDeviceToMajor(map[string]int{
"nvidia-frontend": 195,
"nvidia-uvm": 243,
}),
)
nvidia550Devices := devices.New(
devices.WithDeviceToMajor(map[string]int{
"nvidia": 195,
"nvidia-uvm": 243,
}),
)
mknodeError := errors.New("mknode error")
@@ -54,7 +57,7 @@ func TestCreateControlDevices(t *testing.T) {
}
}{
{
description: "no root specified",
description: "no root specified; pre 550 driver",
root: "",
devices: nvidiaDevices,
mknodeError: nil,
@@ -69,6 +72,22 @@ func TestCreateControlDevices(t *testing.T) {
{"/dev/nvidia-uvm-tools", 243, 1},
},
},
{
description: "no root specified; 550 driver",
root: "",
devices: nvidia550Devices,
mknodeError: nil,
expectedCalls: []struct {
S string
N1 int
N2 int
}{
{"/dev/nvidiactl", 195, 255},
{"/dev/nvidia-modeset", 195, 254},
{"/dev/nvidia-uvm", 243, 0},
{"/dev/nvidia-uvm-tools", 243, 1},
},
},
{
description: "some root specified",
root: "/some/root",
@@ -130,5 +149,4 @@ func TestCreateControlDevices(t *testing.T) {
require.EqualValues(t, tc.expectedCalls, mknode.MknodeCalls())
})
}
}

View File

@@ -10,7 +10,7 @@ Build-Depends: debhelper (>= 9)
Package: nvidia-container-toolkit
Architecture: any
Depends: ${misc:Depends}, nvidia-container-toolkit-base (= @VERSION@), libnvidia-container-tools (>= @LIBNVIDIA_CONTAINER_TOOLS_VERSION@), libnvidia-container-tools (<< 2.0.0), libseccomp2
Depends: ${misc:Depends}, nvidia-container-toolkit-base (= @VERSION@), libnvidia-container-tools (>= @LIBNVIDIA_CONTAINER_TOOLS_VERSION@), libnvidia-container-tools (<< 2.0.0)
Breaks: nvidia-container-runtime (<= 3.5.0-1), nvidia-container-runtime-hook
Replaces: nvidia-container-runtime (<= 3.5.0-1), nvidia-container-runtime-hook
Description: NVIDIA Container toolkit

View File

@@ -23,13 +23,6 @@ Provides: nvidia-container-runtime-hook
Requires: libnvidia-container-tools >= %{libnvidia_container_tools_version}, libnvidia-container-tools < 2.0.0
Requires: nvidia-container-toolkit-base == %{version}-%{release}
%if 0%{?suse_version}
Requires: libseccomp2
Requires: libapparmor1
%else
Requires: libseccomp
%endif
%description
Provides tools and utilities to enable GPU support in containers.

View File

@@ -20,7 +20,7 @@ package engine
type Interface interface {
DefaultRuntime() string
AddRuntime(string, string, bool) error
Set(string, interface{}) error
Set(string, interface{})
RemoveRuntime(string) error
Save(string) (int64, error)
}

View File

@@ -135,11 +135,10 @@ func (c *ConfigV1) RemoveRuntime(name string) error {
}
// SetOption sets the specified containerd option.
func (c *ConfigV1) Set(key string, value interface{}) error {
func (c *ConfigV1) Set(key string, value interface{}) {
config := *c.Tree
config.SetPath([]string{"plugins", "cri", "containerd", key}, value)
*c.Tree = config
return nil
}
// Save wrotes the config to a file

View File

@@ -91,11 +91,10 @@ func (c *Config) getRuntimeAnnotations(path []string) ([]string, error) {
}
// Set sets the specified containerd option.
func (c *Config) Set(key string, value interface{}) error {
func (c *Config) Set(key string, value interface{}) {
config := *c.Tree
config.SetPath([]string{"plugins", "io.containerd.grpc.v1.cri", key}, value)
*c.Tree = config
return nil
}
// DefaultRuntime returns the default runtime for the cri-o config

View File

@@ -31,6 +31,8 @@ type Config struct {
ContainerAnnotations []string
}
var _ engine.Interface = (*Config)(nil)
// New creates a containerd config with the specified options
func New(opts ...Option) (engine.Interface, error) {
b := &builder{}

View File

@@ -27,6 +27,8 @@ import (
// Config represents the cri-o config
type Config toml.Tree
var _ engine.Interface = (*Config)(nil)
// New creates a cri-o config with the specified options
func New(opts ...Option) (engine.Interface, error) {
b := &builder{}
@@ -99,9 +101,11 @@ func (c *Config) RemoveRuntime(name string) error {
return nil
}
// Set is not supported for cri-o configs.
func (c *Config) Set(key string, value interface{}) error {
return fmt.Errorf("Set is not supported for crio configs")
// Set sets the specified cri-o option.
func (c *Config) Set(key string, value interface{}) {
config := (toml.Tree)(*c)
config.Set(key, value)
*c = (Config)(config)
}
// Save writes the config to the specified path

View File

@@ -32,6 +32,8 @@ const (
// TODO: This should not be public, but we need to access it from the tests in tools/container/docker
type Config map[string]interface{}
var _ engine.Interface = (*Config)(nil)
// New creates a docker config with the specified options
func New(opts ...Option) (engine.Interface, error) {
b := &builder{}
@@ -115,9 +117,8 @@ func (c *Config) RemoveRuntime(name string) error {
}
// Set sets the specified docker option
func (c *Config) Set(key string, value interface{}) error {
func (c *Config) Set(key string, value interface{}) {
(*c)[key] = value
return nil
}
// Save writes the config to the specified path

View File

@@ -48,7 +48,8 @@ type Interface interface {
GetCommonEdits() (*cdi.ContainerEdits, error)
GetAllDeviceSpecs() ([]specs.Device, error)
GetGPUDeviceEdits(device.Device) (*cdi.ContainerEdits, error)
GetGPUDeviceSpecs(int, device.Device) (*specs.Device, error)
GetGPUDeviceSpecs(int, device.Device) ([]specs.Device, error)
GetMIGDeviceEdits(device.Device, device.MigDevice) (*cdi.ContainerEdits, error)
GetMIGDeviceSpecs(int, device.Device, int, device.MigDevice) (*specs.Device, error)
GetMIGDeviceSpecs(int, device.Device, int, device.MigDevice) ([]specs.Device, error)
GetDeviceSpecsByID(...string) ([]specs.Device, error)
}

View File

@@ -41,7 +41,7 @@ func (l *nvmllib) newCommonNVMLDiscoverer() (discover.Discover, error) {
l.logger.Warningf("failed to create discoverer for graphics mounts: %v", err)
}
driverFiles, err := NewDriverDiscoverer(l.logger, l.driver, l.nvidiaCTKPath, l.nvmllib)
driverFiles, err := NewDriverDiscoverer(l.logger, l.driver, l.nvidiaCTKPath, l.ldconfigPath, l.nvmllib)
if err != nil {
return nil, fmt.Errorf("failed to create discoverer for driver files: %v", err)
}

View File

@@ -22,7 +22,7 @@ import (
"path/filepath"
"strings"
"github.com/NVIDIA/go-nvlib/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml"
"golang.org/x/sys/unix"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
@@ -34,7 +34,7 @@ import (
// NewDriverDiscoverer creates a discoverer for the libraries and binaries associated with a driver installation.
// The supplied NVML Library is used to query the expected driver version.
func NewDriverDiscoverer(logger logger.Interface, driver *root.Driver, nvidiaCTKPath string, nvmllib nvml.Interface) (discover.Discover, error) {
func NewDriverDiscoverer(logger logger.Interface, driver *root.Driver, nvidiaCTKPath string, ldconfigPath string, nvmllib nvml.Interface) (discover.Discover, error) {
if r := nvmllib.Init(); r != nvml.SUCCESS {
return nil, fmt.Errorf("failed to initialize NVML: %v", r)
}
@@ -49,11 +49,11 @@ func NewDriverDiscoverer(logger logger.Interface, driver *root.Driver, nvidiaCTK
return nil, fmt.Errorf("failed to determine driver version: %v", r)
}
return newDriverVersionDiscoverer(logger, driver, nvidiaCTKPath, version)
return newDriverVersionDiscoverer(logger, driver, nvidiaCTKPath, ldconfigPath, version)
}
func newDriverVersionDiscoverer(logger logger.Interface, driver *root.Driver, nvidiaCTKPath string, version string) (discover.Discover, error) {
libraries, err := NewDriverLibraryDiscoverer(logger, driver, nvidiaCTKPath, version)
func newDriverVersionDiscoverer(logger logger.Interface, driver *root.Driver, nvidiaCTKPath, ldconfigPath, version string) (discover.Discover, error) {
libraries, err := NewDriverLibraryDiscoverer(logger, driver, nvidiaCTKPath, ldconfigPath, version)
if err != nil {
return nil, fmt.Errorf("failed to create discoverer for driver libraries: %v", err)
}
@@ -81,7 +81,7 @@ func newDriverVersionDiscoverer(logger logger.Interface, driver *root.Driver, nv
}
// NewDriverLibraryDiscoverer creates a discoverer for the libraries associated with the specified driver version.
func NewDriverLibraryDiscoverer(logger logger.Interface, driver *root.Driver, nvidiaCTKPath string, version string) (discover.Discover, error) {
func NewDriverLibraryDiscoverer(logger logger.Interface, driver *root.Driver, nvidiaCTKPath, ldconfigPath, version string) (discover.Discover, error) {
libraryPaths, err := getVersionLibs(logger, driver, version)
if err != nil {
return nil, fmt.Errorf("failed to get libraries for driver version: %v", err)
@@ -97,7 +97,7 @@ func NewDriverLibraryDiscoverer(logger logger.Interface, driver *root.Driver, nv
libraryPaths,
)
hooks, _ := discover.NewLDCacheUpdateHook(logger, libraries, nvidiaCTKPath)
hooks, _ := discover.NewLDCacheUpdateHook(logger, libraries, nvidiaCTKPath, ldconfigPath)
d := discover.Merge(
libraries,

View File

@@ -33,12 +33,13 @@ var requiredDriverStoreFiles = []string{
"libnvidia-ml.so.1", /* Core library for nvml */
"libnvidia-ml_loader.so", /* Core library for nvml on WSL */
"libdxcore.so", /* Core library for dxcore support */
"libnvdxgdmal.so.1", /* dxgdmal library for cuda */
"nvcubins.bin", /* Binary containing GPU code for cuda */
"nvidia-smi", /* nvidia-smi binary*/
}
// newWSLDriverDiscoverer returns a Discoverer for WSL2 drivers.
func newWSLDriverDiscoverer(logger logger.Interface, driverRoot string, nvidiaCTKPath string) (discover.Discover, error) {
func newWSLDriverDiscoverer(logger logger.Interface, driverRoot string, nvidiaCTKPath, ldconfigPath string) (discover.Discover, error) {
err := dxcore.Init()
if err != nil {
return nil, fmt.Errorf("failed to initialize dxcore: %v", err)
@@ -55,11 +56,11 @@ func newWSLDriverDiscoverer(logger logger.Interface, driverRoot string, nvidiaCT
}
logger.Infof("Using WSL driver store paths: %v", driverStorePaths)
return newWSLDriverStoreDiscoverer(logger, driverRoot, nvidiaCTKPath, driverStorePaths)
return newWSLDriverStoreDiscoverer(logger, driverRoot, nvidiaCTKPath, ldconfigPath, driverStorePaths)
}
// newWSLDriverStoreDiscoverer returns a Discoverer for WSL2 drivers in the driver store associated with a dxcore adapter.
func newWSLDriverStoreDiscoverer(logger logger.Interface, driverRoot string, nvidiaCTKPath string, driverStorePaths []string) (discover.Discover, error) {
func newWSLDriverStoreDiscoverer(logger logger.Interface, driverRoot string, nvidiaCTKPath string, ldconfigPath string, driverStorePaths []string) (discover.Discover, error) {
var searchPaths []string
seen := make(map[string]bool)
for _, path := range driverStorePaths {
@@ -92,7 +93,7 @@ func newWSLDriverStoreDiscoverer(logger logger.Interface, driverRoot string, nvi
nvidiaCTKPath: nvidiaCTKPath,
}
ldcacheHook, _ := discover.NewLDCacheUpdateHook(logger, libraries, nvidiaCTKPath)
ldcacheHook, _ := discover.NewLDCacheUpdateHook(logger, libraries, nvidiaCTKPath, ldconfigPath)
d := discover.Merge(
libraries,

View File

@@ -23,7 +23,7 @@ import (
"strings"
"github.com/NVIDIA/go-nvlib/pkg/nvlib/device"
"github.com/NVIDIA/go-nvlib/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml"
"tags.cncf.io/container-device-interface/pkg/cdi"
"tags.cncf.io/container-device-interface/specs-go"
@@ -34,23 +34,26 @@ import (
)
// GetGPUDeviceSpecs returns the CDI device specs for the full GPU represented by 'device'.
func (l *nvmllib) GetGPUDeviceSpecs(i int, d device.Device) (*specs.Device, error) {
func (l *nvmllib) GetGPUDeviceSpecs(i int, d device.Device) ([]specs.Device, error) {
edits, err := l.GetGPUDeviceEdits(d)
if err != nil {
return nil, fmt.Errorf("failed to get edits for device: %v", err)
}
name, err := l.deviceNamer.GetDeviceName(i, convert{d})
var deviceSpecs []specs.Device
names, err := l.deviceNamers.GetDeviceNames(i, convert{d})
if err != nil {
return nil, fmt.Errorf("failed to get device name: %v", err)
}
spec := specs.Device{
Name: name,
ContainerEdits: *edits.ContainerEdits,
for _, name := range names {
spec := specs.Device{
Name: name,
ContainerEdits: *edits.ContainerEdits,
}
deviceSpecs = append(deviceSpecs, spec)
}
return &spec, nil
return deviceSpecs, nil
}
// GetGPUDeviceEdits returns the CDI edits for the full GPU represented by 'device'.

View File

@@ -68,7 +68,7 @@ func (l *gdslib) GetGPUDeviceEdits(device.Device) (*cdi.ContainerEdits, error) {
}
// GetGPUDeviceSpecs is unsupported for the gdslib specs
func (l *gdslib) GetGPUDeviceSpecs(int, device.Device) (*specs.Device, error) {
func (l *gdslib) GetGPUDeviceSpecs(int, device.Device) ([]specs.Device, error) {
return nil, fmt.Errorf("GetGPUDeviceSpecs is not supported")
}
@@ -78,6 +78,13 @@ func (l *gdslib) GetMIGDeviceEdits(device.Device, device.MigDevice) (*cdi.Contai
}
// GetMIGDeviceSpecs is unsupported for the gdslib specs
func (l *gdslib) GetMIGDeviceSpecs(int, device.Device, int, device.MigDevice) (*specs.Device, error) {
func (l *gdslib) GetMIGDeviceSpecs(int, device.Device, int, device.MigDevice) ([]specs.Device, error) {
return nil, fmt.Errorf("GetMIGDeviceSpecs is not supported")
}
// GetDeviceSpecsByID returns the CDI device specs for the GPU(s) represented by
// the provided identifiers, where an identifier is an index or UUID of a valid
// GPU device.
func (l *gdslib) GetDeviceSpecsByID(...string) ([]specs.Device, error) {
return nil, fmt.Errorf("GetDeviceSpecsByID is not supported")
}

View File

@@ -45,6 +45,7 @@ func (l *csvlib) GetAllDeviceSpecs() ([]specs.Device, error) {
tegra.WithDriverRoot(l.driverRoot),
tegra.WithDevRoot(l.devRoot),
tegra.WithNVIDIACTKPath(l.nvidiaCTKPath),
tegra.WithLdconfigPath(l.ldconfigPath),
tegra.WithCSVFiles(l.csvFiles),
tegra.WithLibrarySearchPaths(l.librarySearchPaths...),
tegra.WithIngorePatterns(l.csvIgnorePatterns...),
@@ -57,16 +58,20 @@ func (l *csvlib) GetAllDeviceSpecs() ([]specs.Device, error) {
return nil, fmt.Errorf("failed to create container edits for CSV files: %v", err)
}
name, err := l.deviceNamer.GetDeviceName(0, uuidUnsupported{})
names, err := l.deviceNamers.GetDeviceNames(0, uuidIgnored{})
if err != nil {
return nil, fmt.Errorf("failed to get device name: %v", err)
}
deviceSpec := specs.Device{
Name: name,
ContainerEdits: *e.ContainerEdits,
var deviceSpecs []specs.Device
for _, name := range names {
deviceSpec := specs.Device{
Name: name,
ContainerEdits: *e.ContainerEdits,
}
deviceSpecs = append(deviceSpecs, deviceSpec)
}
return []specs.Device{deviceSpec}, nil
return deviceSpecs, nil
}
// GetCommonEdits generates a CDI specification that can be used for ANY devices
@@ -81,7 +86,7 @@ func (l *csvlib) GetGPUDeviceEdits(device.Device) (*cdi.ContainerEdits, error) {
}
// GetGPUDeviceSpecs returns the CDI device specs for the full GPU represented by 'device'.
func (l *csvlib) GetGPUDeviceSpecs(i int, d device.Device) (*specs.Device, error) {
func (l *csvlib) GetGPUDeviceSpecs(i int, d device.Device) ([]specs.Device, error) {
return nil, fmt.Errorf("GetGPUDeviceSpecs is not supported for CSV files")
}
@@ -91,6 +96,13 @@ func (l *csvlib) GetMIGDeviceEdits(device.Device, device.MigDevice) (*cdi.Contai
}
// GetMIGDeviceSpecs returns the CDI device specs for the full MIG represented by 'device'.
func (l *csvlib) GetMIGDeviceSpecs(int, device.Device, int, device.MigDevice) (*specs.Device, error) {
func (l *csvlib) GetMIGDeviceSpecs(int, device.Device, int, device.MigDevice) ([]specs.Device, error) {
return nil, fmt.Errorf("GetMIGDeviceSpecs is not supported for CSV files")
}
// GetDeviceSpecsByID returns the CDI device specs for the GPU(s) represented by
// the provided identifiers, where an identifier is an index or UUID of a valid
// GPU device.
func (l *csvlib) GetDeviceSpecsByID(...string) ([]specs.Device, error) {
return nil, fmt.Errorf("GetDeviceSpecsByID is not supported for CSV files")
}

View File

@@ -18,9 +18,11 @@ package nvcdi
import (
"fmt"
"strconv"
"strings"
"github.com/NVIDIA/go-nvlib/pkg/nvlib/device"
"github.com/NVIDIA/go-nvlib/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml"
"tags.cncf.io/container-device-interface/pkg/cdi"
"tags.cncf.io/container-device-interface/specs-go"
@@ -75,14 +77,142 @@ func (l *nvmllib) GetCommonEdits() (*cdi.ContainerEdits, error) {
return edits.FromDiscoverer(common)
}
// GetDeviceSpecsByID returns the CDI device specs for the GPU(s) represented by
// the provided identifiers, where an identifier is an index or UUID of a valid
// GPU device.
func (l *nvmllib) GetDeviceSpecsByID(identifiers ...string) ([]specs.Device, error) {
for _, id := range identifiers {
if id == "all" {
return l.GetAllDeviceSpecs()
}
}
var deviceSpecs []specs.Device
if r := l.nvmllib.Init(); r != nvml.SUCCESS {
return nil, fmt.Errorf("failed to initialize NVML: %w", r)
}
defer func() {
if r := l.nvmllib.Shutdown(); r != nvml.SUCCESS {
l.logger.Warningf("failed to shutdown NVML: %w", r)
}
}()
nvmlDevices, err := l.getNVMLDevicesByID(identifiers...)
if err != nil {
return nil, fmt.Errorf("failed to get NVML device handles: %w", err)
}
for i, nvmlDevice := range nvmlDevices {
deviceEdits, err := l.getEditsForDevice(nvmlDevice)
if err != nil {
return nil, fmt.Errorf("failed to get CDI device edits for identifier %q: %w", identifiers[i], err)
}
deviceSpec := specs.Device{
Name: identifiers[i],
ContainerEdits: *deviceEdits.ContainerEdits,
}
deviceSpecs = append(deviceSpecs, deviceSpec)
}
return deviceSpecs, nil
}
// TODO: move this to go-nvlib?
func (l *nvmllib) getNVMLDevicesByID(identifiers ...string) ([]nvml.Device, error) {
var devices []nvml.Device
for _, id := range identifiers {
dev, err := l.getNVMLDeviceByID(id)
if err != nvml.SUCCESS {
return nil, fmt.Errorf("failed to get NVML device handle for identifier %q: %w", id, err)
}
devices = append(devices, dev)
}
return devices, nil
}
func (l *nvmllib) getNVMLDeviceByID(id string) (nvml.Device, error) {
var err error
devID := device.Identifier(id)
if devID.IsUUID() {
return l.nvmllib.DeviceGetHandleByUUID(id)
}
if devID.IsGpuIndex() {
if idx, err := strconv.Atoi(id); err == nil {
return l.nvmllib.DeviceGetHandleByIndex(idx)
}
return nil, fmt.Errorf("failed to convert device index to an int: %w", err)
}
if devID.IsMigIndex() {
var gpuIdx, migIdx int
var parent nvml.Device
split := strings.SplitN(id, ":", 2)
if gpuIdx, err = strconv.Atoi(split[0]); err != nil {
return nil, fmt.Errorf("failed to convert device index to an int: %w", err)
}
if migIdx, err = strconv.Atoi(split[1]); err != nil {
return nil, fmt.Errorf("failed to convert device index to an int: %w", err)
}
if parent, err = l.nvmllib.DeviceGetHandleByIndex(gpuIdx); err != nvml.SUCCESS {
return nil, fmt.Errorf("failed to get parent device handle: %w", err)
}
return parent.GetMigDeviceHandleByIndex(migIdx)
}
return nil, fmt.Errorf("identifier is not a valid UUID or index: %q", id)
}
func (l *nvmllib) getEditsForDevice(nvmlDevice nvml.Device) (*cdi.ContainerEdits, error) {
mig, err := nvmlDevice.IsMigDeviceHandle()
if err != nvml.SUCCESS {
return nil, fmt.Errorf("failed to determine if device handle is a MIG device: %w", err)
}
if mig {
return l.getEditsForMIGDevice(nvmlDevice)
}
return l.getEditsForGPUDevice(nvmlDevice)
}
func (l *nvmllib) getEditsForGPUDevice(nvmlDevice nvml.Device) (*cdi.ContainerEdits, error) {
nvlibDevice, err := l.devicelib.NewDevice(nvmlDevice)
if err != nil {
return nil, fmt.Errorf("failed to construct device: %w", err)
}
deviceEdits, err := l.GetGPUDeviceEdits(nvlibDevice)
if err != nil {
return nil, fmt.Errorf("failed to get GPU device edits: %w", err)
}
return deviceEdits, nil
}
func (l *nvmllib) getEditsForMIGDevice(nvmlDevice nvml.Device) (*cdi.ContainerEdits, error) {
nvmlParentDevice, ret := nvmlDevice.GetDeviceHandleFromMigDeviceHandle()
if ret != nvml.SUCCESS {
return nil, fmt.Errorf("failed to get parent device handle: %w", ret)
}
nvlibMigDevice, err := l.devicelib.NewMigDevice(nvmlDevice)
if err != nil {
return nil, fmt.Errorf("failed to construct device: %w", err)
}
nvlibParentDevice, err := l.devicelib.NewDevice(nvmlParentDevice)
if err != nil {
return nil, fmt.Errorf("failed to construct parent device: %w", err)
}
return l.GetMIGDeviceEdits(nvlibParentDevice, nvlibMigDevice)
}
func (l *nvmllib) getGPUDeviceSpecs() ([]specs.Device, error) {
var deviceSpecs []specs.Device
err := l.devicelib.VisitDevices(func(i int, d device.Device) error {
deviceSpec, err := l.GetGPUDeviceSpecs(i, d)
specsForDevice, err := l.GetGPUDeviceSpecs(i, d)
if err != nil {
return err
}
deviceSpecs = append(deviceSpecs, *deviceSpec)
deviceSpecs = append(deviceSpecs, specsForDevice...)
return nil
})
@@ -95,11 +225,11 @@ func (l *nvmllib) getGPUDeviceSpecs() ([]specs.Device, error) {
func (l *nvmllib) getMigDeviceSpecs() ([]specs.Device, error) {
var deviceSpecs []specs.Device
err := l.devicelib.VisitMigDevices(func(i int, d device.Device, j int, mig device.MigDevice) error {
deviceSpec, err := l.GetMIGDeviceSpecs(i, d, j, mig)
specsForDevice, err := l.GetMIGDeviceSpecs(i, d, j, mig)
if err != nil {
return err
}
deviceSpecs = append(deviceSpecs, *deviceSpec)
deviceSpecs = append(deviceSpecs, specsForDevice...)
return nil
})

View File

@@ -54,7 +54,7 @@ func (l *wsllib) GetAllDeviceSpecs() ([]specs.Device, error) {
// GetCommonEdits generates a CDI specification that can be used for ANY devices
func (l *wsllib) GetCommonEdits() (*cdi.ContainerEdits, error) {
driver, err := newWSLDriverDiscoverer(l.logger, l.driverRoot, l.nvidiaCTKPath)
driver, err := newWSLDriverDiscoverer(l.logger, l.driverRoot, l.nvidiaCTKPath, l.ldconfigPath)
if err != nil {
return nil, fmt.Errorf("failed to create discoverer for WSL driver: %v", err)
}
@@ -68,7 +68,7 @@ func (l *wsllib) GetGPUDeviceEdits(device.Device) (*cdi.ContainerEdits, error) {
}
// GetGPUDeviceSpecs returns the CDI device specs for the full GPU represented by 'device'.
func (l *wsllib) GetGPUDeviceSpecs(i int, d device.Device) (*specs.Device, error) {
func (l *wsllib) GetGPUDeviceSpecs(i int, d device.Device) ([]specs.Device, error) {
return nil, fmt.Errorf("GetGPUDeviceSpecs is not supported on WSL")
}
@@ -78,6 +78,13 @@ func (l *wsllib) GetMIGDeviceEdits(device.Device, device.MigDevice) (*cdi.Contai
}
// GetMIGDeviceSpecs returns the CDI device specs for the full MIG represented by 'device'.
func (l *wsllib) GetMIGDeviceSpecs(int, device.Device, int, device.MigDevice) (*specs.Device, error) {
func (l *wsllib) GetMIGDeviceSpecs(int, device.Device, int, device.MigDevice) ([]specs.Device, error) {
return nil, fmt.Errorf("GetMIGDeviceSpecs is not supported on WSL")
}
// GetDeviceSpecsByID returns the CDI device specs for the GPU(s) represented by
// the provided identifiers, where an identifier is an index or UUID of a valid
// GPU device.
func (l *wsllib) GetDeviceSpecsByID(...string) ([]specs.Device, error) {
return nil, fmt.Errorf("GetDeviceSpecsByID is not supported on WSL")
}

View File

@@ -21,7 +21,8 @@ import (
"github.com/NVIDIA/go-nvlib/pkg/nvlib/device"
"github.com/NVIDIA/go-nvlib/pkg/nvlib/info"
"github.com/NVIDIA/go-nvlib/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml"
"tags.cncf.io/container-device-interface/pkg/cdi"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup/root"
@@ -44,10 +45,12 @@ type nvcdilib struct {
nvmllib nvml.Interface
mode string
devicelib device.Interface
deviceNamer DeviceNamer
deviceNamers DeviceNamers
driverRoot string
devRoot string
nvidiaCTKPath string
ldconfigPath string
configSearchPaths []string
librarySearchPaths []string
csvFiles []string
@@ -74,8 +77,9 @@ func New(opts ...Option) (Interface, error) {
if l.logger == nil {
l.logger = logger.New()
}
if l.deviceNamer == nil {
l.deviceNamer, _ = NewDeviceNamer(DeviceNameStrategyIndex)
if len(l.deviceNamers) == 0 {
indexNamer, _ := NewDeviceNamer(DeviceNameStrategyIndex)
l.deviceNamers = []DeviceNamer{indexNamer}
}
if l.driverRoot == "" {
l.driverRoot = "/"
@@ -90,8 +94,11 @@ func New(opts ...Option) (Interface, error) {
l.infolib = info.New()
}
// TODO: We need to improve the construction of this driver root.
l.driver = root.New(l.logger, l.driverRoot, l.librarySearchPaths)
l.driver = root.New(
root.WithLogger(l.logger),
root.WithDriverRoot(l.driverRoot),
root.WithLibrarySearchPaths(l.librarySearchPaths...),
)
var lib Interface
switch l.resolveMode() {
@@ -160,6 +167,17 @@ func (l *wrapper) GetSpec() (spec.Interface, error) {
)
}
// GetCommonEdits returns the wrapped edits and adds additional edits on top.
func (m *wrapper) GetCommonEdits() (*cdi.ContainerEdits, error) {
edits, err := m.Interface.GetCommonEdits()
if err != nil {
return nil, err
}
edits.Env = append(edits.Env, "NVIDIA_VISIBLE_DEVICES=void")
return edits, nil
}
// resolveMode resolves the mode for CDI spec generation based on the current system.
func (l *nvcdilib) resolveMode() (rmode string) {
if l.mode != ModeAuto {
@@ -179,7 +197,7 @@ func (l *nvcdilib) resolveMode() (rmode string) {
isNvml, reason := l.infolib.HasNvml()
l.logger.Debugf("Is NVML-based system? %v: %v", isNvml, reason)
isTegra, reason := l.infolib.IsTegraSystem()
isTegra, reason := l.infolib.HasTegraFiles()
l.logger.Debugf("Is Tegra-based system? %v: %v", isTegra, reason)
if isTegra && !isNvml {

View File

@@ -20,6 +20,7 @@ import (
"fmt"
"testing"
"github.com/NVIDIA/go-nvlib/pkg/nvlib/info"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
)
@@ -87,9 +88,15 @@ func TestResolveMode(t *testing.T) {
for i, tc := range testCases {
t.Run(fmt.Sprintf("test case %d", i), func(t *testing.T) {
l := nvcdilib{
logger: logger,
mode: tc.mode,
infolib: infoMock{hasDXCore: tc.hasDXCore, isTegra: tc.isTegra, hasNVML: tc.hasNVML},
logger: logger,
mode: tc.mode,
infolib: infoMock{
PropertyExtractor: &info.PropertyExtractorMock{
HasDXCoreFunc: func() (bool, string) { return tc.hasDXCore, "" },
HasNvmlFunc: func() (bool, string) { return tc.hasNVML, "" },
HasTegraFilesFunc: func() (bool, string) { return tc.isTegra, "" },
},
},
}
require.Equal(t, tc.expected, l.resolveMode())
@@ -98,19 +105,9 @@ func TestResolveMode(t *testing.T) {
}
type infoMock struct {
hasDXCore bool
isTegra bool
hasNVML bool
info.PropertyExtractor
}
func (i infoMock) HasDXCore() (bool, string) {
return i.hasDXCore, ""
}
func (i infoMock) HasNvml() (bool, string) {
return i.hasNVML, ""
}
func (i infoMock) IsTegraSystem() (bool, string) {
return i.isTegra, ""
func (i infoMock) ResolvePlatform() info.Platform {
return "not-implemented"
}

View File

@@ -66,7 +66,7 @@ func (m *managementlib) GetCommonEdits() (*cdi.ContainerEdits, error) {
return nil, fmt.Errorf("failed to get CUDA version: %v", err)
}
driver, err := newDriverVersionDiscoverer(m.logger, m.driver, m.nvidiaCTKPath, version)
driver, err := newDriverVersionDiscoverer(m.logger, m.driver, m.nvidiaCTKPath, m.ldconfigPath, version)
if err != nil {
return nil, fmt.Errorf("failed to create driver library discoverer: %v", err)
}
@@ -175,7 +175,7 @@ func (m *managementlib) GetGPUDeviceEdits(device.Device) (*cdi.ContainerEdits, e
}
// GetGPUDeviceSpecs is unsupported for the managementlib specs
func (m *managementlib) GetGPUDeviceSpecs(int, device.Device) (*specs.Device, error) {
func (m *managementlib) GetGPUDeviceSpecs(int, device.Device) ([]specs.Device, error) {
return nil, fmt.Errorf("GetGPUDeviceSpecs is not supported")
}
@@ -185,6 +185,13 @@ func (m *managementlib) GetMIGDeviceEdits(device.Device, device.MigDevice) (*cdi
}
// GetMIGDeviceSpecs is unsupported for the managementlib specs
func (m *managementlib) GetMIGDeviceSpecs(int, device.Device, int, device.MigDevice) (*specs.Device, error) {
func (m *managementlib) GetMIGDeviceSpecs(int, device.Device, int, device.MigDevice) ([]specs.Device, error) {
return nil, fmt.Errorf("GetMIGDeviceSpecs is not supported")
}
// GetDeviceSpecsByID returns the CDI device specs for the GPU(s) represented by
// the provided identifiers, where an identifier is an index or UUID of a valid
// GPU device.
func (l *managementlib) GetDeviceSpecsByID(...string) ([]specs.Device, error) {
return nil, fmt.Errorf("GetDeviceSpecsByID is not supported")
}

View File

@@ -20,7 +20,7 @@ import (
"fmt"
"github.com/NVIDIA/go-nvlib/pkg/nvlib/device"
"github.com/NVIDIA/go-nvlib/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml"
"tags.cncf.io/container-device-interface/pkg/cdi"
"tags.cncf.io/container-device-interface/specs-go"
@@ -31,23 +31,25 @@ import (
)
// GetMIGDeviceSpecs returns the CDI device specs for the full GPU represented by 'device'.
func (l *nvmllib) GetMIGDeviceSpecs(i int, d device.Device, j int, mig device.MigDevice) (*specs.Device, error) {
func (l *nvmllib) GetMIGDeviceSpecs(i int, d device.Device, j int, mig device.MigDevice) ([]specs.Device, error) {
edits, err := l.GetMIGDeviceEdits(d, mig)
if err != nil {
return nil, fmt.Errorf("failed to get edits for device: %v", err)
}
name, err := l.deviceNamer.GetMigDeviceName(i, convert{d}, j, convert{mig})
names, err := l.deviceNamers.GetMigDeviceNames(i, convert{d}, j, convert{mig})
if err != nil {
return nil, fmt.Errorf("failed to get device name: %v", err)
}
spec := specs.Device{
Name: name,
ContainerEdits: *edits.ContainerEdits,
var deviceSpecs []specs.Device
for _, name := range names {
spec := specs.Device{
Name: name,
ContainerEdits: *edits.ContainerEdits,
}
deviceSpecs = append(deviceSpecs, spec)
}
return &spec, nil
return deviceSpecs, nil
}
// GetMIGDeviceEdits returns the CDI edits for the MIG device represented by 'mig' on 'parent'.
@@ -67,7 +69,7 @@ func (l *nvmllib) GetMIGDeviceEdits(parent device.Device, mig device.MigDevice)
return nil, fmt.Errorf("error getting Compute Instance ID: %v", ret)
}
editsForDevice, err := GetEditsForComputeInstance(l.logger, l.driverRoot, gpu, gi, ci)
editsForDevice, err := l.GetEditsForComputeInstance(gpu, gi, ci)
if err != nil {
return nil, fmt.Errorf("failed to create container edits for MIG device: %v", err)
}
@@ -76,8 +78,8 @@ func (l *nvmllib) GetMIGDeviceEdits(parent device.Device, mig device.MigDevice)
}
// GetEditsForComputeInstance returns the CDI edits for a particular compute instance defined by the (gpu, gi, ci) tuple
func GetEditsForComputeInstance(logger logger.Interface, driverRoot string, gpu int, gi int, ci int) (*cdi.ContainerEdits, error) {
computeInstance, err := newComputeInstanceDiscoverer(logger, driverRoot, gpu, gi, ci)
func (l *nvmllib) GetEditsForComputeInstance(gpu int, gi int, ci int) (*cdi.ContainerEdits, error) {
computeInstance, err := newComputeInstanceDiscoverer(l.logger, l.devRoot, gpu, gi, ci)
if err != nil {
return nil, fmt.Errorf("failed to create discoverer for Compute Instance: %v", err)
}
@@ -91,7 +93,7 @@ func GetEditsForComputeInstance(logger logger.Interface, driverRoot string, gpu
}
// newComputeInstanceDiscoverer returns a discoverer for the specified compute instance
func newComputeInstanceDiscoverer(logger logger.Interface, driverRoot string, gpu int, gi int, ci int) (discover.Discover, error) {
func newComputeInstanceDiscoverer(logger logger.Interface, devRoot string, gpu int, gi int, ci int) (discover.Discover, error) {
parentPath := fmt.Sprintf("/dev/nvidia%d", gpu)
migCaps, err := nvcaps.NewMigCaps()
@@ -113,7 +115,7 @@ func newComputeInstanceDiscoverer(logger logger.Interface, driverRoot string, gp
deviceNodes := discover.NewCharDeviceDiscoverer(
logger,
driverRoot,
devRoot,
[]string{
parentPath,
giCapDevicePath,

View File

@@ -68,7 +68,7 @@ func (l *mofedlib) GetGPUDeviceEdits(device.Device) (*cdi.ContainerEdits, error)
}
// GetGPUDeviceSpecs is unsupported for the mofedlib specs
func (l *mofedlib) GetGPUDeviceSpecs(int, device.Device) (*specs.Device, error) {
func (l *mofedlib) GetGPUDeviceSpecs(int, device.Device) ([]specs.Device, error) {
return nil, fmt.Errorf("GetGPUDeviceSpecs is not supported")
}
@@ -78,6 +78,13 @@ func (l *mofedlib) GetMIGDeviceEdits(device.Device, device.MigDevice) (*cdi.Cont
}
// GetMIGDeviceSpecs is unsupported for the mofedlib specs
func (l *mofedlib) GetMIGDeviceSpecs(int, device.Device, int, device.MigDevice) (*specs.Device, error) {
func (l *mofedlib) GetMIGDeviceSpecs(int, device.Device, int, device.MigDevice) ([]specs.Device, error) {
return nil, fmt.Errorf("GetMIGDeviceSpecs is not supported")
}
// GetDeviceSpecsByID returns the CDI device specs for the GPU(s) represented by
// the provided identifiers, where an identifier is an index or UUID of a valid
// GPU device.
func (l *mofedlib) GetDeviceSpecsByID(...string) ([]specs.Device, error) {
return nil, fmt.Errorf("GetDeviceSpecsByID is not supported")
}

View File

@@ -20,7 +20,7 @@ import (
"errors"
"fmt"
"github.com/NVIDIA/go-nvlib/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml"
)
// UUIDer is an interface for getting UUIDs.
@@ -28,6 +28,9 @@ type UUIDer interface {
GetUUID() (string, error)
}
// DeviceNamers represents a list of device namers
type DeviceNamers []DeviceNamer
// DeviceNamer is an interface for getting device names
type DeviceNamer interface {
GetDeviceName(int, UUIDer) (string, error)
@@ -102,6 +105,12 @@ type convert struct {
nvmlUUIDer
}
type uuidIgnored struct{}
func (m uuidIgnored) GetUUID() (string, error) {
return "", nil
}
type uuidUnsupported struct{}
func (m convert) GetUUID() (string, error) {
@@ -120,3 +129,39 @@ var errUUIDUnsupported = errors.New("GetUUID is not supported")
func (m uuidUnsupported) GetUUID() (string, error) {
return "", errUUIDUnsupported
}
func (l DeviceNamers) GetDeviceNames(i int, d UUIDer) ([]string, error) {
var names []string
for _, namer := range l {
name, err := namer.GetDeviceName(i, d)
if err != nil {
return nil, err
}
if name == "" {
continue
}
names = append(names, name)
}
if len(names) == 0 {
return nil, errors.New("no names defined")
}
return names, nil
}
func (l DeviceNamers) GetMigDeviceNames(i int, d UUIDer, j int, mig UUIDer) ([]string, error) {
var names []string
for _, namer := range l {
name, err := namer.GetMigDeviceName(i, d, j, mig)
if err != nil {
return nil, err
}
if name == "" {
continue
}
names = append(names, name)
}
if len(names) == 0 {
return nil, errors.New("no names defined")
}
return names, nil
}

View File

@@ -6,7 +6,7 @@ package nvcdi
import (
"sync"
"github.com/NVIDIA/go-nvlib/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml"
)
// Ensure, that nvmlUUIDerMock does implement nvmlUUIDer.

View File

@@ -19,7 +19,7 @@ package nvcdi
import (
"testing"
"github.com/NVIDIA/go-nvlib/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml"
"github.com/stretchr/testify/require"
)

View File

@@ -18,7 +18,7 @@ package nvcdi
import (
"github.com/NVIDIA/go-nvlib/pkg/nvlib/device"
"github.com/NVIDIA/go-nvlib/pkg/nvml"
"github.com/NVIDIA/go-nvml/pkg/nvml"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi/transform"
@@ -34,10 +34,10 @@ func WithDeviceLib(devicelib device.Interface) Option {
}
}
// WithDeviceNamer sets the device namer for the library
func WithDeviceNamer(namer DeviceNamer) Option {
// WithDeviceNamers sets the device namer for the library
func WithDeviceNamers(namers ...DeviceNamer) Option {
return func(l *nvcdilib) {
l.deviceNamer = namer
l.deviceNamers = namers
}
}
@@ -69,6 +69,13 @@ func WithNVIDIACTKPath(path string) Option {
}
}
// WithLdconfigPath sets the path to the ldconfig program
func WithLdconfigPath(path string) Option {
return func(l *nvcdilib) {
l.ldconfigPath = path
}
}
// WithNvmlLib sets the nvml library for the library
func WithNvmlLib(nvmllib nvml.Interface) Option {
return func(l *nvcdilib) {
@@ -119,6 +126,13 @@ func WithCSVIgnorePatterns(csvIgnorePatterns []string) Option {
}
}
// WithConfigSearchPaths sets the search paths for config files.
func WithConfigSearchPaths(paths []string) Option {
return func(o *nvcdilib) {
o.configSearchPaths = paths
}
}
// WithLibrarySearchPaths sets the library search paths.
// This is currently only used for CSV-mode.
func WithLibrarySearchPaths(paths []string) Option {

View File

@@ -47,12 +47,11 @@ func (s *spec) Save(path string) error {
}
specDir := filepath.Dir(path)
registry := cdi.GetRegistry(
cache, _ := cdi.NewCache(
cdi.WithAutoRefresh(false),
cdi.WithSpecDirs(specDir),
)
if err := registry.SpecDB().WriteSpec(s.Raw(), filepath.Base(path)); err != nil {
if err := cache.WriteSpec(s.Raw(), filepath.Base(path)); err != nil {
return fmt.Errorf("failed to write spec: %w", err)
}

View File

@@ -1,3 +1,7 @@
FROM quay.io/centos/centos:stream8
RUN sed -i -e "s|mirrorlist=|#mirrorlist=|g" \
-e "s|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g" \
/etc/yum.repos.d/CentOS-Stream-*
RUN yum install -y createrepo rpm-sign pinentry

View File

@@ -45,8 +45,6 @@ echo "Building ${TARGET} for all packages to ${DIST_DIR}"
: "${LIBNVIDIA_CONTAINER_ROOT:=${PROJECT_ROOT}/third_party/libnvidia-container}"
: "${NVIDIA_CONTAINER_TOOLKIT_ROOT:=${PROJECT_ROOT}}"
: "${NVIDIA_CONTAINER_RUNTIME_ROOT:=${PROJECT_ROOT}/third_party/nvidia-container-runtime}"
: "${NVIDIA_DOCKER_ROOT:=${PROJECT_ROOT}/third_party/nvidia-docker}"
"${SCRIPTS_DIR}/get-component-versions.sh"
@@ -70,57 +68,3 @@ make -C "${NVIDIA_CONTAINER_TOOLKIT_ROOT}" \
LIBNVIDIA_CONTAINER_TAG="${NVIDIA_CONTAINER_TOOLKIT_TAG}" \
"${TARGET}"
fi
# If required we also build the nvidia-container-runtime and nvidia-docker packages.
# Since these are essentially meta packages intended to allow for users to
# transition from older installation workflows, we skip these for rc builds
# (NVIDIA_CONTAINER_TOOLKIT_TAG != "") and releases with a non-zero patch
# version of 0.
if [[ -n ${FORCE_META_PACKAGES} || -z ${NVIDIA_CONTAINER_TOOLKIT_TAG} && "${NVIDIA_CONTAINER_TOOLKIT_VERSION%.0}" != "${NVIDIA_CONTAINER_TOOLKIT_VERSION}" ]]; then
package_format=$(package_type ${TARGET})
package_target=$(get_package_target ${TARGET})
# We set the TOOLKIT_VERSION, TOOLKIT_TAG for the nvidia-container-runtime and nvidia-docker targets
# The LIB_TAG is also overridden to match the TOOLKIT_TAG.
# Build nvidia-container-runtime if required
package_name="nvidia-container-runtime"
package_version=${NVIDIA_CONTAINER_RUNTIME_VERSION}${NVIDIA_CONTAINER_TOOLKIT_TAG:+~${NVIDIA_CONTAINER_TOOLKIT_TAG}}-1
package_pattern=${DIST_DIR}/${package_format}/all/${package_name}?${package_version}?*.${package_format}
package=$(ls ${package_pattern}) || echo ""
if [[ -z ${package} ]]; then
echo "${package_name} does not exist"
make -C ${NVIDIA_CONTAINER_RUNTIME_ROOT} \
LIB_VERSION="${NVIDIA_CONTAINER_RUNTIME_VERSION}" \
LIB_TAG="${NVIDIA_CONTAINER_TOOLKIT_TAG}" \
TOOLKIT_VERSION="${NVIDIA_CONTAINER_TOOLKIT_VERSION}" \
TOOLKIT_TAG="${NVIDIA_CONTAINER_TOOLKIT_TAG}" \
${TARGET}
fi
if [[ -n ${package_target} ]]; then
mkdir -p ${DIST_DIR}/${package_target}/
cp -p ${package_pattern} ${DIST_DIR}/${package_target}/
fi
# Build nvidia-docker2 if required
package_name="nvidia-docker2"
package_version=${NVIDIA_DOCKER_VERSION}${NVIDIA_CONTAINER_TOOLKIT_TAG:+~${NVIDIA_CONTAINER_TOOLKIT_TAG}}-1
package_pattern=${DIST_DIR}/${package_format}/all/${package_name}?${package_version}?*.${package_format}
package=$(ls ${package_pattern}) || echo ""
if [[ -z ${package} ]]; then
echo "${package_name} does not exist"
make -C ${NVIDIA_DOCKER_ROOT} \
LIB_VERSION="${NVIDIA_DOCKER_VERSION}" \
LIB_TAG="${NVIDIA_CONTAINER_TOOLKIT_TAG}" \
TOOLKIT_VERSION="${NVIDIA_CONTAINER_TOOLKIT_VERSION}" \
TOOLKIT_TAG="${NVIDIA_CONTAINER_TOOLKIT_TAG}" \
${TARGET}
fi
if [[ -n ${package_target} ]]; then
mkdir -p ${DIST_DIR}/${package_target}/
cp -p ${package_pattern} ${DIST_DIR}/${package_target}/
fi
else
echo "Skipping nvidia-container-runtime and nvidia-docker builds."
fi

Some files were not shown because too many files have changed in this diff Show More