Compare commits

...

1381 Commits

Author SHA1 Message Date
Evan Lezar
be25223e7a [no-relnote] Refactor driver library discovery
This change aligns the driver file discovery with device discovery
and allows other sources such as nvsandboxutils to be added.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-20 23:14:20 +02:00
Evan Lezar
de230a7e60 [no-relnote] Refactor CDI version extraction
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-20 23:14:13 +02:00
Evan Lezar
6746a412af [no-relnote] Move nvcdi wrapper to separate file
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-20 23:12:29 +02:00
Evan Lezar
5d9b27f107 Merge pull request #925 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.24.0
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump golang from 1.23.6 to 1.24.0 in /deployments/devel
2025-02-17 17:27:25 +01:00
Evan Lezar
767aee913d Merge pull request #922 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/matryer/moq-0.5.3
Bump github.com/matryer/moq from 0.5.2 to 0.5.3 in /deployments/devel
2025-02-17 11:27:53 +01:00
dependabot[bot]
25bac4c62a Bump golang from 1.23.6 to 1.24.0 in /deployments/devel
Bumps golang from 1.23.6 to 1.24.0.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-15 08:33:48 +00:00
dependabot[bot]
7e371aa024 Bump github.com/matryer/moq from 0.5.2 to 0.5.3 in /deployments/devel
Bumps [github.com/matryer/moq](https://github.com/matryer/moq) from 0.5.2 to 0.5.3.
- [Release notes](https://github.com/matryer/moq/releases)
- [Changelog](https://github.com/matryer/moq/blob/main/.goreleaser.yml)
- [Commits](https://github.com/matryer/moq/compare/v0.5.2...v0.5.3)

---
updated-dependencies:
- dependency-name: github.com/matryer/moq
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-15 09:29:32 +01:00
Evan Lezar
bc2b937574 Merge pull request #928 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.64.5
Some checks failed
CI Pipeline / code-scanning (push) Has been cancelled
CI Pipeline / variables (push) Has been cancelled
CI Pipeline / golang (push) Has been cancelled
CI Pipeline / image (push) Has been cancelled
CI Pipeline / e2e-test (push) Has been cancelled
Bump github.com/golangci/golangci-lint from 1.64.2 to 1.64.5 in /deployments/devel
2025-02-14 16:18:07 +01:00
dependabot[bot]
ae8561b509 Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.64.2 to 1.64.5.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.64.2...v1.64.5)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-14 08:25:58 +00:00
Evan Lezar
64633903c7 Merge pull request #926 from ArangoGutierrez/slack_botv2
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
[no-relnote] Update Slack notification action step
2025-02-13 17:33:05 +01:00
Carlos Eduardo Arango Gutierrez
1273f879cc [no-relnote] Update Slack notification action step
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-13 14:16:44 +01:00
Evan Lezar
8713a173fe Merge pull request #921 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvlib-0.7.1
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Bump github.com/NVIDIA/go-nvlib from 0.7.0 to 0.7.1
2025-02-13 14:08:55 +01:00
Evan Lezar
e6d99934f8 Merge pull request #924 from NVIDIA/dependabot/github_actions/main/NVIDIA/holodeck-0.2.6
Bump NVIDIA/holodeck from 0.2.5 to 0.2.6
2025-02-13 14:06:59 +01:00
dependabot[bot]
0cf2d60406 Bump NVIDIA/holodeck from 0.2.5 to 0.2.6
Bumps [NVIDIA/holodeck](https://github.com/nvidia/holodeck) from 0.2.5 to 0.2.6.
- [Release notes](https://github.com/nvidia/holodeck/releases)
- [Commits](https://github.com/nvidia/holodeck/compare/v0.2.5...v0.2.6)

---
updated-dependencies:
- dependency-name: NVIDIA/holodeck
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-13 08:39:07 +00:00
dependabot[bot]
0e759d4ad8 Bump github.com/NVIDIA/go-nvlib from 0.7.0 to 0.7.1
Bumps [github.com/NVIDIA/go-nvlib](https://github.com/NVIDIA/go-nvlib) from 0.7.0 to 0.7.1.
- [Release notes](https://github.com/NVIDIA/go-nvlib/releases)
- [Commits](https://github.com/NVIDIA/go-nvlib/compare/v0.7.0...v0.7.1)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvlib
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-13 08:33:08 +00:00
Evan Lezar
9e85fb54fc Merge pull request #903 from NVIDIA/dependabot/github_actions/main/slackapi/slack-github-action-2.0.0
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Bump slackapi/slack-github-action from 1.27.0 to 2.0.0
2025-02-12 17:21:01 +01:00
dependabot[bot]
ce79d01cf5 Bump slackapi/slack-github-action from 1.27.0 to 2.0.0
Bumps [slackapi/slack-github-action](https://github.com/slackapi/slack-github-action) from 1.27.0 to 2.0.0.
- [Release notes](https://github.com/slackapi/slack-github-action/releases)
- [Commits](https://github.com/slackapi/slack-github-action/compare/v1.27.0...v2.0.0)

---
updated-dependencies:
- dependency-name: slackapi/slack-github-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-12 15:14:20 +00:00
Evan Lezar
5d27489d18 Merge pull request #890 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.30.0
Some checks are pending
CI Pipeline / code-scanning (push) Waiting to run
CI Pipeline / variables (push) Waiting to run
CI Pipeline / golang (push) Waiting to run
CI Pipeline / image (push) Blocked by required conditions
CI Pipeline / e2e-test (push) Blocked by required conditions
Bump golang.org/x/sys from 0.29.0 to 0.30.0
2025-02-12 13:12:50 +01:00
Evan Lezar
d80e8b9733 Merge pull request #904 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.23.6
Bump golang from 1.23.4 to 1.23.6 in /deployments/devel
2025-02-12 13:12:25 +01:00
Evan Lezar
a72b9cd12c Merge pull request #919 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/matryer/moq-0.5.2
Bump github.com/matryer/moq from 0.5.1 to 0.5.2 in /deployments/devel
2025-02-12 13:12:03 +01:00
dependabot[bot]
c2f4ac0959 Bump github.com/matryer/moq from 0.5.1 to 0.5.2 in /deployments/devel
Bumps [github.com/matryer/moq](https://github.com/matryer/moq) from 0.5.1 to 0.5.2.
- [Release notes](https://github.com/matryer/moq/releases)
- [Changelog](https://github.com/matryer/moq/blob/main/.goreleaser.yml)
- [Commits](https://github.com/matryer/moq/compare/v0.5.1...v0.5.2)

---
updated-dependencies:
- dependency-name: github.com/matryer/moq
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-12 11:47:17 +00:00
Carlos Eduardo Arango Gutierrez
5db5ddb31b Merge pull request #915 from ArangoGutierrez/e2e_tests02
[no-relnote] Configure all GitHub actions as reusable workflow
2025-02-12 12:10:53 +01:00
dependabot[bot]
ff1897f2e4 Bump golang.org/x/sys from 0.29.0 to 0.30.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.29.0 to 0.30.0.
- [Commits](https://github.com/golang/sys/compare/v0.29.0...v0.30.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-12 10:24:31 +00:00
Evan Lezar
88f0157cae Merge pull request #920 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.64.2
Bump github.com/golangci/golangci-lint from 1.63.4 to 1.64.2 in /deployments/devel
2025-02-12 11:23:25 +01:00
Carlos Eduardo Arango Gutierrez
27f1738c3d [no-relnote] COnfigure all GitHub actions as reusable workflow
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-12 11:05:57 +01:00
Evan Lezar
f52df9522e [no-relnote] Remove toolchain from deployments/devel go.mod
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-12 10:57:20 +01:00
Evan Lezar
edf79d6e8b [no-relnote] Update vendor make target to handle submodules
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-12 10:56:56 +01:00
dependabot[bot]
6aadd6ed21 Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.63.4 to 1.64.2.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.63.4...v1.64.2)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-12 08:50:19 +00:00
Evan Lezar
a1d60910b6 Merge pull request #912 from NVIDIA/dependabot/go_modules/tests/main/golang.org/x/crypto-0.33.0
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Has been cancelled
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Has been cancelled
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Has been cancelled
image / End-to-end Tests (push) Has been cancelled
Bump golang.org/x/crypto from 0.31.0 to 0.33.0 in /tests
2025-02-10 19:28:11 +01:00
Evan Lezar
03f5a09aec Merge pull request #891 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.23.0
Bump golang.org/x/mod from 0.22.0 to 0.23.0
2025-02-10 19:27:33 +01:00
Evan Lezar
1f2232fc5f Merge pull request #894 from elezar/remove-nvidia-container-runtime-hook-in-cdi-mode
Allow cdi mode to work with --gpus flag
2025-02-10 19:19:33 +01:00
Evan Lezar
b19f5d8f7d Merge pull request #914 from elezar/bump-version-v1.17.4-main
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / End-to-end Tests (push) Blocked by required conditions
Bump version v1.17.4 main
2025-02-10 13:44:54 +01:00
Carlos Eduardo Arango Gutierrez
fa1379c6d5 Merge pull request #913 from ArangoGutierrez/e2e_tests01
[no-relnote] Run e2e tests as reusable workflow
2025-02-10 13:44:14 +01:00
Carlos Eduardo Arango Gutierrez
4ded119c31 [no-relnote] Run e2e tests as reusable workflow
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-10 13:28:36 +01:00
Evan Lezar
e0813e0c4d Bump version for v1.17.4 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-10 13:24:41 +01:00
Evan Lezar
4add0a6bf2 Bump version for v1.17.4 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-10 13:24:27 +01:00
dependabot[bot]
fbcf8ef12e Bump golang.org/x/crypto from 0.31.0 to 0.33.0 in /tests
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.31.0 to 0.33.0.
- [Commits](https://github.com/golang/crypto/compare/v0.31.0...v0.33.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-10 08:28:19 +00:00
Evan Lezar
92472bd0ed Merge pull request #909 from ArangoGutierrez/reg_test08
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Has been cancelled
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Has been cancelled
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Has been cancelled
[no-relnote] fix host and port string generation for remote E2E
2025-02-07 12:53:58 +01:00
Carlos Eduardo Arango Gutierrez
ac236a80f8 [no-relnote] fix host and port string generation for remote E2E
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-07 12:00:17 +01:00
Carlos Eduardo Arango Gutierrez
b69d98d7cf Merge pull request #905 from ArangoGutierrez/reg_test07
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
[no-relnote] fix step holodeck_public_dns_name variable name E2E
2025-02-06 14:01:28 +01:00
Evan Lezar
adbe127afd [no-relnote] Remove on push trigger
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-06 12:02:49 +01:00
Carlos Eduardo Arango Gutierrez
82ccae23b1 [no-relnote] fix step holodeck_public_dns_name variable name E2E
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-06 11:03:29 +01:00
dependabot[bot]
44cf1f4e3b Bump golang from 1.23.4 to 1.23.6 in /deployments/devel
Bumps golang from 1.23.4 to 1.23.6.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-06 09:07:30 +00:00
Carlos Eduardo Arango Gutierrez
1d0777ee01 Merge pull request #902 from ArangoGutierrez/reg_test06
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
[no-relnote] set E2E_INSTALL_CTK to true during GH Actions E2E
2025-02-06 10:02:48 +01:00
Carlos Eduardo Arango Gutierrez
9f82e910f6 [no-relnote] set E2E_INSTALL_CTK to true during GH Actions E2E
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-06 09:52:06 +01:00
Evan Lezar
cb172ba336 Merge pull request #901 from ArangoGutierrez/reg_test05
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
[no-relnote] Fix flag name on E2E
2025-02-05 22:02:21 +01:00
Carlos Eduardo Arango Gutierrez
bbdb13c47d [no-relnote] Fix flag name on E2E
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-05 20:29:27 +01:00
Evan Lezar
03152dba8d Allow cdi mode to work with --gpus flag
This changes ensures that the cdi modifier also removes the NVIDIA
Container Runtime Hook from the incoming spec. This aligns with what is
done for CSV modifications and prevents an error when starting the
container.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 19:01:43 +01:00
Evan Lezar
cf026dce9a [no-relnote] Remove duplicate test case
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 19:01:43 +01:00
Evan Lezar
073fb138d7 Merge pull request #900 from elezar/fix-make-target
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-aarch64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos7-x86_64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, centos8-ppc64le) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubi8, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}) (push) Blocked by required conditions
[no-relnote] Fix e2e test make target
2025-02-05 19:01:20 +01:00
Evan Lezar
948bc113f0 [no-relnote] Fix e2e test make target
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 19:00:40 +01:00
Evan Lezar
b457247a0c Merge pull request #895 from ArangoGutierrez/reg_test03
Add E2E GitHub Action for Container Toolkit
2025-02-05 17:51:18 +01:00
Evan Lezar
bae4b3ebd3 [no-relnote] Wait on image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 17:28:29 +01:00
Evan Lezar
3a06066f6b [no-relnote] Simplify multi-arch checks
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 16:46:29 +01:00
Evan Lezar
c0292f5048 [no-relnote] Use nvcr.io/nvidia/cuda
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 16:26:09 +01:00
Carlos Eduardo Arango Gutierrez
cd8937bc5b Add E2E GitHub Action for Container Toolkit
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-05 16:22:14 +01:00
Evan Lezar
cec3445318 Merge pull request #897 from elezar/fix-copy-prs
Fix detection of PRs
2025-02-05 16:21:44 +01:00
Evan Lezar
f830653738 [no-relnote] Remove unused LABEL_IMAGE_SOURCE
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 16:13:16 +01:00
Evan Lezar
6e9ff446c8 [no-relnote] Use github.ref_name to detect PRs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 16:13:16 +01:00
Evan Lezar
9a07de0ee8 Merge pull request #896 from elezar/onboard-pr-copy-bot
[no-relnote] Enable pr-copy-bot
2025-02-05 15:55:49 +01:00
Evan Lezar
517873e97d [no-relnote] Enable pr-copy-bot
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-02-05 15:55:12 +01:00
dependabot[bot]
49b45693ee Bump golang.org/x/mod from 0.22.0 to 0.23.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.22.0 to 0.23.0.
- [Commits](https://github.com/golang/mod/compare/v0.22.0...v0.23.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-05 08:21:37 +00:00
Carlos Eduardo Arango Gutierrez
78d6cdc7f7 Merge pull request #876 from ArangoGutierrez/reg_test02
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Add remote-test option for E2E
2025-02-04 18:33:34 +01:00
Carlos Eduardo Arango Gutierrez
61640591ba Add remote-test option for E2E
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-02-04 15:05:46 +01:00
Evan Lezar
df4c87b877 Merge pull request #838 from cdesiniotis/enable-cdi-toolkit-container
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Enable CDI in the container runtime if enabled in the toolkit
2025-02-03 16:37:49 +01:00
Tariq
d6c312956b Merge pull request #887 from NVIDIA/latest-qemu
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
[no-relnote] switch to the newer qemu artifacts image
2025-01-31 10:33:38 -05:00
Tariq Ibrahim
51f765dd71 [no-relnote] switch to the newer qemu artifacts image
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2025-01-31 09:28:30 -05:00
Evan Lezar
d8cd5438e4 [no-relnote] Add basic cdi-enabled tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-29 15:37:32 +01:00
Evan Lezar
5ed25bb375 [no-relnote] Add unit test for installer command
This change adds a basic unit test for the nvidia-ckt-installer command.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-29 15:37:32 +01:00
Evan Lezar
a7786d4d41 Enable CDI in runtime if CDI_ENABLED is set
This change also enables CDI in the configured runtime when the toolkit
is installed with CDI enabled.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-29 15:37:32 +01:00
Evan Lezar
be1ac24f2a Merge pull request #882 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.8.0-base-ubuntu20.04
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump nvidia/cuda from 12.6.3-base-ubuntu20.04 to 12.8.0-base-ubuntu20.04 in /deployments/container
2025-01-29 14:11:47 +01:00
dependabot[bot]
dd86f598b5 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.6.3-base-ubuntu20.04 to 12.8.0-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-28 08:57:31 +00:00
Evan Lezar
2b417c1a9a Fix overwriting docker feature flags
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-27 15:52:08 +01:00
Christopher Desiniotis
e89be14c86 Add option in toolkit container to enable CDI in runtime
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2025-01-27 15:52:08 +01:00
Evan Lezar
f625242ed6 Remove Set from engine config API
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-27 15:52:08 +01:00
Christopher Desiniotis
df73db7e1e Add EnableCDI() method to engine.Interface
This change adds an EnableCDI method to the container engine config files and
Updates the 'nvidia-ctk runtime configure' command to use this new method.

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2025-01-27 15:51:23 +01:00
Evan Lezar
89f33bdf71 Merge pull request #881 from elezar/add-imex-binaries
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Add IMEX binaries to CDI discovery
2025-01-27 13:27:58 +01:00
Evan Lezar
6834d5e0a4 Merge pull request #879 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-95d3e86
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump third_party/libnvidia-container from `f23e5e5` to `95d3e86`
2025-01-24 18:30:23 +01:00
Evan Lezar
c91a1b1dc8 Add IMEX binaries to CDI discovery
This change adds the nvidia-imex and nivdia-imex-ctl binaries to
the list of driver binaries that are searched when using CDI.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-24 14:46:31 +01:00
dependabot[bot]
96d91df78e Bump third_party/libnvidia-container from f23e5e5 to 95d3e86
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `f23e5e5` to `95d3e86`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](f23e5e55ea...95d3e86522)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-24 09:05:17 +00:00
Carlos Eduardo Arango Gutierrez
a990860bfa Merge pull request #873 from ArangoGutierrez/tests_rename
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Rename test folder to tests
2025-01-23 11:54:59 +01:00
Carlos Eduardo Arango Gutierrez
bf9d618ff2 Rename test folder to tests
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-23 11:46:14 +01:00
Evan Lezar
7ae5c2901f Merge commit from fork
Disable mounting of compat libs from container by default
2025-01-23 10:56:32 +01:00
Evan Lezar
6b236746ce Bump libnvidia-container to f23e5e55
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-23 10:51:32 +01:00
Evan Lezar
ed3b52eb8d Add allow-cuda-compat-libs-from-container feature flag
This change adds an allow-cuda-compat-libs-from-container feature flag
to the NVIDIA Container Toolkit config. This allows a user to opt-in
to the previous default behaviour of overriding certain driver
libraries with CUDA compat libraries from the container.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-22 17:34:20 +01:00
Evan Lezar
1176430278 Disable mounting of compat libs from container
This change passes the --no-cntlibs argument to the nvidia-container-cli
from the nvidia-container-runtime-hook to disable overwriting host
drivers with the compat libs from a container being started.

Note that this may be a breaking change for some applications.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-22 17:34:20 +01:00
Evan Lezar
c22f3bd56c Merge pull request #765 from elezar/use-logger-in-toolkit-install
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Simplify standalone installer
2025-01-22 14:42:50 +01:00
Evan Lezar
6375e832ff Merge pull request #874 from elezar/skip-graphics-for-csv
Skip graphics modifier in CSV mode
2025-01-22 14:37:07 +01:00
Evan Lezar
991b9c222f Skip graphics modifier in CSV mode
In CSV mode the CSV files at /etc/nvidia-container-runtime/host-files-for-container.d/
should be the source of truth for container modifications. This change skips graphics
modifications to a container. This prevents conflicts when handling files such as
vulkan icd files which are already defined in the CSV file.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-22 13:58:31 +01:00
Evan Lezar
fdad3927b4 [no-relnote] Refactor oci spec modifier list
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-22 13:58:31 +01:00
Evan Lezar
6bd292eff8 [no-relnote] Move tools/container to cmd/nvidia-ctk-installer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:28:46 +01:00
Evan Lezar
9429ce039b [no-relnote] Rename run.go to main.go
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:28:46 +01:00
Evan Lezar
d953bbb977 Move nvidia-toolkit to nvidia-ctk-installer
This change moves the containerized installer from nvidia-toolkit to
cmd/nvidia-ctk-installer to allow for its use in CI.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:28:44 +01:00
Evan Lezar
5cbf3f82d9 [no-relnote] Move fileinstaller to separate file
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:27:21 +01:00
Evan Lezar
69375d7889 [no-relnote] Use logger in toolkit installation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:27:21 +01:00
Evan Lezar
9753096398 [no-relnote] Add app struct for nvidia-toolkit
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:27:21 +01:00
Evan Lezar
76edd1d7ca [no-relnote] Remove unused TryDelete function
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:27:21 +01:00
Evan Lezar
a6476193c8 [no-relnote] Merge verifyFlags and validateFlags
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-21 14:27:21 +01:00
Evan Lezar
d75b1adeda Merge pull request #868 from elezar/docker-file-fixes
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
[no-relnote] Fix Dockerfile lint issues
2025-01-20 14:21:22 +01:00
Evan Lezar
9aa2f67a22 Merge pull request #866 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.5
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump github.com/urfave/cli/v2 from 2.27.4 to 2.27.5
2025-01-16 14:19:17 +01:00
Evan Lezar
b89e7ca849 Merge pull request #865 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvlib-0.7.0
Bump github.com/NVIDIA/go-nvlib from 0.6.1 to 0.7.0
2025-01-16 14:18:55 +01:00
Evan Lezar
7c5ed8157b [no-relnote] Fix Dockerfile lint issues
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-16 14:17:39 +01:00
dependabot[bot]
a401e4a5a6 Bump github.com/urfave/cli/v2 from 2.27.4 to 2.27.5
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.27.4 to 2.27.5.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.27.4...v2.27.5)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-16 08:25:45 +00:00
dependabot[bot]
2bc4201205 Bump github.com/NVIDIA/go-nvlib from 0.6.1 to 0.7.0
Bumps [github.com/NVIDIA/go-nvlib](https://github.com/NVIDIA/go-nvlib) from 0.6.1 to 0.7.0.
- [Release notes](https://github.com/NVIDIA/go-nvlib/releases)
- [Commits](https://github.com/NVIDIA/go-nvlib/compare/v0.6.1...v0.7.0)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvlib
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-16 08:25:38 +00:00
Carlos Eduardo Arango Gutierrez
cb52e779ba Merge pull request #860 from ArangoGutierrez/reg_test01
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Add basic integration tests for docker runtime
2025-01-15 15:59:47 +01:00
Carlos Eduardo Arango Gutierrez
8cc672bed9 Automated regression testing for the NVIDIA Container Toolkit
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2025-01-15 14:39:44 +01:00
Evan Lezar
be001d938c Merge pull request #863 from elezar/sort-feature-flags
[no-relnote] Sort feature flags
2025-01-15 13:33:34 +01:00
Evan Lezar
d584f9a40b [no-relnote] Sort feature flags
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-15 13:27:14 +01:00
Evan Lezar
11acbe7ca2 Merge pull request #805 from alam0rt/add-v3-containerd-config
Add support for containerd v3 configs
2025-01-15 10:46:30 +01:00
Sam Lockart
95bda3ad72 Add support for containerd version 3 config
This change adds support for containerd configs with version=3.
From the perspective of the runtime configuration the contents of the
config are the same. This means that we just have to load the new
version and ensure that this is propagated to the generated config.

Note that v3 config also requires a switch to the 'io.containerd.cri.v1.runtime'
CRI runtime plugin. See:
https://github.com/containerd/containerd/blob/v2.0.0/docs/PLUGINS.md
https://github.com/containerd/containerd/issues/10132

Note that we still use a default config of version=2 since we need to
ensure compatibility with older containerd versions (1.6.x and 1.7.x).

Signed-off-by: Sam Lockart <sam.lockart@zendesk.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2025-01-15 10:33:43 +01:00
Evan Lezar
6e4b0632c5 Merge pull request #861 from elezar/remove-fsnotify
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Remove watch option from create-dev-char-symlinks
2025-01-14 20:26:20 +01:00
Evan Lezar
e4be26ff55 Merge pull request #853 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.22.0
Bump golang.org/x/mod from 0.20.0 to 0.22.0
2025-01-14 18:34:59 +01:00
Evan Lezar
8e6b57b38a Remove watch option from create-dev-char-symlinks
This removes the untested watch option from the
nvidia-ctk system create-dev-char-symlinks command.

This also removes the direct dependency on fsnotify.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-14 18:34:14 +01:00
Evan Lezar
c935779693 Merge pull request #855 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvml-0.12.4-1
Bump github.com/NVIDIA/go-nvml from 0.12.4-0 to 0.12.4-1
2025-01-14 18:28:09 +01:00
Evan Lezar
4b43a29957 Merge pull request #844 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.63.4
Bump github.com/golangci/golangci-lint from 1.62.2 to 1.63.4 in /deployments/devel
2025-01-14 18:27:36 +01:00
Evan Lezar
b3b0d107de Merge pull request #851 from NVIDIA/dependabot/go_modules/main/github.com/stretchr/testify-1.10.0
Bump github.com/stretchr/testify from 1.9.0 to 1.10.0
2025-01-14 18:27:00 +01:00
Evan Lezar
d22fca51b8 Bump go version in go.mod to go 1.22.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-14 18:25:47 +01:00
dependabot[bot]
adad9b9c92 Bump golang.org/x/mod from 0.20.0 to 0.22.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.20.0 to 0.22.0.
- [Commits](https://github.com/golang/mod/compare/v0.20.0...v0.22.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-14 18:18:58 +01:00
Evan Lezar
b6d360fbbe Merge pull request #847 from mihalicyn/fix_custom_config_path_handling
Properly pass configSearchPaths to a Driver constructor
2025-01-14 16:03:40 +01:00
Evan Lezar
5699ec1cff Merge pull request #807 from elezar/add-cdi-imex-channels
Add imex mode to CDI spec generation
2025-01-14 16:03:17 +01:00
Evan Lezar
ec182705f1 Add string TOML source
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-14 10:35:29 +01:00
Evan Lezar
7598fe14bc Improve the implementation for UseLegacyConfig
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2025-01-14 10:35:29 +01:00
Evan Lezar
4fc181a418 Merge pull request #821 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.23.4
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump golang from 1.23.2 to 1.23.4 in /deployments/devel
2025-01-10 13:42:31 +01:00
dependabot[bot]
4bf5894833 Bump github.com/NVIDIA/go-nvml from 0.12.4-0 to 0.12.4-1
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.4-0 to 0.12.4-1.
- [Release notes](https://github.com/NVIDIA/go-nvml/releases)
- [Commits](https://github.com/NVIDIA/go-nvml/compare/v0.12.4-0...v0.12.4-1)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-10 09:01:45 +00:00
dependabot[bot]
272f891204 Bump github.com/stretchr/testify from 1.9.0 to 1.10.0
Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.9.0 to 1.10.0.
- [Release notes](https://github.com/stretchr/testify/releases)
- [Commits](https://github.com/stretchr/testify/compare/v1.9.0...v1.10.0)

---
updated-dependencies:
- dependency-name: github.com/stretchr/testify
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-09 14:16:21 +00:00
Evan Lezar
4b352e6bac Merge pull request #845 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.29.0
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Bump golang.org/x/sys from 0.26.0 to 0.29.0
2025-01-09 09:28:17 +01:00
Evan Lezar
178369eb8e Merge pull request #824 from elezar/post-release
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Perform some cleanups following the v1.17.3 release
2025-01-08 11:24:09 +01:00
Alexander Mikhalitsyn
3c012499db Properly pass configSearchPaths to a Driver constructor
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
2025-01-07 19:11:01 +01:00
dependabot[bot]
6e883bf10a Bump golang.org/x/sys from 0.26.0 to 0.29.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.26.0 to 0.29.0.
- [Commits](https://github.com/golang/sys/compare/v0.26.0...v0.29.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-05 08:44:56 +00:00
dependabot[bot]
cc61f186a8 Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.62.2 to 1.63.4.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.62.2...v1.63.4)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-05 08:31:49 +00:00
Evan Lezar
aac0a62528 Merge pull request #826 from elezar/fix-test
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Fix create-device-node test when devices exist
2024-12-06 14:07:59 +01:00
Evan Lezar
2529aebd6c Fix create-device-node test when devices exist
This changes fixes the TestCreateControlDevices test on systems
where device nodes exist.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-06 14:05:51 +01:00
Evan Lezar
784917a0d9 [no-relnote] Fix archive-packages script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-04 12:50:35 +01:00
Evan Lezar
1330467652 [no-relnote] Update dependabot
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-04 12:50:35 +01:00
Evan Lezar
22035d4561 [no-relnote] Update changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-04 12:50:35 +01:00
Evan Lezar
9318aaf275 Bump version for v1.17.3 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-04 12:37:16 +01:00
Evan Lezar
70261d84f1 Bump version for v1.17.2 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-04 12:36:39 +01:00
Evan Lezar
f8f506fd85 Bump version for v1.17.1 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-12-04 12:36:33 +01:00
Evan Lezar
b0a5fb9f86 Merge pull request #822 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-16f37fc
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump third_party/libnvidia-container from `63d366e` to `16f37fc`
2024-12-04 10:40:54 +01:00
dependabot[bot]
83790dbccd Bump third_party/libnvidia-container from 63d366e to 16f37fc
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `63d366e` to `16f37fc`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](63d366ee3b...16f37fcafc)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-04 09:39:44 +00:00
dependabot[bot]
7ceae92ac1 Bump golang from 1.23.2 to 1.23.4 in /deployments/devel
Bumps golang from 1.23.2 to 1.23.4.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-04 08:26:02 +00:00
Evan Lezar
8dfa61338b Merge pull request #818 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.62.2
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Bump github.com/golangci/golangci-lint from 1.61.0 to 1.62.2 in /deployments/devel
2024-12-03 20:19:12 +01:00
Evan Lezar
484cae273c Merge pull request #820 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.6.3-base-ubuntu20.04
Bump nvidia/cuda from 12.6.2-base-ubuntu20.04 to 12.6.3-base-ubuntu20.04 in /deployments/container
2024-12-03 20:18:26 +01:00
dependabot[bot]
6e67bb247b Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.6.2-base-ubuntu20.04 to 12.6.3-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-03 09:03:05 +00:00
dependabot[bot]
8bd7dd7060 Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.61.0 to 1.62.2.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.61.0...v1.62.2)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-29 16:49:15 +00:00
Evan Lezar
e25c052aa6 Merge pull request #810 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/matryer/moq-0.5.1
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump github.com/matryer/moq from 0.5.0 to 0.5.1 in /deployments/devel
2024-11-29 17:47:36 +01:00
Evan Lezar
21ba9932d0 Merge pull request #778 from zhanluxianshen/fix-fsnotify-logic
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Fix fsnotify.Remove logic function.
2024-11-29 14:17:03 +01:00
Evan Lezar
dcd65a6410 Merge pull request #690 from elezar/ignore-ldconfig-option
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Only allow host-relative LDConfig paths
2024-11-26 13:35:33 +01:00
Evan Lezar
8603d605aa Add imex mode to CDI spec generation
This change adds a imex mode to CDI spec generation. This mode detected
generates CDI specifications for existing IMEX channels. By default these
devices have the fully qualified CDI device names:

nvidia.com/imex-channel=<ID>

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-25 13:46:43 +01:00
Evan Lezar
e1efa28fe6 [no-relnote] Refactor CDI mode handling
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-25 13:46:34 +01:00
Evan Lezar
75a934e11c [no-relnote] Fix error string
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-25 13:46:34 +01:00
dependabot[bot]
43aca46f10 Bump github.com/matryer/moq from 0.5.0 to 0.5.1 in /deployments/devel
Bumps [github.com/matryer/moq](https://github.com/matryer/moq) from 0.5.0 to 0.5.1.
- [Release notes](https://github.com/matryer/moq/releases)
- [Changelog](https://github.com/matryer/moq/blob/main/.goreleaser.yml)
- [Commits](https://github.com/matryer/moq/compare/v0.5.0...v0.5.1)

---
updated-dependencies:
- dependency-name: github.com/matryer/moq
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-24 08:26:22 +00:00
Evan Lezar
00f1d5a627 Only allow host-relative LDConfig paths
This change only allows host-relative LDConfig paths.

An allow-ldconfig-from-container feature flag is added to allow for this
the default behaviour to be changed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-22 14:25:17 +01:00
Evan Lezar
c0764366d9 [no-relnote] Refactor config handling for hook
This change removes indirect calls to get the default config
from the nvidia-container-runtime-hook.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-22 13:19:05 +01:00
Evan Lezar
1afada7de5 [no-relnote] Remove unused code
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-22 13:11:04 +01:00
Evan Lezar
5c3ffc2fba Merge pull request #799 from elezar/fix-legacy-nvidia-imex-channels
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Failing after 51m27s
Golang / check (push) Failing after 6m27s
Golang / Unit test (push) Failing after 2m23s
Golang / Build (push) Failing after 2m24s
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Failing after 6m40s
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Failing after 3m9s
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Failing after 2m38s
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Failing after 2m51s
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Failing after 2m32s
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Failing after 2m56s
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been skipped
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been skipped
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been skipped
Fix NVIDIA_IMEX_CHANNELS handling on legacy images
2024-11-15 10:55:45 -07:00
Evan Lezar
e188090a92 Fix NVIDIA_IMEX_CHANNELS handling on legacy images
For legacy images (images with a CUDA_VERSION set but no CUDA_REQUIRES set), the
default behaviour for device envvars is to treat non-existence as all.

This change ensures that the NVIDIA_IMEX_CHANNELS envvar is not treated in the same
way, instead returning no devices if the envvar is not set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-14 13:26:30 -07:00
Evan Lezar
1995925a7d Merge pull request #783 from elezar/fix-config-file-path
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Fix bug in default config file path
2024-11-08 15:57:42 -08:00
Evan Lezar
46fc47c67a Fix bug in default config file path
This fix ensures that the default config file path for the nvidia-ctk runtime configure
command is set consistently.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-08 15:38:28 -08:00
Evan Lezar
b142091234 Merge pull request #777 from elezar/add-config-fallback
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Fallback to file for runtime config
2024-11-08 09:09:18 -08:00
Evan Lezar
e9b8ad95df Fallback to file for runtime config
This change ensures that we fall back to the previous behaviour of
reading the existing config from the specified config file if extracting
the current config from the command line fails. This fixes use cases where
the containerd / crio executables are not available.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-08 08:43:09 -08:00
zhanluxianshen
f77a808105 Fix fsnotify.Remove logic function.
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2024-11-08 22:07:21 +08:00
Evan Lezar
832818084d Merge pull request #773 from elezar/fix-libcuda-symlink
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Force creation of symlinks in create-symlink hook
2024-11-07 17:41:20 +01:00
Evan Lezar
ab0a4eaa19 Merge pull request #768 from elezar/add-toolkit-install-unit-tests
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
[no-relnote] Add toolkit install unit test
2024-11-06 00:41:55 +01:00
Evan Lezar
0c687be794 [no-relnote] Also validate CDI management spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-05 14:23:36 -08:00
Evan Lezar
8d869acce5 [no-relnote] Add toolkit install unit test
This change adds basic toolkit installation unit tests. This required
that the source for files be specified when installing to allow for
a testdata folder to be used.

This replaces the currently unused shell-based tests in /test/container.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-05 14:23:35 -08:00
Evan Lezar
324096c979 Force symlink creation in create-symlink hook
This change updates the create-symlink hook to be equivalent to
ln -f -s target link

This ensures that links are updated even if they exist in the container
being run.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-11-05 09:39:11 -08:00
Evan Lezar
5bc0315448 Merge pull request #766 from elezar/bump-release-v1.17.0
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump version for v1.17.0 release
2024-10-31 10:17:13 +01:00
Evan Lezar
3fb1615d26 [no-relnote] Address lint errors in test
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-31 10:11:58 +01:00
Evan Lezar
9e4696bf7d Bump version for v1.17.0 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-31 07:20:10 +01:00
Evan Lezar
8c9d3d8f65 Merge commit from fork
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Check for valid paths in create-symlinks hook
2024-10-31 07:16:01 +01:00
Evan Lezar
efb18a72ad Merge pull request #762 from elezar/fix-auto-cdi-runtime-mode
Some checks are pending
CodeQL / Analyze Go code with CodeQL (push) Waiting to run
Golang / check (push) Waiting to run
Golang / Unit test (push) Waiting to run
Golang / Build (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Waiting to run
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Waiting to run
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Blocked by required conditions
Fix bug when using just-in-time CDI spec generation
2024-10-30 13:08:26 +01:00
Evan Lezar
75376d3df2 Fix bug when using just-in-time CDI spec generation
This change fixes a bug when using just-in-time CDI spec generation for the
NVIDIA Container Runtime for specific devices (i.e. not 'all').
Instead of unconditionally using the default nvsandboxutils library -- leading
to errors due to undefined symbols -- we check whether the library can be
properly initialised before continuing.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-30 12:20:36 +01:00
Christopher Desiniotis
7e0cd45b1c Check for valid paths in create-symlinks hook
This change updates the create-symlinks hook to always evaluate
link paths in the container's root filesystem. In addition the
executable is updated to return an error if a link could not
be created.

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-10-29 12:16:51 -07:00
Christopher Desiniotis
a04e3ac4f7 Write failing test case for create-symlinks hook
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-10-29 12:16:51 -07:00
Christopher Desiniotis
92779e71b3 Handle case where symlink already exists in create-symlinks hook
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-10-29 12:16:51 -07:00
Christopher Desiniotis
23f1ba3e93 Add unit tests for create-symlinks hook
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:51 -07:00
Evan Lezar
d0d85a8c5c Always use paths relative to the container root for links
This chagne ensures that we always treat the link path as a path
relative to the container root. Without this change, relative paths
in link paths would result links being created relative to the
current working directory where the hook is executed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:51 -07:00
Evan Lezar
bfea673d6a [no-relnote] Remove unused hostRoot argument
The hostRoot argument is always empty and not applicable to
how links are specified.

Links are specified by the paths in the container filesystem and as such
the only transform required to change the root is a join of the filepath.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:50 -07:00
Evan Lezar
6a6a3e6055 [no-relnote] Remove redundant changeRoot for link target
Since hostRoot is always the empty string and we are changing the root in the
target path to /, the call to changeRoot is redundant.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:50 -07:00
Evan Lezar
fa59d12973 [no-relnote] Check created outside of create loop
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-29 12:16:49 -07:00
Evan Lezar
d78868cd31 Merge pull request #760 from elezar/bump-release-v1.17.0-rc.2
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Bump version for v1.17.0-rc.2 release
2024-10-28 14:26:53 +01:00
Evan Lezar
74b1e5ea8c Bump version for v1.17.0-rc.2 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-28 14:09:12 +01:00
Evan Lezar
88608781b6 Merge pull request #755 from elezar/fix-libcuda-so
Some checks failed
CodeQL / Analyze Go code with CodeQL (push) Has been cancelled
Golang / check (push) Has been cancelled
Golang / Unit test (push) Has been cancelled
Golang / Build (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-aarch64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos7-x86_64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, centos8-ppc64le) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-amd64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-arm64) (push) Has been cancelled
image / packages (${{github.event_name == 'pull_request'}}, ubuntu18.04-ppc64le) (push) Has been cancelled
image / image (packaging, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubi8, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
image / image (ubuntu20.04, ${{github.event_name == 'pull_request'}}) (push) Has been cancelled
Fix bug where libcuda.so is not found in ldcache
2024-10-24 23:33:12 +02:00
Evan Lezar
fa5a4ac499 Read ldcache at construction instead of on each locate call
This change udpates the ldcache locator to read the ldcache at construction
and use these contents to perform future lookups against. Each of the cache
entries are resolved and lookups return the resolved target.

Assuming a symlink chain: libcuda.so -> libcuda.so.1 -> libcuda.so.VERSION, this
means that libcuda.so.VERION will be returned for any of the following inputs:
libcuda.so, libcuda.so.1, libcudal.so.*.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 23:12:58 +02:00
Evan Lezar
9f1bd62c42 [no-relnote] Add failing libcuda locate test
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:53 +02:00
Evan Lezar
9534249936 [no-relnote] Add test for libcuda lookup
This change adds a test for locating libcuda as a driver library.
This includes a failing test on a system where libcuda.so.1 is in
the ldcache, but not at one of the predefined library search paths.

A testdata folder with sample root filesystems is included to test
various combinations.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Evan Lezar
e1ea0056b9 Fix bug in sorting of symlink chain
Since we use a map to keep track of the elements of a symlink chain
the construction of the final list of located elements is not stable.
This change constructs the output as this is being discovered and as
such maintains the original ordering.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Evan Lezar
c802c3089c Remove unsupported print-ldcache command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-24 15:56:25 +02:00
Tariq
771ac6b88a Merge pull request #756 from NVIDIA/cli-source-fallback
[TOML ConfigSource] add support for executing fallback CLI commands
2024-10-23 14:31:45 -07:00
Tariq Ibrahim
0f7aba9c3c [TOML ConfigSource] add support for executing fallback CLI commands
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Co-authored-by: Evan Lezar <elezar@nvidia.com>
2024-10-23 14:26:17 -07:00
Tariq
3c07ea0b17 Merge pull request #726 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.23.2
Bump golang from 1.23.1 to 1.23.2 in /deployments/devel
2024-10-21 10:11:21 -07:00
Evan Lezar
183dff9161 Merge pull request #750 from elezar/remove-csv-filenames-support
Remove csv filenames support
2024-10-21 11:10:27 +02:00
Evan Lezar
5e3e91a010 [no-relnote] Minor cleanup in create-symlinks
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 16:27:38 +02:00
Evan Lezar
dc0e191093 Remove csv-filename support from create-symlinks
This change removes support for specifying csv-filenames when
calling the create-symlinks hook. This is no longer required
as tegra-based systems generate hooks with `--link` arguments.

This also allows the hook to better serve as a reference implementation
for upstream projects wanting to implement a set of standard CDI hooks.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 16:27:27 +02:00
Evan Lezar
8a6c1944a5 Merge pull request #749 from elezar/bump-release-v1.17.0-rc.1
Bump version for v1.17.0-rc.1 release
2024-10-18 15:35:34 +02:00
Evan Lezar
5d057dce66 Bump version for v1.17.0-rc.1 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 15:33:12 +02:00
Evan Lezar
5931136879 Merge pull request #748 from elezar/fix-operator
Add aliases for runtime-specific envvars
2024-10-18 14:49:32 +02:00
Evan Lezar
1145ce2283 Add aliases for runtime-specific envvars
This change ensures that the toolkit works with older
versions of the GPU Operator where runtime-specific envvars are
used to set options such as the config file location.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-18 12:16:50 +02:00
Evan Lezar
38790c5df0 Merge pull request #747 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-63d366e
Bump third_party/libnvidia-container from `921e2f3` to `63d366e`
2024-10-17 18:18:53 +02:00
Evan Lezar
e5175c270e Merge pull request #745 from elezar/fix-symlink-logging
Fix symlink resolution error message
2024-10-17 18:04:54 +02:00
dependabot[bot]
d18a2b6fc7 Bump third_party/libnvidia-container from 921e2f3 to 63d366e
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `921e2f3` to `63d366e`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](921e2f3197...63d366ee3b)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-17 16:02:52 +00:00
Evan Lezar
2987c4d670 Merge pull request #740 from elezar/imex-by-volume-mount
Allow IMEX channel requests by volume mount
2024-10-17 17:56:12 +02:00
Evan Lezar
2e6712d2bc Allow IMEX channels to be requested as volume mounts
This change allows IMEX channels to be requested using the
volume mount mechanism.

A mount from /dev/null to /var/run/nvidia-container-devices/imex/{{ .ChannelID }}
is equivalent to including {{ .ChannelID }} in the NVIDIA_IMEX_CHANNELS
envvironment variables.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:54:29 +02:00
Evan Lezar
92df542f2f [no-relnote] Use image.CUDA to extract visible devices
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:53:17 +02:00
Evan Lezar
1991b3ef2a [no-relnote] Use string slice for devices in hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 16:53:17 +02:00
Evan Lezar
cdf39fbad3 [no-relnote] Use symlinks.Resolve in hook
This change removes duplicate logic from the create-symlinks hook
and uses symlinks.Resolve instead.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 15:47:13 +02:00
Evan Lezar
c30ca0fdc3 Fix typo in error message
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 15:46:49 +02:00
Evan Lezar
b077e2648d Merge pull request #741 from elezar/imex-default
Add disableIMEXChannelCreation feature flag
2024-10-17 15:26:21 +02:00
Evan Lezar
457d71c170 Add disable-imex-channel-creation feature flag
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 14:26:24 +02:00
Evan Lezar
bc9180b59d Expose opt-in features in toolkit-container
This change enables opt-in (off-by-default) features to be opted into.
These features can be toggled by name by specifying the (repeated)
--opt-in-features command line argument or as a comma-separated list
in the NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES environment variable.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-17 14:26:24 +02:00
Evan Lezar
ec8dfaf779 Merge pull request #743 from elezar/remove-opt-in-features
Remove ability to set per-container features in the config file
2024-10-17 13:46:23 +02:00
Evan Lezar
c129122da6 Merge pull request #742 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.6.2-base-ubuntu20.04
Bump nvidia/cuda from 12.6.1-base-ubuntu20.04 to 12.6.2-base-ubuntu20.04 in /deployments/container
2024-10-17 11:49:05 +02:00
Evan Lezar
0abf800000 Merge pull request #744 from elezar/fix-script
[no-relnote] Fix typo in script
2024-10-16 15:32:09 +02:00
Evan Lezar
1d9d0acf7d [no-relnote] Remove feature flag for per-container features
This change REMOVES the ability to set opt-in features
(e.g. GDS, MOFED, GDRCOPY) in the config file. The existing
per-container envvars are unaffected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-16 15:30:31 +02:00
Evan Lezar
17f14278a9 [no-relnote] Fix typo in script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-16 10:53:45 +02:00
dependabot[bot]
1fa5bbf351 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.6.1-base-ubuntu20.04 to 12.6.2-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-15 09:07:59 +00:00
Evan Lezar
f794d09df1 Merge pull request #729 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.26.0
Bump golang.org/x/sys from 0.25.0 to 0.26.0
2024-10-11 16:16:15 +02:00
Evan Lezar
17a2377ad5 Merge pull request #734 from NVIDIA/minor-cleanup
minor cleanup and improvements
2024-10-11 16:15:19 +02:00
Tariq Ibrahim
b90ee5d100 [no-relnote] minor cleanup and improvements
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-11 16:14:41 +02:00
Evan Lezar
1ef3f4048f Merge pull request #733 from elezar/add-imex-channels-to-management-spec
Add imex channels to management CDI spec
2024-10-11 15:28:50 +02:00
Evan Lezar
7fb31bd1dc Merge pull request #732 from elezar/add-z-lazy
Add -z,lazy to LDFLAGS
2024-10-11 15:20:30 +02:00
Evan Lezar
e2fe591535 Add -z,lazy to LDFLAGS
This fixes undefined symbol errors on platforms where -z,lazy may
not be the default.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-11 15:20:06 +02:00
Evan Lezar
adf3708d0b Add imex channels to management CDI spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-10 14:38:33 +02:00
Evan Lezar
a06d838b1c Merge pull request #686 from NVIDIA/get-config-from-cmdline
Fetch current container runtime config
2024-10-10 11:58:08 +02:00
Tariq Ibrahim
f477dc0df1 fetch current container runtime config through the command line
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>

add default runtime binary path to runtimes field of toolkit config toml

Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>

[no-relnote] Get low-level runtimes consistently

We ensure that we use the same low-level runtimes regardless
of the runtime engine being configured. This ensures consistent
behaviour.

Signed-off-by: Evan Lezar <elezar@nvidia.com>

Co-authored-by: Evan Lezar <elezar@nvidia.com>

address review comment

Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-10-10 01:13:20 -07:00
dependabot[bot]
879bb9ffd5 Bump golang.org/x/sys from 0.25.0 to 0.26.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.25.0 to 0.26.0.
- [Commits](https://github.com/golang/sys/compare/v0.25.0...v0.26.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-06 08:57:11 +00:00
Tariq
4604e3b6c8 Merge pull request #725 from elezar/fix-nvsandboxutils
Ensure that nvsandboxutils is available for version
2024-10-04 04:58:03 +08:00
dependabot[bot]
a9ca6995f7 Bump golang from 1.23.1 to 1.23.2 in /deployments/devel
Bumps golang from 1.23.1 to 1.23.2.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-02 08:37:23 +00:00
Evan Lezar
7cd2aef0d8 Ensure that nvsandboxutils is available for version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-02 09:37:05 +02:00
Evan Lezar
19482dac6f Merge pull request #715 from elezar/add-libcuda-so-symlink
Align driver symlinks with libnvidia-container
2024-10-01 18:16:36 +02:00
Evan Lezar
78c4ca8a12 Merge pull request #693 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.61.0
Bump github.com/golangci/golangci-lint from 1.60.1 to 1.61.0 in /deployments/devel
2024-10-01 11:40:31 +02:00
Evan Lezar
b12bdfc52a Merge pull request #691 from elezar/fix-codeql
Add an explicit CodeQL workflow to this repostitory
2024-10-01 11:39:34 +02:00
Evan Lezar
82ae2e615a Add creation of select driver symlinks to CDI spec
This change aligns the creation of symlinks under CDI with
the implementation in libnvidia-container. If the driver libraries
are present, the following symlinks are created:

* {{ .LibRoot }}/libcuda.so -> libcuda.so.1
* {{ .LibRoot }}/libnvidia-opticalflow.so -> libnvidia-opticalflow.so.1
* {{ .LibRoot }}/libGLX_indirect.so.0 -> libGLX_nvidia.so.{{ .Version }}

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-10-01 11:34:58 +02:00
Tariq
4f440dedda Merge pull request #722 from tariq1890/use-go-api-for-toolkit-install-rebase 2024-10-01 07:41:28 +08:00
Evan Lezar
3ee678f4f6 Convert crio to runtime package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-30 14:40:30 -07:00
Evan Lezar
103375e504 Convert containerd to runtime package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-30 14:39:52 -07:00
Evan Lezar
5bedbc2b50 Convert docker to runtime package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-30 14:36:35 -07:00
Evan Lezar
94337b7427 Add runtime package for runtime setup
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-30 14:36:35 -07:00
Evan Lezar
046a05921f Convert toolkit to go package
This change converts the toolkit installation logic to a go package
and invokes this installation over the go API instead of starting
this executable.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-30 14:36:35 -07:00
Tariq
6ca2700a17 Merge pull request #721 from NVIDIA/devel-check-modules
add go modules check for deployments/devel
2024-10-01 05:35:45 +08:00
Tariq Ibrahim
0d626cfbb7 add go modules check for deployments/devel
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-09-30 12:13:56 -07:00
Tariq
10bafd1d09 Merge pull request #643 from elezar/refactor-toml-source
Refactor handling of TOML config files for runtimes
2024-10-01 00:59:52 +08:00
Evan Lezar
bf2bdfd35e Refactor Toml config handling
This change refactors the toml config file handlig for runtimes
such as containerd or crio. A toml.Loader is introduced that
encapsulates loading the required file.

This can be extended to allow other mechanisms for loading
loading the current config.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-30 14:24:18 +02:00
Evan Lezar
f126877254 Merge pull request #716 from elezar/discover-vdpau-libraries
Also search for driver libraries in vdpau
2024-09-30 11:18:49 +02:00
Evan Lezar
006aebf31e Merge pull request #717 from elezar/fix-libnvidia-allocator-so-1
Skip explicit creation of libnvidia-allocator.so.1 symlink
2024-09-30 11:06:08 +02:00
Evan Lezar
6c5f4eea63 Remove support for config overrides
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-27 13:23:35 +02:00
dependabot[bot]
b0b7c7c9ee Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.60.1 to 1.61.0.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.60.1...v1.61.0)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-26 14:45:16 +00:00
Evan Lezar
b466270a24 Merge pull request #666 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/matryer/moq-0.5.0
Bump github.com/matryer/moq from 0.3.4 to 0.5.0 in /deployments/devel
2024-09-26 16:43:49 +02:00
Evan Lezar
d806f1045b Merge pull request #677 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.6.1-base-ubuntu20.04
Bump nvidia/cuda from 12.6.0-base-ubuntu20.04 to 12.6.1-base-ubuntu20.04 in /deployments/container
2024-09-26 16:43:08 +02:00
Evan Lezar
35ee96ac41 Merge pull request #685 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.25.0
Bump golang.org/x/sys from 0.24.0 to 0.25.0
2024-09-26 16:42:30 +02:00
Evan Lezar
f8141aab27 Skip explicit creation of libnvidia-allocator.so.1 symlink
Since we expect .so.1 symlinks to be created by ldconfig, we don't
explicitly request these. This change removes the creation of
a libnvidia-allocator.so.1 -> libnvidia-allocator.so.RM_VERSION symlink
through the create-symlinks hook.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-26 16:34:39 +02:00
Evan Lezar
98ffe2aa67 Also search for driver libraries in vdpau
This change adds the vdpau subfolder to the paths searched
for driver libraries. This allows the libvdpau_nvidia.so.RM_VERSION
library to also be discovered.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-26 14:42:29 +02:00
Evan Lezar
79c59aeb7f Merge pull request #713 from sananya12/nvsandboxutils-sananya
Fix an incompatible pointer conversion in nvsandboxutils
2024-09-26 13:12:21 +02:00
Sananya Majumder
906531fee3 Fix incompatible pointer conversion
This change adds a safe pointer conversion to fix an
incompatible C pointer conversion, which caused build failures on some
architectures.

Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-25 16:40:43 -07:00
Evan Lezar
0e68f60c0b Merge pull request #629 from sananya12/nvsandboxutils-sananya
Add changes for usage of nvsandboxutils
2024-09-25 19:17:05 +02:00
Sananya Majumder
563db0e0be nvsandboxutils: Add usage of GetGpuResource and GetFileContent APIs
This change adds a new discoverer for Sandboxutils to report the file
system paths and associated symbolic links using GetGpuResource and
GetFileContent APIs. Both GPU and MIG devices are supported. If the
Sandboxutils discoverer fails, the NVML discoverer is used to report
the file system information.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:05:14 -07:00
Sananya Majumder
7b770f63c3 nvsandboxutils: Add usage of GetDriverVersion API
This change includes the usage of Sandboxutils GetDriverVersion API to
retrieve the CUDA driver version. If the library is not available on the
system or the API call fails for some other reason, it will fallback to
the NVML API to return the driver version.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:05:11 -07:00
Sananya Majumder
dcbf5bc81f nvsandboxutils: Add implementation for the APIs
This change adds manual wrappers around the generated bindings to make
them into more user-friendly APIs for the caller. Some helper functions
are also added.
The APIs that are currently present in the library and implemented here
are:
nvSandboxUtilsInit
nvSandboxUtilsShutdown
nvSandboxUtilsGetDriverVersion
nvSandboxUtilsGetGpuResource
nvSandboxUtilsGetFileContent

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:05:09 -07:00
Sananya Majumder
978d439cf8 nvsandboxutils: Add internal bindings
This change adds the internal bindings for Sandboxutils, some of which
have been automatically generated with the help of c-for-go. The format
followed is similar to what is used in go-nvml. These would need to be
regenerated when the header file is modified and new APIs are added.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:05:05 -07:00
Sananya Majumder
aa946f3f59 nvsandboxutils: Add script to generate bindings
This change adds a script and related files to generate the internal
bindings for Sandboxutils library with the help of c-for-go.
This can be used to update the bindings when the header file is modified
with reference to how they are generated with the Makefile in go-nvml.

Run: ./update-bindings.sh

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Sananya Majumder <sananyam@nvidia.com>
2024-09-24 10:04:48 -07:00
Evan Lezar
a5a5833c14 Merge pull request #710 from elezar/update-changelog
Update CHANGELOG.md for v1.16.2 release
2024-09-24 17:22:29 +02:00
Evan Lezar
8e07d90802 Update CHANGELOG.md for v1.16.2 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-24 14:12:24 +02:00
Evan Lezar
2aadbbf22d Merge pull request #709 from elezar/bump-version-v1.16.2
Bump version to v1.16.2
2024-09-20 20:52:54 +02:00
Evan Lezar
88f9414849 Bump version to v1.16.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-20 20:50:33 +02:00
Evan Lezar
53c2dc6301 Merge pull request #705 from elezar/remove-libnvidia-contianer-update-check
[no-relnote] Don't check for submodule changes
2024-09-20 20:50:17 +02:00
Evan Lezar
3121663537 Merge pull request #708 from elezar/revert-check-symlink-resolution
Revert "Merge pull request #696 from elezar/check-link-resolution"
2024-09-20 20:50:02 +02:00
Evan Lezar
4b6de8036d Merge pull request #707 from elezar/revert-no-persistenced-flag-by-default
Revert #703 and #694
2024-09-20 20:46:13 +02:00
Evan Lezar
dc2ccdd2fa Revert "Merge pull request #696 from elezar/check-link-resolution"
This reverts commit c490baab63, reversing
changes made to 32486cf1e9.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-20 20:29:42 +02:00
Evan Lezar
5145b0a4b6 Revert "Merge pull request #694 from elezar/add-opt-in-to-sockets"
This reverts commit b061446694, reversing
changes made to c490baab63.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-20 20:26:45 +02:00
Evan Lezar
a819cfdab4 Revert "Merge pull request #703 from elezar/fix-no-persistenced-flag"
This reverts commit c02b144ed4, reversing
changes made to b061446694.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-20 20:26:33 +02:00
Evan Lezar
629db79b4f [no-relnote] Don't check for submodule changes
With the move to dependabot to udpate the libnvidia-container
submodule it is no longer required to have checks in place that
ensure that this is up to date when building.

In addition when re-building old commits in CI, for example we explicitly
do NOT want to have this check.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-20 20:20:36 +02:00
Evan Lezar
8693dd6962 [no-relnote] Add CodeQL workflow
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-19 14:36:45 +02:00
dependabot[bot]
51cc619eab Bump github.com/matryer/moq from 0.3.4 to 0.5.0 in /deployments/devel
Bumps [github.com/matryer/moq](https://github.com/matryer/moq) from 0.3.4 to 0.5.0.
- [Release notes](https://github.com/matryer/moq/releases)
- [Changelog](https://github.com/matryer/moq/blob/main/.goreleaser.yml)
- [Commits](https://github.com/matryer/moq/compare/v0.3.4...v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/matryer/moq
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-18 22:45:12 +00:00
Tariq
9b72161b63 Merge pull request #702 from NVIDIA/sync-gomod2 2024-09-19 06:43:55 +08:00
Evan Lezar
d925b6596b Merge pull request #687 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.23.1
Bump golang from 1.23.0 to 1.23.1 in /deployments/devel
2024-09-18 23:30:30 +02:00
Evan Lezar
c02b144ed4 Merge pull request #703 from elezar/fix-no-persistenced-flag
[no-relnote] Move --no-persistenced flag to after configure
2024-09-18 22:58:43 +02:00
Evan Lezar
16fdef50e6 [no-relnote] Move --no-persistenced flag to after configure
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 22:52:25 +02:00
Christopher Desiniotis
b061446694 Merge pull request #694 from elezar/add-opt-in-to-sockets
Skip injection of nvidia-persistenced socket by default
2024-09-18 13:39:28 -07:00
Evan Lezar
9c2476c98d Expose opt-in features in toolkit-container
This change enables opt-in (off-by-default) features to be opted into.
These features can be toggled by name by specifying the (repeated)
--opt-in-feature command line argument or as a comma-separated list
in the NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES environment variable.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 22:30:27 +02:00
Evan Lezar
70da6cfa50 Allow inclusion of persistenced socket in CDI specification
This change adds an include-persistenced-socket flag to the
nvidia-ctk cdi generate command that ensures that a generated
specification includes the nvidia-persistenced socket if present on
the host.

Note that for mangement mode, these sockets are always included
if detected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 22:30:27 +02:00
Evan Lezar
a4bfccc3fe Use include-persistenced-socket feature for CDI mode
This change ensures that the internal CDI representation includes
the persistenced socket if the include-persistenced-socket feature
flag is enabled.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 22:30:27 +02:00
Tariq Ibrahim
66f50d91bd sync go.mod with go mod tidy in deployments/devel
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-09-18 13:17:54 -07:00
Evan Lezar
ba1ed3232f Skip injection of nvidia-persistenced socket by default
This changes skips the injection of the nvidia-persistenced socket by
default.

An include-persistenced-socket feature flag is added to allow the
injection of this socket to be explicitly requested.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 22:10:09 +02:00
Evan Lezar
c490baab63 Merge pull request #696 from elezar/check-link-resolution
Check for valid paths in create-symlinks hook
2024-09-18 22:05:14 +02:00
Evan Lezar
32486cf1e9 Merge pull request #701 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-921e2f3
Bump third_party/libnvidia-container from `6b8b8e4` to `921e2f3`
2024-09-18 22:04:37 +02:00
dependabot[bot]
b9c3185d72 Bump third_party/libnvidia-container from 6b8b8e4 to 921e2f3
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `6b8b8e4` to `921e2f3`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](6b8b8e425e...921e2f3197)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-18 20:00:15 +00:00
Evan Lezar
e383b75a4d Check for valid paths in create-symlinks hook
This change updates the create-symlinks hook to check whether
link paths resolve in the container's filesystem. In addition
the executable is updated to return an error if a link could
not be created.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-09-18 17:47:19 +02:00
Evan Lezar
5827434cae Merge pull request #699 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-6b8b8e4
Bump third_party/libnvidia-container from `4c2494f` to `6b8b8e4`
2024-09-18 16:22:21 +02:00
dependabot[bot]
6ad699c095 Bump third_party/libnvidia-container from 4c2494f to 6b8b8e4
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `4c2494f` to `6b8b8e4`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](4c2494f165...6b8b8e425e)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-18 09:05:59 +00:00
dependabot[bot]
2450b002a8 Bump golang from 1.23.0 to 1.23.1 in /deployments/devel
Bumps golang from 1.23.0 to 1.23.1.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-09 08:25:21 +00:00
dependabot[bot]
03d1acc7b0 Bump golang.org/x/sys from 0.24.0 to 0.25.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.24.0 to 0.25.0.
- [Commits](https://github.com/golang/sys/compare/v0.24.0...v0.25.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-08 08:25:45 +00:00
dependabot[bot]
39120d5878 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.6.0-base-ubuntu20.04 to 12.6.1-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-06 08:44:24 +00:00
Evan Lezar
da5e3ce8c3 Merge pull request #634 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.20.0
Bump golang.org/x/mod from 0.19.0 to 0.20.0
2024-09-04 12:41:04 +02:00
dependabot[bot]
b47f954b7c Bump golang.org/x/mod from 0.19.0 to 0.20.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.19.0 to 0.20.0.
- [Commits](https://github.com/golang/mod/compare/v0.19.0...v0.20.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-27 07:44:17 +00:00
Evan Lezar
f959c0daaa Merge pull request #644 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.24.0
Bump golang.org/x/sys from 0.22.0 to 0.24.0
2024-08-27 09:43:10 +02:00
Evan Lezar
c3c0cdcc89 Merge pull request #660 from elezar/fix-libnvidia-allocator-duplicate-mount
Exclude libnvidia-allocator from graphics mounts
2024-08-22 14:00:25 +02:00
Evan Lezar
e3936fd623 Merge pull request #661 from elezar/fix-golangci-lint
[no-relnote] Ignore integer overflow in golangci-lint
2024-08-22 11:43:17 +02:00
Evan Lezar
f4987580d2 [no-relnote] Ignore integer overflow in golangci-lint
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-08-22 11:33:59 +02:00
dependabot[bot]
f3fceab317 Bump golang.org/x/sys from 0.22.0 to 0.24.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.22.0 to 0.24.0.
- [Commits](https://github.com/golang/sys/compare/v0.22.0...v0.24.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-20 18:29:08 +00:00
Evan Lezar
ff2ba2bbc5 Merge pull request #655 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.4
Bump github.com/urfave/cli/v2 from 2.27.3 to 2.27.4
2024-08-20 20:27:53 +02:00
Evan Lezar
0c554cbf7e Merge pull request #652 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.23.0
Bump golang from 1.22.6 to 1.23.0 in /deployments/devel
2024-08-20 17:18:06 +02:00
Evan Lezar
beb921fafe Exclude libnvidia-allocator from graphics mounts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-08-20 16:28:14 +02:00
Evan Lezar
45030d1169 Merge pull request #651 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.6.0-base-ubuntu20.04
Bump nvidia/cuda from 12.5.1-base-ubuntu20.04 to 12.6.0-base-ubuntu20.04 in /deployments/container
2024-08-19 11:39:46 +02:00
dependabot[bot]
2f37055bc6 Bump golang from 1.22.6 to 1.23.0 in /deployments/devel
Bumps golang from 1.22.6 to 1.23.0.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-19 09:39:15 +00:00
Evan Lezar
5316c2a618 Merge pull request #657 from yeahdongcn/typo
Rename icp_test.go to ipc_test.go
2024-08-19 11:38:56 +02:00
dependabot[bot]
89e4ae70c8 Bump github.com/urfave/cli/v2 from 2.27.3 to 2.27.4
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.27.3 to 2.27.4.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.27.3...v2.27.4)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-19 09:38:51 +00:00
Evan Lezar
c68b4be12f Merge pull request #658 from elezar/fix-default-value
Use empty string for default runtime-config-override
2024-08-19 11:34:36 +02:00
Evan Lezar
89c12c1368 Use empty string for default runtime-config-override
This change ensures that we unnecessarily print warnings for
runtimes where these configs are not applicable.

This removes the following warnings:
WARN[0000] Ignoring runtime-config-override flag for docker

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-08-19 11:33:26 +02:00
Evan Lezar
9d4294450d Merge pull request #656 from NVIDIA/dependabot/go_modules/deployments/devel/main/github.com/golangci/golangci-lint-1.60.1
Bump github.com/golangci/golangci-lint from 1.59.1 to 1.60.1 in /deployments/devel
2024-08-19 11:32:34 +02:00
dependabot[bot]
070b40d62a Bump github.com/golangci/golangci-lint in /deployments/devel
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.59.1 to 1.60.1.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/golangci/golangci-lint/compare/v1.59.1...v1.60.1)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-19 11:31:11 +02:00
Xiaodong Ye
13862f3c75 Rename icp_test.go to ipc_test.go
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2024-08-19 15:39:54 +08:00
dependabot[bot]
05449807d1 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.5.1-base-ubuntu20.04 to 12.6.0-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-13 08:52:55 +00:00
Evan Lezar
b7948428ff Merge pull request #624 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.3
Bump github.com/urfave/cli/v2 from 2.27.2 to 2.27.3
2024-08-07 17:40:35 +02:00
Evan Lezar
d2750e86c3 Merge pull request #633 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.22.6
Bump golang from 1.22.5 to 1.22.6 in /deployments/devel
2024-08-07 17:39:35 +02:00
Evan Lezar
0e4759cf36 Merge pull request #628 from elezar/switch-release-branch
[no-relnote] Switch dependabot to release-1.16 branch
2024-08-07 17:38:37 +02:00
dependabot[bot]
d767e4cfe9 Bump golang from 1.22.5 to 1.22.6 in /deployments/devel
Bumps golang from 1.22.5 to 1.22.6.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-08-07 09:08:13 +00:00
Evan Lezar
e454caf577 [no-relnote] Switch dependabot to release-1.16 branch
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-08-01 14:48:35 +02:00
Evan Lezar
fa46c01331 Merge pull request #616 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvlib-0.6.1
Bump github.com/NVIDIA/go-nvlib from 0.6.0 to 0.6.1
2024-08-01 11:01:10 +02:00
Evan Lezar
122136fe92 Merge pull request #627 from elezar/add-mig-cdi-test
[no-relnotes] Add initial unit test for MIG CDI spec generation
2024-08-01 11:00:07 +02:00
Evan Lezar
fa16d83494 [no-relnotes] Add initial unit test for MIG CDI spec generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-31 11:56:36 +02:00
dependabot[bot]
faabbd36d6 Bump github.com/urfave/cli/v2 from 2.27.2 to 2.27.3
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.27.2 to 2.27.3.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.27.2...v2.27.3)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-28 08:18:49 +00:00
Christopher Desiniotis
a470818ba7 Merge pull request #620 from NVIDIA/bump-v1.16.1
Bump version for v1.16.1 release
2024-07-23 07:55:16 -07:00
Christopher Desiniotis
404a2763cb Bump version for v1.16.1 release
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-07-22 20:38:53 -07:00
Christopher Desiniotis
87a63a6159 Merge pull request #619 from NVIDIA/fix-mig-get-minor-number
[bugfix] Process error type instead of nvml.Return
2024-07-22 15:21:49 -07:00
Christopher Desiniotis
d8ccd50349 [bugfix] Process error type instead of nvml.Return in internal/platform-support/dgpu
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-07-22 13:57:14 -07:00
dependabot[bot]
95694f082b Bump github.com/NVIDIA/go-nvlib from 0.6.0 to 0.6.1
Bumps [github.com/NVIDIA/go-nvlib](https://github.com/NVIDIA/go-nvlib) from 0.6.0 to 0.6.1.
- [Release notes](https://github.com/NVIDIA/go-nvlib/releases)
- [Commits](https://github.com/NVIDIA/go-nvlib/compare/v0.6.0...v0.6.1)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvlib
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-21 08:24:28 +00:00
Evan Lezar
c5c124b882 Merge pull request #612 from elezar/bump-release-v1.16.0
Bump version for v1.16.0 release
2024-07-15 15:39:37 +02:00
Evan Lezar
a67c3a1bb5 Bump version for v1.16.0 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-15 15:07:11 +02:00
Evan Lezar
51911974fb Merge pull request #611 from elezar/fix-generate-changelog
[no-relnote] Fix generate changelog
2024-07-15 15:06:44 +02:00
Evan Lezar
73a6c4d521 [no-relnote] Fix generate changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-15 15:06:24 +02:00
Evan Lezar
eb7983af41 Merge pull request #610 from elezar/fix-generate-changelog
[no-relnote] Fix generate-changelog
2024-07-15 14:48:13 +02:00
Evan Lezar
2273664328 [no-relnote] Fix generate-changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-15 14:47:54 +02:00
Evan Lezar
49acaef503 [no-relnote] Fix release script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-15 14:40:25 +02:00
Evan Lezar
fb6a767ada Merge pull request #609 from elezar/add-release-md
[no-relnote] Add RELEASE.md
2024-07-15 14:27:30 +02:00
Evan Lezar
76f419e3ba [no-relnote] Add RELEASE.md
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-15 14:25:12 +02:00
Evan Lezar
df3a7729ce Merge pull request #606 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.5.1-base-ubuntu20.04
Bump nvidia/cuda from 12.5.0-base-ubuntu20.04 to 12.5.1-base-ubuntu20.04 in /deployments/container
2024-07-15 11:04:34 +02:00
dependabot[bot]
509993e138 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.5.0-base-ubuntu20.04 to 12.5.1-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-15 08:36:11 +00:00
Evan Lezar
fce56f905b Merge pull request #601 from elezar/post-release-tooling-fix
[no-relnote] Fix release tooling
2024-07-11 14:58:02 +02:00
Evan Lezar
832fcba5cd [no-relnote] fix archive script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-11 10:34:28 +02:00
Evan Lezar
b646372998 [no-relnote] Fix release tooling
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 17:38:19 +02:00
Evan Lezar
d51cee6a9d Merge pull request #600 from NVIDIA/dependabot/docker/deployments/devel/main/golang-1.22.5
Bump golang from 1.22.1 to 1.22.5 in /deployments/devel
2024-07-10 13:31:09 +02:00
dependabot[bot]
2d9c45575c Bump golang from 1.22.1 to 1.22.5 in /deployments/devel
Bumps golang from 1.22.1 to 1.22.5.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-10 11:29:58 +00:00
Evan Lezar
49ada6ce56 Merge pull request #599 from elezar/add-build-image-tooling
Update build image tooling
2024-07-10 13:29:33 +02:00
Evan Lezar
d8edb83673 [no-relnote] Add dependabot rule for devel/Dockerfile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 13:28:58 +02:00
Evan Lezar
969d6c3a4b [no-relnotes] Add golangci-lint to devel image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 13:26:02 +02:00
Evan Lezar
ee4f1f4a24 Merge pull request #597 from elezar/add-build-image-tooling
[no-relnotes] Use a local development image
2024-07-10 13:25:22 +02:00
Evan Lezar
1b1b03d46e [no-relnotes] Use a local development image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 13:21:46 +02:00
Evan Lezar
cf417b0308 Merge pull request #595 from elezar/manage-golang-version-dependabot
Manage golang version with dependabot
2024-07-10 13:13:14 +02:00
Evan Lezar
aedd3b8f04 [no-relnote] Define golang version in Dockefile
This change adds logic to extract the golang version from a Dockefile.
This allows us to trigger golang updates using dependabot.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 13:12:44 +02:00
Evan Lezar
d969c6f302 Merge pull request #593 from elezar/bump-release-v1.16.0-rc.2
Bump version for v1.16.0-rc.2 release
2024-07-10 12:01:02 +02:00
Evan Lezar
976fdae5d0 [no-relnote] Fix version bump in release script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 11:56:42 +02:00
Evan Lezar
0eec445d7a Bump version for v1.16.0-rc.2 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 11:56:42 +02:00
Evan Lezar
448a3853ad Merge pull request #552 from elezar/refactor-dgpu-discovery
Refactor dGPU device discovery
2024-07-10 11:48:30 +02:00
Evan Lezar
9dd4e357b7 Merge pull request #562 from elezar/add-release-ci
[no-relnote] Add basic release workflow
2024-07-10 11:32:20 +02:00
Evan Lezar
6358c13dab [no-relnote] Add basic release workflow
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-10 11:20:40 +02:00
Evan Lezar
e4cdc48854 Merge pull request #591 from elezar/fix-release-scripts
[no-relnote] fix package release scripts
2024-07-09 15:34:13 +02:00
Evan Lezar
8e079c040a [no-relnote] fix package release scripts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 15:33:41 +02:00
Evan Lezar
8b010313e5 Merge pull request #548 from elezar/additional-graphics-libs
Inject additional libraries for full X11 functionality
2024-07-09 14:57:55 +02:00
Evan Lezar
bd7295702e Merge pull request #584 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.22.0
Bump golang.org/x/sys from 0.21.0 to 0.22.0
2024-07-09 14:26:10 +02:00
dependabot[bot]
135f682a87 Bump golang.org/x/sys from 0.21.0 to 0.22.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.21.0 to 0.22.0.
- [Commits](https://github.com/golang/sys/compare/v0.21.0...v0.22.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-09 12:01:41 +00:00
Evan Lezar
077ad3eb25 Merge pull request #583 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.19.0
Bump golang.org/x/mod from 0.18.0 to 0.19.0
2024-07-09 14:00:54 +02:00
Evan Lezar
e527cc1ff5 Use relative path to locate driver libraries
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:54:07 +02:00
Evan Lezar
55ea268829 Add RelativeToRoot function to Driver
This adds a function to return a path as a path relative to
the specified driver root.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:54:07 +02:00
Seungmin Kim
aae3da88c3 Inject additional libraries for full X11 functionality
This change updates X11 library detection to ensure that this
works for a wider range of driver installations.

Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
Signed-off-by: tux-rampage <tuxrampage@gmail.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:53:55 +02:00
dependabot[bot]
bb2be19a6c Bump golang.org/x/mod from 0.18.0 to 0.19.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.18.0 to 0.19.0.
- [Commits](https://github.com/golang/mod/compare/v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-09 11:53:13 +00:00
Evan Lezar
6fd2fc0c24 Merge pull request #587 from NVIDIA/dependabot/go_modules/main/tags.cncf.io/container-device-interface-0.8.0
Bump tags.cncf.io/container-device-interface from 0.7.2 to 0.8.0
2024-07-09 13:52:27 +02:00
Evan Lezar
7c2e624492 Merge pull request #589 from elezar/fix-dependabot
[no-relnote] Remove gh-pages action update
2024-07-09 13:52:14 +02:00
Evan Lezar
c0a3864ab4 [no-relnote] Remove gh-pages action update
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:50:09 +02:00
dependabot[bot]
0c309df7e7 Bump tags.cncf.io/container-device-interface from 0.7.2 to 0.8.0
Bumps [tags.cncf.io/container-device-interface](https://github.com/cncf-tags/container-device-interface) from 0.7.2 to 0.8.0.
- [Release notes](https://github.com/cncf-tags/container-device-interface/releases)
- [Commits](https://github.com/cncf-tags/container-device-interface/compare/v0.7.2...v0.8.0)

---
updated-dependencies:
- dependency-name: tags.cncf.io/container-device-interface
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-09 11:48:04 +00:00
Evan Lezar
46838b1a44 Merge pull request #576 from elezar/get-options-from-default
Extract options from default runtime if runc does not exist
2024-07-09 13:34:57 +02:00
Evan Lezar
be11cf428b [no-relnote] Add MIG discoverer to dgpu package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:34:04 +02:00
Evan Lezar
b42a5d3e3a [no-relnote] Refactor dGPU device discovery
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-09 13:34:04 +02:00
Evan Lezar
b8389283d5 Extract options from default runtime if runc does not exist
This change updates the logic to populate the options for the
nvidia runtime configs added to containerd or crio from a default runtime
if this is specified and a runc entry is not found.

This allows the default runtime values for settings such as SystemdCgroup
to be applied correctly.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-04 12:44:08 +02:00
Evan Lezar
6732f6d13b Merge pull request #577 from tariq1890/no-map-ptr
avoid using map pointers as maps are always passed by reference
2024-07-03 12:36:58 +02:00
Tariq Ibrahim
70ef0fb973 avoid using map pointers as maps are always passed by reference
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-07-02 17:35:44 -07:00
Evan Lezar
15c884e99f Merge pull request #558 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvlib-0.6.0
Bump github.com/NVIDIA/go-nvlib from 0.5.0 to 0.6.0
2024-07-02 16:21:02 +02:00
Evan Lezar
17d4d7da1f Merge pull request #574 from elezar/fix-centos7-builds
Use centos:7 vault repos for builds
2024-07-01 16:41:40 +02:00
Evan Lezar
c96ac07bf7 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 16:32:41 +02:00
Evan Lezar
a6a96a8d0e Merge pull request #560 from elezar/reduce-logging
Reduce logging verbosity in NVIDIA Container Runtime
2024-07-01 16:21:59 +02:00
Evan Lezar
be61ba01d9 [no-relnote] Use centos:7 vault repos for builds
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 16:21:20 +02:00
Evan Lezar
3aeba886d4 Reduce logging for the NVIDIA Container runtime
This change moves most of the logging for the NVIDIA Container runtime
from Info to Debug or Trace -- especially in the case where no
modifications are requried.

This removes execessive logging.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 12:12:38 +02:00
Evan Lezar
490c7dd599 Add Tracef to logger Interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 12:12:38 +02:00
Evan Lezar
8df5e33ef6 Add String function to oci.Runtime interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 12:12:38 +02:00
Evan Lezar
f55aac0232 Merge pull request #573 from elezar/bugfix-in-log-argument
Fix bug in argument parsing for logger creation
2024-07-01 12:12:14 +02:00
Evan Lezar
2c8431c1f8 Fix bug in argument parsing for logger creation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-07-01 12:03:53 +02:00
Evan Lezar
f35f3903ab Merge pull request #551 from Weselton/main
[no-relnote] Fixed spelling error
2024-06-24 15:00:56 +02:00
Evan Lezar
bbc5363009 Merge pull request #561 from elezar/update-generate-changelog
[no-relnote] Update generate changelog filter
2024-06-24 13:04:55 +02:00
Evan Lezar
0b944ba274 [no-relnote] Update generate changelog filter
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-24 13:04:13 +02:00
dependabot[bot]
ca528c4f53 Bump github.com/NVIDIA/go-nvlib from 0.5.0 to 0.6.0
Bumps [github.com/NVIDIA/go-nvlib](https://github.com/NVIDIA/go-nvlib) from 0.5.0 to 0.6.0.
- [Release notes](https://github.com/NVIDIA/go-nvlib/releases)
- [Commits](https://github.com/NVIDIA/go-nvlib/compare/v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvlib
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-23 08:42:01 +00:00
German Maltsev
692dac4cbd [no-relnote] Fixed spelling error
Signed-off-by: German Maltsev <maltsev.german@gmail.com>
2024-06-19 17:40:56 +03:00
Evan Lezar
6b78c72fec Merge pull request #550 from ArangoGutierrez/ref_name
Use ref_name on release workflow
2024-06-18 11:49:59 +02:00
Carlos Eduardo Arango Gutierrez
a3223f32e0 Use ref_name on release workflow
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-06-17 16:56:39 +02:00
Evan Lezar
f2eb4ea9ba Merge pull request #549 from elezar/bump-v1.16.0-rc.1
Bump v1.16.0 rc.1
2024-06-17 15:35:27 +02:00
Evan Lezar
4686f9499c Add basic release workflow
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-17 15:33:11 +02:00
Evan Lezar
3f481cd20a Update changelog for v1.16.0-rc.1 release
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-17 15:17:28 +02:00
Evan Lezar
cd52be86e6 Merge pull request #529 from elezar/vulkan-location
Support vulkan ICD files in a driver root
2024-06-17 15:00:29 +02:00
Evan Lezar
b5743da52f Merge pull request #526 from elezar/add-dev-root-to-create-device-nodes
Add dev-root option to create-device-nodes
2024-06-17 14:57:26 +02:00
Evan Lezar
03ccd64f33 Merge pull request #512 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvml-0.12.4-0
Bump github.com/NVIDIA/go-nvml from 0.12.0-6 to 0.12.4-0
2024-06-17 14:23:00 +02:00
Evan Lezar
33369861fc Merge pull request #547 from NVIDIA/revert-490-main
Revert "Inject additional libraries required for full display functionality"
2024-06-17 11:47:36 +02:00
Evan Lezar
d9a1106e00 Revert "Inject additional libraries required for full display functionality"
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-17 11:47:15 +02:00
Evan Lezar
f26425d3fd Merge pull request #490 from ehfd/main
Inject additional libraries required for full display functionality
2024-06-17 11:41:38 +02:00
Evan Lezar
272585d261 Merge pull request #544 from elezar/allow-toolkit-pid-file-to-be-specified
Allow toolkit.pid path to be specified
2024-06-17 11:39:44 +02:00
Evan Lezar
fe5a44cb35 Merge pull request #527 from elezar/increase-priority-of-injected-libs
Increase priority of ld.so.conf.d config file
2024-06-17 11:34:43 +02:00
Evan Lezar
5f2be72335 Support vulkan ICD files in a driver root
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-17 11:33:29 +02:00
Evan Lezar
ae074e7ba2 Merge pull request #545 from elezar/bump-version-v1.16.0-rc.1
Bump version to v1.16.0-rc.1
2024-06-17 11:32:07 +02:00
Evan Lezar
876d479308 Allow toolkit.pid path to be specified
This change makes the following changes:
* Allows the toolkit.pid path to be specified
* Creates the toolkit.pid file at /run/nvidia/toolkit/toolkit.pid by default
* Handles failures to remove the /run/nvidia/toolkit folder

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-17 11:26:23 +02:00
dependabot[bot]
abd638add9 Bump github.com/NVIDIA/go-nvml from 0.12.0-6 to 0.12.4-0
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.0-6 to 0.12.4-0.
- [Commits](https://github.com/NVIDIA/go-nvml/compare/v0.12.0-6...v0.12.4-0)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-14 13:27:46 +00:00
Evan Lezar
1dd59101c7 Bump version to v1.16.0-rc.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-14 15:25:21 +02:00
Evan Lezar
55630bc2c0 Merge pull request #542 from elezar/remove-provenance
Remove provenance information from image manifests
2024-06-14 15:16:06 +02:00
Evan Lezar
4f0de9f1ef Increase priority of ld.so.conf.d config file
This change ensures that the created /etc/ld.so.conf.d file
has a higher priority to ensure that the injected libraries
take precendence over non-compat libraries.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-13 13:49:14 +02:00
Evan Lezar
bced007f87 Remove provenance information from image manifests
Tools such as oc mirror do not support the provenence metadata
added to the image manifests with newer docker buildx versions.

This change disables the addition of provenance information.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-13 13:36:07 +02:00
Evan Lezar
ac90b7963d Merge pull request #519 from shivakunv/ngc_signing_job
add ngc image signing job for auto signing
2024-06-13 12:01:19 +02:00
shiva kumar
2e947edbe4 add ngc image signing job for auto signing
Signed-off-by: shiva kumar <shivaku@nvidia.com>
2024-06-12 13:20:35 +05:30
Evan Lezar
9fde4b21df Merge pull request #539 from elezar/fix-ppcle64-builds
Fix ppcle64 builds
2024-06-11 14:49:37 +02:00
Evan Lezar
84e0060fe8 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-11 14:31:31 +02:00
Evan Lezar
024dd3126d Use archived package repo for centos:stream8
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-11 14:31:06 +02:00
Evan Lezar
86b272cc7b Merge pull request #533 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.18.0
Bump golang.org/x/mod from 0.17.0 to 0.18.0
2024-06-10 13:43:15 +02:00
dependabot[bot]
2bc24970e0 Bump golang.org/x/mod from 0.17.0 to 0.18.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.17.0 to 0.18.0.
- [Commits](https://github.com/golang/mod/compare/v0.17.0...v0.18.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-10 11:35:48 +00:00
Evan Lezar
00dc0daecc Merge pull request #532 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.21.0
Bump golang.org/x/sys from 0.20.0 to 0.21.0
2024-06-10 13:34:42 +02:00
dependabot[bot]
e3120cbe64 Bump golang.org/x/sys from 0.20.0 to 0.21.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.20.0 to 0.21.0.
- [Commits](https://github.com/golang/sys/compare/v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-10 11:32:41 +00:00
Evan Lezar
00d82dd540 Merge pull request #536 from elezar/bump-github.com/xrash/smetrics
Bump github.com/xrash/smetrics to v0.0.0-20240521201337-686a1a2994c1
2024-06-10 13:31:58 +02:00
Evan Lezar
8fe366683e Bump github.com/xrash/smetrics to v0.0.0-20240521201337-686a1a2994c1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-10 13:30:14 +02:00
Evan Lezar
7320fcd86d Merge pull request #524 from NVIDIA/dependabot/docker/deployments/container/main/nvidia/cuda-12.5.0-base-ubuntu20.04
Bump nvidia/cuda from 12.4.1-base-ubuntu20.04 to 12.5.0-base-ubuntu20.04 in /deployments/container
2024-06-10 13:23:25 +02:00
Evan Lezar
01f212b7a8 Merge pull request #528 from elezar/set-cdi-permissions-644
Set default CDI spec permissions to 644
2024-06-10 13:20:18 +02:00
Evan Lezar
71e0b8590f Set default CDI spec permissions to 644
Although the nvidia-ctk cdi generate command generates
specs with 644 permissions, the nvidia-ctk cdi transform
commands do not. This change sets the default permissions
to 600 instead of 644.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-05 11:27:03 +02:00
tux-rampage
e841c6256a Add driver hooks 2024-06-05 06:42:25 +02:00
tux-rampage
c2411e644e Build xorg lib search paths dynamically 2024-06-04 19:40:05 +02:00
Evan Lezar
dffce25637 Rename driver-root option to root
This change renames the nvidia-ctk system create-device-nodes
flag driver-root to root. This makes it clearer that this is
used to load the kernel modules and is not specific to the
user-mode driver installation.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-04 10:25:49 +02:00
Evan Lezar
f5a4b23041 Add dev-root option to create-device-nodes
This allows for dev nodes to be created in cases
where the driver root and the dev root do not
match.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-04 10:24:26 +02:00
Evan Lezar
dfc8e22e12 Merge pull request #360 from elezar/add-device-root-to-toolkit-container
Add dev-root option to toolkit container
2024-06-04 10:09:37 +02:00
Seungmin Kim
155fe66575 Merge branch 'NVIDIA:main' into main 2024-06-04 16:29:35 +09:00
Evan Lezar
9208159263 Add dev-root option to toolkit container
This changes adds an option to the toolkit container to allow
the dev root to be specified. This adds support for driver installations
where the driver files are at one root and the dev nodes are created
elsewhere -- most typically at /. This is the case, for example, for
GKE driver installations.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-03 20:40:30 +02:00
Evan Lezar
9b83d09f18 Merge pull request #440 from elezar/cdi-generation-with-driver-root
Find libnvidia-ml in driver root
2024-06-03 13:31:54 +02:00
Evan Lezar
c5eda7af8e Ensure that libnvidia-ml.so.1 is found in driver root
This change ensures that the driver root is used to locate libnvidia-ml.so.1
if required.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-06-03 12:01:10 +02:00
dependabot[bot]
572b0401a4 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.4.1-base-ubuntu20.04 to 12.5.0-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-03 08:21:46 +00:00
Seungmin Kim
0d70052105 Merge branch 'NVIDIA:main' into main 2024-06-02 17:42:21 +09:00
Evan Lezar
bead6f98f3 Merge pull request #518 from elezar/fix-staging-image-repo
Remove trailing slash from staging registry
2024-05-29 16:51:39 +02:00
Evan Lezar
533d7119db Remove trailing slash from staging registry
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-29 16:51:20 +02:00
Evan Lezar
e4b46a09a7 Merge pull request #516 from elezar/update-infolib-construction
Update infolib construction
2024-05-28 13:32:55 +02:00
Evan Lezar
8fc4b9c742 Add WithInfoLib option to CDI package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-28 13:30:31 +02:00
Evan Lezar
ef57c07199 Bump github.com/NVIDIA/go-nvlib to v0.5.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-28 13:28:28 +02:00
Evan Lezar
b407109bdf Merge pull request #515 from elezar/fix-cdi-construction
Ensure consistent construction order for libs
2024-05-28 12:33:37 +02:00
Evan Lezar
abb5abaea4 Ensure consistent construction order for libs
This change ensures that nvnllib and devicelib are constructed
before these are used to construct infolib.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-28 12:05:44 +02:00
Evan Lezar
e55e6abc09 Merge pull request #506 from elezar/set-version-on-transform
Set mininum spec version on save
2024-05-28 11:48:28 +02:00
Evan Lezar
17c044eef8 Set minimum version on save
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-24 15:36:18 +02:00
Evan Lezar
edda11d647 Merge pull request #428 from elezar/fix-cdi-mode-resolution
Fix cdi mode resolution
2024-05-21 13:22:10 +02:00
Evan Lezar
52d0383b47 Bump github.com/NVIDIA/go-nvlib to v0.4.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-21 12:25:54 +02:00
Evan Lezar
3defc6babb Use go-nvlib mode resolution
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-21 12:25:54 +02:00
Evan Lezar
7b988f15ab Merge pull request #474 from deitch/separate-hook-binary
move nvidia-ctk hook command into nvidia-cdi-hook binary
2024-05-21 12:24:56 +02:00
Avi Deitcher
179d8655f9 Move nvidia-ctk hook command into own binary
This change creates an nvidia-cdi-hook binary for implementing
CDI hooks. This allows for these hooks to be separated from the
nvidia-ctk command which may, for example, require libnvidia-ml
to support other functionality.

The nvidia-ctk hook subcommand is maintained as an alias for the
time being to allow for existing CDI specifications referring to
this path to work as expected.

Signed-off-by: Avi Deitcher <avi@deitcher.net>
2024-05-21 12:19:44 +02:00
Evan Lezar
2d7b2360d2 Merge pull request #497 from elezar/systemdcgroup
Add option to set additional containerd configs per runtime
2024-05-21 12:05:51 +02:00
Evan Lezar
a61dc148b2 Merge pull request #501 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvml-0.12.0-6
Bump github.com/NVIDIA/go-nvml from 0.12.0-5 to 0.12.0-6
2024-05-21 11:35:58 +02:00
dependabot[bot]
3f6b916a85 Bump github.com/NVIDIA/go-nvml from 0.12.0-5 to 0.12.0-6
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.0-5 to 0.12.0-6.
- [Commits](https://github.com/NVIDIA/go-nvml/compare/v0.12.0-5...v0.12.0-6)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-21 11:28:09 +02:00
Seungmin Kim
cf388e7e63 Update internal/discover/graphics.go
Co-authored-by: Evan Lezar <evanlezar@gmail.com>
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
2024-05-17 23:55:36 +09:00
Evan Lezar
b435b797af Add support for adding additional containerd configs
This allow for options such as SystemdCgroup to be optionally set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-17 12:58:08 +02:00
Evan Lezar
c86c3aeeaf Allow per-runtime config overrides
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-17 12:58:08 +02:00
Evan Lezar
f13f1bdba4 Merge pull request #484 from elezar/fix-config-set
Use : as a config --set slice separator
2024-05-13 12:32:53 +02:00
Seungmin Kim
55440f40b3 Make X11 search paths accurate
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
2024-05-11 04:31:11 +09:00
Seungmin Kim
cc34996684 Inject additional libraries required for full X11 functionality and fix paths
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
2024-05-11 00:33:41 +09:00
Evan Lezar
5a3eda4cba Use : as a config --set list separator
This allows settings such as:

nvidia-ctk config --set nvidia-container-runtime.runtimes=crun:runc

to be applied correctly.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-08 17:26:59 +02:00
Evan Lezar
973a6633b3 Merge pull request #479 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.2
Bump github.com/urfave/cli/v2 from 2.27.1 to 2.27.2
2024-05-07 22:46:49 +02:00
Evan Lezar
f4d0cfb687 Merge pull request #318 from cdesiniotis/update-func-signature
Get device specs by Identifier
2024-05-07 22:45:38 +02:00
Christopher Desiniotis
35b23c5a2c Accept device.Identifiers for requesting CDI specs
This change moves from using strings to useing device.Identifiers
as input for requesting CDI specifications for specific
devices.

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-05-07 21:50:28 +02:00
Evan Lezar
0dc87e5d69 Merge pull request #488 from NVIDIA/dependabot/github_actions/golangci/golangci-lint-action-6
Bump golangci/golangci-lint-action from 5 to 6
2024-05-07 15:22:24 +02:00
Evan Lezar
edc50f6e49 Merge pull request #485 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.20.0
Bump golang.org/x/sys from 0.19.0 to 0.20.0
2024-05-07 15:21:58 +02:00
dependabot[bot]
7de7444b0f Bump golangci/golangci-lint-action from 5 to 6
Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 5 to 6.
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](https://github.com/golangci/golangci-lint-action/compare/v5...v6)

---
updated-dependencies:
- dependency-name: golangci/golangci-lint-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-07 08:25:32 +00:00
dependabot[bot]
8d3ffcd122 Bump github.com/urfave/cli/v2 from 2.27.1 to 2.27.2
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.27.1 to 2.27.2.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.27.1...v2.27.2)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-06 11:45:55 +02:00
dependabot[bot]
d72481cbd7 Bump golang.org/x/sys from 0.19.0 to 0.20.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.19.0 to 0.20.0.
- [Commits](https://github.com/golang/sys/compare/v0.19.0...v0.20.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-05 08:27:16 +00:00
Evan Lezar
a442a5ed1f Merge pull request #478 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvml-0.12.0-5
Bump github.com/NVIDIA/go-nvml from 0.12.0-4 to 0.12.0-5
2024-05-03 13:14:38 +02:00
dependabot[bot]
7de58b4af4 Bump github.com/NVIDIA/go-nvml from 0.12.0-4 to 0.12.0-5
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.0-4 to 0.12.0-5.
- [Commits](https://github.com/NVIDIA/go-nvml/compare/v0.12.0-4...v0.12.0-5)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-28 08:29:54 +00:00
Carlos Eduardo Arango Gutierrez
fde099d25b Merge pull request #476 from NVIDIA/dependabot/github_actions/golangci/golangci-lint-action-5
Bump golangci/golangci-lint-action from 4 to 5
2024-04-25 12:31:18 +02:00
dependabot[bot]
0a3eb67df8 Bump golangci/golangci-lint-action from 4 to 5
Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 4 to 5.
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](https://github.com/golangci/golangci-lint-action/compare/v4...v5)

---
updated-dependencies:
- dependency-name: golangci/golangci-lint-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-25 08:31:16 +00:00
Carlos Eduardo Arango Gutierrez
78f250a6b0 Merge pull request #469 from elezar/switch-to-ghimages
Switch to ghcr.io/nvidia/container-toolkit staging images
2024-04-22 10:19:56 +02:00
Evan Lezar
0aed9a16ad Merge pull request #460 from NVIDIA/dependabot/go_modules/main/github.com/NVIDIA/go-nvml-0.12.0-4
Bump github.com/NVIDIA/go-nvml from 0.12.0-3 to 0.12.0-4
2024-04-19 11:46:20 +02:00
Evan Lezar
f46b99c2f7 Switch to ghcr.io/nvidia/container-toolkit staging images
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-18 15:04:59 +02:00
Evan Lezar
d5f6e6f868 Use nvml/mock package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-18 14:53:37 +02:00
Evan Lezar
082ce066ed Replace go-nvlib/pkg/nvml with go-nvml/pkg/nvml
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-18 14:53:37 +02:00
Evan Lezar
bbaf543537 Update github.com/NVIDIA/go-nvlib to v0.3.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-18 14:53:37 +02:00
dependabot[bot]
50dd460eaa Bump github.com/NVIDIA/go-nvml from 0.12.0-3 to 0.12.0-4
Bumps [github.com/NVIDIA/go-nvml](https://github.com/NVIDIA/go-nvml) from 0.12.0-3 to 0.12.0-4.
- [Commits](https://github.com/NVIDIA/go-nvml/compare/v0.12.0-3...v0.12.0-4)

---
updated-dependencies:
- dependency-name: github.com/NVIDIA/go-nvml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-18 14:53:37 +02:00
Evan Lezar
b3af77166b Merge pull request #467 from NVIDIA/dependabot/go_modules/main/tags.cncf.io/container-device-interface-0.7.2
Bump tags.cncf.io/container-device-interface from 0.7.1 to 0.7.2
2024-04-18 14:50:33 +02:00
dependabot[bot]
d8cb812c8e Bump tags.cncf.io/container-device-interface from 0.7.1 to 0.7.2
Bumps [tags.cncf.io/container-device-interface](https://github.com/cncf-tags/container-device-interface) from 0.7.1 to 0.7.2.
- [Release notes](https://github.com/cncf-tags/container-device-interface/releases)
- [Commits](https://github.com/cncf-tags/container-device-interface/compare/v0.7.1...v0.7.2)

---
updated-dependencies:
- dependency-name: tags.cncf.io/container-device-interface
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-18 10:12:16 +00:00
Carlos Eduardo Arango Gutierrez
80386a7fb2 Merge pull request #463 from elezar/switch-maintenance-branch
Update maintenance dependabot rule
2024-04-18 12:11:04 +02:00
Evan Lezar
c0a5bbe7db Update maintenance dependabot rule
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-16 15:36:20 +02:00
Evan Lezar
ddeeca392c Merge pull request #462 from elezar/bump-version-v1.15.0
Bump version to v1.15.0
2024-04-15 15:21:52 +02:00
Evan Lezar
9944feee45 Bump version to v1.15.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-15 14:37:16 +02:00
Evan Lezar
762b14b6cd Merge pull request #459 from elezar/remove-runtime-docker
Remove runtime and docker packages
2024-04-15 11:51:32 +02:00
Evan Lezar
e76e10fb36 Remove third_party package folders
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-12 14:35:03 +02:00
Evan Lezar
fcdf565586 Remove tooling to build packages
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-12 14:32:46 +02:00
Evan Lezar
7a9bc14d98 Merge pull request #425 from jmbaur/discover-use-xdg
Use XDG_DATA_DIRS instead of hardcoding /usr/share
2024-04-11 19:09:53 +02:00
Jared Baur
5788e622f4 Use XDG_DATA_DIRS instead of hardcoding /usr/share
When running nvidia-ctk on a system that uses a custom XDG_DATA_DIRS
environment variable value, the configuration files for `glvnd`,
`vulkan`, and `egl` fail to get passed through from the host to the
container. Reading from XDG_DATA_DIRS instead of hardcoding the default
value allows for finding said files so they can be mounted in the
container.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-11 17:13:02 +02:00
Evan Lezar
29c0f82ed2 Merge pull request #327 from elezar/add-driver-config
Add config search path option to driver root
2024-04-11 16:58:33 +02:00
Evan Lezar
e1417bee64 Merge pull request #456 from elezar/fix-ubi8-image-build
Remove unneeded repo manipulation
2024-04-11 16:54:31 +02:00
Evan Lezar
5f9e49705c Remove unneeded repo manipulation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-11 16:44:42 +02:00
Evan Lezar
1d2b61ee11 Merge pull request #455 from elezar/fix-ubi8-image-build
Fix typo in dockerfile
2024-04-11 15:10:04 +02:00
Evan Lezar
271987d448 Fix typo in dockerfile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-11 15:06:22 +02:00
Evan Lezar
6cac2f5848 Merge pull request #446 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.19.0
Bump golang.org/x/sys from 0.18.0 to 0.19.0
2024-04-11 11:41:51 +02:00
dependabot[bot]
ef4eb0d3c6 Bump golang.org/x/sys from 0.18.0 to 0.19.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.18.0 to 0.19.0.
- [Commits](https://github.com/golang/sys/compare/v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-11 09:39:33 +00:00
Evan Lezar
04ab0595fa Merge pull request #449 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.17.0
Bump golang.org/x/mod from 0.16.0 to 0.17.0
2024-04-11 11:38:52 +02:00
Evan Lezar
9d3418d603 Merge pull request #454 from NVIDIA/dependabot/docker/deployments/container/nvidia/cuda-12.4.1-base-ubuntu20.04
Bump nvidia/cuda from 12.3.2-base-ubuntu20.04 to 12.4.1-base-ubuntu20.04 in /deployments/container
2024-04-11 11:38:18 +02:00
dependabot[bot]
57acd85fb1 Bump nvidia/cuda in /deployments/container
Bumps nvidia/cuda from 12.3.2-base-ubuntu20.04 to 12.4.1-base-ubuntu20.04.

---
updated-dependencies:
- dependency-name: nvidia/cuda
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-10 14:11:46 +00:00
Evan Lezar
6d69ca81de Merge pull request #453 from elezar/bump-cuda-version-in-containers
Add dependabot update for CUDA base images
2024-04-10 16:11:26 +02:00
Evan Lezar
be73581489 Add dependabot update for Dockefiles
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-10 16:06:27 +02:00
Evan Lezar
5682ce3149 Specify CUDA base images directly in Dockerfile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-10 16:06:27 +02:00
dependabot[bot]
cb2b000ddc Bump golang.org/x/mod from 0.16.0 to 0.17.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.16.0 to 0.17.0.
- [Commits](https://github.com/golang/mod/compare/v0.16.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-08 11:01:21 +00:00
Evan Lezar
cbc6ff73a4 Merge pull request #441 from elezar/bump-cdi-version
Update tags.cncf.io/container-device-interface to v0.7.1
2024-04-08 13:00:33 +02:00
Evan Lezar
4cd86caf67 Use NewCache instead of GetRegistry
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-05 17:09:17 +02:00
Evan Lezar
885313af3b Bump tags.cncf.io/container-device-interface to v0.7.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-05 17:05:40 +02:00
Evan Lezar
26e52b8013 Merge pull request #438 from elezar/refactor-driver-root
Create root.Driver instance at first usage
2024-04-03 15:11:45 +02:00
Evan Lezar
011c658945 Create root.Driver instance at first usage
This allows for testing through injection of the driver root.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-03 15:03:33 +02:00
Evan Lezar
413da20838 Merge pull request #362 from elezar/add-feature-flags
Add support for feature flags
2024-04-03 12:04:18 +02:00
Evan Lezar
09341a0934 Add support for feature flags
This change adds a features config that allows
individual features to be toggled at a global level. Each feature can (by default)
be controlled by an environment variable.

The GDS, MOFED, NVSWITCH, and GDRCOPY features are examples of such features.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-03 11:58:37 +02:00
Evan Lezar
2a9e3537ec Add config search paths option to driver root.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-02 23:03:05 +02:00
Evan Lezar
c374520b64 Merge pull request #436 from elezar/remove-verbose-from-tests
Remove verbose from tests
2024-04-02 17:46:33 +02:00
Evan Lezar
e982b9798c Remove verbose from tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-04-02 17:45:25 +02:00
Evan Lezar
7eb031919c Merge pull request #430 from lengrongfu/main
fix doc link
2024-04-02 17:41:19 +02:00
Evan Lezar
97950d6b8d Merge pull request #418 from elezar/bump-golang-version
Bump golang version to v1.22.1
2024-04-02 10:59:45 +02:00
Evan Lezar
1613f35bf5 Merge pull request #419 from elezar/rename-build-deployments
Rename build folder deployments
2024-04-02 10:57:54 +02:00
rongfu.leng
a78a7f866f fix doc
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
2024-03-28 13:55:25 +08:00
Evan Lezar
643b89e539 Add driver.Config
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-25 20:18:15 +02:00
Evan Lezar
bdfa525a75 Merge pull request #427 from elezar/add-functional-options-to-driver-root
Use functional options to construct driver root
2024-03-25 20:17:54 +02:00
Evan Lezar
93763d25f0 Use functional options to construct driver root
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-25 20:13:25 +02:00
Evan Lezar
5800e55027 Rename build folder deployments
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 14:00:24 +02:00
Evan Lezar
c572c3b787 Remove lint-internal make target
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:54:16 +02:00
Evan Lezar
3f7ed7c8db Rename golangci-lint target to lint
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:54:16 +02:00
Evan Lezar
cc6cbd4a89 Use versions.mk GOLANG version in CI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:54:16 +02:00
Evan Lezar
98ad835a77 Add vendor and check-vendor make targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:48:58 +02:00
Evan Lezar
3a1ac85020 Bump golang version to 1.22.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 13:46:47 +02:00
Evan Lezar
1ddc859700 Merge pull request #417 from elezar/bump-version-v1.15.0-rc.4
Bump version v1.15.0 rc.4
2024-03-19 10:47:10 +02:00
Evan Lezar
f1f629674e Bump CUDA base image to 12.3.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 10:37:36 +02:00
Evan Lezar
5a6bf02914 Bump version to v1.15.0-rc.4
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-19 10:36:02 +02:00
Evan Lezar
197cbbe0c6 Merge pull request #411 from elezar/bump-go-nvlib
Bump go-nvlib to v0.2.0  and go-nvml v0.12.0-3
2024-03-15 15:12:01 +02:00
Evan Lezar
b9abb44613 Bump go-nvlib to v0.2.0 and go-nvml v0.12.0-3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-15 15:09:12 +02:00
Evan Lezar
c4ec4a01f8 Merge pull request #384 from tariq1890/upd-go
Fix dockerfile arch selection
2024-03-15 08:58:27 +02:00
Tariq Ibrahim
f40f4369a1 Fix dockerfile arch selection
This change includes an arm64 arch check when installing golang.

Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-15 08:54:00 +02:00
Evan Lezar
2733661125 Merge pull request #383 from elezar/fix-golangci-lint-in-shell
Add GOLANGCI_LINT_CACHE to docker make targets
2024-03-15 08:51:29 +02:00
Evan Lezar
4806f6e70d Merge pull request #395 from elezar/add-nvidia-visible-devices-void
Add NVIDIA_VISIBLE_DEVICES=void to CDI specs
2024-03-13 10:00:38 +02:00
Evan Lezar
db21f5f9a8 Merge pull request #400 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.18.0
Bump golang.org/x/sys from 0.17.0 to 0.18.0
2024-03-12 15:16:05 +02:00
Evan Lezar
07443a0e86 Merge pull request #405 from NVIDIA/dependabot/submodules/main/third_party/libnvidia-container-6c8f1df
Bump third_party/libnvidia-container from `86f1946` to `6c8f1df`
2024-03-12 15:15:32 +02:00
dependabot[bot]
675db67ebb Bump third_party/libnvidia-container from 86f1946 to 6c8f1df
Bumps [third_party/libnvidia-container](https://github.com/NVIDIA/libnvidia-container) from `86f1946` to `6c8f1df`.
- [Release notes](https://github.com/NVIDIA/libnvidia-container/releases)
- [Commits](86f19460fb...6c8f1df7fd)

---
updated-dependencies:
- dependency-name: third_party/libnvidia-container
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-12 12:10:07 +00:00
Evan Lezar
14ecacf6d1 Merge pull request #403 from elezar/add-github-submodule-update
Add dependabot config to update libnvidia-container submodule
2024-03-12 13:47:17 +02:00
Evan Lezar
9451da1e6d Add dependabot config to update libnvidia-container submodule
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-12 13:45:04 +02:00
Evan Lezar
e30ddb398f Merge pull request #397 from jbujak/enumAdapters3
Use D3DKMTEnumAdapters3 for adapter enumeration
2024-03-12 13:33:25 +02:00
dependabot[bot]
375188495e Bump golang.org/x/sys from 0.17.0 to 0.18.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.17.0 to 0.18.0.
- [Commits](https://github.com/golang/sys/compare/v0.17.0...v0.18.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-12 11:14:05 +00:00
Evan Lezar
5ff48a5a89 Merge pull request #401 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.16.0
Bump golang.org/x/mod from 0.15.0 to 0.16.0
2024-03-12 13:13:24 +02:00
Jakub Bujak
44ae31d101 Use D3DKMTEnumAdapters3 for adapter enumeration
D3DKMTEnumAdapters3 is required to enumerate MCDM compute-only adapters

Signed-off-by: Jakub Bujak <jbujak@nvidia.com>
2024-03-12 11:42:08 +01:00
dependabot[bot]
942e5c7224 Bump golang.org/x/mod from 0.15.0 to 0.16.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.15.0 to 0.16.0.
- [Commits](https://github.com/golang/mod/compare/v0.15.0...v0.16.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-10 08:26:44 +00:00
Evan Lezar
88ad42ccd1 Add NVIDIA_VISIBLE_DEVICES=void to CDI specs
This change ensures taht NVIDIA_VISIBLE_DEVICES=void is included in
generated CDI specs. This prevents the NVIDIA Container Runtime Hook
from injecting devices if NVIDIA_VISIBLE_DEVICES=all is set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-04 16:10:06 +02:00
Evan Lezar
df9732dae4 Merge pull request #369 from NVIDIA/dependabot/go_modules/main/github.com/opencontainers/runtime-spec-1.2.0
Bump github.com/opencontainers/runtime-spec from 1.1.0 to 1.2.0
2024-03-01 22:49:29 +02:00
Evan Lezar
e66cc6a7b1 Merge pull request #392 from elezar/fix-non-fork-image-builds
Allow finer control of pushing images
2024-03-01 22:49:01 +02:00
dependabot[bot]
8f5a9a1918 Bump github.com/opencontainers/runtime-spec from 1.1.0 to 1.2.0
Bumps [github.com/opencontainers/runtime-spec](https://github.com/opencontainers/runtime-spec) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/opencontainers/runtime-spec/releases)
- [Changelog](https://github.com/opencontainers/runtime-spec/blob/main/ChangeLog)
- [Commits](https://github.com/opencontainers/runtime-spec/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: github.com/opencontainers/runtime-spec
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-01 22:42:29 +02:00
Evan Lezar
6b9dee5b77 Allow finer control of pushing images
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-01 22:42:13 +02:00
Evan Lezar
50bbf32cf0 Add GOLANGCI_LINT_CACHE to docker make targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-01 22:27:06 +02:00
Evan Lezar
413c1264ce Merge pull request #391 from elezar/add-lint-excludes-for-prestart
Add lint exclude for hooks.Prestart deprecation
2024-03-01 22:20:54 +02:00
Evan Lezar
c084756e48 Add lint exclude for hooks.Prestart deprecation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-01 22:18:27 +02:00
Evan Lezar
6265a2d89e Merge pull request #389 from NVIDIA/dependabot/go_modules/main/github.com/stretchr/testify-1.9.0
Bump github.com/stretchr/testify from 1.8.4 to 1.9.0
2024-03-01 22:16:05 +02:00
dependabot[bot]
72778ee536 Bump github.com/stretchr/testify from 1.8.4 to 1.9.0
Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.8.4 to 1.9.0.
- [Release notes](https://github.com/stretchr/testify/releases)
- [Commits](https://github.com/stretchr/testify/compare/v1.8.4...v1.9.0)

---
updated-dependencies:
- dependency-name: github.com/stretchr/testify
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-01 15:45:26 +00:00
Evan Lezar
2f11a190bf Merge pull request #388 from elezar/add-gh-pages-dependabot
Add dependabot for gh-pages branches
2024-03-01 17:05:02 +02:00
Evan Lezar
2d394f4624 Add dependabot for gh-pages branches
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-03-01 16:55:14 +02:00
Evan Lezar
ea55757bc3 Merge pull request #382 from elezar/remove-centos7-image
Remove centos7 container-toolkit image
2024-02-29 12:01:26 +02:00
Evan Lezar
2a620dc845 Merge pull request #387 from elezar/update-libnvidia-container
Update libnvidia-container
2024-02-29 12:00:34 +02:00
Evan Lezar
bad5369760 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-29 11:53:51 +02:00
Evan Lezar
2623e8a707 Allow finer control of pushing images
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-29 11:45:40 +02:00
Evan Lezar
05dd438489 Remove centos7 container-toolkit image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-28 15:45:09 +02:00
Evan Lezar
6780afbed1 Merge pull request #379 from NVIDIA/exists-fallback
[R550 driver support] add fallback logic to device.Exists(name)
2024-02-27 22:25:33 +02:00
Tariq Ibrahim
f80f4c485d [R550 driver support] add fallback logic to device.Exists(name)
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-02-27 11:59:35 -08:00
Evan Lezar
ac63063362 Merge pull request #375 from klueska/add-imex-support
Add imex support
2024-02-27 13:24:07 +02:00
Kevin Klues
761a425e0d Update reference to latest libnvidia-container
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-02-27 10:17:32 +01:00
Kevin Klues
296d4560b0 Add support for an NVIDIA_IMEX_CHANNELS envvar
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-02-26 20:09:43 +01:00
Evan Lezar
0409824106 Merge pull request #370 from elezar/remove-libnvidia-container0-dependency
Remove additional libnvidia-container0 dependency
2024-02-19 13:56:02 +01:00
Evan Lezar
562addc3c6 Remove additional libnvidia-container0 dependency
This change removes the additional libnvidia-container0=0.10.0+jetpack dependency
that was introduced for Tegra-based systems. These have since been migrated to
CDI-based direct injection using the NVIDIA Container Runtime.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-19 13:12:57 +01:00
Evan Lezar
ae82b2d9b6 Merge pull request #345 from elezar/allow-skip-of-device-nodes
Add --create-device-nodes option to toolkit config
2024-02-14 20:28:15 +01:00
Evan Lezar
355997d2d6 Merge pull request #314 from elezar/CNT-4032/mulitple-naming-strategies
Allow multiple naming strategies when generating CDI specification
2024-02-13 16:46:54 +01:00
Evan Lezar
b6efd3091d Use index and uuid as default device-name-strategies
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-13 16:38:18 +01:00
Evan Lezar
52da12cf9a Allow multiple device name strategies to be specified
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-13 16:38:05 +01:00
Evan Lezar
cd7d586afa Also ignore CDI errors if required
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-13 12:37:41 +01:00
Evan Lezar
cc4c2783a3 Add --create-device-nodes option to toolkit config
This change adds a --create-device-nodes option to the toolkit config CLI.
Most noteably, this allows the creation of control devices to be skipped
when CDI spec generation is enabled.

Currently values of "", "node", and "control" are supported and can be set
via the command line flag or the CREATE_DEVICE_NODES environment variable.

The default value of CREATE_DEVICE_NODES=control will trigger the creation
of control device nodes. Setting this envvar to include the (comma-separated)
strings of "" or "none" will disable device node creation regardless of
whether other supported strings are included.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-13 12:37:41 +01:00
Evan Lezar
a8d48808d7 Merge pull request #354 from elezar/add-release-1.14-dependabot
Add dependabot for release-1.14
2024-02-12 17:34:38 +01:00
Evan Lezar
aa724f1ac6 Add dependabot for release-1.14
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-12 17:32:18 +01:00
Evan Lezar
519b9f3cc8 Merge pull request #353 from elezar/update-changelog
Update changelog for #330
2024-02-12 17:29:06 +01:00
Evan Lezar
6e1bc0d7fb Update changelog for #330
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-12 17:28:40 +01:00
Evan Lezar
a2a1a78620 Merge pull request #330 from tariq1890/nvidia-dev-maj-num-lookup
add fallback logic when retrieving major number of the nvidia control device
2024-02-12 17:22:32 +01:00
Evan Lezar
ab7693ac9f Merge pull request #347 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.15.0
Bump golang.org/x/mod from 0.14.0 to 0.15.0
2024-02-12 15:47:01 +01:00
dependabot[bot]
f4df5308d0 Bump golang.org/x/mod from 0.14.0 to 0.15.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.14.0 to 0.15.0.
- [Commits](https://github.com/golang/mod/compare/v0.14.0...v0.15.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-12 13:35:09 +00:00
Evan Lezar
8dcc57c614 Merge pull request #348 from NVIDIA/dependabot/go_modules/main/github.com/sirupsen/logrus-1.9.3
Bump github.com/sirupsen/logrus from 1.9.0 to 1.9.3
2024-02-12 14:35:07 +01:00
Evan Lezar
6594f06e9a Merge pull request #349 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.17.0
Bump golang.org/x/sys from 0.16.0 to 0.17.0
2024-02-12 14:34:27 +01:00
Evan Lezar
8a706a97a0 Merge pull request #350 from NVIDIA/dependabot/github_actions/golangci/golangci-lint-action-4
Bump golangci/golangci-lint-action from 3 to 4
2024-02-12 14:33:43 +01:00
Evan Lezar
39f0bf21ce Merge pull request #346 from elezar/consistent-driver-root
Specify DRIVER_ROOT consistently
2024-02-12 14:33:15 +01:00
dependabot[bot]
0915a12e38 Bump golangci/golangci-lint-action from 3 to 4
Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 3 to 4.
- [Release notes](https://github.com/golangci/golangci-lint-action/releases)
- [Commits](https://github.com/golangci/golangci-lint-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: golangci/golangci-lint-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-12 08:13:37 +00:00
dependabot[bot]
e6cd897cc4 Bump golang.org/x/sys from 0.16.0 to 0.17.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.16.0 to 0.17.0.
- [Commits](https://github.com/golang/sys/compare/v0.16.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-11 08:49:59 +00:00
dependabot[bot]
35600910e0 Bump github.com/sirupsen/logrus from 1.9.0 to 1.9.3
Bumps [github.com/sirupsen/logrus](https://github.com/sirupsen/logrus) from 1.9.0 to 1.9.3.
- [Release notes](https://github.com/sirupsen/logrus/releases)
- [Changelog](https://github.com/sirupsen/logrus/blob/master/CHANGELOG.md)
- [Commits](https://github.com/sirupsen/logrus/compare/v1.9.0...v1.9.3)

---
updated-dependencies:
- dependency-name: github.com/sirupsen/logrus
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-11 08:49:55 +00:00
Evan Lezar
f89cef307d Specify DRIVER_ROOT consistently
This change ensures that CLI tools that require the path to the
driver root accept both the NVIDIA_DRIVER_ROOT and DRIVER_ROOT
environment variables in addition to the --driver-root command
line argument.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-09 14:28:56 +01:00
Evan Lezar
e96edb3f36 Merge pull request #342 from elezar/add-spec-dirs-to-cdi-list
Add spec-dir flag to nvidia-ctk cdi list command
2024-02-09 14:17:45 +01:00
Evan Lezar
bab4ec30af Improve error reporting for cdi list
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-08 14:58:48 +01:00
Evan Lezar
b6ab444529 Add spec-dirs argument to cdi list
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-08 14:50:14 +01:00
Evan Lezar
15d905def0 Merge pull request #309 from NVIDIA/dependabot/go_modules/main/github.com/urfave/cli/v2-2.27.1
Bump github.com/urfave/cli/v2 from 2.3.0 to 2.27.1
2024-02-07 11:01:46 +01:00
Evan Lezar
e64b723b71 Add proc.devices.New constructor
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-06 11:19:43 +01:00
Evan Lezar
f0545dd979 Merge pull request #333 from elezar/test-on-darwin
Fix build and tests targets on darwin
2024-02-06 10:16:04 +01:00
Tariq Ibrahim
f414ac2865 add fallback logic when retrieving major number of the nvidia control device
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-02-05 22:55:54 -08:00
Evan Lezar
772cf77dcc Fix build and test on darwin
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-05 23:58:28 +01:00
dependabot[bot]
026055a0b7 Bump github.com/urfave/cli/v2 from 2.3.0 to 2.27.1
Bumps [github.com/urfave/cli/v2](https://github.com/urfave/cli) from 2.3.0 to 2.27.1.
- [Release notes](https://github.com/urfave/cli/releases)
- [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md)
- [Commits](https://github.com/urfave/cli/compare/v2.3.0...v2.27.1)

---
updated-dependencies:
- dependency-name: github.com/urfave/cli/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-05 15:04:18 +00:00
Evan Lezar
812e6a2402 Merge pull request #308 from NVIDIA/dependabot/go_modules/main/github.com/pelletier/go-toml-1.9.5
Bump github.com/pelletier/go-toml from 1.9.4 to 1.9.5
2024-02-05 16:03:36 +01:00
Evan Lezar
b56aebb26f Merge pull request #310 from NVIDIA/dependabot/go_modules/main/golang.org/x/sys-0.16.0
Bump golang.org/x/sys from 0.7.0 to 0.16.0
2024-02-05 10:39:58 +01:00
dependabot[bot]
870903e03e Bump golang.org/x/sys from 0.7.0 to 0.16.0
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.7.0 to 0.16.0.
- [Commits](https://github.com/golang/sys/compare/v0.7.0...v0.16.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-05 09:33:36 +00:00
dependabot[bot]
b233a8b6ba Bump github.com/pelletier/go-toml from 1.9.4 to 1.9.5
Bumps [github.com/pelletier/go-toml](https://github.com/pelletier/go-toml) from 1.9.4 to 1.9.5.
- [Release notes](https://github.com/pelletier/go-toml/releases)
- [Changelog](https://github.com/pelletier/go-toml/blob/v2/.goreleaser.yaml)
- [Commits](https://github.com/pelletier/go-toml/compare/v1.9.4...v1.9.5)

---
updated-dependencies:
- dependency-name: github.com/pelletier/go-toml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-05 09:32:35 +00:00
Evan Lezar
e96e1baed5 Merge pull request #336 from NVIDIA/dependabot/go_modules/main/github.com/fsnotify/fsnotify-1.7.0
Bump github.com/fsnotify/fsnotify from 1.5.4 to 1.7.0
2024-02-05 10:31:45 +01:00
dependabot[bot]
dce368e308 Bump github.com/fsnotify/fsnotify from 1.5.4 to 1.7.0
Bumps [github.com/fsnotify/fsnotify](https://github.com/fsnotify/fsnotify) from 1.5.4 to 1.7.0.
- [Release notes](https://github.com/fsnotify/fsnotify/releases)
- [Changelog](https://github.com/fsnotify/fsnotify/blob/main/CHANGELOG.md)
- [Commits](https://github.com/fsnotify/fsnotify/compare/v1.5.4...v1.7.0)

---
updated-dependencies:
- dependency-name: github.com/fsnotify/fsnotify
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-04 08:26:11 +00:00
Evan Lezar
15f609a52d Merge pull request #332 from elezar/filter-actions
Enable a subset of package and image builds in PRs
2024-02-02 11:34:10 +01:00
Evan Lezar
0bf08085ce Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-02 09:21:54 +01:00
Evan Lezar
da68ad393c Enable subset of image builds in PRs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-02 09:20:05 +01:00
Evan Lezar
2f3600af9a Merge pull request #329 from elezar/migrate-libnvidia-container-to-github
Update libnvidia-container to github ref
2024-02-01 17:08:55 +01:00
Evan Lezar
0ff28aa21b Update libnvidia-container to github ref
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-02-01 16:36:10 +01:00
Evan Lezar
b88ff4470c Merge pull request #328 from NVIDIA/revert-323-sheckler/cuda-12.3.2
Revert "chore: Update CUDA base image to 12.3.2"
2024-02-01 16:28:59 +01:00
Evan Lezar
cfb1daee0a Revert "chore: Update CUDA base image to 12.3.2" 2024-02-01 16:27:53 +01:00
Evan Lezar
e3ab55beed Merge pull request #323 from heckler1/sheckler/cuda-12.3.2
chore: Update CUDA base image to 12.3.2
2024-02-01 14:02:33 +01:00
Evan Lezar
9530d9949f Merge pull request #325 from elezar/remove-jenkinsfile
Remove unneeded Jenkinsfile
2024-02-01 08:41:41 +01:00
Evan Lezar
6b2cd487a6 Remove unneeded Jenkinsfile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-31 22:05:01 +01:00
Stephen Heckler
e5ec408a5c Release 1.15.0-rc4
Signed-off-by: Stephen Heckler <sheckler@cloudflare.com>
2024-01-31 13:49:31 -06:00
Stephen Heckler
301b666790 chore: Update CUDA base image to 12.3.2
Signed-off-by: Stephen Heckler <sheckler@cloudflare.com>
2024-01-31 13:48:05 -06:00
Evan Lezar
e99b519509 Merge pull request #321 from elezar/bump-libnvidia-container
Update libnvidia-container
2024-01-30 16:44:01 +01:00
Evan Lezar
d123273800 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-30 16:11:57 +01:00
Evan Lezar
07136d9ac4 Merge pull request #312 from NVIDIA/dependabot/go_modules/main/golang.org/x/mod-0.14.0
Bump golang.org/x/mod from 0.5.0 to 0.14.0
2024-01-30 16:04:48 +01:00
Evan Lezar
0ef06be477 Merge pull request #320 from elezar/remove-centos8-jobs
Remove centos8 jobs
2024-01-30 16:03:30 +01:00
Evan Lezar
5a70e75547 Use stable/rpm repo for release tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-30 12:40:57 +01:00
Evan Lezar
46b4cd7b03 Remove unused centos8 jobs
This change removes the centos8-x86_64 and centos8-aarch64 pipeline jobs.

These packages are no longer used since centos7 packages are used instead.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-30 12:40:16 +01:00
Evan Lezar
93e15bc641 Merge pull request #319 from elezar/bump-version-v1.15.0-rc.3
Bump version to v1.15.0-rc.3
2024-01-30 11:41:17 +01:00
Evan Lezar
07d1f48778 Bump version to v1.15.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-30 11:18:17 +01:00
Carlos Eduardo Arango Gutierrez
21ed60bc46 Merge pull request #313 from elezar/fix-update-ldconfig
Fix bug in update-ldcache hook
2024-01-29 18:24:33 +01:00
dependabot[bot]
7abbd98ff0 Bump golang.org/x/mod from 0.5.0 to 0.14.0
Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.5.0 to 0.14.0.
- [Commits](https://github.com/golang/mod/compare/v0.5.0...v0.14.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-29 14:20:35 +00:00
Evan Lezar
862f071557 Fix bug in update-ldcache hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-29 15:20:21 +01:00
Carlos Eduardo Arango Gutierrez
73cd63e4e5 Merge pull request #307 from ArangoGutierrez/githubactions
Add GitHub Actions
2024-01-29 15:18:32 +01:00
Carlos Eduardo Arango Gutierrez
6857f538a6 Add github actions
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-01-29 14:32:02 +01:00
Carlos Eduardo Arango Gutierrez
195e3a22b4 Drop go checks from gitlab ci
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-01-28 18:23:20 +01:00
Carlos Eduardo Arango Gutierrez
88debb8e34 Use test-infra devel image
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-01-28 18:22:51 +01:00
Christopher Desiniotis
03cbf9c6cd Merge branch 'CNT-4798/gdrcopy' into 'main'
Add a new gated modifier for GDRCopy which injects the gdrdrv device node

See merge request nvidia/container-toolkit/container-toolkit!530
2024-01-24 23:17:15 +00:00
Christopher Desiniotis
55097b3d7d Add a new gated modifier for GDRCopy which injects the gdrdrv device node
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2024-01-24 14:25:58 -08:00
Evan Lezar
738ebd83d3 Merge branch 'fix-cdi-enable-docker' into 'main'
Fix --cdi.enabled for Docker

See merge request nvidia/container-toolkit/container-toolkit!541
2024-01-23 20:16:12 +00:00
Tariq Ibrahim
9c6dd94ac8 Merge branch 'disc-typos' into 'main'
fix minor typos and rm unused logger param

See merge request nvidia/container-toolkit/container-toolkit!540
2024-01-23 18:42:21 +00:00
Evan Lezar
f936f4c0bc Add --cdi.enable as alias for --cdi.enabled
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-23 14:57:15 +01:00
Evan Lezar
ab598f004d Fix --cdi.enabled for Docker
Instead of relying only on Experimental mode, the docker daemon
config requires that CDI is an opt-in feature.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-23 14:56:08 +01:00
Tariq Ibrahim
9c1f0bb08b fix minor typos and rm unused logger param
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
2024-01-22 16:48:11 -08:00
Evan Lezar
b3519fadc4 Merge branch 'ldconfig-path' into 'main'
Allow for customizing the path to ldconfig and call ldconfig with explicit cache/config paths

See merge request nvidia/container-toolkit/container-toolkit!525
2024-01-22 12:36:51 +00:00
Jared Baur
d80657dd0a Explicitly set ldconfig cache and config file
Since the `update-ldcache` hook uses the host's `ldconfig`, the default
cache and config files configured on the host will be used. If those
defaults differ from what nvidia-ctk expects it to be (/etc/ld.so.cache
and /etc/ld.so.conf, respectively), then the hook will fail. This change
makes the call to ldconfig explicit in which cache and config files are
being used.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2024-01-18 02:23:27 -08:00
Jared Baur
838493b8b9 Allow for customizing the path to ldconfig
Since the `createContainer` `runc` hook runs with the environment that
the container's config.json specifies, the path to `ldconfig` may not be
easily resolvable if the host environment differs enough from the
container (e.g. on a NixOS host where all binaries are under hashed
paths in /nix/store with an Ubuntu container whose PATH contains
FHS-style paths such as /bin and /usr/bin). This change allows for
specifying exactly where ldconfig comes from.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2024-01-17 21:07:00 -08:00
Evan Lezar
26a4eb327c Merge branch 'add-crun-as-configured-runtime' into 'main'
Set default low-level runtimes to runc, crun

See merge request nvidia/container-toolkit/container-toolkit!536
2024-01-17 21:28:14 +00:00
Evan Lezar
f6c252cbde Add crun as a default low-level runtime.
This change adds crun as a configured low-level runtime.
Note that runc still preferred and will be used if present on the
system.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-17 11:31:07 +01:00
Evan Lezar
11692a8499 Merge branch 'bump-cuda-12.3.1' into 'main'
Bump CUDA base image to 12.3.1

See merge request nvidia/container-toolkit/container-toolkit!535
2024-01-11 14:03:32 +00:00
Evan Lezar
ba3d80e8ea Merge branch 'fix-user-group' into 'main'
Fix bug in determining CLI user on SUSE systems

See merge request nvidia/container-toolkit/container-toolkit!532
2024-01-11 13:19:25 +00:00
Evan Lezar
9c029cac72 Fix bug in determining CLI user on SUSE systems
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-11 13:54:40 +01:00
Evan Lezar
dd065fa69e Bump CUDA base image to 12.3.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-11 10:35:54 +01:00
Evan Lezar
6f3d9307bb Merge branch 'update-libnvidia-container' into 'main'
Update libnvidia-container to a4ef85eb

See merge request nvidia/container-toolkit/container-toolkit!533
2024-01-10 14:14:39 +00:00
Evan Lezar
72584cd863 Update libnvidia-container to a4ef85eb
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-10 14:29:03 +01:00
Evan Lezar
2a7bfcd36b Merge branch 'fix-mid-device-nodes' into 'main'
Use devRoot to resolve MIG device nodes

See merge request nvidia/container-toolkit/container-toolkit!526
2024-01-09 14:58:19 +00:00
Evan Lezar
21fc1f24e4 Use devRoot to resolve MIG device nodes
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-09 15:40:17 +01:00
Evan Lezar
9396858834 Merge branch 'libnvdxgdmal' into 'main'
Add libnvdxgdmal library

See merge request nvidia/container-toolkit/container-toolkit!529
2024-01-09 14:37:08 +00:00
Jakub Bujak
79acd7acff Add libnvdxgdmal library
This change adds the new libnvdxgdmal.so.1 library to the list of files copied from the DriverStore.

Signed-off-by: Jakub Bujak <jbujak@nvidia.com>
2024-01-09 15:29:55 +01:00
Evan Lezar
fab711ddf3 Merge branch 'remove-libseccomp-dependency' into 'main'
Remove libseccomp package dependency

See merge request nvidia/container-toolkit/container-toolkit!531
2024-01-09 09:44:23 +00:00
Evan Lezar
760cf93317 Remove libseccomp package dependency
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2024-01-09 10:02:06 +01:00
Evan Lezar
f4838dde9b Merge branch 'log-requested-mode' into 'main'
Log explicitly requested runtime mode

See merge request nvidia/container-toolkit/container-toolkit!527
2024-01-08 11:42:42 +00:00
Evan Lezar
c90211e070 Log explicitly requested runtime mode
For users running the nvidia-container-runtime it would be useful
to determine the runtime mode used from the logs directly instead
of relying on other log messages as signals. This change ensures
that an explicitly selected mode is also logged instead of only
when mode=auto.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-15 15:35:35 +01:00
Evan Lezar
a2262d00cc Merge branch 'tegra-chardev-discover' into 'main'
Use `devRoot` for discovering character devices on Tegra platforms

See merge request nvidia/container-toolkit/container-toolkit!524
2023-12-15 14:34:07 +00:00
Jared Baur
95b8ebc297 Use devRoot for discovering character devices on Tegra platforms
Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2023-12-14 11:46:21 -08:00
Evan Lezar
99b3050d20 Merge branch 'update-changelog' into 'main'
Update changelog

See merge request nvidia/container-toolkit/container-toolkit!523
2023-12-14 15:35:45 +00:00
Evan Lezar
883f7ec3d8 Update changelog
See merge request nvidia/container-toolkit/container-toolkit!522

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-14 13:47:38 +01:00
Evan Lezar
9dd324be9c Merge branch 'tegra-dev-root' into 'main'
Fix using `devRoot` on Tegra platforms

See merge request nvidia/container-toolkit/container-toolkit!522
2023-12-14 09:46:05 +00:00
Jared Baur
508438a0c5 Fix using devRoot on Tegra platforms
Using `WithDevRoot` on Tegra platforms was incorrectly setting
`driverRoot`, fix it so that it correctly sets `devRoot`.

Signed-off-by: Jared Baur <jaredbaur@fastmail.com>
2023-12-13 19:56:02 -08:00
Christopher Desiniotis
9baed635d1 Merge branch 'CNT-4778/bump-gonvlib' into 'main'
Update to github.com/NVIDIA/go-nvlib@f3264c8a6a7a

See merge request nvidia/container-toolkit/container-toolkit!520
2023-12-13 18:11:59 +00:00
Christopher Desiniotis
895a5ed73a Update to github.com/NVIDIA/go-nvlib@f3264c8a6a7a
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-13 10:08:14 -08:00
Christopher Desiniotis
2d7b126bc9 Merge branch 'CNT-4762/extend-runtime-cdi-device-names' into 'main'
Extend the 'runtime.nvidia.com/gpu' CDI device kind to support full-GPUs...

See merge request nvidia/container-toolkit/container-toolkit!514
2023-12-06 17:10:26 +00:00
Christopher Desiniotis
86d86395ea Update changelog for the automatic CDI spec generation added for the 'runtime.nvidia.com/gpu' CDI kind
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:09:10 -08:00
Christopher Desiniotis
32c3bd1ded Fallback to standard CDI modifier when creation of automatic CDI modifier fails
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
3158146946 Extend the 'runtime.nvidia.com/gpu' CDI device kind to support MIG devices specified by index or UUID
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
def7d09f85 Refactor how device identifiers are parsed before performing automatic CDI spec generation
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
b9ac54b922 Add GetDeviceSpecsByID() API to the nvcdi Interface
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Christopher Desiniotis
ae1b7e126c Extend the 'runtime.nvidia.com/gpu' CDI device kind to support full-GPUs specified by index or UUID
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-06 09:02:19 -08:00
Evan Lezar
08ef3e7969 Merge branch 'bump-version-1.15.0-rc.2' into 'main'
Bump version to v1.15.0-rc.2

See merge request nvidia/container-toolkit/container-toolkit!519
2023-12-06 16:45:29 +00:00
Evan Lezar
ea977fb43e Bump version to v1.15.0-rc.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-06 17:45:08 +01:00
Evan Lezar
7b47eee634 Merge branch 'CNT-4774/implement-set-for-crio' into 'main'
Implement Set() for the crio implementation of engine.Interface

See merge request nvidia/container-toolkit/container-toolkit!517
2023-12-05 09:30:42 +00:00
Christopher Desiniotis
d7a3d93024 Implement Set() for the crio implementation of engine.Interface
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-04 15:27:35 -08:00
Christopher Desiniotis
527248ef5b Merge branch 'CNT-4764/cleanup-engine-interface' into 'main'
Refactor the engine.Interface such that the Set() API does not return an extraneous error

See merge request nvidia/container-toolkit/container-toolkit!515
2023-12-04 23:05:30 +00:00
Christopher Desiniotis
83ad09b179 Refactor the engine.Interface such that the Set() API does not return an extraneous error
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-12-01 15:59:34 -08:00
Evan Lezar
ffe7ed313a Merge branch 'goimports-local' into 'main'
run goimports -local against the entire codebase

See merge request nvidia/container-toolkit/container-toolkit!512
2023-12-01 10:54:26 +00:00
Tariq Ibrahim
7627d48a5c run goimports -local against the entire codebase
Signed-off-by: Tariq Ibrahim <tibrahim@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-12-01 11:13:17 +01:00
Evan Lezar
5c78e2b7e6 Merge branch 'CNT-4659/transform-container-roots' into 'main'
Add transformer for container roots

See merge request nvidia/container-toolkit/container-toolkit!507
2023-12-01 09:38:39 +00:00
Evan Lezar
bc4e19aa48 Add --relative-to option to nvidia-ctk transform root
This change adds a --relative-to option to the nvidia-ctk transform root
command. This defaults to "host" maintaining the existing behaviour.

If --relative-to=container is specified, the root transform is applied to
container paths in the CDI specification instead of host paths.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-30 20:26:42 +01:00
Evan Lezar
879cc99aac Add transformer for container roots
This change renames the root transformer to indicate that it
operates on host paths and adds a container root transformer for
explicitly transforming container roots.

The transform.NewRootTransformer constructor still exists, but has
been marked as deprecated.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-30 20:26:42 +01:00
Evan Lezar
aa72dcde97 Merge branch 'golangci-lint-on-darwin' into 'main'
Allow make check to run on non-linux platforms

See merge request nvidia/container-toolkit/container-toolkit!509
2023-11-27 13:51:08 +00:00
Evan Lezar
a545810981 Allow make check to run on non-linux platforms
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-27 14:10:34 +01:00
Evan Lezar
cff50aa5d6 Merge branch 'bump-version-1.15.0-rc.1' into 'main'
Bump version to v1.15.0-rc.1

See merge request nvidia/container-toolkit/container-toolkit!508
2023-11-27 13:06:31 +00:00
Evan Lezar
84d857b497 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-27 13:22:52 +01:00
Evan Lezar
7840e7d650 Bump version to v1.15.0-rc.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-27 11:49:54 +01:00
Evan Lezar
c014f72ffb Merge branch 'fix-ldconfig-update' into 'main'
Fix incorrect ldconfig path

See merge request nvidia/container-toolkit/container-toolkit!505
2023-11-24 10:49:23 +00:00
Evan Lezar
893b3c1824 Fix incorrect ldconfig path
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-24 11:03:51 +01:00
Evan Lezar
097e203f1f Merge branch 'fix-config-update-command' into 'main'
Switch to reflect package for config updates

See merge request nvidia/container-toolkit/container-toolkit!500
2023-11-23 12:35:09 +00:00
Evan Lezar
671d787a42 Switch to reflect package for config updates
This change switches to using the reflect package to determine
the type of config options instead of inferring the type from the
Toml data structure.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-23 10:29:38 +01:00
Evan Lezar
fcc9922133 Merge branch 'CNT-4761/enable-cdi-in-docker' into 'main'
Add option to nvidia-ctk to enable CDI in docker

See merge request nvidia/container-toolkit/container-toolkit!499
2023-11-23 09:16:33 +00:00
Christopher Desiniotis
64fb26b086 Add option to nvidia-ctk to enable CDI in docker
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-23 10:15:58 +01:00
Evan Lezar
16a4de1a2b Merge branch 'CNT-4645/add-nvswitch-devices' into 'main'
Add support for NVIDIA_NVSWITCH envvar to inject nvidia-nvswitch device nodes

See merge request nvidia/container-toolkit/container-toolkit!502
2023-11-22 21:00:54 +00:00
Evan Lezar
efae501834 Add support for injecting NVSWITCH devices
This change adds support for an NVIDIA_NVSWITCH environment variable.
When set to `enabled` this striggers the injection of all available
/dev/nvidia-nvswitch* device nodes.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 21:59:39 +01:00
Evan Lezar
3045954cd9 Consolidate GDS and MOFED modifiers
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 21:59:17 +01:00
Evan Lezar
886c6b973e Merge branch 'library-search-path-cdi-generate' into 'main'
Locate libnvidia-egl-gbm.so.*

See merge request nvidia/container-toolkit/container-toolkit!504
2023-11-22 20:58:55 +00:00
Evan Lezar
1ab3ef0af4 Locate libnvidia-egl-gbm.so.*
Searching for a pattern allows platforms where no `.so` symlink
exists to function as expected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 21:57:36 +01:00
Evan Lezar
dd9b13cb58 Merge branch 'bump-changelog' into 'main'
Add missing changelog

See merge request nvidia/container-toolkit/container-toolkit!503
2023-11-22 20:46:09 +00:00
Evan Lezar
8a7a6e8a70 Add missing changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 20:51:12 +01:00
Evan Lezar
1909b1fe60 Merge branch 'library-search-path-cdi-generate' into 'main'
Allow search paths when locating libcuda.so

See merge request nvidia/container-toolkit/container-toolkit!462
2023-11-22 19:49:15 +00:00
Evan Lezar
881e440d22 Merge branch 'include-nvoptix' into 'main'
Update list of graphics mounts

See merge request nvidia/container-toolkit/container-toolkit!501
2023-11-22 19:47:52 +00:00
Evan Lezar
7d79b311d8 Include vulkan/icd.d/nvidia_layers.json
This change includes vulkan/icd.d/nvidia_layers.json in the list of
possible graphics mounts.
2023-11-22 13:54:12 +01:00
Evan Lezar
b46bc10c44 Include nvidia/nvoptix.bin in graphics mounts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 13:53:59 +01:00
Evan Lezar
bbd9222206 Add driver root abstraction
This change adds a driver root abstraction that defines how
libraries are located relative to the root. This allows for
this driver root to be constructed once and passed to discovery
code.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 13:27:48 +01:00
Evan Lezar
f20ab793a2 Add support for specifying search paths
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-22 13:27:47 +01:00
Evan Lezar
e5391760e6 Remove duplicate not found error
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 18:09:42 +01:00
Evan Lezar
5505886655 Use options for NewLibraryLocator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 18:08:53 +01:00
Evan Lezar
64f554ef41 Add builder for file locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 18:07:47 +01:00
Evan Lezar
fc8c5f82dc Merge branch 'fix-ldconfig-resolution' into 'main'
Resolve LDConfig path

See merge request nvidia/container-toolkit/container-toolkit!490
2023-11-21 16:45:21 +00:00
Evan Lezar
d792e64f38 Resolve ldconfig path in update-ldcache hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 15:31:12 +01:00
Evan Lezar
232df647c1 Resolve LDConfig path passed to nvidia-container-cli
Instead of relying solely on a static config, we resolve the path
to ldconfig. The path is checked for existence and a .real suffix is preferred.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-21 15:31:12 +01:00
Evan Lezar
adc516fd59 Merge branch 'ctk-hook-chmod-improve-eperm-handling' into 'main'
nvidia-ctk hook chmod: Improve permission error handling

See merge request nvidia/container-toolkit/container-toolkit!496
2023-11-21 11:05:03 +00:00
Evan Lezar
039d7fd324 Merge branch 'remove-config-import-from-discover' into 'main'
Remove NewGraphicsDiscoverer API simplification

See merge request nvidia/container-toolkit/container-toolkit!498
2023-11-20 22:52:02 +00:00
Christopher Desiniotis
2768023ff5 Merge branch 'gen-cdi-spec-at-runtime' into 'main'
Automatically generate CDI spec for the runtime.nvidia.com/gpu=all device

See merge request nvidia/container-toolkit/container-toolkit!492
2023-11-20 22:15:05 +00:00
Evan Lezar
255181a5ff Rename NewGraphicsDiscoverer as NewDRMNodesDiscoverer
This change renames NewGraphicsDiscoverer to NewDRMNodesDiscoverer and
instead calls NewGraphicsMountsDiscoverer explicitly when constructing
a graphics modifier.

This avoids the import of config.Config into the discover package
which leads to a transitive dependency on toml-specifics and
requires that the vendor/github.com/pelletier/ package
be vendored in to consumers.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 23:10:57 +01:00
Christopher Desiniotis
dc36ea76e8 Automatically generate CDI spec for the runtime.nvidia.com/gpu=all device
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-11-20 13:35:07 -08:00
Evan Lezar
e315d7d74b Merge branch 'allow-separate-dev-root' into 'main'
Add devRoot option to CDI api

See merge request nvidia/container-toolkit/container-toolkit!497
2023-11-20 21:10:12 +00:00
Evan Lezar
b4c6832828 Add additional debug
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 21:29:35 +01:00
Evan Lezar
3a96a00362 Simplify meta device discovery
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 21:29:35 +01:00
Evan Lezar
00a712d018 Add --dev-root option to CDI spec generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 21:29:35 +01:00
Evan Lezar
d4e21fdd10 Add devRoot option to CDI api
A driverRoot defines both the driver library root and the
root for device nodes. In the case of preinstalled drivers or
the driver container, these are equal, but in cases such as GKE
they do not match. In this case, drivers are extracted to a folder
and devices exist at the root /.

The changes here add a devRoot option to the nvcdi API that allows the
parent of /dev to be specified explicitly.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 21:29:35 +01:00
Ievgen Popovych
9085cb7dd5 nvidia-ctk hook chmod: Move file mode parsing into flag validation function
Signed-off-by: Ievgen Popovych <jmennius@gmail.com>
2023-11-20 14:49:29 +02:00
Evan Lezar
f6e3593a72 Merge branch 'update-go-nvlib' into 'main'
Update to github.com/NVIDIA/go-nvlib@9fd385bace0d2b8949cf60d9fcaab6169bde87ef

See merge request nvidia/container-toolkit/container-toolkit!495
2023-11-20 11:25:25 +00:00
Evan Lezar
f2ef7ee661 Update to github.com/NVIDIA/go-nvlib@9fd385bace0d2b8949cf60d9fcaab6169bde87ef
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 12:25:08 +01:00
Evan Lezar
27777f4dab Merge branch 'bump-libnvidia-container' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!494
2023-11-20 11:11:15 +00:00
Evan Lezar
34175f15d3 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-20 10:57:35 +01:00
Ievgen Popovych
eb35d9b30a nvidia-ctk hook chmod: Ignore permission errors
In some cases we might get a permission error trying to chmod -
most likely this is due to something beyond our control
like whole `/dev` being mounted.
Do not fail container creation in this case.

Due to loosing control of the program after `exec()`-ing `chmod(1)` program
and therefore not being able to handle errors -
refactor to use `chmod(2)` syscall instead of `exec()` `chmod(1)` program.

Fixes: #143
Signed-off-by: Ievgen Popovych <jmennius@gmail.com>
2023-11-20 01:29:51 +02:00
Ievgen Popovych
f1d32f2cd3 nvidia-ctk hook chmod: Only chmod if desired permissions are different
This is to avoid any unnecessary potential errors (e.g. due to permissions).

Fixes: #143
Signed-off-by: Ievgen Popovych <jmennius@gmail.com>
2023-11-20 01:18:36 +02:00
Evan Lezar
ee713adf33 Merge branch 'fix-update-ldcache' into 'main'
Allow ldcache update in container to be skipped

See merge request nvidia/container-toolkit/container-toolkit!485
2023-11-18 11:37:56 +00:00
Christopher Desiniotis
33cb1b68df Merge branch 'deduplicate-symlinks' into 'main'
Deduplicate symlinks

See merge request nvidia/container-toolkit/container-toolkit!493
2023-11-17 17:09:18 +00:00
Evan Lezar
6dc9ee3f33 Allow ldcache update in container to be skipped
This change skips the update of ld.cache in the container if it
doesn't exist. Instead, the -N flag is used to only create the
relevant symlinks.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-17 11:56:19 +01:00
Evan Lezar
e609e41a64 Allow multiple pattern matches for symlinks
Since we allow pattern inputs for locating symlinks we could have
multiples. The error being checked is resolved by the deduplication.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-17 10:43:52 +01:00
Evan Lezar
80ecd024ee Add tests for library locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-17 10:43:52 +01:00
Evan Lezar
e8dbb216a5 Return empty ldcache if cache does not exist
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-17 10:14:03 +01:00
Christopher Desiniotis
f5d8d248b7 Deduplicate symlinks
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-11-16 17:57:31 -08:00
Evan Lezar
5d7ee25b37 Merge branch 'migrate-go-nvlib' into 'main'
Use github.com/NVIDIA/go-nvlib imports

See merge request nvidia/container-toolkit/container-toolkit!491
2023-11-15 20:39:35 +00:00
Evan Lezar
2ff2d84283 Update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-15 21:38:54 +01:00
Evan Lezar
c63fb35ba8 Use github.com/NVIDIA/go-nvlib imports
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-15 21:38:26 +01:00
Evan Lezar
da0755769f Merge branch 'improve-library-lookup' into 'main'
Make CDI-based library discovery more robust

See merge request nvidia/container-toolkit/container-toolkit!488
2023-11-06 18:17:25 +00:00
Evan Lezar
04b28d116c Make library lookups more robust
These changes make library lookups more robust. The core change is that
library lookups now first look a set of predefined locations before checking
the ldcache. This also handles cases where an ldcache is not available more
gracefully.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-06 12:15:28 -06:00
Evan Lezar
65b0b2b5e0 Merge branch 'make-graphics-optional' into 'main'
Make discovery of graphics libraries optional

See merge request nvidia/container-toolkit/container-toolkit!489
2023-11-03 21:22:47 +00:00
Evan Lezar
8d52cc18ce Make discovery of graphics libraries optional
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-03 22:15:41 +01:00
Evan Lezar
c25376afa0 Merge branch 'update-cdi' into 'main'
Use tags.cncf.io for CDI imports

See merge request nvidia/container-toolkit/container-toolkit!487
2023-11-02 09:14:30 +00:00
Evan Lezar
7cc0c1f1cf Run go mod tidy
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-01 12:42:07 +01:00
Evan Lezar
e56bb09889 Use tags.cncf.io for CDI imports
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-11-01 12:40:51 +01:00
Evan Lezar
c7a7083e64 Merge branch 'add-option-to-enable-cdi' into 'main'
Add cdi.enabled config option to nvidia-ctk runtime configure command

See merge request nvidia/container-toolkit/container-toolkit!486
2023-11-01 09:54:19 +00:00
Evan Lezar
61595aa0fa Add cdi.enabled option to runtime configure
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-31 17:23:55 +01:00
Evan Lezar
b8b134f389 Ensure git works in container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-31 15:25:41 +01:00
Evan Lezar
c5a9ed6594 Merge branch 'support-cdi-mount-devices' into 'main'
Support CDI devices as mounts

See merge request nvidia/container-toolkit/container-toolkit!480
2023-10-27 20:04:46 +00:00
Evan Lezar
833254fa59 Support CDI devices as mounts
This change allows CDI devices to be requested as mounts in the
container. This enables their use in environments such as kind
where environment variables or annotations cannot be used.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-27 21:24:53 +02:00
Evan Lezar
1b1aae9c4a Merge branch 'golangci-lint' into 'main'
Use golanglint-ci to check code in NVIDIA Container Toolkit

See merge request nvidia/container-toolkit/container-toolkit!474
2023-10-24 18:59:46 +00:00
Evan Lezar
acc50969dc Fix ifElseChain lint errors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
48d68e4eff Add nolint for exec calls
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
709e27bf4b Fix implicit memory aliasing in for loop
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
1b16b341dd Fix default permissions
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
2e1f94aedf Fix append assignments
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
2f48ab99c3 Address singleCaseSwitch errors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
f8870b31be Fix filepath.Join with single arg
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
73857eb8e3 Fix unnecessary conversion
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:11:34 +02:00
Evan Lezar
dd2f218226 Use MustCompile for static regexp
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
8a9f367067 Check returned error values
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
e0df157f70 Remove unnecessary assignment to the blank identifier
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
f2c9937ca8 Use cdi parser package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
12dc12ce09 Fix misspellings
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
2fad708556 Address ioutil deprecation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
73749285d5 Remove unused loadSaver interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
49dbae5c32 Use .golangci config for toml.Delete issues
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
d8d56e18f9 Add nolint for dxcore
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
7f610d19ed Add .golangci.yml config
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
3eca7dfd7b Replace check targets with golangci-lint
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-24 20:00:24 +02:00
Evan Lezar
b02d5538b0 Merge branch 'update-container-runtime-readme' into 'main'
Update nvidia-container-runtime README

See merge request nvidia/container-toolkit/container-toolkit!484
2023-10-24 13:05:12 +00:00
Evan Lezar
ebff62f56b Update nvidia-container-runtime README
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-23 13:35:19 +02:00
Evan Lezar
53b24618a5 Merge branch 'bump-cuda-version' into 'main'
Add missing changelog entry

See merge request nvidia/container-toolkit/container-toolkit!483
2023-10-19 11:22:22 +00:00
Evan Lezar
867151fe25 Bump version to v1.14.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-19 13:20:23 +02:00
Evan Lezar
82fc309c4e Merge branch 'bump-cuda-version' into 'main'
Bump version to v1.14.3

See merge request nvidia/container-toolkit/container-toolkit!482
2023-10-19 09:32:30 +00:00
Evan Lezar
27521c0b5e Bump CUDA base image version to 12.2.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-19 10:47:13 +02:00
Evan Lezar
e611d4403b Bump version to v1.14.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-10-19 10:46:46 +02:00
Evan Lezar
807c87e057 Merge branch 'fix-symlink-discovery' into 'main'
Fix bug in creating symlinks in containers on Tegra-based systems

See merge request nvidia/container-toolkit/container-toolkit!479
2023-09-25 10:01:08 +00:00
Evan Lezar
f63ad3d9e7 Refactor symlink filter
This change refactors the use of the symlink filter to make it extendible.
A blocked filter can be set on the Tegra CSV discoverer to ensure that the correct
symlink libraries are filtered out. Here, globs can be used to select mulitple libraries,
and a **/ prefix on the globs indicates that the pattern that follows is only applied to
the filename of the symlink entry in the CSV file.

A --csv.ignore-pattern command line argument is added to the nvidia-ctk cdi generate
command that allows this to be set.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:04:06 +02:00
Evan Lezar
c4b4478d1a Remove default symlink filter
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:02:51 +02:00
Evan Lezar
963250a58f Refactor CSV discovery for testability
This change improves the testibility of the CSV discoverer.
This is done by adding injection points for mocks for library discovery and
symlink resolution.

Note that this highlights a bug in the current implementation where the
library filter causes valid symlinks to be skipped.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:02:30 +02:00
Evan Lezar
be570fce65 Bump version to 1.14.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-22 22:02:28 +02:00
Evan Lezar
6094effd58 Merge branch 'debug-no-cgroups' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!478
2023-09-07 15:55:06 +00:00
Evan Lezar
7187608a36 Update libnvidia-container 2023-09-07 15:55:06 +00:00
Evan Lezar
a54d9d2118 Merge branch 'debug-no-cgroups' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!477
2023-09-07 15:36:23 +00:00
Evan Lezar
56dd69ff1c Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-07 17:35:48 +02:00
Evan Lezar
89240cecae Merge branch 'debug-no-cgroups' into 'main'
Add required option to new toml config

See merge request nvidia/container-toolkit/container-toolkit!476
2023-09-07 11:16:34 +00:00
Evan Lezar
4ec9bd751e Add required option to new toml config
This change adds a "required" option to the new toml config
that controls whether a default config is returned or not.
This is useful from the NVIDIA Container Runtime Hook, where
/run/driver/nvidia/etc/nvidia-container-runtime/config.toml
is checked before the standard path.

This fixes a bug where the default config was always applied
when this config was not used.

See https://github.com/NVIDIA/nvidia-container-toolkit/issues/106

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-07 11:56:01 +02:00
Evan Lezar
d74f7fef4e Update libnvidia-container to fix rpm builds
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-07 11:55:46 +02:00
Evan Lezar
538d4020df Bump version to v1.14.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-06 17:51:52 +02:00
Evan Lezar
f2bd3173d4 Merge branch 'bump-version-v1.14.0' into 'main'
Bump verison to v1.14.0

See merge request nvidia/container-toolkit/container-toolkit!475
2023-08-29 14:45:37 +00:00
Evan Lezar
2bf8017516 Bump verison to v1.14.0
Note that v1.14.0-rc.3 was an internal-only release.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-29 16:03:41 +02:00
Evan Lezar
2a3afdd5d9 Merge branch 'fix-platform-detection' into 'main'
Add UsesNVGPUModule info function

See merge request nvidia/container-toolkit/container-toolkit!473
2023-08-28 15:58:41 +00:00
Evan Lezar
1dc028cdf2 Add UsesNVGPUModule info function
This change adds a UsesNVGPUModule function that checks whether the nvgpu
kernel module is used by NVML. This allows for more robust detection of
Tegra-based platforms where libnvidia-ml.so is supported to enumerate the
iGPU.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-25 11:24:34 +02:00
Evan Lezar
72c56567fe Merge branch 'CNT-4496/remove/sys/devices/soc' into 'main'
Remove /sys/devices/soc0/family from CDI spec

See merge request nvidia/container-toolkit/container-toolkit!472
2023-08-25 08:26:15 +00:00
Evan Lezar
ca1055588d Remove /sys/devices/soc0/family from CDI spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-25 10:25:33 +02:00
Evan Lezar
fca30d7acc Merge branch 'fix-config-file' into 'main'
Properly create output for config file

See merge request nvidia/container-toolkit/container-toolkit!471
2023-08-23 08:55:58 +00:00
Evan Lezar
5bf2209fdb Properly create output for config file
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-23 09:41:46 +02:00
Evan Lezar
f86a5abeb6 Merge branch 'CNT-4478/fix-unknown-devices' into 'main'
Update go-nvlib dependency to  v0.0.0-20230818092907-09424fdc8884

See merge request nvidia/container-toolkit/container-toolkit!470
2023-08-21 09:05:58 +00:00
Evan Lezar
9ac313f551 Instantiate nvpci.Interface with logger
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-18 11:40:17 +02:00
Evan Lezar
546f810159 Update go-nvlib dependency to v0.0.0-20230818092907-09424fdc8884
This change updates go-nvlib to include logic to skip NVIDIA PCI-E
devices where the name or class id is not known.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-18 11:40:15 +02:00
Evan Lezar
7affdafcd3 Merge branch 'CNT-4286/set-nvidia-visible-devices-to-void' into 'main'
Set NVIDIA_VISIBLE_DEVICES=void in toolkit-container

See merge request nvidia/container-toolkit/container-toolkit!469
2023-08-15 10:36:09 +00:00
Evan Lezar
7221b6b24b Set NVIDIA_VISIBLE_DEVICES=void in toolkit-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-15 11:52:56 +02:00
Tariq Ibrahim
f904ec41eb Merge branch 'log-unresolved-devices' into 'main'
add a warning statement listing unresolved CDI devices

See merge request nvidia/container-toolkit/container-toolkit!461
2023-08-14 17:22:36 +00:00
Evan Lezar
e7ae0f183c Merge branch 'rename-library-search-path' into 'main'
Add library-search-path option to cdi generate

See merge request nvidia/container-toolkit/container-toolkit!468
2023-08-14 13:49:26 +00:00
Evan Lezar
86df7c6696 Add library-search-path option to cdi generate
This change renames the csv.library-search-path option to
library-search-path so as to be more generally applicable in
future. Note that the option is still only applied in csv mode.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 15:04:33 +02:00
Evan Lezar
99923b57b8 Merge branch 'add-config-set-command' into 'main'
Allow config options to be set usign the nvidia-ctk config command

See merge request nvidia/container-toolkit/container-toolkit!464
2023-08-14 11:18:57 +00:00
Evan Lezar
4addb292b1 Extend nvidia-ctk config command to allow options to be set
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 11:33:26 +02:00
Evan Lezar
149a8d7bd8 Simplify nvidia-ctk config default command
This chagne simplifies the nvidia-ctk config default command.
By default it now outputs the default config to STDOUT, and can
optionally output this to file.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 11:32:54 +02:00
Evan Lezar
a69657dde7 Add config.Toml type to handle config files
This change introduced a config.Toml type that is used as the base for
config file processing and manipulation. This ensures that configs --
including commented values -- can be handled consistently.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 11:32:54 +02:00
Evan Lezar
c2d4de54b0 Add function to get config file path. 2023-08-14 11:32:54 +02:00
Evan Lezar
5216e89a70 Merge branch 'refactor-hook-configs' into 'main'
Migrate to internal/config.Config structs for the NVIDIA Container Runtime Hook config.

See merge request nvidia/container-toolkit/container-toolkit!463
2023-08-14 09:23:43 +00:00
Evan Lezar
96766aa719 Remove BurntSushi/toml go dependency
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:40:42 +02:00
Evan Lezar
3670e7b89e Refactor loading of hook configs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:40:42 +02:00
Evan Lezar
b18ac09f77 Refactor handling of DriverCapabilities
This change consolidates the handling of NVIDIA_DRIVER_CAPABILITIES in the
interal/image package.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:40:42 +02:00
Evan Lezar
4dcaa61167 Use internal/config structs in hook
This change ensures that the Config structs from internal.Config
are used for the NVIDIA Container Runtime Hook config too.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:40:41 +02:00
Evan Lezar
8bf52e1dec Export config.GetDefault function
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-14 10:35:33 +02:00
Evan Lezar
e4722e9642 Merge branch 'add-runtime-hook-config' into 'main'
Add support for creating oci hook to nvidia-ctk

See merge request nvidia/container-toolkit/container-toolkit!467
2023-08-14 08:33:56 +00:00
Evan Lezar
65f6f46846 Remove installation of oci-nvidia-hook files in RPM packages
This change removes installation of the oci-nvidia-hook files.
These files conflict with CDI use in runtimes that support it.

The use of the hook should be considered deprecated on these platforms.

If a hook is required, the

nvidia-ctk runtime configure --config-mode=oci-hook

command should be used to create the hook file(s).

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-11 16:34:58 +02:00
Evan Lezar
f6a4986c15 Add support for creating oci hook to nvidia-ctk
This change extends the nvidia-ctk runtime configure command
with a --config-mode=oci-hook that creates an OCI hook json file.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-11 16:34:58 +02:00
Tariq Ibrahim
6d3b29f3ca add a warning statement listing unresolved CDI devices 2023-08-10 08:38:33 -07:00
Evan Lezar
30c0848487 Merge branch 'fix-libnvidia-container0-url' into 'main'
Use stable repo URL directly

See merge request nvidia/container-toolkit/container-toolkit!465
2023-08-10 14:31:06 +00:00
Evan Lezar
ee1b0c3e4f Use stable repo URL directly
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-10 16:30:25 +02:00
Evan Lezar
37ac294a11 Merge branch 'add-deb-and-rpm-repos' into 'main'
Publish generic deb and rpm repos.

See merge request nvidia/container-toolkit/container-toolkit!460
2023-08-10 13:35:12 +00:00
Evan Lezar
0d862efa9c Publish generic deb and rpm repos.
This change ensures that the centos7 and ubuntu18.04 packages are
published to the generic rpm and deb repos, respectively.

All other packages except the centos8-ppc64le packages are skipped
as these use cases are covered by the generic packages.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-09 17:54:41 +02:00
Evan Lezar
22d7b52a58 Merge branch 'set-libnvidia-container-version' into 'main'
Set libnvidia-container version to toolkit version

See merge request nvidia/container-toolkit/container-toolkit!459
2023-08-09 12:10:36 +00:00
Evan Lezar
9f1c9b2a31 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-09 13:24:48 +02:00
Evan Lezar
0483eebc7b Set libnvidia-container version to toolkit version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-09 13:24:08 +02:00
Evan Lezar
54aacb4245 Merge branch 'list-shows-errors' into 'main'
Log registry refresh errors in cdi list

See merge request nvidia/container-toolkit/container-toolkit!458
2023-08-08 15:26:42 +00:00
Evan Lezar
5cb367e771 Merge branch 'sort-cdi-entities' into 'main'
Sort CDI entities in generated CDI specifications

See merge request nvidia/container-toolkit/container-toolkit!457
2023-08-08 14:11:17 +00:00
Evan Lezar
feb069a2e9 Log registry refresh errors in cdi list
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-08 16:00:36 +02:00
Evan Lezar
cbdbcd87ff Add sorter to simplifying transformer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-08 15:27:04 +02:00
Evan Lezar
7a4d2cff67 Add merged CDI spec transformer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-08 14:45:31 +02:00
Evan Lezar
5638f47cb0 Add sort CDI spec transoformer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-08 14:45:31 +02:00
Evan Lezar
4c513d536b Merge branch 'improve-csv-cdi-spec-generation' into 'main'
Rework CSV file support to enable more robust CDI spec generation

See merge request nvidia/container-toolkit/container-toolkit!447
2023-08-04 16:40:15 +00:00
Evan Lezar
8553fce68a Specify library search paths for CSV CDI spec generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
03a4e2f8a9 Skip symlinks to libraries
In order to properly handle systems with both iGPU and dGPU
drivers included, we skip "sym" mount specifications which
refer to .so or .so.[1-9] files.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
918bd03488 Move tegra-specifics to new package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
01a7f7bb8e Explicitly generate CDI spec for CSV mode
This change explicitly generates a CDI specification from
the supplied CSV files when cdi mode is detected. This
ensures consistency between the behaviour on Tegra-based
systems.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
6b48cbd1dc Move CDI modifier to separate package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 16:49:30 +02:00
Evan Lezar
64a0a67eb4 Merge branch 'bump-version' into 'main'
Bump version to 1.14.0-rc.3

See merge request nvidia/container-toolkit/container-toolkit!456
2023-08-04 14:16:56 +00:00
Evan Lezar
93d9e18f04 Update libnvidia-container to 1.14.0~rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 15:17:00 +02:00
Evan Lezar
7c2c42b8da Bump version to 1.14.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-08-04 15:12:29 +02:00
Evan Lezar
e4fee325cb Merge branch 'fix-hook' into 'main'
Handle empty root in config

See merge request nvidia/container-toolkit/container-toolkit!454
2023-07-19 12:45:49 +00:00
Evan Lezar
ec63533eb1 Ensure default config comments are consistent
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-19 14:37:49 +02:00
Evan Lezar
e51621aa7f Handle empty root in config
If the config.toml has an empty root specified, this could be
passed to the NVIDIA Container CLI through the --root flag
which causes argument parsing to fail. This change only
adds the --root flag if the config option is specified
and is non-empty.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-19 14:02:23 +02:00
Evan Lezar
80a78e60d1 Merge branch 'device-namer' into 'main'
Refactor device namer

See merge request nvidia/container-toolkit/container-toolkit!453
2023-07-18 14:16:01 +00:00
Evan Lezar
9f46c34587 Support device name strategies for Tegra devices
This change generates CDI specifications for Tegra devices
with the nvidia.com/gpu=0 name by default. The type-index
nameing strategy is also supported and will generate a device
with the name nvidia.com/gpu=gpu0.

The uuid naming strategy will raise an error if selected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 16:13:38 +02:00
Evan Lezar
f07a0585fc Refactor device namer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 16:13:37 +02:00
Evan Lezar
32ec10485e Merge branch 'lookup-functional-options' into 'main'
Use functional options when creating Symlink and Directory locators

See merge request nvidia/container-toolkit/container-toolkit!452
2023-07-18 13:39:23 +00:00
Evan Lezar
ce7d5f7a51 Use functional options when constructing direcory locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:36:03 +02:00
Evan Lezar
9b64d74f6a Use functional options when constructing Symlink locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:31:15 +02:00
Evan Lezar
99cc0aebd6 Merge branch 'pass-image-to-csv-constructor' into 'main'
Pass image when constructing CSV modifier

See merge request nvidia/container-toolkit/container-toolkit!451
2023-07-18 13:30:53 +00:00
Evan Lezar
cca343abb0 Pass image when constructing CSV modifier
Since the incoming OCI spec has already been parsed and used to
construct a CUDA image representation, pass this to the CSV
modifier constructor instead of re-creating an image representation.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:27:16 +02:00
Evan Lezar
f08e48e700 Merge branch 'set-cdi-spec-dirs-in-config' into 'main'
Set default spec dirs at config level

See merge request nvidia/container-toolkit/container-toolkit!450
2023-07-18 13:25:29 +00:00
Evan Lezar
e2f8d2a15f Set default spec dirs at config level
This change sets the default CDI spec dirs at a config level instead
of when a CDI runtime modifier is constructed. This makes this setting
consistent with other options such as the nvidia-ctk path.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:23:09 +02:00
Evan Lezar
2c5761d32e Merge branch 'bug-fixes' into 'main'
Minor fixes and cleanups

See merge request nvidia/container-toolkit/container-toolkit!449
2023-07-18 13:20:46 +00:00
Evan Lezar
3c9d95c62f Fix usage string in CLI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:20:24 +02:00
Evan Lezar
481000b4ce Remove unused argument
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:20:24 +02:00
Evan Lezar
b2126722e5 Update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:16:25 +02:00
Evan Lezar
083b789102 Use cdi parser package for IsQualiedName
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-18 15:16:25 +02:00
Evan Lezar
a564b38b7e Merge branch 'remove-centos7-aarch64-scan' into 'main'
Remove centos7-arm64 scan

See merge request nvidia/container-toolkit/container-toolkit!445
2023-07-17 14:29:17 +00:00
Evan Lezar
5427249cfc Remove centos7-arm64 scan
Since we don't publish a centos7-arm64 image, the scan does not
make sense.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-17 16:28:31 +02:00
Evan Lezar
032982ab9c Merge branch 'bump-dependencies' into 'main'
Bump dependencies

See merge request nvidia/container-toolkit/container-toolkit!444
2023-07-17 14:13:12 +00:00
Evan Lezar
96aeb9bf64 Update container-device-interface to v0.6.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-17 14:12:06 +02:00
Evan Lezar
c98f6ea395 Update containerized docker files for golang 1.20.5
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-17 14:10:05 +02:00
Evan Lezar
073f9cf120 Bump golang version to 1.20.5
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-17 14:06:48 +02:00
Evan Lezar
1fdd0c1248 Merge branch 'bump-changelog' into 'main'
Fix changelog for 1.14.0-rc.2

See merge request nvidia/container-toolkit/container-toolkit!443
2023-07-17 12:04:39 +00:00
Evan Lezar
a883c65dd6 Fix changelog for 1.14.0-rc.2 2023-07-17 12:04:38 +00:00
Evan Lezar
aac39f89cc Merge branch 'update-libnvidia-container' into 'main'
Include Shared Compiler Library (libnvidia-gpucomp.so) in the list of compute libaries.

See merge request nvidia/container-toolkit/container-toolkit!442
2023-07-13 12:57:40 +00:00
Evan Lezar
e25576d26d Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-13 14:15:33 +02:00
Evan Lezar
3626a13273 Merge branch 'fix-disable-require' into 'main'
Return empty requirements if NVIDIA_DISABLE_REQUIRE is true

See merge request nvidia/container-toolkit/container-toolkit!438
2023-07-11 11:48:35 +00:00
Evan Lezar
6750ce1667 Print invalid version on parse error
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 13:47:39 +02:00
Evan Lezar
1081cecea9 Return empty requirements if NVIDIA_DISABLE_REQUIRE is true
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 13:47:37 +02:00
Evan Lezar
7451e6eb75 Merge branch 'custom-firmware-paths' into 'main'
Add firmware search paths when generating CDI specifications

See merge request nvidia/container-toolkit/container-toolkit!439
2023-07-11 09:16:33 +00:00
Evan Lezar
81908c8cc9 Search custom firmware paths first
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 10:34:14 +02:00
Evan Lezar
d3d41a3e1d Simplify handling of custom firmware path
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 10:31:50 +02:00
Evan Lezar
0a37f8798a Add firmware search paths when generating CDI specifications
Path to locate the GSP firmware is explicitly set to /lib/firmware/nvidia.
Users may chose to install the GSP firmware in alternate locations where
the kernel would look for firmware on the root filesystem.

Add locate functionality which looks for the GSP firmware files in the
same location as the kernel would
(https://docs.kernel.org/driver-api/firmware/fw_search_path.html).

The paths searched in order are:
- path described in /sys/module/firmware_class/parameters/path
- /lib/firmware/updates/UTS_RELEASE/
- /lib/firmware/updates/
- /lib/firmware/UTS_RELEASE/
- /lib/firmware/

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 10:31:50 +02:00
Evan Lezar
4f89b60ab9 Merge branch 'remove-experimental-runtime' into 'main'
Remove NVIDIA experimental runtime from toolkit container

See merge request nvidia/container-toolkit/container-toolkit!238
2023-07-10 10:25:55 +00:00
Evan Lezar
0938576618 Remove NVIDIA experimental runtime from toolkit container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-10 11:44:55 +02:00
Evan Lezar
4ca8d4173a Merge branch 'revert-d5cbe48d' into 'main'
Revert "Merge branch 'bump-golang-1.20.5' into 'main'"

See merge request nvidia/container-toolkit/container-toolkit!437
2023-07-05 15:12:01 +00:00
Evan Lezar
978549dc58 Revert "Merge branch 'bump-golang-1.20.5' into 'main'"
This reverts merge request !436
2023-07-05 15:11:41 +00:00
Evan Lezar
d5cbe48d59 Merge branch 'bump-golang-1.20.5' into 'main'
Bump golang version to 1.20.5

See merge request nvidia/container-toolkit/container-toolkit!436
2023-07-05 14:07:58 +00:00
Evan Lezar
e8ec795883 Bump golang version to 1.20.5
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 16:07:41 +02:00
Evan Lezar
62bc6b211f Merge branch 'bump-cuda-12.2.0' into 'main'
Bump cuda base image to 12.2.0

See merge request nvidia/container-toolkit/container-toolkit!435
2023-07-05 10:11:27 +00:00
Evan Lezar
6fac6c237b Bump cuda base image to 12.2.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:28:32 +02:00
Evan Lezar
20ff4e2fb9 Merge branch 'generate-default-config-post-install' into 'main'
Ensure that default config is created on the file system as a post-install step

See merge request nvidia/container-toolkit/container-toolkit!431
2023-07-05 09:27:29 +00:00
Evan Lezar
f78d3a858f Rework default config generation to not use toml
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:26:55 +02:00
Evan Lezar
bc6ca7ff88 Generate default config post-install
The debian and rpm packages are updated to trigger the generation of
of a default config if no config exists at the expected location.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:26:53 +02:00
Evan Lezar
65ae6f1dab Fix generation of default config
This change ensures that the nvidia-ctk config default command
generates a config file that is compatible with the official documentation
to, for example, disable cgroups in the NVIDIA Container CLI.

This requires that whitespace around comments is stripped before outputing the
contets.

This also adds an option to load a config and modify it in-place instead. This can
be triggered as a post-install step, for example.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:26:04 +02:00
Evan Lezar
ba24338122 Add quiet mode to nvidia-ctk cli
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-05 11:26:04 +02:00
Evan Lezar
2299c9588d Merge branch 'create-config-folders' into 'main'
Ensure that folders exist when creating config files

See merge request nvidia/container-toolkit/container-toolkit!433
2023-07-05 09:25:28 +00:00
Evan Lezar
ba80d0318f Merge branch 'rpm-fix-missing-coreutils-during-install' into 'main'
RPM spec: Avoid scriptlet failure during initial system installation

See merge request nvidia/container-toolkit/container-toolkit!432
2023-07-05 08:43:26 +00:00
Evan Lezar
6342dae0e9 Ensure that parent directories exist for config files
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-03 15:30:31 +02:00
Evan Lezar
baf94181aa Add engine.Config to encapsulate writing
This change adds an engine.Config type to encapsulate the writing
of config files for container engines.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-03 15:26:47 +02:00
Evan Lezar
bbe9742c46 Merge branch 'switch-to-latest-dind' into 'main'
Switch to latest dind image for tests

See merge request nvidia/container-toolkit/container-toolkit!430
2023-06-30 09:46:49 +00:00
Evan Lezar
1447ef3818 Switch to latest dind image for tests
The stable-dind image is out of date and has not been updated for 3 years.
This change updates to the latest dind image.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-30 11:03:07 +02:00
Claudius Volz
5598dbf9d7 RPM spec: Only run fixup code if the package is being upgraded, to avoid a scenario where the coreutils (mkdir, cp) are not available yet during a fresh system installation.
Signed-off-by: Claudius Volz <c.volz@gmx.de>
2023-06-29 00:23:24 +02:00
Evan Lezar
8967e851c4 Merge branch 'fix-multiple-driver-roots-wsl' into 'main'
Fix bug with multiple driver store paths

See merge request nvidia/container-toolkit/container-toolkit!425
2023-06-27 14:15:38 +00:00
Evan Lezar
15378f6ced Merge branch 'fix-ordering-of-envvars' into 'main'
Ensure common envvars have higher precedence

See merge request nvidia/container-toolkit/container-toolkit!426
2023-06-27 13:26:34 +00:00
Evan Lezar
4d2e8d1913 Ensure common envvars have higher precedence
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-27 14:45:15 +02:00
Evan Lezar
4feaee0fe6 Merge branch 'bump-version' into 'main'
Bump version to v1.14.0-rc.2

See merge request nvidia/container-toolkit/container-toolkit!427
2023-06-27 12:38:05 +00:00
Evan Lezar
51984d49cf Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-27 14:37:26 +02:00
Evan Lezar
a6a8bb940c Bump version to v1.14.0-rc.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-27 14:02:59 +02:00
Evan Lezar
6265e34afb Fix bug with multiple driver store paths
This change uses the actual discovered path of nvidia-smi when
creating a symlink to the binary on WSL2 platforms.

This ensures that cases where multiple driver store paths are
detected are supported.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-26 21:37:14 +02:00
Evan Lezar
d08a2394b3 Merge branch 'fix-package-archive-script' into 'main'
Fix package archive script

See merge request nvidia/container-toolkit/container-toolkit!424
2023-06-26 11:46:26 +00:00
Evan Lezar
c0f1263d78 Fix package archive script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-26 13:46:08 +02:00
Evan Lezar
a25b1c1048 Merge branch 'fix-load-kernel-modules' into 'main'
Split internal system package

See merge request nvidia/container-toolkit/container-toolkit!420
2023-06-26 08:30:51 +00:00
Evan Lezar
99859e461d Merge branch 'import-wrapper-and-runtime' into 'main'
Import NVIDIA Docker and NVIDIA Container Runtime to in-tree folders

See merge request nvidia/container-toolkit/container-toolkit!418
2023-06-21 12:33:05 +00:00
Evan Lezar
d52dbeaa7a Split internal system package
This changes splits the functionality in the internal system package
into two packages: one for dealing with devices and one for dealing
with kernel modules. This removes ambiguity around the meaning of
driver / device roots in each case.

In each case, a root can be specified where device nodes are created
or kernel modules loaded.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-15 09:01:13 +02:00
Evan Lezar
c11c7695cb Merge branch 'update-go-nvlib' into 'main'
Update go-nvlib with new constructor API

See merge request nvidia/container-toolkit/container-toolkit!422
2023-06-14 22:50:12 +00:00
Evan Lezar
c4d3b13ae2 Update go-nvlib with new constructor
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-14 17:55:33 +02:00
Evan Lezar
bcf3a70174 Update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-14 17:55:18 +02:00
Evan Lezar
743d290577 Merge branch 'CNT-4301/resolve-auto-to-cdi' into 'main'
Resolve auto mode to cdi if all devices are cdi devices

See merge request nvidia/container-toolkit/container-toolkit!421
2023-06-13 14:48:32 +00:00
Evan Lezar
82347eb9bc Resolve auto mode as cdi for fully-qualified names
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-13 16:05:37 +02:00
Evan Lezar
84c7bf8b18 Minor refactor of mode resolver
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-13 16:04:03 +02:00
Evan Lezar
d92300506c Construct CUDA image object once
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-13 10:36:02 +02:00
Evan Lezar
2da32970b9 Merge branch 'refactor-logger' into 'main'
Refactor Logging

See merge request nvidia/container-toolkit/container-toolkit!416
2023-06-12 08:59:41 +00:00
Evan Lezar
1d0a733487 Replace logger.Warn(f) with logger.Warning(f)
This aligns better with klog used in other projects.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-12 10:48:04 +02:00
Evan Lezar
9464953924 Use logger.Interface when resolving auto mode
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-12 10:46:11 +02:00
Evan Lezar
c9b05d8fed Use logger Interface in runtime configuration
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-12 10:46:11 +02:00
Evan Lezar
a02bc27c3e Define a basic logger interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-12 10:46:10 +02:00
Carlos Eduardo Arango Gutierrez
6a04e97bca Merge branch 'use-same-envvars-for-runtime-config' into 'main'
Allow same envars for all runtime configs

See merge request nvidia/container-toolkit/container-toolkit!357
2023-06-12 08:32:34 +00:00
Evan Lezar
0780621024 Ensure runtime dir is set for crio cleanup
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-11 12:46:08 +02:00
Evan Lezar
2bc0f45a52 Remove unused constants and variables
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-11 11:38:22 +02:00
Evan Lezar
178eb5c5a8 Rework restart logic
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-10 12:41:53 +02:00
Evan Lezar
761fc29567 Add version info to config CLIs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-09 18:49:17 +02:00
Evan Lezar
9f5c82420a Refactor toolking to setup and cleanup configs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-09 18:49:15 +02:00
Evan Lezar
23041be511 Add runtimeDir as argument
Thsi change adds the --nvidia-runtime-dir as a command line
argument when configuring container runtimes in the toolkit container.
This removes the need to set it via the command line.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-09 18:48:34 +02:00
Evan Lezar
dcbf4b4f2a Allow same envars for all runtime configs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-09 18:46:34 +02:00
Evan Lezar
652345bc4d Add nvidia-container-runtime as folder
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-07 13:17:48 +02:00
Evan Lezar
69a1a9ef7a Add nvidia-docker as folder
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-07 13:15:23 +02:00
Evan Lezar
2464181d2b Remove runtime and wrapper submodules
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-07 13:09:18 +02:00
Evan Lezar
c3c1d19a5c Merge branch 'bump-libnvidia-container' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!415
2023-06-06 19:22:43 +00:00
Evan Lezar
75f288a6e4 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-06 21:22:25 +02:00
Evan Lezar
94259baea1 Merge branch 'CNT-4302/cdi-only' into 'main'
Skip additional modifications in CDI mode

See merge request nvidia/container-toolkit/container-toolkit!413
2023-06-06 18:29:59 +00:00
Evan Lezar
9e8ff003b6 Merge branch 'bump-libnvidia-container' into 'main'
Bump libnvidia container

See merge request nvidia/container-toolkit/container-toolkit!414
2023-06-06 15:29:35 +00:00
Evan Lezar
3dee9d9a4c Support OpenSSL 3 with the Encrypt/Decrypt library
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-06 16:47:21 +02:00
Evan Lezar
3f03a71afd Skip additional modifications in CDI mode
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-05 15:01:58 +02:00
Evan Lezar
093e93cfbf Merge branch 'bump-cuda-base-image' into 'main'
Bump CUDA baseimage version to 12.1.1

See merge request nvidia/container-toolkit/container-toolkit!412
2023-06-01 12:42:48 +00:00
Evan Lezar
78f619b1e7 Bump CUDA baseimage version to 12.1.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-06-01 14:42:30 +02:00
Evan Lezar
43c44a0f48 Merge branch 'treat-log-errors-as-non-fatal' into 'main'
Ignore errors when creating debug log file

See merge request nvidia/container-toolkit/container-toolkit!404
2023-06-01 07:44:56 +00:00
Evan Lezar
6b1e8171c8 Merge branch 'add-mod-probe' into 'main'
Add option to load NVIDIA kernel modules

See merge request nvidia/container-toolkit/container-toolkit!409
2023-05-31 18:14:45 +00:00
Evan Lezar
2e50b3da7c Merge branch 'ldcache-resolve-circular' into 'main'
Fix infinite recursion when resolving libraries in LDCache

Closes #13

See merge request nvidia/container-toolkit/container-toolkit!406
2023-05-31 17:35:27 +00:00
Evan Lezar
eca13e72bf Update CHANGELOG
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 19:33:31 +02:00
Evan Lezar
b64ba6ac2d Add option to create device nodes
This change adds a --create-device-nodes option to the
nvidia-ctk system create-dev-char-symlinks command to create
device nodes. The currently only creates control device nodes.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 19:31:38 +02:00
Evan Lezar
7b801a0ce0 Add option to load NVIDIA kernel modules
These changes add a --load-kernel-modules option to the
nvidia-ctk system commands. If specified the NVIDIA kernel modules
(nvidia, nvidia-uvm, and nvidia-modeset) are loaded before any
operations on device nodes are performed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 19:31:38 +02:00
Evan Lezar
528cbbb636 Merge branch 'fix-device-symlinks' into 'main'
Fix creation of device symlinks in /dev/char

See merge request nvidia/container-toolkit/container-toolkit!399
2023-05-31 17:31:04 +00:00
Evan Lezar
fd48233c13 Merge branch 'fix-ubi-pipeline-dependency' into 'main'
Fix ui8 image job dependencies

See merge request nvidia/container-toolkit/container-toolkit!411
2023-05-31 16:39:10 +00:00
Evan Lezar
b72764af5a Fix ui8 image job dependencies
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 17:54:41 +02:00
Evan Lezar
7e7c45fb0f Merge branch 'switch-to-centos7' into 'main'
Use centos7 packages instead of centos8 packages

See merge request nvidia/container-toolkit/container-toolkit!410
2023-05-31 15:17:08 +00:00
Evan Lezar
61f515b3dd Use centos7 packages in kitmaker archives
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 16:28:54 +02:00
Evan Lezar
e05686cbe8 Use centos7 packages for ubi8 image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-31 16:26:43 +02:00
Carlos Eduardo Arango Gutierrez
1fc8ae32bd Merge branch 'rorajani-rename-ci' into 'main'
Rename blossom ci file

See merge request nvidia/container-toolkit/container-toolkit!408
2023-05-30 11:41:33 +00:00
rorajani
e80d43f4c4 Rename blossom ci file
Signed-off-by: rorajani <rorajani@nvidia.com>
2023-05-30 16:56:32 +05:30
Evan Lezar
a6b0f45d2c Fix infinite recursion when resolving ldcache
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-30 11:03:36 +02:00
Evan Lezar
39263ea365 Add command to print ldcache
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-30 11:02:33 +02:00
Evan Lezar
9ea214d0b3 Correct typo in info command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-30 10:58:30 +02:00
Evan Lezar
5371ff039b Merge branch 'CNT-4285/add-runtime-hook-path' into 'main'
Add nvidia-contianer-runtime-hook.path config option

See merge request nvidia/container-toolkit/container-toolkit!401
2023-05-26 08:29:52 +00:00
Evan Lezar
315f4adb8f Check for required device majors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-26 10:24:36 +02:00
Evan Lezar
05632c0a40 Treat missing nvidia device majors as an error
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-26 10:24:36 +02:00
Evan Lezar
8df4a98d7b Merge branch 'pre-sanity-check' into 'main'
Add pre sanity check for gothub repo

See merge request nvidia/container-toolkit/container-toolkit!396
2023-05-25 14:35:43 +00:00
Evan Lezar
02656b624d Create log directory if required
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-25 15:17:00 +02:00
Evan Lezar
61af2aee8e Ignore errors when creating debug log file
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-25 14:44:00 +02:00
Evan Lezar
ddebd69128 Use installed hook path in toolkit container
This change uses the installed NVIDIA Container Runtime Hook wrapper
as the path in the applied config. This prevents conflicts with other
installations of the NVIDIA Container Toolkit.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-25 12:05:33 +02:00
Evan Lezar
ac11727ec5 Add nvidia-contianer-runtime-hook.path config option
This change adds an nvidia-container-runtime-hook.path config option
to allow the path used for the prestart hook to be overridden. This
is useful in cases where multiple NVIDIA Container Toolkit installations
are present.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-25 12:05:33 +02:00
Evan Lezar
5748d220ba Merge branch 'add-centos7-aarch64' into 'main'
Add centos7-aarch64 CI jobs

See merge request nvidia/container-toolkit/container-toolkit!403
2023-05-24 14:48:59 +00:00
Evan Lezar
3b86683843 Add centos7-aarch64 CI jobs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-24 16:48:14 +02:00
Evan Lezar
3bd5baa3c5 Merge branch 'add-centos7-aarch64' into 'main'
Add centos7-aarch64 targets

See merge request nvidia/container-toolkit/container-toolkit!402
2023-05-24 14:42:32 +00:00
Evan Lezar
330aa16687 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-24 15:32:34 +02:00
Evan Lezar
8a4d6b5bcf Add centos7-aarch64 targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-24 15:32:09 +02:00
Evan Lezar
40d0a88cf9 Merge branch 'update-go-nvlib' into 'main'
Update go-nvlib to skip non-MIG devices

See merge request nvidia/container-toolkit/container-toolkit!398
2023-05-24 08:38:21 +00:00
Evan Lezar
dc6a895db8 Merge branch 'pass-single-links-instead-of-csv' into 'main'
Pass individual links in create-symlinks hook instead of CSV filename

See merge request nvidia/container-toolkit/container-toolkit!394
2023-05-23 19:56:18 +00:00
Evan Lezar
3b1b89e6c0 Merge branch 'better-support-for-skipping-update' into 'main'
Skip update of components on SKIP_UPDATE_COMPONENTS=yes

See merge request nvidia/container-toolkit/container-toolkit!400
2023-05-23 19:17:29 +00:00
Evan Lezar
013a1b413b Fix ineffectual assignment
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 21:14:02 +02:00
Evan Lezar
3be16d8077 Create individual links instead of processing CSV
This change switches to generating a OCI runtime hook to create
individual symlinks instead of processing a CSV file in the hook.
This allows for better reuse of the logic generating CDI
specifications, for example.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 20:43:36 +02:00
Evan Lezar
927ec78b6e Add symlinks package with Resolve function
This change adds a symlinks.Resolve function for resolving symlinks and
updates usages across the code to make use of it.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 20:42:17 +02:00
Evan Lezar
8ca606f7ac Skip update of components on SKIP_UPDATE_COMPONENTS=yes
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 20:34:20 +02:00
Evan Lezar
e7d2a9c212 Merge branch 'CNT-1876/cdi-specs-from-csv' into 'main'
Add csv mode to CDI spec generation

See merge request nvidia/container-toolkit/container-toolkit!393
2023-05-23 14:47:19 +00:00
Evan Lezar
fcb4e379e3 Fix mode resolution tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 16:02:07 +02:00
rorajani
cda96f2f9e Add pre sanity
Signed-off-by: rorajani <rorajani@nvidia.com>
2023-05-22 20:39:50 +05:30
Evan Lezar
e11f65e51e Update go-nvlib to skip non-MIG devices
This change updates go-nvlib to ensure that non-migcapable GPUs
are skipped when generating CDI specifications for MIG devices.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 15:36:55 +02:00
Evan Lezar
3ea02d13fc Merge branch 'use-major-minor-for-cuda-version' into 'main'
Use *.* pattern when locating libcuda.so

See merge request nvidia/container-toolkit/container-toolkit!397
2023-05-22 13:02:33 +00:00
Evan Lezar
e30fd0f4ad Add csv mode to nvidia-ctk cdi generate command
This chagne allows the csv mode option to specified in the
nvidia-ctk cdi generate command and adds a --csv.file option
that can be repeated to specify the CSV files to be processed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:56:45 +02:00
Evan Lezar
418e4014e6 Resolve to csv for CDI generation on tegra systems
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:56:00 +02:00
Evan Lezar
e78a4f5eac Add csv mode to nvcdi api
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:55:58 +02:00
Evan Lezar
540dbcbc03 Move tegra system mounts to tegra-specific discoverer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:55:22 +02:00
Evan Lezar
a8265f8846 Add tegra discoverer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:55:22 +02:00
Evan Lezar
c120c511d5 Merge branch 'CNT-3939/generate-all-device' into 'main'
Add options to generate all device to nvcdi API

See merge request nvidia/container-toolkit/container-toolkit!348
2023-05-22 11:53:39 +00:00
Evan Lezar
424b8c9d46 Use *.* pattern when locating libcuda.so
This change ensures that libcuda.so can be located on systems
where no patch version is specified in the driver version.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:53:19 +02:00
Evan Lezar
5bc72b70b8 Merge branch 'minor-refactor' into 'main'
Include xorg discoverer with graphics mounts

See merge request nvidia/container-toolkit/container-toolkit!392
2023-05-12 13:12:06 +00:00
Evan Lezar
fe37196788 Generate all device using merged transform
The nvcid api is extended to allow for merged device options to
be specified. If any options are specified, then a merged device
is generated.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-12 13:52:58 +02:00
Evan Lezar
ba44c50f4e Add MergedDevice transform to generate all device
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-12 13:52:58 +02:00
Evan Lezar
729ca941be Merge branch 'refactor-nvidia-ctk-path' into 'main'
Refactor discover.Config to prepare for CSV CDI spec generation.

See merge request nvidia/container-toolkit/container-toolkit!391
2023-05-12 10:45:36 +00:00
Evan Lezar
0ee947dba6 Merge branch 'CNT-4257/remove-redundant-packages' into 'main'
Remove config.toml from installation

See merge request nvidia/container-toolkit/container-toolkit!388
2023-05-12 10:41:47 +00:00
Evan Lezar
d1fd0a7384 Merge branch 'CNT-4270/centos7-packages' into 'main'
Publish centos7 packages as a kitmaker branch

See merge request nvidia/container-toolkit/container-toolkit!390
2023-05-11 10:10:18 +00:00
Evan Lezar
ae2c582138 Merge branch 'clean-scan-archives' into 'main'
Remove image archives after scan

See merge request nvidia/container-toolkit/container-toolkit!389
2023-05-11 10:10:05 +00:00
Evan Lezar
b7e5cef934 Include xorg discoverer with graphics mounts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 17:07:55 +02:00
Evan Lezar
9378d0cd0f Move discover.FindNvidiaCTK to config.ResolveNVIDIACTKPath
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 15:12:44 +02:00
Evan Lezar
f9df36c473 Rename config struct to options
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 15:12:00 +02:00
Evan Lezar
8bb0235c92 Remove discover.Config
These changes remove the use of discover.Config which was used
to pass the driver root and the nvidiaCTK path in some cases.

Instead, the nvidiaCTKPath is resolved at the begining of runtime
invocation to ensure that this is valid at all points where it is
used.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 15:03:37 +02:00
Evan Lezar
fc310e429e Publish centos7 packages as a kitmaker branch
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 13:44:47 +02:00
Evan Lezar
8d0ffb2fa5 Remove unneeded targets from scripts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 13:13:05 +02:00
Evan Lezar
9f07cc9ab2 Remove CVE_UPDATES from dockerfiles
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 13:10:10 +02:00
Evan Lezar
1fff80e10d Remove unused CI variables
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 13:10:10 +02:00
Evan Lezar
0a57cdc6e8 Remove redundant packaging targets from CI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 13:10:10 +02:00
Evan Lezar
1a86b20f7c Remove config.toml from installation
Since the default configuration is now platform specific,
there is no need to install specific versions as part of
the package installation.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 13:10:10 +02:00
Evan Lezar
0068750a5c Remove image archives after scan
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 10:52:47 +02:00
Evan Lezar
ee47f26d1c Merge branch 'cdi-list' into 'main'
Add nvidia-ctk cdi list command

See merge request nvidia/container-toolkit/container-toolkit!387
2023-05-09 18:59:56 +00:00
Evan Lezar
3945abb2f2 Add nvidia-ctk cdi list command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-09 19:59:00 +02:00
Evan Lezar
9de4f7f4b9 Merge branch 'CNT-4262/create-release-archives' into 'main'
Create and upload release archives to artifactory

See merge request nvidia/container-toolkit/container-toolkit!386
2023-05-09 14:38:10 +00:00
Evan Lezar
3610b5073b Add package release archive
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-09 16:01:53 +02:00
Evan Lezar
1991138185 Merge branch 'CNT-4259/check-for-images-before-release' into 'main'
Skip publishing of images if these already exist

See merge request nvidia/container-toolkit/container-toolkit!385
2023-05-09 09:03:52 +00:00
Evan Lezar
8ebc21cd1f Merge branch 'CNT-4260/add-packaging-scan-and-release' into 'main'
Add scan and release steps for packaging image

See merge request nvidia/container-toolkit/container-toolkit!384
2023-05-08 16:50:09 +00:00
Evan Lezar
1c1ce2c6f7 Use version from manifest to extract packages
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 16:32:35 +02:00
Evan Lezar
39b0830a66 Extract manifest from packaging image
Also include manifest.txt with, for example, version
info when extracting packages from the packagin image.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 15:55:27 +02:00
Evan Lezar
6b367445a3 Merge branch 'CNT-4016/add-nvidia-ctk-config-default' into 'main'
Add nvidia-ctk config generate-default command to generate default config file contents

See merge request nvidia/container-toolkit/container-toolkit!338
2023-05-08 10:40:42 +00:00
Evan Lezar
37c66fc33c Ensure that the nvidia-container-cli.user option is uncommented on suse
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 11:26:54 +02:00
Evan Lezar
1bd5798a99 Use toml representation to get defaults
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 11:26:53 +02:00
Evan Lezar
90c4c4811a Fallback to ldconfig if ldconfig.real does not exist
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 11:26:24 +02:00
Evan Lezar
49de170652 Generate default config.toml contents
This change adds a GetDefaultConfigToml function to the config package.

This function returns the default config in the form of raw TOML
including comments. This is useful for generating a default config at
installation time, with platform-specific differences codified.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 11:26:22 +02:00
Evan Lezar
07c89fa975 Always publish external images
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-08 10:27:48 +02:00
Evan Lezar
7a1f23e2e4 Skip publishing of images if these already exist
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-05 15:06:32 +02:00
Evan Lezar
25165b0771 Add scan and release steps for packaging image
This ensures that the artifacts associated with a particular
release version are preserved along with the container
images that are used as operands for this version.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-05 13:56:10 +02:00
Evan Lezar
3e7acec0b4 Add nvidia-ctk config generate-default command
This change adds a CLI command to generate a default config.
This config checks the host operating system to apply specific
modifications that were previously captured in static config
files.

These include:
* select /sbin/ldconfig or /sbin/ldconfig.real depending on which exists on the host
* set the user to allow device access on SUSE-based systems

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-03 16:11:05 +02:00
Evan Lezar
4165961d31 Rename config struct options to avoid conflict
This change renames the struct for storing CLI flag values options over
config to avoid a conflict with the config package.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-03 15:59:02 +02:00
Evan Lezar
2e3a12438a Fix toml definition in cli config struct
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-03 15:59:02 +02:00
Evan Lezar
731c99b52c Merge branch 'fix-cdi-permissions' into 'main'
Properly set spec permissions

See merge request nvidia/container-toolkit/container-toolkit!383
2023-05-03 08:47:58 +00:00
Evan Lezar
470b4eebd8 Properly set spec permissions
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-03 10:45:11 +02:00
Carlos Eduardo Arango Gutierrez
6750df8e01 Merge branch 'fix-cdi-spec-permissions' into 'main'
Generate CDI specifications with 644 permissions to allow non-root clients to consume them

See merge request nvidia/container-toolkit/container-toolkit!381
2023-05-02 19:36:40 +00:00
Evan Lezar
8736d1e78f Merge branch 'fix/minor-spelling' into 'main'
chore(cmd): Fixing minor spelling error.

See merge request nvidia/container-toolkit/container-toolkit!382
2023-05-02 17:58:13 +00:00
Elliot Courant
140b1e33ef chore(cmd): Fixing minor spelling error.
Fixed a minor spelling error inside `nvidia-ctk system create-device-nodes`.

Signed-off-by: Elliot Courant <me@elliotcourant.dev>
2023-05-02 12:53:45 -05:00
Evan Lezar
3056428eda Generate spec file with 644 permissions
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-02 16:47:44 +02:00
Evan Lezar
367a30827f Allow spec file permisions to be specified
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-02 16:27:50 +02:00
Evan Lezar
fe8ef9e0bd Merge branch 'fix-ld.so.conf-permissions' into 'main'
Create ld.so.conf file with permissions 644

See merge request nvidia/container-toolkit/container-toolkit!380
2023-05-02 10:51:40 +00:00
Evan Lezar
d77f46aa09 Create ld.so.conf file with permissions 644
By default, temporary files are created with permissions 600 and
this means that the files created when updating the ldcache are
not readable in non-root containers.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-02 12:51:27 +02:00
Evan Lezar
043e283db3 Merge branch 'nvidia-docker-as-meta-package' into 'main'
Support building nvidia-docker and nvidia-container-runtime as dist-independent packages

See merge request nvidia/container-toolkit/container-toolkit!379
2023-05-02 08:25:45 +00:00
Evan Lezar
2019f1e7ea Preserve timestamps when copying meta packages
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-26 17:03:36 +02:00
Evan Lezar
22c7178561 Build meta-packages before others
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-26 17:03:36 +02:00
Evan Lezar
525aeb102f Update third_party submodules
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-26 13:52:56 +02:00
Evan Lezar
9fb5ac36ed Allow update of subcomponents to be skipped
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-26 13:14:19 +02:00
Evan Lezar
c30764b7cc Update build all components for meta packages
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-26 13:14:19 +02:00
Evan Lezar
8a2de90c28 Merge branch 'revert-kitmaker-workaround' into 'main'
Remove workaround to add libnvidia-container0 to kitmaker archive

See merge request nvidia/container-toolkit/container-toolkit!378
2023-04-26 10:10:23 +00:00
Evan Lezar
243c439bb8 Remove workaround to add libnvidia-container0 to kitmaker archive
In order to add the libnvidia-container0 packages to our ubuntu18.04
kitmaker archive, a workaround was added that downloaded these packages
before constructing the archive. Since the packages have now been
published -- and will not change -- this workaround is not longer needed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-26 11:47:39 +02:00
Evan Lezar
060ac46bd8 Merge branch 'bump-runc' into 'main'
Bump golang version and update dependencies

See merge request nvidia/container-toolkit/container-toolkit!377
2023-04-25 10:28:07 +00:00
Evan Lezar
ae2a683929 Run go fmt
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-25 11:27:58 +02:00
Evan Lezar
2b5eeb8d24 Regenerate mocks for formatting
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-25 11:26:55 +02:00
Evan Lezar
bbb94be213 Bump golang version to 1.20.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-25 10:42:22 +02:00
Evan Lezar
e1c75aec6c Bump runc version and update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-25 10:40:46 +02:00
Carlos Eduardo Arango Gutierrez
3030d281d9 Merge branch 'engine_export' into 'main'
Export pkg config/engine

See merge request nvidia/container-toolkit/container-toolkit!375
2023-04-25 05:17:20 +00:00
Carlos Eduardo Arango Gutierrez
81d8b94cdc Export pkg config/engine
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-25 07:16:59 +02:00
Evan Lezar
276e1960b1 Merge branch 'CNT-2350/configure-containerd' into 'main'
Add support for containerd configs to nvidia-ctk runtime configure command

See merge request nvidia/container-toolkit/container-toolkit!355
2023-04-24 17:24:24 +00:00
Evan Lezar
70920d7a04 Add support for containerd to the runtime configure CLI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-24 18:32:28 +02:00
Evan Lezar
f1e201d368 Refactor runtime configure cli
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-24 18:32:04 +02:00
Evan Lezar
ef863f5fd4 Merge branch 'bump-version-1.14.0-rc.1' into 'main'
Bump version to 1.14.0-rc.1

See merge request nvidia/container-toolkit/container-toolkit!376
2023-04-24 16:27:05 +00:00
Evan Lezar
ce65df7d17 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-24 17:33:37 +02:00
Evan Lezar
fa9c6116a4 Bump version to 1.14.0-rc.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-24 17:33:04 +02:00
Evan Lezar
28b70663f1 Merge branch 'skip-for-point-release' into 'main'
Skip components for patch releases

See merge request nvidia/container-toolkit/container-toolkit!374
2023-04-24 12:12:36 +00:00
Evan Lezar
c0fe8f27eb Skip components for patch releases
This change ensures that the nvidia-docker2 and nvidia-container-runtime
components are not build and distributed for patch releases.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-24 14:00:10 +02:00
Evan Lezar
926ac77bc0 Merge branch 'fix-cdi-spec-generation-on-debian' into 'main'
Resolve all symlinks when finding libraries in LDCache

See merge request nvidia/container-toolkit/container-toolkit!370
2023-04-24 10:09:37 +00:00
Evan Lezar
fc7c8f7520 Resolve all symlinks in ldcache
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-21 17:28:49 +02:00
Evan Lezar
46c1c45d85 Add /usr/lib/current to search path
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-21 11:47:42 +02:00
Evan Lezar
f99e863649 Merge branch 'CNT-4142/xorg-missing-not-fatal' into 'main'
Make discovery of Xorg libraries optional

See merge request nvidia/container-toolkit/container-toolkit!368
2023-04-21 09:47:10 +00:00
Evan Lezar
dcc21ece97 Merge branch 'add-debug-output' into 'main'
Fix target folder for kitmaker

See merge request nvidia/container-toolkit/container-toolkit!373
2023-04-20 18:38:05 +00:00
Evan Lezar
a53e3604a6 Fix target folder for kitmaker
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-20 20:37:46 +02:00
Evan Lezar
cfea6c1179 Merge branch 'add-debug-output' into 'main'
Properly create target folder for kitmaker

See merge request nvidia/container-toolkit/container-toolkit!372
2023-04-20 14:53:57 +00:00
Evan Lezar
4d1daa0b6c Properly create target folder for kitmaker
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-20 16:53:32 +02:00
Evan Lezar
df925bc7fd Merge branch 'include-libnvidia-container-in-kitmaker' into 'main'
Add a workaround to publish libnvidia-container0

See merge request nvidia/container-toolkit/container-toolkit!371
2023-04-20 08:29:02 +00:00
Evan Lezar
df22e37dfd Add a workaround to publish libnvidia-container0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-18 23:23:36 +02:00
Evan Lezar
2136266d1d Make discovery of Xorg libraries optional
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-13 18:41:38 +02:00
Evan Lezar
a95232dd33 Merge branch 'CNT-4144/non-ldcache' into 'main'
Only update ldcache if it exists

See merge request nvidia/container-toolkit/container-toolkit!369
2023-04-13 16:12:22 +00:00
Evan Lezar
29c6288128 Only update ldcache if it exists
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-13 17:18:09 +02:00
Evan Lezar
cd6fcb5297 Merge branch 'bump-version-1.13.1' into 'main'
Bump version to 1.13.1

See merge request nvidia/container-toolkit/container-toolkit!367
2023-04-13 11:12:37 +00:00
Evan Lezar
36989deff7 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-13 13:11:39 +02:00
Evan Lezar
7f6c9851fe Bump version to 1.13.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-04-13 13:11:09 +02:00
Evan Lezar
b7079454b5 Merge branch 'bump-versions' into 'main'
Bump nvidia-docker and nvidia-container-runtime versions

See merge request nvidia/container-toolkit/container-toolkit!366
2023-03-31 13:00:58 +00:00
Evan Lezar
448bd45ab4 Bump nvidia-docker and nvidia-container-runtime versions
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-31 15:00:15 +02:00
Evan Lezar
dde6170df1 Merge branch 'bump-version-v1.13.0' into 'main'
Bump version to v1.13.0

See merge request nvidia/container-toolkit/container-toolkit!365
2023-03-31 10:58:18 +00:00
Evan Lezar
e4b9350e65 Update libnvidia-container to v1.13.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-31 11:38:01 +02:00
Evan Lezar
622a0649ce Bump version to v1.13.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-31 11:35:53 +02:00
Evan Lezar
f6983969ad Merge branch 'nvidia-ctk-cdi-transform' into 'main'
Add 'target-driver-root' option to 'nvidia-ctk cdi generate' to transform root...

See merge request nvidia/container-toolkit/container-toolkit!363
2023-03-28 20:05:12 +00:00
Evan Lezar
7f7fc35843 Move input and output to transform root subcommand
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-28 21:12:48 +02:00
Evan Lezar
8eef7e5406 Merge branch 'add-runtimes' into 'main'
Add nvidia-container-runtime.runtimes config option

See merge request nvidia/container-toolkit/container-toolkit!364
2023-03-28 18:58:46 +00:00
Evan Lezar
f27c33b45f Remove target-driver-root from generate
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-28 11:49:45 -07:00
Evan Lezar
6a83e2ebe5 Add nvidia-ctk cdi transform root command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-28 11:45:58 -07:00
Christopher Desiniotis
ee5be5e3f2 Merge branch 'CNT-4056/add-cdi-annotations' into 'main'
Add nvidia-container-runtime.modes.cdi.annotation-prefixes config option.

See merge request nvidia/container-toolkit/container-toolkit!356
2023-03-28 16:47:51 +00:00
Evan Lezar
be0cc9dc6e Add nvidia-container-runtime.runtimes config option
This change adds an nvidia-container-runtime.runtimes config option.

If this is unset no changes are made to the config and the default values are used. This
allows this setting to be overridden in cases where this is required. One such example is
crio where crun is set as the default runtime.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-28 17:39:17 +02:00
Evan Lezar
7c5283bb97 Merge branch 'create-device-nodes' into 'main'
Add nvidia-ctk system create-device-nodes command

See merge request nvidia/container-toolkit/container-toolkit!362
2023-03-28 15:07:04 +00:00
Evan Lezar
4d5ba09d88 Add --ignore-errors option for testing
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-28 16:24:17 +02:00
Evan Lezar
149236b002 Configure containerd config based on specified annotation prefixes
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-28 16:22:48 +02:00
Evan Lezar
ee141f97dc Reorganise setting toolkit config options
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-28 16:22:48 +02:00
Evan Lezar
646503ff31 Set nvidia-container-runtime.modes.cdi.annotation-prefixes in toolkit-contianer
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-28 16:22:48 +02:00
Evan Lezar
cdaaf5e46f Generate device nodes when creating management spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-28 11:29:45 +02:00
Evan Lezar
e774c51c97 Add nvidia-ctk system create-device-nodes command
This change adds an nvidia-ctk system create-device-nodes command for
creating NVIDIA device nodes. Currently this is limited to control devices
(nvidia-uvm, nvidia-uvm-tools, nvidia-modeset, nvidiactl).

A --dry-run mode is included for outputing commands that would be executed and
the driver root can be specified.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-28 11:29:45 +02:00
Christopher Desiniotis
7f5c9abc1e Add ability to configure CDI kind with 'nvidia-ctk cdi generate'
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-03-27 23:12:00 -07:00
Christopher Desiniotis
92d82ceaee Add 'target-driver-root' option to 'nvidia-ctk cdi generate' to transform root paths in generated spec
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-03-27 22:22:36 -07:00
Evan Lezar
c46b118f37 Add nvidia-container-runtime.modes.cdi.annotation-prefixes config option.
This change adds an nvidia-container-runtime.modes.cdi.annotation-prefixes config
option that defaults to cdi.k8s.io/. This allows the annotation prefixes parsed
for CDI devices to be overridden in cases where CDI support in container engines such
as containerd or crio need to be overridden.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-27 16:36:54 +02:00
Evan Lezar
1722b07615 Merge branch 'CNT-2264/xorg-libs' into 'main'
Inject xorg libs and config in container

See merge request nvidia/container-toolkit/container-toolkit!328
2023-03-27 14:19:52 +00:00
Evan Lezar
c13c6ebadb Inject xorg libs and config in container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-26 17:04:06 +02:00
Evan Lezar
2abe679dd1 Move libcuda locator to internal/lookup package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-26 17:04:06 +02:00
Evan Lezar
9571513601 Merge branch 'update-changelog' into 'main'
Update changelog

See merge request nvidia/container-toolkit/container-toolkit!361
2023-03-26 15:03:28 +00:00
Evan Lezar
ff2767ee7b Reorder changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-26 17:03:05 +02:00
Evan Lezar
56319475a6 Merge branch 'fix-changelog' into 'main'
Reorder changelog

See merge request nvidia/container-toolkit/container-toolkit!360
2023-03-26 14:52:27 +00:00
Evan Lezar
a3ee58a294 Reorder changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-26 16:51:59 +02:00
Evan Lezar
7a533aeff3 Merge branch 'update-nvcdi-new-with-error' into 'main'
Allow nvcdi.Option to return an error

See merge request nvidia/container-toolkit/container-toolkit!352
2023-03-26 14:13:41 +00:00
Evan Lezar
226c54613e Also return an error from nvcdi.New
This change allows nvcdi.New to return an error in addition to the
constructed library instead of panicing.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-26 16:13:12 +02:00
Evan Lezar
1ebbebf5de Merge branch 'CNT-3932/deduplicate-entries-in-cdi-spec' into 'main'
Add transform to deduplicate entities in CDI spec

See merge request nvidia/container-toolkit/container-toolkit!345
2023-03-24 19:04:43 +00:00
Evan Lezar
33f6fe0217 Generate a simplified CDI spec by default
As simplified CDI spec has no duplicate entities in any single set of container edits.
Furthermore, contianer edits defined at a spec-level are not included in the container
edits for a device.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-24 11:01:46 +02:00
Evan Lezar
5ff206e1a9 Add transform to deduplicate entities in CDI spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-24 11:01:23 +02:00
Evan Lezar
df618d3cba Merge branch 'CNT-4052/fix-arm-management-containers' into 'main'
Fix generation of management CDI spec in containers

See merge request nvidia/container-toolkit/container-toolkit!354
2023-03-23 16:39:10 +00:00
Evan Lezar
9506bd9da0 Fix generation of management CDI spec in containers
Since we relied on finding libcuda.so in the LDCache to determine both the CUDA
version and the expected directory for the driver libraries, the generation of the
management CDI specifications fails in containers where the LDCache has not been updated.

This change falls back to searching a set of predefined paths instead when the lookup of
libcuda.so in the cache fails.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-23 15:59:01 +02:00
Evan Lezar
5e0684e99d Merge branch 'update-libnvidia-container' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!353
2023-03-23 08:50:18 +00:00
Evan Lezar
09a0cb24cc Remove fedora make targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-23 10:35:57 +02:00
Evan Lezar
ff92f1d799 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-23 10:33:26 +02:00
Christopher Desiniotis
b87703c503 Merge branch 'fix-nil-logger-in-library-locator' into 'main'
Instantiate a logger when constructing a library Locator

See merge request nvidia/container-toolkit/container-toolkit!351
2023-03-21 21:54:14 +00:00
Christopher Desiniotis
b2aaa21b0a Instantiate a logger when constructing a library Locator
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-03-21 13:38:36 -07:00
Evan Lezar
310c15b046 Merge branch 'CNT-4026/only-init-nvml-when-required' into 'main'
Only init nvml as required when generating CDI specs

See merge request nvidia/container-toolkit/container-toolkit!344
2023-03-20 13:26:07 +00:00
Evan Lezar
685802b1ce Only init nvml as required when generating CDI specs
CDI generation modes such as management and wsl don't require
NVML. This change removes the top-level instantiation of nvmllib
and replaces it with an instanitation in the nvml CDI spec generation
code.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-20 14:24:08 +02:00
Evan Lezar
380eb8340a Merge branch 'blossom-ci' into 'main'
Add blossom-ci github action

See merge request nvidia/container-toolkit/container-toolkit!349
2023-03-20 09:56:23 +00:00
Evan Lezar
f98e1160f5 Update components with blossim-ci
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-20 11:06:44 +02:00
Evan Lezar
1962fd68df Merge branch 'locate-ipc-sockets-at-run' into 'main'
Locate persistenced and fabricmanager sockets at /run instead of /var/run

See merge request nvidia/container-toolkit/container-toolkit!347
2023-03-20 08:08:59 +00:00
Carlos Eduardo Arango Gutierrez
29813c1e14 Add blossom-ci github action
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-03-17 16:16:27 +01:00
Evan Lezar
df40fbe03e Locate persistenced and fabricmanager sockets at /run instead of /var/run
This chagne prefers (non-symlink) sockets at /run over /var/run for
nvidia-persistenced and nvidia-fabricmanager sockets.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-17 09:23:48 +02:00
Carlos Eduardo Arango Gutierrez
7000c6074e Merge branch 'ci_rules' into 'main'
Rework pipeline triggers for MRs

See merge request nvidia/container-toolkit/container-toolkit!346
2023-03-15 13:15:23 +00:00
Evan Lezar
ef1fe3ab41 Rework pipeline triggers for MRs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-15 14:15:20 +02:00
Evan Lezar
fdd198b0e8 Merge branch 'bump-v1.13.0-rc.3' into 'main'
Bump version to v1.13.0-rc.3

See merge request nvidia/container-toolkit/container-toolkit!343
2023-03-15 07:50:50 +00:00
Evan Lezar
e37f77e02d Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-15 09:49:49 +02:00
Evan Lezar
3fcfee88be Bump version to v1.13.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-15 09:26:19 +02:00
Evan Lezar
a082413d09 Merge branch 'trigger-ci-on-mrs-only' into 'main'
Add workflow rule to only trigger on MRs

See merge request nvidia/container-toolkit/container-toolkit!342
2023-03-15 07:10:30 +00:00
Evan Lezar
280f40508e Make pipeline manual on MRs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-15 08:51:18 +02:00
Evan Lezar
e2be0e2ff0 Add workflow rule to only trigger on MRs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-15 08:45:26 +02:00
Evan Lezar
dcff3118d9 Merge branch 'update-libnvidia-container' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!340
2023-03-14 13:54:11 +00:00
Evan Lezar
731168ec8d Update changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-14 15:05:36 +02:00
Evan Lezar
7b4435a0f8 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-14 15:05:29 +02:00
Evan Lezar
738af29724 Merge branch 'explicit-cdi-enabled-flag' into 'main'
Add --cdi-enabled option to control generating CDI spec

See merge request nvidia/container-toolkit/container-toolkit!339
2023-03-14 07:00:30 +00:00
Evan Lezar
08ef242afb Add --cdi-enabled option to control generating CDI spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-13 18:19:00 +02:00
Evan Lezar
92ea8be309 Merge branch 'fix-privileged-check-cdi-mode' into 'main'
Return empty list of devices for unprivileged containers when...

See merge request nvidia/container-toolkit/container-toolkit!337
2023-03-13 07:36:25 +00:00
Christopher Desiniotis
48414e97bb Return empty list of devices for unprivileged containers when 'accept-nvidia-visible-devices-envvar-unprivileged=false'
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-03-10 13:11:29 -08:00
Evan Lezar
77a2975524 Merge branch 'fix-kitmaker' into 'main'
Use component name as folder name

See merge request nvidia/container-toolkit/container-toolkit!336
2023-03-10 13:57:24 +00:00
Evan Lezar
ce9477966d Use component name as folder name
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-10 15:51:36 +02:00
Evan Lezar
fe02351c3a Merge branch 'bump-cuda-version' into 'main'
Bump CUDA base image version to 12.1.0

See merge request nvidia/container-toolkit/container-toolkit!335
2023-03-10 10:23:30 +00:00
Evan Lezar
9c2018a0dc Bump CUDA base image version to 12.1.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-10 11:31:23 +02:00
Evan Lezar
33e5b34fa1 Merge branch 'CNT-3999/legacy-cli-doesnt-work-in-cdi-mode' into 'main'
Add nvidia-container-runtime-hook.skip-mode-detection option to config

See merge request nvidia/container-toolkit/container-toolkit!330
2023-03-09 19:18:16 +00:00
Evan Lezar
ccf73f2505 Set skip-mode-detection in the toolkit-container by default
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-09 20:16:10 +02:00
Evan Lezar
3a11f6ee0a Add nvidia-container-runtime-hook.skip-mode-detection option to config
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-09 20:15:40 +02:00
Evan Lezar
8f694bbfb7 Merge branch 'set-nvidia-ctk-path' into 'main'
Set nvidia-ctk.path config option based on installed path

See merge request nvidia/container-toolkit/container-toolkit!334
2023-03-09 16:44:13 +00:00
Evan Lezar
4c2eff4865 Merge branch 'CNT-3998/cdi-accept-visible-devices-when-privileged' into 'main'
Honor accept-nvidia-visible-devices-envvar-when-unprivileged setting in CDI mode

See merge request nvidia/container-toolkit/container-toolkit!331
2023-03-09 15:59:08 +00:00
Evan Lezar
1fbdc17c40 Set nvidia-ctk.path config option based on installed path
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-09 17:53:08 +02:00
Evan Lezar
965d62f326 Merge branch 'fix-containerd-integration-tests' into 'main'
Fix integration tests failing due to CDI spec generation

See merge request nvidia/container-toolkit/container-toolkit!333
2023-03-09 14:41:52 +00:00
Evan Lezar
25ea7fa98e Remove whitespace in Makefile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-09 15:32:07 +02:00
Evan Lezar
5ee040ba95 Disable CDI spec generation for integration tests 2023-03-09 15:32:07 +02:00
Evan Lezar
eb2aec9da8 Allow CDI options to be set by envvars
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-09 12:25:05 +02:00
Evan Lezar
973e7bda5e Check accept-nvidia-visible-devices-envvar-when-unprivileged option for CDI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-09 11:15:53 +02:00
Evan Lezar
154cd4ecf3 Add to config struct
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-09 11:15:53 +02:00
Evan Lezar
936fad1d04 Move check for privileged images to config/image/ package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-09 11:15:53 +02:00
Evan Lezar
86dd046c7c Merge branch 'CNT-3928/allow-cdi-container-annotations' into 'main'
Add cdi.k8s.io annotations to runtimes configured in containerd

See merge request nvidia/container-toolkit/container-toolkit!315
2023-03-09 07:52:37 +00:00
Evan Lezar
510fb248fe Add cdi.k8s.io annotations to containerd config
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-08 07:23:27 +02:00
Evan Lezar
c7384c6aee Merge branch 'fix-comment' into 'main'
Fix comment

See merge request nvidia/container-toolkit/container-toolkit!329
2023-03-08 05:15:38 +00:00
Evan Lezar
1c3c9143f8 Fix comment
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-08 07:15:05 +02:00
Evan Lezar
1c696b1e39 Merge branch 'CNT-3894/configure-mode-specific-runtimes' into 'main'
Configure .cdi and .legacy executables in Toolkit Container

See merge request nvidia/container-toolkit/container-toolkit!308
2023-03-08 05:12:50 +00:00
Evan Lezar
a2adbc1133 Merge branch 'CNT-3898/improve-cdi-annotations' into 'main'
Improve handling of environment variable devices in CDI mode

See merge request nvidia/container-toolkit/container-toolkit!321
2023-03-08 04:37:41 +00:00
Evan Lezar
36576708f0 Merge branch 'CNT-3896/gds-mofed-devices' into 'main'
Add GDS and MOFED support to the NVCDI API

See merge request nvidia/container-toolkit/container-toolkit!323
2023-03-08 04:36:55 +00:00
Evan Lezar
cc7a6f166b Handle case were runtime name is set to predefined name
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 20:59:56 +02:00
Evan Lezar
62d88e7c95 Add cdi and legacy mode runtimes
This change adds .cdi and .legacy mode-specific runtimes the list of
runtimes supported by the operator. These are also installed as
part of the NVIDIA Container Toolkit.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 20:59:55 +02:00
Evan Lezar
dca8e3123f Migrate containerd config to engine.Interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 20:59:55 +02:00
Evan Lezar
3bac4fad09 Migrate cri-o config update to engine.Interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 20:59:54 +02:00
Evan Lezar
9fff19da23 Migrate docker config to engine.Interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 20:59:54 +02:00
Evan Lezar
e5bb4d2718 Move runtime config code from config to config/engine
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 20:59:54 +02:00
Evan Lezar
5bfb51f801 Add API for interacting with runtime engine configs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 20:59:53 +02:00
Evan Lezar
ece5b29d97 Add tools/container/operator package to handle runtime naming
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 20:59:53 +02:00
Evan Lezar
ec8a92c17f Use nvidia-container-runtime.experimental as wrapper
This change switches to using nvidia-container-runtime.experimental as the
wrapper name over nvidia-container-runtime-experimental. This is consistent
with upcoming mode-specific binaries.

The wrapper is created at nvidia-container-runtime.experimental.real.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 20:59:53 +02:00
Evan Lezar
868393b7ed Add mofed mode to nvcdi API
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 18:47:52 +02:00
Evan Lezar
ebe18fbb7f Add gds mode to nvcdi API
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 18:47:52 +02:00
Evan Lezar
9435343541 Merge branch 'fix-kitmaker' into 'main'
Include = when extracting manifest information

See merge request nvidia/container-toolkit/container-toolkit!327
2023-03-07 14:44:27 +00:00
Evan Lezar
1cd20afe4f Include = when extracting manifest information
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 16:43:49 +02:00
Evan Lezar
1e6fe40c76 Allow nvidia-container-runtime.modes.cdi.default-kind to be set
This change allows the nvidia-container-runtime.modes.cdi.default-kind
to be set in the toolkit-container.

The NVIDIA_CONTAINER_RUNTIME_MODES_CDI_DEFAULT_KIND envvar is used.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 16:19:38 +02:00
Evan Lezar
6d220ed9a2 Rework selection of devices in CDI mode
The following changes are made:
* The default-cdi-kind config option is used to convert an envvar entry to a fully-qualified device name
* If annotation devices exist, these are used instead of the envvar devices.
* The `all` device is no longer treated as a special case and MUST exist in the CDI spec.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 16:18:53 +02:00
Evan Lezar
f00439c93e Add nvidia-container-runtime.modes.csv.default-kind config option
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 16:18:53 +02:00
Evan Lezar
c59696e30e Merge branch 'fix-kitmaker' into 'main'
Log source file

See merge request nvidia/container-toolkit/container-toolkit!326
2023-03-06 16:26:43 +00:00
Evan Lezar
89c18c73cd Add source and log curl command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-06 18:26:05 +02:00
Evan Lezar
cb5006c73f Merge branch 'CNT-3897/generate-management-container-spec' into 'main'
Generate CDI specs for management containers

See merge request nvidia/container-toolkit/container-toolkit!314
2023-03-06 16:23:13 +00:00
Evan Lezar
547b71f222 Merge branch 'change-discovery-mode' into 'main'
Rename --discovery-mode to --mode

See merge request nvidia/container-toolkit/container-toolkit!318
2023-03-06 16:21:22 +00:00
Evan Lezar
ae84bfb055 Log source file
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-06 18:11:12 +02:00
Evan Lezar
9b303d5b89 Merge branch 'fix-changelist' into 'main'
Strip on tilde for kitmaker version

See merge request nvidia/container-toolkit/container-toolkit!325
2023-03-06 14:10:54 +00:00
Evan Lezar
d944f934d7 Strip on tilde for kitmaker version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-06 16:10:25 +02:00
Evan Lezar
c37209cd09 Merge branch 'fix-changelist' into 'main'
Fix blank changelist in kitmaker properties

See merge request nvidia/container-toolkit/container-toolkit!324
2023-03-06 13:51:19 +00:00
Evan Lezar
863b569a61 Fix blank changelist in kitmaker properties
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-06 15:50:38 +02:00
Evan Lezar
f36c514f1f Merge branch 'update-kitmaker-folders' into 'main'
Update kitmaker target folder

See merge request nvidia/container-toolkit/container-toolkit!313
2023-03-06 11:16:49 +00:00
Evan Lezar
3ab28c7fa4 Merge branch 'fix-rule-for-release' into 'main'
Run full build on release- branches

See merge request nvidia/container-toolkit/container-toolkit!320
2023-03-06 10:56:58 +00:00
Evan Lezar
c03258325b Run full build on release- branches
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-06 12:54:27 +02:00
Evan Lezar
20d3bb189b Rename --discovery-mode to --mode
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-06 11:00:22 +02:00
Evan Lezar
90acec60bb Skip CDI spec generation in integration tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-06 10:57:40 +02:00
Evan Lezar
0565888c03 Generate CDI spec in toolkit container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-06 10:57:40 +02:00
Evan Lezar
f7e817cff6 Support management mode in nvidia-ctk cdi generate
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-06 10:53:43 +02:00
Evan Lezar
29cbbe83f9 Add management mode to CDI spec generation API
These changes add support for generating a management spec to the nvcdi API.
A management spec consists of a single CDI device (`all`) which includes all expected
NVIDIA device nodes, driver libraries, binaries, and IPC sockets.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-06 10:53:43 +02:00
Evan Lezar
64b16acb1f Also install nvidia-ctk in toolkit-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-06 10:53:43 +02:00
Evan Lezar
19c20bb422 Merge branch 'CNT-3931/add-spec-validation' into 'main'
Add nvcdi.spec for writing and validating CDI specifications

See merge request nvidia/container-toolkit/container-toolkit!306
2023-03-06 08:52:56 +00:00
Evan Lezar
28b10d2ee0 Merge branch 'fix-toolkit-ctr-envvars' into 'main'
Fix handling of envvars in toolkit container which modify the NVIDIA Container Runtime config

See merge request nvidia/container-toolkit/container-toolkit!317
2023-03-06 07:36:03 +00:00
Christopher Desiniotis
1f5123f72a Fix handling of envvars in toolkit container which modify the NVIDIA Container Runtime config
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-03-05 20:14:04 -08:00
Evan Lezar
ac5b6d097b Use kitmaker folder for releases
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 17:27:07 +02:00
Evan Lezar
a7bf9ddf28 Update kitmaker folder structure
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 17:27:07 +02:00
Evan Lezar
e27479e170 Add GIT_COMMIT_SHORT to packaging image manifest
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 17:27:07 +02:00
Evan Lezar
fa28e738c6 Merge branch 'fix-internal-scans' into 'main'
Fix internal scans

See merge request nvidia/container-toolkit/container-toolkit!316
2023-03-01 15:26:27 +00:00
Evan Lezar
898c5555f6 Fix internal scans
This fixes the internal scans due to the removed ubuntu18.04 images.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 17:25:28 +02:00
Evan Lezar
314059fcf0 Move path manipulation to spec.Save
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 13:49:04 +02:00
Evan Lezar
221781bd0b Use full path for output spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 13:48:28 +02:00
Evan Lezar
9f5e141437 Expose vendor and class as options
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 13:48:28 +02:00
Evan Lezar
8be6de177f Move formatJSON and formatYAML to nvcdi/spec package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 13:48:28 +02:00
Evan Lezar
890a519121 Use nvcdi.spec package to write and validate spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 13:48:28 +02:00
Evan Lezar
89321edae6 Add top-level GetSpec function to nvcdi API
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 13:48:28 +02:00
Evan Lezar
6d6cd56196 Return nvcdi.spec.Interface from GetSpec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 12:45:30 +02:00
Evan Lezar
2e95e04359 Add nvcdi.spec package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 12:45:30 +02:00
Evan Lezar
accba4ead5 Merge branch 'CNT-3965/clean-up-by-path-symlinks' into 'main'
Improve handling of /dev/dri devices and nested device paths

See merge request nvidia/container-toolkit/container-toolkit!307
2023-03-01 10:25:48 +00:00
Christopher Desiniotis
1e9b7883cf Merge branch 'CNT-3937/add-target-driver-root' into 'main'
Add a driver root transformer to nvcdi

See merge request nvidia/container-toolkit/container-toolkit!300
2023-02-28 18:04:29 +00:00
Christopher Desiniotis
87e406eee6 Update root transformer tests to ensure container path is not modified
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-02-28 09:00:05 -08:00
Christopher Desiniotis
45ed3b0412 Handle hook arguments for creation of symlinks
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-02-28 09:00:02 -08:00
Christopher Desiniotis
0516fc96ca Add Transform interface and initial implemention for a root transform
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-02-28 08:56:13 -08:00
Evan Lezar
e7a435fd5b Merge branch 'update-libnvidia-container' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!312
2023-02-27 13:41:26 +00:00
Evan Lezar
7a249d7771 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-27 15:41:02 +02:00
Evan Lezar
7986ff9cee Merge branch 'CNT-3963/deduplicate-wsl-driverstore-paths' into 'main'
Deduplicate WSL driverstore paths

See merge request nvidia/container-toolkit/container-toolkit!304
2023-02-27 13:27:31 +00:00
Evan Lezar
b74c13d75f Merge branch 'fix-rpm-postun-scriptlet' into 'main'
nvidia-container-toolkit.spec: fix syntax error in postun scriptlet

See merge request nvidia/container-toolkit/container-toolkit!309
2023-02-27 12:36:49 +00:00
Evan Lezar
de8eeb87f4 Merge branch 'remove-outdated-platforms' into 'main'
Remove outdated platforms from CI

See merge request nvidia/container-toolkit/container-toolkit!310
2023-02-27 11:48:33 +00:00
Evan Lezar
36c4174de3 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-27 13:45:44 +02:00
Evan Lezar
3497936cdf Remove ubuntu18.04 toolkit-container image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-27 12:55:17 +02:00
Evan Lezar
81abc92743 Remove fedora35 from 'all' targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-27 12:31:38 +02:00
Evan Lezar
1ef8dc3137 Remove centos7-ppc64le from CI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-27 12:30:29 +02:00
Evan Lezar
9a5c1bbe48 Remove ubuntu16.04 packages from CI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-27 12:29:35 +02:00
Evan Lezar
30dff61376 Remove debian9 packages from CI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-27 12:28:46 +02:00
Claudius Volz
de1bb68d19 nvidia-container-toolkit.spec: fix syntax error in postun scriptlet
Signed-off-by: Claudius Volz <c.volz@gmx.de>
2023-02-27 00:45:21 +01:00
Evan Lezar
06d8bb5019 Merge branch 'CNT-3965/dont-fail-chmod-hook' into 'main'
Skip paths with errors in chmod hook

See merge request nvidia/container-toolkit/container-toolkit!303
2023-02-22 15:20:26 +00:00
Evan Lezar
b4dc1f338d Generate nested device folder permission hooks per device
This change generates device folder permission hooks per device instead of
at a spec level. This ensures that the hook is not injected for a device that
does not have any nested device nodes.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-22 17:16:23 +02:00
Evan Lezar
181128fe73 Only include by-path-symlinks for injected device nodes
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-22 16:53:04 +02:00
Evan Lezar
252838e696 Merge branch 'bump-version-v1.13.0-rc.2' into 'main'
Bump version to v1.13.0-rc.2

See merge request nvidia/container-toolkit/container-toolkit!305
2023-02-21 13:11:00 +00:00
Evan Lezar
49f171a8b1 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-21 14:27:02 +02:00
Evan Lezar
3d12803ab3 Bump version to v1.13.0-rc.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-21 14:24:37 +02:00
Evan Lezar
a168091bfb Add v1.13.0-rc.1 Changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-21 14:23:52 +02:00
Evan Lezar
35fc57291f Deduplicate WSL driverstore paths
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-21 11:48:56 +02:00
Evan Lezar
2542224d7b Skip paths with errors in chmod hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-21 11:47:11 +02:00
Evan Lezar
882fbb3209 Merge branch 'add-cdi-auto-mode' into 'main'
Add constants for CDI mode to nvcdi API

See merge request nvidia/container-toolkit/container-toolkit!302
2023-02-20 14:41:07 +00:00
Evan Lezar
2680c45811 Add mode constants to nvcdi
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 16:33:51 +02:00
Evan Lezar
b76808dbd5 Add tests for CDI mode resolution
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 16:33:33 +02:00
Evan Lezar
ba50b50a15 Merge branch 'add-cdi-auto-mode' into 'main'
Add auto mode to CDI spec generation

See merge request nvidia/container-toolkit/container-toolkit!292
2023-02-20 14:30:33 +00:00
Evan Lezar
f6d3f8d471 Merge branch 'CNT-3895/add-runtime-mode-config' into 'main'
Add nvidia-container-runtime.mode config option

See merge request nvidia/container-toolkit/container-toolkit!299
2023-02-20 12:51:18 +00:00
Evan Lezar
d9859d66bf Update go vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 14:49:58 +02:00
Evan Lezar
4ccb0b9a53 Add and resolve auto discovery mode for cdi generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 14:49:58 +02:00
Evan Lezar
f36c775d50 Merge branch 'wsl2-wip' into 'main'
Add CDI Spec generation on WSL2

See merge request nvidia/container-toolkit/container-toolkit!289
2023-02-20 09:36:41 +00:00
Evan Lezar
b21dc929ef Add WSL2 discovery and spec generation
These changes add a wsl discovery mode to the nvidia-ctk cdi generate command.

If wsl mode is enabled, the driver store for the available devices is used as
the source for discovered entities.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 10:30:13 +02:00
Evan Lezar
d226925fe7 Construct nvml-based CDI lib based on mode
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 10:30:13 +02:00
Evan Lezar
20d6e9af04 Add --discovery-mode to nvidia-ctk cdi generate command
This change adds --discovery-mode flag to the nvidia-ctk cdi generate
command and plumbs this through to the CDI API.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 10:30:13 +02:00
Evan Lezar
5103adab89 Add mode option to nvcdi API
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 10:30:13 +02:00
Evan Lezar
7eb435eb73 Add basic dxcore bindings
This change copies dxcore.h and dxcore.c from libnvidia-container to
allow for the driver store path to be queried. Modifications are made
to dxcore to remove the code associated with checking the components
in the driver store path.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 10:30:13 +02:00
Evan Lezar
5d011c1333 Add Discoverer to create a single symlink
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 10:30:13 +02:00
Evan Lezar
6adb792d57 Merge branch 'fix-nvidia-ctk-path' into 'main'
Ensure that generate uses a consistent nvidia-ctk path

See merge request nvidia/container-toolkit/container-toolkit!301
2023-02-20 08:29:44 +00:00
Evan Lezar
a844749791 Ensure that generate uses a consistent nvidia-ctk path
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 10:28:45 +02:00
Evan Lezar
dd0d43e726 Add nvidia-container-runtime.mode config option
This change allows the nvidia-container-runtime.mode option to be set
by the toolkit container.

This is controlled by the --nvidia-container-runtime-mode command line
argument and the NVIDIA_CONTAINER_RUNTIME_MODE envvar.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-17 18:04:49 +02:00
Evan Lezar
25811471fa Merge branch 'update-libnvidia-container' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!298
2023-02-17 08:46:56 +00:00
Evan Lezar
569bc1a889 Update Changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-17 10:46:21 +02:00
Evan Lezar
b1756b410a Merge branch 'fix-logging' into 'main'
Fix nvidia-container-runtime logging

See merge request nvidia/container-toolkit/container-toolkit!296
2023-02-16 15:17:24 +00:00
Evan Lezar
7789ac6331 Fix logger.Update and Reset
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-16 15:22:56 +01:00
Evan Lezar
7a3aabbbda Add logger test
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-16 15:22:56 +01:00
Evan Lezar
e486095603 Merge branch 'fix-nvidia-ctk-path' into 'main'
Fix issue with blank nvidia-ctk path

See merge request nvidia/container-toolkit/container-toolkit!297
2023-02-16 13:58:43 +00:00
Evan Lezar
bf6babe07e Fix issue with blank nvidia-ctk path
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-16 14:18:07 +01:00
Kevin Klues
d5a4d89682 Merge branch 'support-multimple-firmware-files' into 'main'
Add globbing for mounting multiple GSP firmware files

See merge request nvidia/container-toolkit/container-toolkit!295
2023-02-16 13:09:47 +00:00
Kevin Klues
5710b9e7e8 Add globbing for mounting multiple GSP firmware files
Newer drivers have split the GSP firmware into multiple files so a simple match
against gsp.bin in the firmware directory is no longer possible. This patch
adds globbing capabilitis to match any GSP firmware files of the form gsp*.bin
and mount them all into the container.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-02-16 11:53:36 +00:00
Evan Lezar
b4ab95f00c Merge branch 'fix-nvcdi-constructor' into 'main'
fix: apply options when constructing an instance of the nvcdi library

See merge request nvidia/container-toolkit/container-toolkit!294
2023-02-15 08:13:19 +00:00
Christopher Desiniotis
a52c9f0ac6 fix: apply options when constructing an instance of the nvcdi library
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-02-14 16:32:40 -08:00
Evan Lezar
b6bab4d3fd Merge branch 'expose-generate-spec' into 'main'
Implement basic CDI spec generation API

See merge request nvidia/container-toolkit/container-toolkit!257
2023-02-14 19:36:31 +00:00
Evan Lezar
5b110fba2d Add nvcdi package with basic CDI generation API
This change adds an nvcdi package that exposes a basic API for
CDI spec generation. This is used from the nvidia-ctk cdi generate
command and can be consumed by DRA implementations and the device plugin.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-14 19:52:31 +01:00
Evan Lezar
179133c8ad Merge branch 'fix-ubi8' into 'main'
Fix package version in ubi8 container builds

See merge request nvidia/container-toolkit/container-toolkit!293
2023-02-14 10:31:21 +00:00
Evan Lezar
365b6c7bc2 Fix package version in ubi8 container builds
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-14 10:50:38 +01:00
Evan Lezar
dc4887cd44 Merge branch 'cdi-executable' into 'main'
Add nvidia-container-runtime.{{MODE}} executable that overrides runtime mode

See merge request nvidia/container-toolkit/container-toolkit!288
2023-02-14 08:01:41 +00:00
Evan Lezar
c4836a576f Also skip nvidia-container-toolit-operator-extensions in release scripts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-13 16:10:01 +01:00
Evan Lezar
98afe0d27a Generate nvidia-container-toolkit-operator-extensions package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-13 16:09:46 +01:00
Evan Lezar
fdc759f7c2 Add nvidia-container-runtime.legacy executable
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-13 16:09:46 +01:00
Evan Lezar
43448bac11 Add nvidia-container-runtime.cdi executable
This change adds an nvidia-container-runtime.cdi executable that
overrides the runtime mode from the config to "cdi".

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-13 16:09:46 +01:00
Evan Lezar
456d2864a6 Log config in JSON if possible
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-13 16:09:46 +01:00
Evan Lezar
406a5ec76f Implement runtime package for creating runtime CLI
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-13 16:09:46 +01:00
Evan Lezar
f71c419cfb Move modifying OCI runtime wrapper to oci package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-13 16:09:46 +01:00
Evan Lezar
babb73295f Update gitignore
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-13 16:09:45 +01:00
Evan Lezar
f3ec5fd329 Merge branch 'packaging-verisons' into 'main'
Align release candidate RPM version with Debian version

See merge request nvidia/container-toolkit/container-toolkit!291
2023-02-13 14:53:01 +00:00
Evan Lezar
5aca0d147d Use - as version-tag separator for libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-13 15:10:08 +01:00
Evan Lezar
f2b19b6ae9 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-13 14:39:14 +01:00
Evan Lezar
7cb9ed66be Align release candidate RPM version with Debian version
The version for RPM release candidates has the form `1.13.0-0.1.rc.1-1` whereas debian packages have the form `1.13.0~rc.1-1`.

Note that since the `~` is handled in [the same way](https://docs.fedoraproject.org/en-US/packaging-guidelines/Versioning/#_handling_non_sorting_versions_with_tilde_dot_and_caret) as for Debian packages, there does not seem to be a specific reason for this and dealing with multiple version strings in our entire pipeline adds complexity.

This change aligns the package versioning for rpm packages with Debian packages.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-13 14:31:23 +01:00
Evan Lezar
d578f4598a Remove fedora35 pipeline targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-13 14:31:23 +01:00
Evan Lezar
d30e6c23ab Merge branch 'update-ldflags' into 'main'
Update ldflags for cgo

See merge request nvidia/container-toolkit/container-toolkit!290
2023-02-10 14:17:53 +00:00
Evan Lezar
1c05f2fb9a Merge branch 'add-options-to-mounts' into 'main'
Add Options to mounts to refactor IPC CDI spec generation

See merge request nvidia/container-toolkit/container-toolkit!287
2023-02-10 08:04:24 +00:00
Evan Lezar
1407ace94a Update ldflags for cgo
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-08 21:54:49 +01:00
Evan Lezar
97008f2db6 Move IPC discoverer into DriverDiscoverer
This simplifies the construction of the required common edits
when constructing a CDI specification.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-08 09:06:07 +01:00
Evan Lezar
076eed7eb4 Update ipcMount to add noexec option
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-08 09:06:07 +01:00
Evan Lezar
33c7b056ea Add ipcMounts type
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-08 09:06:07 +01:00
Evan Lezar
3b8c40c3e6 Move IPC discoverer to internal/discover package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-08 09:06:07 +01:00
Evan Lezar
3f70521a63 Add Options to discover.Mount
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-08 09:06:07 +01:00
Evan Lezar
21f5895b5a Merge branch 'bump-version-1.13.0-rc.1' into 'main'
Bump version to 1.13.0-rc.1

See merge request nvidia/container-toolkit/container-toolkit!286
2023-02-07 11:38:18 +00:00
Evan Lezar
738a2e7343 Bump version to 1.13.0-rc.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-07 11:55:47 +01:00
Evan Lezar
62bd015475 Merge branch 'bump-version-v1.12.0' into 'main'
Bump version to v1.12.0

See merge request nvidia/container-toolkit/container-toolkit!285
2023-02-03 14:07:30 +00:00
Evan Lezar
ac5c62c116 Bump CUDA base images to 12.0.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-03 14:25:42 +01:00
Evan Lezar
80fe1065ad Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-03 14:23:49 +01:00
Evan Lezar
fea195cc8d Bump version to v1.12.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-03 13:58:26 +01:00
Evan Lezar
9ef314e1e3 Merge branch 'rename-root-flag' into 'main'
Rename root to driverRoot for CDI generation

See merge request nvidia/container-toolkit/container-toolkit!284
2023-02-02 16:33:24 +00:00
Evan Lezar
95f859118b Update changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-02 15:58:00 +01:00
Evan Lezar
daceac9117 Rename discover.Config.Root to discover.Config.DriverRoot
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-02 15:57:15 +01:00
Evan Lezar
cfa2647260 Rename root to driverRoot for CDI generation
This makes the intent of the command line argument clearer since this
relates specifically to the root where the NVIDIA driver is installed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-02 15:42:04 +01:00
Evan Lezar
03cdf3b5d7 Merge branch 'bump-version-v1.12.0-rc.6' into 'main'
Bump version to v1.12.0-rc.6

See merge request nvidia/container-toolkit/container-toolkit!283
2023-02-02 13:50:25 +00:00
Evan Lezar
f8f415a605 Ensure container-archive name is unique
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-02 12:03:34 +01:00
Evan Lezar
fe117d3916 Udpate libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-02 12:02:57 +01:00
Evan Lezar
069536d598 Bump version to v1.12.0-rc.6
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-02 12:00:08 +01:00
Evan Lezar
5f53ca0af5 Add missing v1.12.0-rc.5 changelog entry
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-02 11:58:44 +01:00
Evan Lezar
9a06768863 Merge branch 'fix-nvidia-ctk-path' into 'main'
Only use configured nvidia-ctk path if it is a full path

See merge request nvidia/container-toolkit/container-toolkit!281
2023-02-01 11:42:09 +00:00
Evan Lezar
0c8379f681 Fix nvidia-ctk path for update ldcache hook
This change ensures that the update-ldcache hook is created in a manner
consistent with other nvidia-ctk hooks ensuring that a full path is
used.

Without this change the update-ldcache hook on Tegra-based sytems had an
invalid path.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-01 12:00:23 +01:00
Evan Lezar
92dc0506fe Add hook path to logger output
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-01 12:00:23 +01:00
Evan Lezar
7045a223d2 Only use configured nvidia-ctk path if it is a full path
If this is not done, the default config which sets the nvidia-ctk.path
option as "nvidia-ctk" will result in an invalid OCI spec if a hook is
injected. This change ensures that the path used is always an absolute
path as required by the hook spec.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-01 12:00:23 +01:00
Evan Lezar
763e4936cd Merge branch 'fix-kitmaker' into 'main'
Add additional build args to manifest

See merge request nvidia/container-toolkit/container-toolkit!279
2023-02-01 09:48:08 +00:00
Evan Lezar
f0c7491029 Use 'main' as branch component in kitmaker archive
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-01 04:54:59 +01:00
Evan Lezar
ba5c4b2831 Use package version as version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-01 04:54:59 +01:00
Evan Lezar
9c73438682 Add additional build args to manifest
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-01 04:54:59 +01:00
Evan Lezar
37f7337d2b Merge branch 'bump-version-1.12.0-rc.5' into 'main'
Bump version to 1.12.0-rc.5

See merge request nvidia/container-toolkit/container-toolkit!280
2023-02-01 03:18:26 +00:00
Evan Lezar
98285c27ab Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-01 04:17:42 +01:00
Evan Lezar
5750881cea Bump version to 1.12.0-rc.5
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-01 04:08:57 +01:00
Evan Lezar
95ca1c2e50 Merge branch 'fix-git-branch' into 'main'
Fix git branch shell command

See merge request nvidia/container-toolkit/container-toolkit!277
2023-01-31 14:08:35 +00:00
Evan Lezar
e4031ced39 Fix git branch shell command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-31 15:08:12 +01:00
Evan Lezar
7f6d21c53b Merge branch 'fix-kitmaker' into 'main'
Fix GIT_BRANCH command

See merge request nvidia/container-toolkit/container-toolkit!276
2023-01-31 13:16:10 +00:00
Evan Lezar
846ac347fe Fix GIT_BRANCH command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-31 14:15:48 +01:00
Evan Lezar
50afd443fc Merge branch 'fix-libraries-cdi' into 'main'
Fix relative link resolution for ldcache

See merge request nvidia/container-toolkit/container-toolkit!275
2023-01-31 13:07:35 +00:00
Evan Lezar
14bcebd8b7 Fix relative link resolution for ldcache
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-31 13:51:48 +01:00
Evan Lezar
d091d3c7f4 Merge branch 'fix-kitmaker' into 'main'
Ensure git is available in image build step

See merge request nvidia/container-toolkit/container-toolkit!274
2023-01-31 11:20:40 +00:00
Evan Lezar
eb0ef8ab31 Ensure git is available in image build step
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-31 12:20:11 +01:00
Evan Lezar
9c5c12a1bc Merge branch 'kitmaker-update' into 'main'
Update the process for publishing packages to kitmaker

See merge request nvidia/container-toolkit/container-toolkit!271
2023-01-30 18:37:42 +00:00
Evan Lezar
8b197b27ed Rework the upload of archives to kitmaker.
This change simplifies how Kitmaker archives are constructed.

Currently only centos8 and ubuntu18.04 packages are included.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-30 14:52:16 +01:00
Evan Lezar
8c57e55b59 Add additional information to the manifest
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-30 14:52:16 +01:00
Evan Lezar
6d1639a513 Merge branch 'set-default-to-index' into 'main'
Use device index as CDI device names by default

See merge request nvidia/container-toolkit/container-toolkit!273
2023-01-30 13:22:36 +00:00
Evan Lezar
5e6f72e8f4 Update changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-30 13:40:25 +01:00
Evan Lezar
707e3479f8 Fix lint errors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-30 13:39:57 +01:00
Evan Lezar
201232dae3 Add logging of minimum CDI version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-30 13:39:08 +01:00
Evan Lezar
f768bb5783 Use device index as CDI device names by default
This change uses the `index` mode for the --device-name-strategy when
generating CDI specifications by default. This generates device names
such as nvidia.com/gpu=0 or nvidia.com/gpu=1:0 by default.

Note that this requires a CDI spec version of 0.5.0 and for consumers
(e.g. podman) that are only compatible with older versions one of the
other stragegies (`type-index` or `uuid`) should be used instead to
generate a v0.3.0 or v0.4.0 specification.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-30 13:36:17 +01:00
Evan Lezar
f0de3ccd9c Merge branch 'CNT-3718/allow-device-name-to-be-controlled' into 'main'
Add --device-name-strategy flag for CDI spec generation

See merge request nvidia/container-toolkit/container-toolkit!269
2023-01-30 12:28:38 +00:00
Evan Lezar
09e8d4c4f3 Merge branch 'move-dev-char-creation' into 'main'
Move `create-dev-char-symlinks` from `nvidia-ctk hook` to `nvidia-ctk system`

See merge request nvidia/container-toolkit/container-toolkit!272
2023-01-30 12:10:17 +00:00
Evan Lezar
8188400c97 Move create-dev-char-symlinks subcommand from hook to system
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-27 12:12:54 +01:00
Evan Lezar
962d38e9dd Add nvidia-ctk system subcommand
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-27 12:12:54 +01:00
Kevin Klues
9fc2c59122 Merge branch 'CNT-3845/add-dev-char-symlink' into 'main'
Add create-dev-char-symlinks hook

See merge request nvidia/container-toolkit/container-toolkit!267
2023-01-27 10:11:29 +00:00
Evan Lezar
540f4349f5 Update vendoring for nvpci
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 13:43:43 +01:00
Evan Lezar
1d7e419008 Add --create-all mode to creation of dev/char symlinks
This change adds a --create-all mode to the create-dev-char-symlinks hook.
This mode creates all POSSIBLE symlinks to device nodes for regular and cap
devices. With the number of GPUs inferred from the PCI device information.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 13:43:43 +01:00
Evan Lezar
95394e0fc8 Add internal/info/proc/devices package to read device majors
This change adds basic functionality to process the /proc/devices
file to extract device majors.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 13:43:43 +01:00
Evan Lezar
f9330a4c2c Add --watch option to create-dev-char-symlinks
This change adds a --watch option to the create-dev-char-symlinks hook. This
installs an fsnotify watcher that creates symlinks for ADDED device nodes under
/dev/char.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 13:43:43 +01:00
Evan Lezar
be0e4667a5 Add create-dev-char-symlinks hook
This change adds an nvidia-ctk hook create-dev-char-symlinks
subcommand that creates symlinks to device nodes (as required by
systemd) under /dev/char.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 13:43:43 +01:00
Evan Lezar
408eeae70f Allow locator to be marked as optional
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 10:38:11 +01:00
Evan Lezar
27c82c19ea Merge branch 'bump-version' into 'main'
Bump version to 1.12.0-rc.4

See merge request nvidia/container-toolkit/container-toolkit!270
2023-01-25 09:37:41 +00:00
Evan Lezar
937f3d0d78 Add changelog for v1.12.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 10:37:24 +01:00
Evan Lezar
bc3cc71f90 Update libnvidia-container to 1.12.0-rc.4
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 10:23:54 +01:00
Evan Lezar
ad4531db1e Bump version to 1.12.0-rc.4
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 10:23:02 +01:00
Evan Lezar
e5d8d10d4f Merge branch 'CNT-3854/discover-first' into 'main'
Limit number of candidates for executables

See merge request nvidia/container-toolkit/container-toolkit!268
2023-01-23 17:43:27 +00:00
Evan Lezar
89bf81a9db Add --device-name-strategy flag for CDI spec generation
This change adds a --device-name-strategy flag for generating a CDI
specificaion. This allows a CDI spec to be generated with the following
names used for device:

* type-index: gpu0 and mig0:1
* index: 0 and 0:1
* uuid: GPU and MIG UUIDs

Note that the use of 'index' generates a v0.5.0 CDI specification since
this relaxes the restriction on the device names.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-20 16:17:32 +01:00
Evan Lezar
6237477ba3 Limit number of candidates for executables
This change ensures that the first match of an executable in the path
is retured instead of a list of candidates. This prevents a CDI spec,
for example, from containing multiple entries for a single executable
(e.g. nvidia-smi).

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-20 15:10:24 +01:00
Evan Lezar
6706024687 Merge branch 'fix-missing-nvidia-container-runtime-hook-1682' into 'main'
Avoid missing nvidia-container-runtime-hook during rpm update from <=1.10.0

See merge request nvidia/container-toolkit/container-toolkit!263
2023-01-19 16:32:58 +00:00
Evan Lezar
7649126248 Remove rpm-state directory instead of just single file. 2023-01-19 14:50:35 +00:00
Evan Lezar
104dca867f Merge branch 'fix-ldcache-list' into 'main'
Fix and refactor code related to reading LDCache

See merge request nvidia/container-toolkit/container-toolkit!266
2023-01-19 13:52:55 +00:00
Evan Lezar
881b1c0e08 introduce resolveSelected helper
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 14:10:55 +01:00
Evan Lezar
3537d76726 Further refactoring of ldcache code
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 14:10:36 +01:00
Evan Lezar
ccd1961c60 Ensure root is included in absolute ldcache paths
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 14:09:43 +01:00
Evan Lezar
f350f0c0bb Refactor resolving of links in ldcache
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 14:09:41 +01:00
Evan Lezar
80672d33af Continue instead of break on error when listing libraries
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 13:54:24 +01:00
Evan Lezar
7a1cfb48b9 Merge branch 'update-cdi' into 'main'
Determine the minumum required spec version

See merge request nvidia/container-toolkit/container-toolkit!265
2023-01-19 11:57:19 +00:00
Evan Lezar
ae3b213b0e Merge branch 'fix-cdi-library-container-path' into 'main'
Reuse mount discovery for driver libraries

See merge request nvidia/container-toolkit/container-toolkit!262
2023-01-19 11:52:21 +00:00
Evan Lezar
eaf9bdaeb4 Determine the minumum required spec version
This change uses functionality from the CDI package to determine
the minimum required CDI spec version. This allows for a spec with
the widest compatibility to be specified.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 12:14:00 +01:00
Evan Lezar
bc4bfb94a2 Update CDI package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 12:12:54 +01:00
Evan Lezar
a77331f8f0 Reuse mount discovery for driver libraries
This change implements the discovery of versioned driver libaries
by reusing the mounts and update ldcache discoverers use for, for example,
CVS file discovery. This allows the container paths to be correctly generated
without requiring specific manipulation.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 12:11:13 +01:00
Evan Lezar
94b7add334 Merge branch 'bugfix-nvidia-ctk' into 'main'
Fix nvidia-ctk path in spec generation

See merge request nvidia/container-toolkit/container-toolkit!264
2023-01-19 11:10:38 +00:00
Evan Lezar
9c9e6cd324 Fix nvidia-ctk path in spec generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 12:03:07 +01:00
Evan Lezar
f50efca73f Merge branch 'specify-nvidia-ctk-path' into 'main'
Make handling of nvidia-ctk path consistent

See merge request nvidia/container-toolkit/container-toolkit!261
2023-01-19 10:20:25 +00:00
Evan Lezar
19cfb2774d Use common code to construct nvidia-ctk hooks
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 10:37:10 +01:00
Evan Lezar
27347c98d9 Consolidate code to find nvidia-ctk
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 10:31:42 +01:00
Evan Lezar
ebbc47702d Remove 'Executable' from private struct member names
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-18 17:02:42 +01:00
Evan Lezar
09d42f0ad9 Remove 'Executable' from config struct member
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-18 17:02:42 +01:00
Evan Lezar
35df24d63a Make handling of nvidia-ctk path consistent
This change adds an --nvidia-ctk-path to the nvidia-ctk cdi generate
command. This ensures that the executable path for the generated
hooks can be specified consistently.

Since the NVIDIA Container Runtime already allows for the executable
path to be specified in the config the utility code to update the
LDCache and create other nvidia-ctk hooks are also updated.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-18 17:02:42 +01:00
Claudius Volz
f93b6a13f4 Preserve a temporary copy of nvidia-container-runtime-hook during rpm update,
to avoid being destroyed by when updating from <=1.10.0

Signed-off-by: Claudius Volz <c.volz@gmx.de>
2023-01-15 01:22:37 +01:00
Evan Lezar
50d7fb8f41 Merge branch 'missing-dra-devices' into 'main'
Ensure existence of DRM devices nodes is checked

See merge request nvidia/container-toolkit/container-toolkit!260
2022-12-13 12:42:04 +00:00
Evan Lezar
311e7a1feb Ensure existence of DRM devices nodes is checked
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-12 14:48:54 +01:00
1953 changed files with 607733 additions and 48083 deletions

View File

@@ -19,10 +19,10 @@ default:
variables:
GIT_SUBMODULE_STRATEGY: recursive
BUILDIMAGE: "${CI_REGISTRY_IMAGE}/build:${CI_COMMIT_SHORT_SHA}"
BUILD_MULTI_ARCH_IMAGES: "true"
stages:
- trigger
- image
- lint
- go-checks
@@ -33,92 +33,70 @@ stages:
- test
- scan
- release
- sign
.pipeline-trigger-rules:
rules:
# We trigger the pipeline if started manually
- if: $CI_PIPELINE_SOURCE == "web"
# We trigger the pipeline on the main branch
- if: $CI_COMMIT_BRANCH == "main"
# We trigger the pipeline on the release- branches
- if: $CI_COMMIT_BRANCH =~ /^release-.*$/
# We trigger the pipeline on tags
- if: $CI_COMMIT_TAG && $CI_COMMIT_TAG != ""
workflow:
rules:
# We trigger the pipeline on a merge request
- if: $CI_PIPELINE_SOURCE == 'merge_request_event'
# We then add all the regular triggers
- !reference [.pipeline-trigger-rules, rules]
# The main or manual job is used to filter out distributions or architectures that are not required on
# every build.
.main-or-manual:
rules:
- if: $CI_COMMIT_BRANCH == "main"
- if: $CI_COMMIT_TAG && $CI_COMMIT_TAG != ""
- !reference [.pipeline-trigger-rules, rules]
- if: $CI_PIPELINE_SOURCE == "schedule"
when: manual
# Define the distribution targets
.dist-amazonlinux2:
# The trigger-pipeline job adds a manualy triggered job to the pipeline on merge requests.
trigger-pipeline:
stage: trigger
script:
- echo "starting pipeline"
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: amazonlinux2
PACKAGE_REPO_TYPE: rpm
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
when: manual
allow_failure: false
- when: always
# Define the distribution targets
.dist-centos7:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: centos7
CVE_UPDATES: "cyrus-sasl-lib"
PACKAGE_REPO_TYPE: rpm
.dist-centos8:
variables:
DIST: centos8
CVE_UPDATES: "cyrus-sasl-lib"
PACKAGE_REPO_TYPE: rpm
.dist-debian10:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: debian10
PACKAGE_REPO_TYPE: debian
.dist-debian9:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: debian9
PACKAGE_REPO_TYPE: debian
.dist-fedora35:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: fedora35
PACKAGE_REPO_TYPE: rpm
.dist-opensuse-leap15.1:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: opensuse-leap15.1
PACKAGE_REPO_TYPE: rpm
.dist-ubi8:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: ubi8
CVE_UPDATES: "cyrus-sasl-lib"
PACKAGE_REPO_TYPE: rpm
.dist-ubuntu16.04:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: ubuntu16.04
PACKAGE_REPO_TYPE: debian
.dist-ubuntu18.04:
variables:
DIST: ubuntu18.04
CVE_UPDATES: "libsasl2-2 libsasl2-modules-db"
PACKAGE_REPO_TYPE: debian
.dist-ubuntu20.04:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: ubuntu20.04
CVE_UPDATES: "libsasl2-2 libsasl2-modules-db"
PACKAGE_REPO_TYPE: debian
.dist-packaging:
variables:
@@ -167,7 +145,7 @@ stages:
- docker login -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}" "${CI_REGISTRY}"
- docker pull "${IMAGE_NAME}:${VERSION}-${DIST}"
script:
- make -f build/container/Makefile test-${DIST}
- make -f deployments/container/Makefile test-${DIST}
# Define the test targets
test-packaging:
@@ -217,7 +195,7 @@ test-packaging:
# Since OUT_IMAGE_NAME and OUT_IMAGE_VERSION are set, this will push the CI image to the
# Target
- make -f build/container/Makefile push-${DIST}
- make -f deployments/container/Makefile push-${DIST}
# Define a staging release step that pushes an image to an internal "staging" repository
# This is triggered for all pipelines (i.e. not only tags) to test the pipeline steps
@@ -236,6 +214,8 @@ test-packaging:
.release:external:
extends:
- .release
variables:
FORCE_PUBLISH_IMAGES: "yes"
rules:
- if: $CI_COMMIT_TAG
variables:
@@ -245,13 +225,6 @@ test-packaging:
OUT_IMAGE_VERSION: "${DEVEL_RELEASE_IMAGE_VERSION}"
# Define the release jobs
release:staging-centos7:
extends:
- .release:staging
- .dist-centos7
needs:
- image-centos7
release:staging-ubi8:
extends:
- .release:staging
@@ -259,22 +232,15 @@ release:staging-ubi8:
needs:
- image-ubi8
release:staging-ubuntu18.04:
extends:
- .release:staging
- .dist-ubuntu18.04
needs:
- test-toolkit-ubuntu18.04
- test-containerd-ubuntu18.04
- test-crio-ubuntu18.04
- test-docker-ubuntu18.04
release:staging-ubuntu20.04:
extends:
- .release:staging
- .dist-ubuntu20.04
needs:
- image-ubuntu20.04
- test-toolkit-ubuntu20.04
- test-containerd-ubuntu20.04
- test-crio-ubuntu20.04
- test-docker-ubuntu20.04
release:staging-packaging:
extends:

3
.github/copy-pr-bot.yaml vendored Normal file
View File

@@ -0,0 +1,3 @@
# https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#configuration
enabled: true

120
.github/dependabot.yml vendored Normal file
View File

@@ -0,0 +1,120 @@
# Please see the documentation for all configuration options:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
version: 2
updates:
# main branch
- package-ecosystem: "gomod"
target-branch: main
directories:
- "/"
- "deployments/devel"
- "tests"
schedule:
interval: "daily"
labels:
- dependencies
groups:
k8sio:
patterns:
- k8s.io/*
exclude-patterns:
- k8s.io/klog/*
- package-ecosystem: "docker"
target-branch: main
directories:
# CUDA image
- "/deployments/container"
# Golang version
- "/deployments/devel"
schedule:
interval: "daily"
labels:
- dependencies
- package-ecosystem: "github-actions"
target-branch: main
directory: "/"
schedule:
interval: "daily"
labels:
- dependencies
# Allow dependabot to update the libnvidia-container submodule.
- package-ecosystem: "gitsubmodule"
target-branch: main
directory: "/"
allow:
- dependency-name: "third_party/libnvidia-container"
schedule:
interval: "daily"
labels:
- dependencies
- libnvidia-container
# The release branch(es):
- package-ecosystem: "gomod"
target-branch: release-1.17
directories:
- "/"
# We don't update development or test dependencies on release branches
# - "deployments/devel"
# - "tests"
schedule:
interval: "weekly"
day: "sunday"
labels:
- dependencies
- maintenance
ignore:
# For release branches we only consider patch updates.
- dependency-name: "*"
update-types:
- version-update:semver-major
- version-update:semver-minor
groups:
k8sio:
patterns:
- k8s.io/*
exclude-patterns:
- k8s.io/klog/*
- package-ecosystem: "docker"
target-branch: release-1.17
directories:
# CUDA image
- "/deployments/container"
# Golang version
- "/deployments/devel"
schedule:
interval: "weekly"
day: "sunday"
ignore:
# For release branches we only apply patch updates to the golang version.
- dependency-name: "*golang*"
update-types:
- version-update:semver-major
- version-update:semver-minor
labels:
- dependencies
- maintenance
- package-ecosystem: "github-actions"
target-branch: release-1.17
directory: "/"
schedule:
interval: "weekly"
day: "sunday"
labels:
- dependencies
- maintenance
# Github actions need to be gh-pages branches.
- package-ecosystem: "github-actions"
target-branch: gh-pages
directory: "/"
schedule:
interval: "daily"
labels:
- dependencies

53
.github/workflows/ci.yaml vendored Normal file
View File

@@ -0,0 +1,53 @@
# Copyright 2025 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: CI Pipeline
on:
push:
branches:
- "pull-request/[0-9]+"
- main
- release-*
jobs:
code-scanning:
uses: ./.github/workflows/code_scanning.yaml
variables:
runs-on: ubuntu-latest
outputs:
version: ${{ steps.version.outputs.version }}
steps:
- name: Generate Commit Short SHA
id: version
run: echo "version=$(echo $GITHUB_SHA | cut -c1-8)" >> "$GITHUB_OUTPUT"
golang:
uses: ./.github/workflows/golang.yaml
image:
uses: ./.github/workflows/image.yaml
needs: [variables, golang, code-scanning]
secrets: inherit
with:
version: ${{ needs.variables.outputs.version }}
build_multi_arch_images: ${{ github.ref_name == 'main' || startsWith(github.ref_name, 'release-') }}
e2e-test:
needs: [image, variables]
secrets: inherit
uses: ./.github/workflows/e2e.yaml
with:
version: ${{ needs.variables.outputs.version }}

49
.github/workflows/code_scanning.yaml vendored Normal file
View File

@@ -0,0 +1,49 @@
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: "CodeQL"
on:
workflow_call: {}
pull_request:
types:
- opened
- synchronize
branches:
- main
- release-*
jobs:
analyze:
name: Analyze Go code with CodeQL
runs-on: ubuntu-latest
timeout-minutes: 360
permissions:
security-events: write
packages: read
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: go
build-mode: manual
- shell: bash
run: |
make build
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3
with:
category: "/language:go"

98
.github/workflows/e2e.yaml vendored Normal file
View File

@@ -0,0 +1,98 @@
# Copyright 2025 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: End-to-end Tests
on:
workflow_call:
inputs:
version:
required: true
type: string
secrets:
AWS_ACCESS_KEY_ID:
required: true
AWS_SECRET_ACCESS_KEY:
required: true
AWS_SSH_KEY:
required: true
E2E_SSH_USER:
required: true
SLACK_BOT_TOKEN:
required: true
SLACK_CHANNEL_ID:
required: true
jobs:
e2e-tests:
runs-on: linux-amd64-cpu4
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Calculate build vars
id: vars
run: |
echo "COMMIT_SHORT_SHA=${GITHUB_SHA:0:8}" >> $GITHUB_ENV
echo "LOWERCASE_REPO_OWNER=$(echo "${GITHUB_REPOSITORY_OWNER}" | awk '{print tolower($0)}')" >> $GITHUB_ENV
GOLANG_VERSION=$(./hack/golang-version.sh)
echo "GOLANG_VERSION=${GOLANG_VERSION##GOLANG_VERSION := }" >> $GITHUB_ENV
- name: Install Go
uses: actions/setup-go@v5
with:
go-version: ${{ env.GOLANG_VERSION }}
- name: Set up Holodeck
uses: NVIDIA/holodeck@v0.2.6
with:
aws_access_key_id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws_secret_access_key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws_ssh_key: ${{ secrets.AWS_SSH_KEY }}
holodeck_config: "tests/e2e/infra/aws.yaml"
- name: Get public dns name
id: holodeck_public_dns_name
uses: mikefarah/yq@master
with:
cmd: yq '.status.properties[] | select(.name == "public-dns-name") | .value' /github/workspace/.cache/holodeck.yaml
- name: Run e2e tests
env:
IMAGE_NAME: ghcr.io/nvidia/container-toolkit
VERSION: ${{ inputs.version }}
SSH_KEY: ${{ secrets.AWS_SSH_KEY }}
E2E_SSH_USER: ${{ secrets.E2E_SSH_USER }}
E2E_SSH_HOST: ${{ steps.holodeck_public_dns_name.outputs.result }}
E2E_INSTALL_CTK: "true"
run: |
e2e_ssh_key=$(mktemp)
echo "$SSH_KEY" > "$e2e_ssh_key"
chmod 600 "$e2e_ssh_key"
export E2E_SSH_KEY="$e2e_ssh_key"
make -f tests/e2e/Makefile test
- name: Send Slack alert notification
if: ${{ failure() }}
uses: slackapi/slack-github-action@v2.0.0
with:
method: chat.postMessage
token: ${{ secrets.SLACK_BOT_TOKEN }}
payload: |
channel: ${{ secrets.SLACK_CHANNEL_ID }}
text: |
:x: On repository ${{ github.repository }}, the Workflow *${{ github.workflow }}* has failed.
Details: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}

83
.github/workflows/golang.yaml vendored Normal file
View File

@@ -0,0 +1,83 @@
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Golang
on:
workflow_call: {}
pull_request:
types:
- opened
- synchronize
branches:
- main
- release-*
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
name: Checkout code
- name: Get Golang version
id: vars
run: |
GOLANG_VERSION=$(./hack/golang-version.sh)
echo "GOLANG_VERSION=${GOLANG_VERSION##GOLANG_VERSION := }" >> $GITHUB_ENV
- name: Install Go
uses: actions/setup-go@v5
with:
go-version: ${{ env.GOLANG_VERSION }}
- name: Lint
uses: golangci/golangci-lint-action@v6
with:
version: latest
args: -v --timeout 5m
skip-cache: true
- name: Check golang modules
run: |
make check-vendor
make -C deployments/devel check-modules
test:
name: Unit test
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Get Golang version
id: vars
run: |
GOLANG_VERSION=$(./hack/golang-version.sh)
echo "GOLANG_VERSION=${GOLANG_VERSION##GOLANG_VERSION := }" >> $GITHUB_ENV
- name: Install Go
uses: actions/setup-go@v5
with:
go-version: ${{ env.GOLANG_VERSION }}
- run: make test
build:
name: Build
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Get Golang version
id: vars
run: |
GOLANG_VERSION=$(./hack/golang-version.sh)
echo "GOLANG_VERSION=${GOLANG_VERSION##GOLANG_VERSION ?= }" >> $GITHUB_ENV
- name: Install Go
uses: actions/setup-go@v5
with:
go-version: ${{ env.GOLANG_VERSION }}
- run: make build

126
.github/workflows/image.yaml vendored Normal file
View File

@@ -0,0 +1,126 @@
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Run this workflow on pull requests
name: image
on:
workflow_call:
inputs:
version:
required: true
type: string
build_multi_arch_images:
required: true
type: string
jobs:
packages:
runs-on: linux-amd64-cpu4
strategy:
matrix:
target:
- ubuntu18.04-arm64
- ubuntu18.04-amd64
- ubuntu18.04-ppc64le
- centos7-aarch64
- centos7-x86_64
- centos8-ppc64le
ispr:
- ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}
exclude:
- ispr: true
target: ubuntu18.04-arm64
- ispr: true
target: ubuntu18.04-ppc64le
- ispr: true
target: centos7-aarch64
- ispr: true
target: centos8-ppc64le
fail-fast: false
steps:
- uses: actions/checkout@v4
name: Check out code
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
with:
image: tonistiigi/binfmt:master
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: build ${{ matrix.target }} packages
run: |
sudo apt-get install -y coreutils build-essential sed git bash make
echo "Building packages"
./scripts/build-packages.sh ${{ matrix.target }}
- name: 'Upload Artifacts'
uses: actions/upload-artifact@v4
with:
compression-level: 0
name: toolkit-container-${{ matrix.target }}-${{ github.run_id }}
path: ${{ github.workspace }}/dist/*
image:
runs-on: linux-amd64-cpu4
strategy:
matrix:
dist:
- ubuntu20.04
- ubi8
- packaging
ispr:
- ${{ github.ref_name != 'main' && !startsWith( github.ref_name, 'release-' ) }}
exclude:
- ispr: true
dist: ubi8
needs: packages
steps:
- uses: actions/checkout@v4
name: Check out code
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
with:
image: tonistiigi/binfmt:master
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Get built packages
uses: actions/download-artifact@v4
with:
path: ${{ github.workspace }}/dist/
pattern: toolkit-container-*-${{ github.run_id }}
merge-multiple: true
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build image
env:
IMAGE_NAME: ghcr.io/nvidia/container-toolkit
VERSION: ${{ inputs.version }}
PUSH_ON_BUILD: "true"
BUILD_MULTI_ARCH_IMAGES: ${{ inputs.build_multi_arch_images }}
run: |
echo "${VERSION}"
make -f deployments/container/Makefile build-${{ matrix.dist }}

38
.github/workflows/release.yaml vendored Normal file
View File

@@ -0,0 +1,38 @@
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Run this workflow on new tags
name: Release
on:
push:
tags:
- v*
jobs:
release:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
name: Check out code
- name: Prepare Artifacts
run: |
./hack/prepare-artifacts.sh ${{ github.ref_name }}
- name: Create Draft Release
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
./hack/create-release.sh ${{ github.ref_name }}

8
.gitignore vendored
View File

@@ -1,11 +1,13 @@
dist
/dist
/artifacts
*.swp
*.swo
/coverage.out*
/test/output/
/tests/output/
/nvidia-container-runtime
/nvidia-container-runtime.*
/nvidia-container-runtime-hook
/nvidia-container-toolkit
/nvidia-ctk
/shared-*
/release-*
/release-*

View File

@@ -15,68 +15,6 @@
include:
- .common-ci.yml
build-dev-image:
stage: image
script:
- apk --no-cache add make bash
- make .build-image
- docker login -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}" "${CI_REGISTRY}"
- make .push-build-image
.requires-build-image:
image: "${BUILDIMAGE}"
.go-check:
extends:
- .requires-build-image
stage: go-checks
fmt:
extends:
- .go-check
script:
- make assert-fmt
vet:
extends:
- .go-check
script:
- make vet
lint:
extends:
- .go-check
script:
- make lint
allow_failure: true
ineffassign:
extends:
- .go-check
script:
- make ineffassign
allow_failure: true
misspell:
extends:
- .go-check
script:
- make misspell
go-build:
extends:
- .requires-build-image
stage: go-build
script:
- make build
unit-tests:
extends:
- .requires-build-image
stage: unit-tests
script:
- make coverage
# Define the package build helpers
.multi-arch-build:
before_script:
@@ -102,25 +40,35 @@ unit-tests:
name: ${ARTIFACTS_NAME}
paths:
- ${ARTIFACTS_ROOT}
needs:
- job: package-meta-packages
artifacts: true
# Define the package build targets
package-amazonlinux2-aarch64:
package-meta-packages:
extends:
- .package-build
- .dist-amazonlinux2
- .arch-aarch64
- .package-artifacts
stage: package-build
variables:
SKIP_LIBNVIDIA_CONTAINER: "yes"
SKIP_NVIDIA_CONTAINER_TOOLKIT: "yes"
parallel:
matrix:
- PACKAGING: [deb, rpm]
before_script:
- apk add --no-cache coreutils build-base sed git bash make
script:
- ./scripts/build-packages.sh ${PACKAGING}
artifacts:
name: ${ARTIFACTS_NAME}
paths:
- ${ARTIFACTS_ROOT}
package-amazonlinux2-x86_64:
extends:
- .package-build
- .dist-amazonlinux2
- .arch-x86_64
package-centos7-ppc64le:
package-centos7-aarch64:
extends:
- .package-build
- .dist-centos7
- .arch-ppc64le
- .arch-aarch64
package-centos7-x86_64:
extends:
@@ -128,66 +76,12 @@ package-centos7-x86_64:
- .dist-centos7
- .arch-x86_64
package-centos8-aarch64:
extends:
- .package-build
- .dist-centos8
- .arch-aarch64
package-centos8-ppc64le:
extends:
- .package-build
- .dist-centos8
- .arch-ppc64le
package-centos8-x86_64:
extends:
- .package-build
- .dist-centos8
- .arch-x86_64
package-debian10-amd64:
extends:
- .package-build
- .dist-debian10
- .arch-amd64
package-debian9-amd64:
extends:
- .package-build
- .dist-debian9
- .arch-amd64
package-fedora35-aarch64:
extends:
- .package-build
- .dist-fedora35
- .arch-aarch64
package-fedora35-x86_64:
extends:
- .package-build
- .dist-fedora35
- .arch-x86_64
package-opensuse-leap15.1-x86_64:
extends:
- .package-build
- .dist-opensuse-leap15.1
- .arch-x86_64
package-ubuntu16.04-amd64:
extends:
- .package-build
- .dist-ubuntu16.04
- .arch-amd64
package-ubuntu16.04-ppc64le:
extends:
- .package-build
- .dist-ubuntu16.04
- .arch-ppc64le
package-ubuntu18.04-amd64:
extends:
- .package-build
@@ -228,20 +122,11 @@ package-ubuntu18.04-ppc64le:
before_script:
- !reference [.buildx-setup, before_script]
- apk add --no-cache bash make
- apk add --no-cache bash make git
- 'echo "Logging in to CI registry ${CI_REGISTRY}"'
- docker login -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}" "${CI_REGISTRY}"
script:
- make -f build/container/Makefile build-${DIST}
image-centos7:
extends:
- .image-build
- .package-artifacts
- .dist-centos7
needs:
- package-centos7-ppc64le
- package-centos7-x86_64
- make -f deployments/container/Makefile build-${DIST}
image-ubi8:
extends:
@@ -249,21 +134,9 @@ image-ubi8:
- .package-artifacts
- .dist-ubi8
needs:
# Note: The ubi8 image uses the centos8 packages
- package-centos8-aarch64
- package-centos8-x86_64
- package-centos8-ppc64le
image-ubuntu18.04:
extends:
- .image-build
- .package-artifacts
- .dist-ubuntu18.04
needs:
- package-ubuntu18.04-amd64
- package-ubuntu18.04-arm64
- job: package-ubuntu18.04-ppc64le
optional: true
# Note: The ubi8 image uses the centos7 packages
- package-centos7-aarch64
- package-centos7-x86_64
image-ubuntu20.04:
extends:
@@ -273,7 +146,8 @@ image-ubuntu20.04:
needs:
- package-ubuntu18.04-amd64
- package-ubuntu18.04-arm64
- package-ubuntu18.04-ppc64le
- job: package-ubuntu18.04-ppc64le
optional: true
# The DIST=packaging target creates an image containing all built packages
image-packaging:
@@ -282,15 +156,13 @@ image-packaging:
- .package-artifacts
- .dist-packaging
needs:
- job: package-centos8-aarch64
- job: package-centos8-x86_64
- job: package-ubuntu18.04-amd64
- job: package-ubuntu18.04-arm64
- job: package-amazonlinux2-aarch64
optional: true
- job: package-amazonlinux2-x86_64
optional: true
- job: package-centos7-ppc64le
- job: package-centos7-aarch64
optional: true
- job: package-centos7-x86_64
optional: true
@@ -298,28 +170,12 @@ image-packaging:
optional: true
- job: package-debian10-amd64
optional: true
- job: package-debian9-amd64
optional: true
- job: package-fedora35-aarch64
optional: true
- job: package-fedora35-x86_64
optional: true
- job: package-opensuse-leap15.1-x86_64
optional: true
- job: package-ubuntu16.04-amd64
optional: true
- job: package-ubuntu16.04-ppc64le
optional: true
- job: package-ubuntu18.04-ppc64le
optional: true
# Define publish test helpers
.test:toolkit:
extends:
- .integration
variables:
TEST_CASES: "toolkit"
.test:docker:
extends:
- .integration
@@ -343,31 +199,30 @@ image-packaging:
TEST_CASES: "crio"
# Define the test targets
test-toolkit-ubuntu18.04:
test-toolkit-ubuntu20.04:
extends:
- .test:toolkit
- .dist-ubuntu18.04
- .dist-ubuntu20.04
needs:
- image-ubuntu18.04
- image-ubuntu20.04
test-containerd-ubuntu18.04:
test-containerd-ubuntu20.04:
extends:
- .test:containerd
- .dist-ubuntu18.04
- .dist-ubuntu20.04
needs:
- image-ubuntu18.04
- image-ubuntu20.04
test-crio-ubuntu18.04:
test-crio-ubuntu20.04:
extends:
- .test:crio
- .dist-ubuntu18.04
- .dist-ubuntu20.04
needs:
- image-ubuntu18.04
- image-ubuntu20.04
test-docker-ubuntu18.04:
test-docker-ubuntu20.04:
extends:
- .test:docker
- .dist-ubuntu18.04
- .dist-ubuntu20.04
needs:
- image-ubuntu18.04
- image-ubuntu20.04

10
.gitmodules vendored
View File

@@ -1,12 +1,4 @@
[submodule "third_party/libnvidia-container"]
path = third_party/libnvidia-container
url = https://gitlab.com/nvidia/container-toolkit/libnvidia-container.git
branch = main
[submodule "third_party/nvidia-container-runtime"]
path = third_party/nvidia-container-runtime
url = https://gitlab.com/nvidia/container-toolkit/container-runtime.git
branch = main
[submodule "third_party/nvidia-docker"]
path = third_party/nvidia-docker
url = https://gitlab.com/nvidia/container-toolkit/nvidia-docker.git
url = https://github.com/NVIDIA/libnvidia-container.git
branch = main

43
.golangci.yml Normal file
View File

@@ -0,0 +1,43 @@
run:
timeout: 10m
linters:
enable:
- contextcheck
- gocritic
- gofmt
- goimports
- gosec
- gosimple
- govet
- ineffassign
- misspell
- staticcheck
- unconvert
linters-settings:
goimports:
local-prefixes: github.com/NVIDIA/nvidia-container-toolkit
issues:
exclude:
# The legacy hook relies on spec.Hooks.Prestart, which is deprecated as of the v1.2.0 OCI runtime spec.
- "SA1019:(.+).Prestart is deprecated(.+)"
# TODO: We should address each of the following integer overflows.
- "G115: integer overflow conversion(.+)"
exclude-rules:
# Exclude the gocritic dupSubExpr issue for cgo files.
- path: internal/dxcore/dxcore.go
linters:
- gocritic
text: dupSubExpr
# Exclude the checks for usage of returns to config.Delete(Path) in the crio and containerd config packages.
- path: pkg/config/engine/
linters:
- errcheck
text: config.Delete
# RENDERD refers to the Render Device and not the past tense of render.
- path: .*.go
linters:
- misspell
text: "`RENDERD` is a misspelling of `RENDERED`"

View File

@@ -33,11 +33,11 @@ variables:
# On the multi-arch builder we don't need the qemu setup.
SKIP_QEMU_SETUP: "1"
# Define the public staging registry
STAGING_REGISTRY: registry.gitlab.com/nvidia/container-toolkit/container-toolkit/staging
STAGING_REGISTRY: ghcr.io/nvidia
STAGING_VERSION: ${CI_COMMIT_SHORT_SHA}
ARTIFACTORY_REPO_BASE: "https://urm.nvidia.com/artifactory/sw-gpu-cloudnative"
# TODO: We should set the kitmaker release folder here once we have the end-to-end workflow set up
KITMAKER_RELEASE_FOLDER: "testing"
KITMAKER_RELEASE_FOLDER: "kitmaker"
PACKAGE_ARCHIVE_RELEASE_FOLDER: "releases"
.image-pull:
stage: image-build
@@ -67,23 +67,13 @@ variables:
regctl manifest get ${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST} --list > /dev/null && echo "${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST}" || ( echo "${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST} does not exist" && sleep infinity )
script:
- regctl registry login "${OUT_REGISTRY}" -u "${OUT_REGISTRY_USER}" -p "${OUT_REGISTRY_TOKEN}"
- make -f build/container/Makefile IMAGE=${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST} OUT_IMAGE=${OUT_IMAGE_NAME}:${CI_COMMIT_SHORT_SHA}-${DIST} push-${DIST}
image-centos7:
extends:
- .dist-centos7
- .image-pull
- make -f deployments/container/Makefile IMAGE=${IN_REGISTRY}/${IN_IMAGE_NAME}:${IN_VERSION}-${DIST} OUT_IMAGE=${OUT_IMAGE_NAME}:${CI_COMMIT_SHORT_SHA}-${DIST} push-${DIST}
image-ubi8:
extends:
- .dist-ubi8
- .image-pull
image-ubuntu18.04:
extends:
- .dist-ubuntu18.04
- .image-pull
image-ubuntu20.04:
extends:
- .dist-ubuntu20.04
@@ -95,109 +85,6 @@ image-packaging:
- .dist-packaging
- .image-pull
# Define the package release targets
release:packages:amazonlinux2-aarch64:
extends:
- .release:packages
- .dist-amazonlinux2
- .arch-aarch64
release:packages:amazonlinux2-x86_64:
extends:
- .release:packages
- .dist-amazonlinux2
- .arch-x86_64
release:packages:centos7-ppc64le:
extends:
- .release:packages
- .dist-centos7
- .arch-ppc64le
release:packages:centos7-x86_64:
extends:
- .release:packages
- .dist-centos7
- .arch-x86_64
release:packages:centos8-aarch64:
extends:
- .release:packages
- .dist-centos8
- .arch-aarch64
release:packages:centos8-ppc64le:
extends:
- .release:packages
- .dist-centos8
- .arch-ppc64le
release:packages:centos8-x86_64:
extends:
- .release:packages
- .dist-centos8
- .arch-x86_64
release:packages:debian10-amd64:
extends:
- .release:packages
- .dist-debian10
- .arch-amd64
release:packages:debian9-amd64:
extends:
- .release:packages
- .dist-debian9
- .arch-amd64
release:packages:fedora35-aarch64:
extends:
- .release:packages
- .dist-fedora35
- .arch-aarch64
release:packages:fedora35-x86_64:
extends:
- .release:packages
- .dist-fedora35
- .arch-x86_64
release:packages:opensuse-leap15.1-x86_64:
extends:
- .release:packages
- .dist-opensuse-leap15.1
- .arch-x86_64
release:packages:ubuntu16.04-amd64:
extends:
- .release:packages
- .dist-ubuntu16.04
- .arch-amd64
release:packages:ubuntu16.04-ppc64le:
extends:
- .release:packages
- .dist-ubuntu16.04
- .arch-ppc64le
release:packages:ubuntu18.04-amd64:
extends:
- .release:packages
- .dist-ubuntu18.04
- .arch-amd64
release:packages:ubuntu18.04-arm64:
extends:
- .release:packages
- .dist-ubuntu18.04
- .arch-arm64
release:packages:ubuntu18.04-ppc64le:
extends:
- .release:packages
- .dist-ubuntu18.04
- .arch-ppc64le
# We skip the integration tests for the internal CI:
.integration:
stage: test
@@ -213,7 +100,7 @@ release:packages:ubuntu18.04-ppc64le:
image: "${PULSE_IMAGE}"
variables:
IMAGE: "${CI_REGISTRY_IMAGE}/container-toolkit:${CI_COMMIT_SHORT_SHA}-${DIST}"
IMAGE_ARCHIVE: "container-toolkit.tar"
IMAGE_ARCHIVE: "container-toolkit-${DIST}-${ARCH}-${CI_JOB_ID}.tar"
rules:
- if: $SKIP_SCANS != "yes"
- when: manual
@@ -228,6 +115,7 @@ release:packages:ubuntu18.04-ppc64le:
- if [ -z "$SSA_TOKEN" ]; then exit 1; else echo "SSA_TOKEN set!"; fi
script:
- pulse-cli -n $NSPECT_ID --ssa $SSA_TOKEN scan -i $IMAGE_ARCHIVE -p $CONTAINER_POLICY -o
- rm -f "${IMAGE_ARCHIVE}"
artifacts:
when: always
expire_in: 1 week
@@ -239,31 +127,6 @@ release:packages:ubuntu18.04-ppc64le:
- policy_evaluation.json
# Define the scan targets
scan-centos7-amd64:
extends:
- .dist-centos7
- .platform-amd64
- .scan
needs:
- image-centos7
scan-centos7-arm64:
extends:
- .dist-centos7
- .platform-arm64
- .scan
needs:
- image-centos7
- scan-centos7-amd64
scan-ubuntu18.04-amd64:
extends:
- .dist-ubuntu18.04
- .platform-amd64
- .scan
needs:
- image-ubuntu18.04
scan-ubuntu20.04-amd64:
extends:
- .dist-ubuntu20.04
@@ -298,6 +161,13 @@ scan-ubi8-arm64:
- image-ubi8
- scan-ubi8-amd64
scan-packaging:
extends:
- .dist-packaging
- .scan
needs:
- image-packaging
# Define external release helpers
.release:ngc:
extends:
@@ -319,35 +189,47 @@ scan-ubi8-arm64:
PACKAGE_REGISTRY_TOKEN: "${CI_REGISTRY_PASSWORD}"
PACKAGE_IMAGE_NAME: "${CI_REGISTRY_IMAGE}/container-toolkit"
PACKAGE_IMAGE_TAG: "${CI_COMMIT_SHORT_SHA}-packaging"
PACKAGE_ARTIFACTORY_REPO: "${ARTIFACTORY_REPO_BASE}-${PACKAGE_REPO_TYPE}-local"
KITMAKER_ARTIFACTORY_REPO: "${ARTIFACTORY_REPO_BASE}-generic-local/${KITMAKER_RELEASE_FOLDER}"
ARTIFACTS_DIR: "${CI_PROJECT_DIR}/artifacts"
script:
- !reference [.regctl-setup, before_script]
- apk add --no-cache bash
- apk add --no-cache bash git
- regctl registry login "${PACKAGE_REGISTRY}" -u "${PACKAGE_REGISTRY_USER}" -p "${PACKAGE_REGISTRY_TOKEN}"
- ./scripts/extract-packages.sh "${PACKAGE_IMAGE_NAME}:${PACKAGE_IMAGE_TAG}" "${DIST}-${ARCH}"
# TODO: ./scripts/release-packages-artifactory.sh "${DIST}-${ARCH}" "${PACKAGE_ARTIFACTORY_REPO}"
- ./scripts/release-kitmaker-artifactory.sh "${DIST}-${ARCH}" "${KITMAKER_ARTIFACTORY_REPO}"
- ./scripts/extract-packages.sh "${PACKAGE_IMAGE_NAME}:${PACKAGE_IMAGE_TAG}"
- ./scripts/release-kitmaker-artifactory.sh "${KITMAKER_ARTIFACTORY_REPO}"
- rm -rf ${ARTIFACTS_DIR}
release:staging-ubuntu18.04:
# Define the package release targets
release:packages:kitmaker:
extends:
- .release:packages
release:archive:
extends:
- .release:external
needs:
- image-packaging
variables:
VERSION: "${CI_COMMIT_SHORT_SHA}"
PACKAGE_REGISTRY: "${CI_REGISTRY}"
PACKAGE_REGISTRY_USER: "${CI_REGISTRY_USER}"
PACKAGE_REGISTRY_TOKEN: "${CI_REGISTRY_PASSWORD}"
PACKAGE_IMAGE_NAME: "${CI_REGISTRY_IMAGE}/container-toolkit"
PACKAGE_IMAGE_TAG: "${CI_COMMIT_SHORT_SHA}-packaging"
PACKAGE_ARCHIVE_ARTIFACTORY_REPO: "${ARTIFACTORY_REPO_BASE}-generic-local/${PACKAGE_ARCHIVE_RELEASE_FOLDER}"
script:
- apk add --no-cache bash git
- ./scripts/archive-packages.sh "${PACKAGE_ARCHIVE_ARTIFACTORY_REPO}"
release:staging-ubuntu20.04:
extends:
- .release:staging
- .dist-ubuntu18.04
- .dist-ubuntu20.04
needs:
- image-ubuntu18.04
- image-ubuntu20.04
# Define the external release targets
# Release to NGC
release:ngc-centos7:
extends:
- .dist-centos7
- .release:ngc
release:ngc-ubuntu18.04:
extends:
- .dist-ubuntu18.04
- .release:ngc
release:ngc-ubuntu20.04:
extends:
- .dist-ubuntu20.04
@@ -357,3 +239,67 @@ release:ngc-ubi8:
extends:
- .dist-ubi8
- .release:ngc
release:ngc-packaging:
extends:
- .dist-packaging
- .release:ngc
# Define the external image signing steps for NGC
# Download the ngc cli binary for use in the sign steps
.ngccli-setup:
before_script:
- apt-get update && apt-get install -y curl unzip jq
- |
if [ -z "${NGCCLI_VERSION}" ]; then
NGC_VERSION_URL="https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions"
# Extract the latest version from the JSON data using jq
export NGCCLI_VERSION=$(curl -s $NGC_VERSION_URL | jq -r '.recipe.latestVersionIdStr')
fi
echo "NGCCLI_VERSION ${NGCCLI_VERSION}"
- curl -sSLo ngccli_linux.zip https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/${NGCCLI_VERSION}/files/ngccli_linux.zip
- unzip ngccli_linux.zip
- chmod u+x ngc-cli/ngc
# .sign forms the base of the deployment jobs which signs images in the CI registry.
# This is extended with the image name and version to be deployed.
.sign:ngc:
image: ubuntu:latest
stage: sign
rules:
- if: $CI_COMMIT_TAG
variables:
NGC_CLI_API_KEY: "${NGC_REGISTRY_TOKEN}"
IMAGE_NAME: "${NGC_REGISTRY_IMAGE}"
IMAGE_TAG: "${CI_COMMIT_TAG}-${DIST}"
retry:
max: 2
before_script:
- !reference [.ngccli-setup, before_script]
# We ensure that the IMAGE_NAME and IMAGE_TAG is set
- 'echo Image Name: ${IMAGE_NAME} && [[ -n "${IMAGE_NAME}" ]] || exit 1'
- 'echo Image Tag: ${IMAGE_TAG} && [[ -n "${IMAGE_TAG}" ]] || exit 1'
script:
- 'echo "Signing the image ${IMAGE_NAME}:${IMAGE_TAG}"'
- ngc-cli/ngc registry image publish --source ${IMAGE_NAME}:${IMAGE_TAG} ${IMAGE_NAME}:${IMAGE_TAG} --public --discoverable --allow-guest --sign --org nvidia
sign:ngc-ubuntu20.04:
extends:
- .dist-ubuntu20.04
- .sign:ngc
needs:
- release:ngc-ubuntu20.04
sign:ngc-ubi8:
extends:
- .dist-ubi8
- .sign:ngc
needs:
- release:ngc-ubi8
sign:ngc-packaging:
extends:
- .dist-packaging
- .sign:ngc
needs:
- release:ngc-packaging

View File

@@ -1,7 +1,335 @@
# NVIDIA Container Toolkit Changelog
## v1.17.4
- Disable mounting of compat libs from container by default
- Add allow-cuda-compat-libs-from-container feature flag
- Skip graphics modifier in CSV mode
- Properly pass configSearchPaths to a Driver constructor
- Add support for containerd version 3 config
- Add string TOML source
### Changes in libnvidia-container
- Add no-cntlibs CLI option to nvidia-container-cli
### Changes in the Toolkit Container
- Bump CUDA base image version to 12.6.3
## v1.17.3
- Only allow host-relative LDConfig paths by default.
### Changes in libnvidia-container
- Create virtual copy of host ldconfig binary before calling fexecve()
## v1.17.2
- Fixed a bug where legacy images would set imex channels as `all`.
## v1.17.1
- Fixed a bug where specific symlinks existing in a container image could cause a container to fail to start.
- Fixed a bug on Tegra-based systems where a container would fail to start.
- Fixed a bug where the default container runtime config path was not properly set.
### Changes in the Toolkit Container
- Fallback to using a config file if the current runtime config can not be determined from the command line.
## v1.17.0
- Promote v1.17.0-rc.2 to v1.17.0
- Fix bug when using just-in-time CDI spec generation
- Check for valid paths in create-symlinks hook
## v1.17.0-rc.2
- Fix bug in locating libcuda.so from ldcache
- Fix bug in sorting of symlink chain
- Remove unsupported print-ldcache command
- Remove csv-filename support from create-symlinks
### Changes in the Toolkit Container
- Fallback to `crio-status` if `crio status` does not work when configuring the crio runtime
## v1.17.0-rc.1
- Allow IMEX channels to be requested as volume mounts
- Fix typo in error message
- Add disable-imex-channel-creation feature flag
- Add -z,lazy to LDFLAGS
- Add imex channels to management CDI spec
- Add support to fetch current container runtime config from the command line.
- Add creation of select driver symlinks to CDI spec generation.
- Remove support for config overrides when configuring runtimes.
- Skip explicit creation of libnvidia-allocator.so.1 symlink
- Add vdpau as as a driver library search path.
- Add support for using libnvsandboxutils to generate CDI specifications.
### Changes in the Toolkit Container
- Allow opt-in features to be selected when deploying the toolkit-container.
- Bump CUDA base image version to 12.6.2
- Remove support for config overrides when configuring runtimes.
### Changes in libnvidia-container
- Add no-create-imex-channels command line option.
## v1.16.2
- Exclude libnvidia-allocator from graphics mounts. This fixes a bug that leaks mounts when a container is started with bi-directional mount propagation.
- Use empty string for default runtime-config-override. This removes a redundant warning for runtimes (e.g. Docker) where this is not applicable.
### Changes in the Toolkit Container
- Bump CUDA base image version to 12.6.0
### Changes in libnvidia-container
- Add no-gsp-firmware command line option
- Add no-fabricmanager command line option
- Add no-persistenced command line option
- Skip directories and symlinks when mounting libraries.
## v1.16.1
- Fix bug with processing errors during CDI spec generation for MIG devices
## v1.16.0
- Promote v1.16.0-rc.2 to v1.16.0
### Changes in the Toolkit Container
- Bump CUDA base image version to 12.5.1
## v1.16.0-rc.2
- Use relative path to locate driver libraries
- Add RelativeToRoot function to Driver
- Inject additional libraries for full X11 functionality
- Extract options from default runtime if runc does not exist
- Avoid using map pointers as maps are always passed by reference
- Reduce logging for the NVIDIA Container runtime
- Fix bug in argument parsing for logger creation
## v1.16.0-rc.1
- Support vulkan ICD files directly in a driver root. This allows for the discovery of vulkan files in GKE driver installations.
- Increase priority of ld.so.conf.d config file injected into container. This ensures that injected libraries are preferred over libraries present in the container.
- Set default CDI spec permissions to 644. This fixes permission issues when using the `nvidia-ctk cdi transform` functions.
- Add `dev-root` option to `nvidia-ctk system create-device-nodes` command.
- Fix location of `libnvidia-ml.so.1` when a non-standard driver root is used. This enabled CDI spec generation when using the driver container on a host.
- Recalculate minimum required CDI spec version on save.
- Move `nvidia-ctk hook` commands to a separate `nvidia-cdi-hook` binary. The same subcommands are supported.
- Use `:` as an `nvidia-ctk config --set` list separator. This fixes a bug when trying to set config options that are lists.
- [toolkit-container] Bump CUDA base image version to 12.5.0
- [toolkit-container] Allow the path to `toolkit.pid` to be specified directly.
- [toolkit-container] Remove provenance information from image manifests.
- [toolkit-container] Add `dev-root` option when configuring the toolkit. This adds support for GKE driver installations.
## v1.15.0
* Remove `nvidia-container-runtime` and `nvidia-docker2` packages.
* Use `XDG_DATA_DIRS` environment variable when locating config files such as graphics config files.
* Add support for v0.7.0 Container Device Interface (CDI) specification.
* Add `--config-search-path` option to `nvidia-ctk cdi generate` command. These paths are used when locating driver files such as graphics config files.
* Use D3DKMTEnumAdapters3 to enumerate adpaters on WSL2 if available.
* Add support for v1.2.0 OCI Runtime specification.
* Explicitly set `NVIDIA_VISIBLE_DEVICES=void` in generated CDI specifications. This prevents the NVIDIA Container Runtime from making additional modifications.
* [libnvidia-container] Use D3DKMTEnumAdapters3 to enumerate adpaters on WSL2 if available.
* [toolkit-container] Bump CUDA base image version to 12.4.1
## v1.15.0-rc.4
* Add a `--spec-dir` option to the `nvidia-ctk cdi generate` command. This allows specs outside of `/etc/cdi` and `/var/run/cdi` to be processed.
* Add support for extracting device major number from `/proc/devices` if `nvidia` is used as a device name over `nvidia-frontend`.
* Allow multiple device naming strategies for `nvidia-ctk cdi generate` command. This allows a single
CDI spec to be generated that includes GPUs by index and UUID.
* Set the default `--device-name-strategy` for the `nvidia-ctk cdi generate` command to `[index, uuid]`.
* Remove `libnvidia-container0` jetpack dependency included for legacy Tegra-based systems.
* Add `NVIDIA_VISIBLE_DEVICES=void` to generated CDI specifications.
* [toolkit-container] Remove centos7 image. The ubi8 image can be used on all RPM-based platforms.
* [toolkit-container] Bump CUDA base image version to 12.3.2
## v1.15.0-rc.3
* Fix bug in `nvidia-ctk hook update-ldcache` where default `--ldconfig-path` value was not applied.
## v1.15.0-rc.2
* Extend the `runtime.nvidia.com/gpu` CDI kind to support full-GPUs and MIG devices specified by index or UUID.
* Fix bug when specifying `--dev-root` for Tegra-based systems.
* Log explicitly requested runtime mode.
* Remove package dependency on libseccomp.
* Added detection of libnvdxgdmal.so.1 on WSL2
* Use devRoot to resolve MIG device nodes.
* Fix bug in determining default nvidia-container-runtime.user config value on SUSE-based systems.
* Add `crun` to the list of configured low-level runtimes.
* Added support for `--ldconfig-path` to `nvidia-ctk cdi generate` command.
* Fix `nvidia-ctk runtime configure --cdi.enabled` for Docker.
* Add discovery of the GDRCopy device (`gdrdrv`) if the `NVIDIA_GDRCOPY` environment variable of the container is set to `enabled`
* [toolkit-container] Bump CUDA base image version to 12.3.1.
## v1.15.0-rc.1
* Skip update of ldcache in containers without ldconfig. The .so.SONAME symlinks are still created.
* Normalize ldconfig path on use. This automatically adjust the ldconfig setting applied to ldconfig.real on systems where this exists.
* Include `nvidia/nvoptix.bin` in list of graphics mounts.
* Include `vulkan/icd.d/nvidia_layers.json` in list of graphics mounts.
* Add support for `--library-search-paths` to `nvidia-ctk cdi generate` command.
* Add support for injecting /dev/nvidia-nvswitch* devices if the NVIDIA_NVSWITCH=enabled envvar is specified.
* Added support for `nvidia-ctk runtime configure --enable-cdi` for the `docker` runtime. Note that this requires Docker >= 25.
* Fixed bug in `nvidia-ctk config` command when using `--set`. The types of applied config options are now applied correctly.
* Add `--relative-to` option to `nvidia-ctk transform root` command. This controls whether the root transformation is applied to host or container paths.
* Added automatic CDI spec generation when the `runtime.nvidia.com/gpu=all` device is requested by a container.
* [libnvidia-container] Fix device permission check when using cgroupv2 (fixes #227)
## v1.14.3
* [toolkit-container] Bump CUDA base image version to 12.2.2.
## v1.14.2
* Fix bug on Tegra-based systems where symlinks were not created in containers.
* Add --csv.ignore-pattern command line option to nvidia-ctk cdi generate command.
## v1.14.1
* Fixed bug where contents of `/etc/nvidia-container-runtime/config.toml` is ignored by the NVIDIA Container Runtime Hook.
* [libnvidia-container] Use libelf.so on RPM-based systems due to removed mageia repositories hosting pmake and bmake.
## v1.14.0
* Promote v1.14.0-rc.3 to v1.14.0
## v1.14.0-rc.3
* Added support for generating OCI hook JSON file to `nvidia-ctk runtime configure` command.
* Remove installation of OCI hook JSON from RPM package.
* Refactored config for `nvidia-container-runtime-hook`.
* Added a `nvidia-ctk config` command which supports setting config options using a `--set` flag.
* Added `--library-search-path` option to `nvidia-ctk cdi generate` command in `csv` mode. This allows folders where
libraries are located to be specified explicitly.
* Updated go-nvlib to support devices which are not present in the PCI device database. This allows the creation of dev/char symlinks on systems with such devices installed.
* Added `UsesNVGPUModule` info function for more robust platform detection. This is required on Tegra-based systems where libnvidia-ml.so is also supported.
* [toolkit-container] Set `NVIDIA_VISIBLE_DEVICES=void` to prevent injection of NVIDIA devices and drivers into the NVIDIA Container Toolkit container.
## v1.14.0-rc.2
* Fix bug causing incorrect nvidia-smi symlink to be created on WSL2 systems with multiple driver roots.
* Remove dependency on coreutils when installing package on RPM-based systems.
* Create output folders if required when running `nvidia-ctk runtime configure`
* Generate default config as post-install step.
* Added support for detecting GSP firmware at custom paths when generating CDI specifications.
* Added logic to skip the extraction of image requirements if `NVIDIA_DISABLE_REQUIRES` is set to `true`.
* [libnvidia-container] Include Shared Compiler Library (libnvidia-gpucomp.so) in the list of compute libaries.
* [toolkit-container] Ensure that common envvars have higher priority when configuring the container engines.
* [toolkit-container] Bump CUDA base image version to 12.2.0.
* [toolkit-container] Remove installation of nvidia-experimental runtime. This is superceded by the NVIDIA Container Runtime in CDI mode.
## v1.14.0-rc.1
* Add support for updating containerd configs to the `nvidia-ctk runtime configure` command.
* Create file in `etc/ld.so.conf.d` with permissions `644` to support non-root containers.
* Generate CDI specification files with `644` permissions to allow rootless applications (e.g. podman)
* Add `nvidia-ctk cdi list` command to show the known CDI devices.
* Add support for generating merged devices (e.g. `all` device) to the nvcdi API.
* Use *.* pattern to locate libcuda.so when generating a CDI specification to support platforms where a patch version is not specified.
* Update go-nvlib to skip devices that are not MIG capable when generating CDI specifications.
* Add `nvidia-container-runtime-hook.path` config option to specify NVIDIA Container Runtime Hook path explicitly.
* Fix bug in creation of `/dev/char` symlinks by failing operation if kernel modules are not loaded.
* Add option to load kernel modules when creating device nodes
* Add option to create device nodes when creating `/dev/char` symlinks
* [libnvidia-container] Support OpenSSL 3 with the Encrypt/Decrypt library
* [toolkit-container] Allow same envars for all runtime configs
## v1.13.1
* Update `update-ldcache` hook to only update ldcache if it exists.
* Update `update-ldcache` hook to create `/etc/ld.so.conf.d` folder if it doesn't exist.
* Fix failure when libcuda cannot be located during XOrg library discovery.
* Fix CDI spec generation on systems that use `/etc/alternatives` (e.g. Debian)
## v1.13.0
* Promote 1.13.0-rc.3 to 1.13.0
## v1.13.0-rc.3
* Only initialize NVML for modes that require it when runing `nvidia-ctk cdi generate`.
* Prefer /run over /var/run when locating nvidia-persistenced and nvidia-fabricmanager sockets.
* Fix the generation of CDI specifications for management containers when the driver libraries are not in the LDCache.
* Add transformers to deduplicate and simplify CDI specifications.
* Generate a simplified CDI specification by default. This means that entities in the common edits in a spec are not included in device definitions.
* Also return an error from the nvcdi.New constructor instead of panicing.
* Detect XOrg libraries for injection and CDI spec generation.
* Add `nvidia-ctk system create-device-nodes` command to create control devices.
* Add `nvidia-ctk cdi transform` command to apply transforms to CDI specifications.
* Add `--vendor` and `--class` options to `nvidia-ctk cdi generate`
* [libnvidia-container] Fix segmentation fault when RPC initialization fails.
* [libnvidia-container] Build centos variants of the NVIDIA Container Library with static libtirpc v1.3.2.
* [libnvidia-container] Remove make targets for fedora35 as the centos8 packages are compatible.
* [toolkit-container] Add `nvidia-container-runtime.modes.cdi.annotation-prefixes` config option that allows the CDI annotation prefixes that are read to be overridden.
* [toolkit-container] Create device nodes when generating CDI specification for management containers.
* [toolkit-container] Add `nvidia-container-runtime.runtimes` config option to set the low-level runtime for the NVIDIA Container Runtime
## v1.13.0-rc.2
* Don't fail chmod hook if paths are not injected
* Only create `by-path` symlinks if CDI devices are actually requested.
* Fix possible blank `nvidia-ctk` path in generated CDI specifications
* Fix error in postun scriplet on RPM-based systems
* Only check `NVIDIA_VISIBLE_DEVICES` for environment variables if no annotations are specified.
* Add `cdi.default-kind` config option for constructing fully-qualified CDI device names in CDI mode
* Add support for `accept-nvidia-visible-devices-envvar-unprivileged` config setting in CDI mode
* Add `nvidia-container-runtime-hook.skip-mode-detection` config option to bypass mode detection. This allows `legacy` and `cdi` mode, for example, to be used at the same time.
* Add support for generating CDI specifications for GDS and MOFED devices
* Ensure CDI specification is validated on save when generating a spec
* Rename `--discovery-mode` argument to `--mode` for `nvidia-ctk cdi generate`
* [libnvidia-container] Fix segfault on WSL2 systems
* [toolkit-container] Add `--cdi-enabled` flag to toolkit config
* [toolkit-container] Install `nvidia-ctk` from toolkit container
* [toolkit-container] Use installed `nvidia-ctk` path in NVIDIA Container Toolkit config
* [toolkit-container] Bump CUDA base images to 12.1.0
* [toolkit-container] Set `nvidia-ctk` path in the
* [toolkit-container] Add `cdi.k8s.io/*` to set of allowed annotations in containerd config
* [toolkit-container] Generate CDI specification for use in management containers
* [toolkit-container] Install experimental runtime as `nvidia-container-runtime.experimental` instead of `nvidia-container-runtime-experimental`
* [toolkit-container] Install and configure mode-specific runtimes for `cdi` and `legacy` modes
## v1.13.0-rc.1
* Include MIG-enabled devices as GPUs when generating CDI specification
* Fix missing NVML symbols when running `nvidia-ctk` on some platforms [#49]
* Add CDI spec generation for WSL2-based systems to `nvidia-ctk cdi generate` command
* Add `auto` mode to `nvidia-ctk cdi generate` command to automatically detect a WSL2-based system over a standard NVML-based system.
* Add mode-specific (`.cdi` and `.legacy`) NVIDIA Container Runtime binaries for use in the GPU Operator
* Discover all `gsb*.bin` GSP firmware files when generating CDI specification.
* Align `.deb` and `.rpm` release candidate package versions
* Remove `fedora35` packaging targets
* [libnvidia-container] Include all `gsp*.bin` firmware files if present
* [libnvidia-container] Align `.deb` and `.rpm` release candidate package versions
* [libnvidia-container] Remove `fedora35` packaging targets
* [toolkit-container] Install `nvidia-container-toolkit-operator-extensions` package for mode-specific executables.
* [toolkit-container] Allow `nvidia-container-runtime.mode` to be set when configuring the NVIDIA Container Toolkit
## v1.12.0
* Promote `v1.12.0-rc.5` to `v1.12.0`
* Rename `nvidia cdi generate` `--root` flag to `--driver-root` to better indicate intent
* [libnvidia-container] Add nvcubins.bin to DriverStore components under WSL2
* [toolkit-container] Bump CUDA base images to 12.0.1
## v1.12.0-rc.5
* Fix bug here the `nvidia-ctk` path was not properly resolved. This causes failures to run containers when the runtime is configured in `csv` mode or if the `NVIDIA_DRIVER_CAPABILITIES` includes `graphics` or `display` (e.g. `all`).
## v1.12.0-rc.4
* Generate a minimum CDI spec version for improved compatibility.
* Add `--device-name-strategy` options to the `nvidia-ctk cdi generate` command that can be used to control how device names are constructed.
* Set default for CDI device name generation to `index` to generate device names such as `nvidia.com/gpu=0` or `nvidia.com/gpu=1:0` by default.
## v1.12.0-rc.3
* Don't fail if by-path symlinks for DRM devices do not exist
* Replace the --json flag with a --format [json|yaml] flag for the nvidia-ctk cdi generate command
* Ensure that the CDI output folder is created if required
* When generating a CDI specification use a blank host path for devices to ensure compatibility with the v0.4.0 CDI specification
* Add injection of Wayland JSON files
* Add GSP firmware paths to generated CDI specification
* Add --root flag to nvidia-ctk cdi generate command
## v1.12.0-rc.2
* Inject Direct Rendering Manager (DRM) devices into a container using the NVIDIA Container Runtime

View File

@@ -19,7 +19,7 @@ where `TARGET` is a make target that is valid for each of the sub-components.
These include:
* `ubuntu18.04-amd64`
* `centos8-x86_64`
* `centos7-x86_64`
If no `TARGET` is specified, all valid release targets are built.
@@ -34,7 +34,7 @@ environment variables.
## Testing packages locally
The [test/release](./test/release/) folder contains documentation on how the installation of local or staged packages can be tested.
The [tests/release](./tests/release/) folder contains documentation on how the installation of local or staged packages can be tested.
## Releasing

142
Jenkinsfile vendored
View File

@@ -1,142 +0,0 @@
/*
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
podTemplate (cloud:'sw-gpu-cloudnative',
containers: [
containerTemplate(name: 'docker', image: 'docker:dind', ttyEnabled: true, privileged: true),
containerTemplate(name: 'golang', image: 'golang:1.16.3', ttyEnabled: true)
]) {
node(POD_LABEL) {
def scmInfo
stage('checkout') {
scmInfo = checkout(scm)
}
stage('dependencies') {
container('golang') {
sh 'GO111MODULE=off go get -u github.com/client9/misspell/cmd/misspell'
sh 'GO111MODULE=off go get -u github.com/gordonklaus/ineffassign'
sh 'GO111MODULE=off go get -u golang.org/x/lint/golint'
}
container('docker') {
sh 'apk add --no-cache make bash git'
}
}
stage('check') {
parallel (
getGolangStages(["assert-fmt", "lint", "vet", "ineffassign", "misspell"])
)
}
stage('test') {
parallel (
getGolangStages(["test"])
)
}
def versionInfo
stage('version') {
container('docker') {
versionInfo = getVersionInfo(scmInfo)
println "versionInfo=${versionInfo}"
}
}
def dist = 'ubuntu20.04'
def arch = 'amd64'
def stageLabel = "${dist}-${arch}"
stage('build-one') {
container('docker') {
stage (stageLabel) {
sh "make ${dist}-${arch}"
}
}
}
stage('release') {
container('docker') {
stage (stageLabel) {
def component = 'main'
def repository = 'sw-gpu-cloudnative-debian-local/pool/main/'
def uploadSpec = """{
"files":
[ {
"pattern": "./dist/${dist}/${arch}/*.deb",
"target": "${repository}",
"props": "deb.distribution=${dist};deb.component=${component};deb.architecture=${arch}"
}
]
}"""
sh "echo starting release with versionInfo=${versionInfo}"
if (versionInfo.isTag) {
// upload to artifactory repository
def server = Artifactory.server 'sw-gpu-artifactory'
server.upload spec: uploadSpec
} else {
sh "echo skipping release for non-tagged build"
}
}
}
}
}
}
def getGolangStages(def targets) {
stages = [:]
for (t in targets) {
stages[t] = getLintClosure(t)
}
return stages
}
def getLintClosure(def target) {
return {
container('golang') {
stage(target) {
sh "make ${target}"
}
}
}
}
// getVersionInfo returns a hash of version info
def getVersionInfo(def scmInfo) {
def versionInfo = [
isTag: isTag(scmInfo.GIT_BRANCH)
]
scmInfo.each { k, v -> versionInfo[k] = v }
return versionInfo
}
def isTag(def branch) {
if (!branch.startsWith('v')) {
return false
}
def version = shOutput('git describe --all --exact-match --always')
return version == "tags/${branch}"
}
def shOuptut(def script) {
return sh(script: script, returnStdout: true).trim()
}

112
Makefile
View File

@@ -38,8 +38,8 @@ EXAMPLE_TARGETS := $(patsubst %,example-%, $(EXAMPLES))
CMDS := $(patsubst ./cmd/%/,%,$(sort $(dir $(wildcard ./cmd/*/))))
CMD_TARGETS := $(patsubst %,cmd-%, $(CMDS))
CHECK_TARGETS := assert-fmt vet lint ineffassign misspell
MAKE_TARGETS := binaries build check fmt lint-internal test examples cmds coverage generate licenses $(CHECK_TARGETS)
CHECK_TARGETS := lint
MAKE_TARGETS := binaries build check fmt test examples cmds coverage generate licenses vendor check-vendor $(CHECK_TARGETS)
TARGETS := $(MAKE_TARGETS) $(EXAMPLE_TARGETS) $(CMD_TARGETS)
@@ -53,22 +53,26 @@ CLI_VERSION = $(VERSION)
endif
CLI_VERSION_PACKAGE = github.com/NVIDIA/nvidia-container-toolkit/internal/info
GOOS ?= linux
binaries: cmds
ifneq ($(PREFIX),)
cmd-%: COMMAND_BUILD_OPTIONS = -o $(PREFIX)/$(*)
endif
cmds: $(CMD_TARGETS)
ifneq ($(shell uname),Darwin)
EXTLDFLAGS = -Wl,--export-dynamic -Wl,--unresolved-symbols=ignore-in-object-files -Wl,-z,lazy
else
EXTLDFLAGS = -Wl,-undefined,dynamic_lookup
endif
$(CMD_TARGETS): cmd-%:
GOOS=$(GOOS) go build -ldflags "-s -w -X $(CLI_VERSION_PACKAGE).gitCommit=$(GIT_COMMIT) -X $(CLI_VERSION_PACKAGE).version=$(CLI_VERSION)" $(COMMAND_BUILD_OPTIONS) $(MODULE)/cmd/$(*)
go build -ldflags "-s -w '-extldflags=$(EXTLDFLAGS)' -X $(CLI_VERSION_PACKAGE).gitCommit=$(GIT_COMMIT) -X $(CLI_VERSION_PACKAGE).version=$(CLI_VERSION)" $(COMMAND_BUILD_OPTIONS) $(MODULE)/cmd/$(*)
build:
GOOS=$(GOOS) go build ./...
go build ./...
examples: $(EXAMPLE_TARGETS)
$(EXAMPLE_TARGETS): example-%:
GOOS=$(GOOS) go build ./examples/$(*)
go build ./examples/$(*)
all: check test build binary
check: $(CHECK_TARGETS)
@@ -78,37 +82,47 @@ fmt:
go list -f '{{.Dir}}' $(MODULE)/... \
| xargs gofmt -s -l -w
assert-fmt:
go list -f '{{.Dir}}' $(MODULE)/... \
| xargs gofmt -s -l > fmt.out
@if [ -s fmt.out ]; then \
echo "\nERROR: The following files are not formatted:\n"; \
cat fmt.out; \
rm fmt.out; \
exit 1; \
else \
rm fmt.out; \
fi
ineffassign:
ineffassign $(MODULE)/...
# Apply goimports -local github.com/NVIDIA/container-toolkit to the codebase
goimports:
go list -f {{.Dir}} $(MODULE)/... \
| xargs goimports -local $(MODULE) -w
lint:
# We use `go list -f '{{.Dir}}' $(MODULE)/...` to skip the `vendor` folder.
go list -f '{{.Dir}}' $(MODULE)/... | xargs golint -set_exit_status
golangci-lint run ./...
misspell:
misspell $(MODULE)/...
vendor: | mod-tidy mod-vendor mod-verify
vet:
go vet $(MODULE)/...
mod-tidy:
@for mod in $$(find . -name go.mod -not -path "./testdata/*" -not -path "./third_party/*"); do \
echo "Tidying $$mod..."; ( \
cd $$(dirname $$mod) && go mod tidy \
) || exit 1; \
done
mod-vendor:
@for mod in $$(find . -name go.mod -not -path "./testdata/*" -not -path "./third_party/*" -not -path "./deployments/*"); do \
echo "Vendoring $$mod..."; ( \
cd $$(dirname $$mod) && go mod vendor \
) || exit 1; \
done
mod-verify:
@for mod in $$(find . -name go.mod -not -path "./testdata/*" -not -path "./third_party/*"); do \
echo "Verifying $$mod..."; ( \
cd $$(dirname $$mod) && go mod verify | sed 's/^/ /g' \
) || exit 1; \
done
check-vendor: vendor
git diff --quiet HEAD -- go.mod go.sum vendor
licenses:
go-licenses csv $(MODULE)/...
COVERAGE_FILE := coverage.out
test: build cmds
go test -v -coverprofile=$(COVERAGE_FILE) $(MODULE)/...
go test -coverprofile=$(COVERAGE_FILE) $(MODULE)/...
coverage: test
cat $(COVERAGE_FILE) | grep -v "_mock.go" > $(COVERAGE_FILE).no-mocks
@@ -119,30 +133,24 @@ generate:
# Generate an image for containerized builds
# Note: This image is local only
.PHONY: .build-image .pull-build-image .push-build-image
.build-image: docker/Dockerfile.devel
if [ x"$(SKIP_IMAGE_BUILD)" = x"" ]; then \
$(DOCKER) build \
--progress=plain \
--build-arg GOLANG_VERSION="$(GOLANG_VERSION)" \
--tag $(BUILDIMAGE) \
-f $(^) \
docker; \
fi
.PHONY: .build-image
.build-image:
make -f deployments/devel/Makefile .build-image
.pull-build-image:
$(DOCKER) pull $(BUILDIMAGE)
ifeq ($(BUILD_DEVEL_IMAGE),yes)
$(DOCKER_TARGETS): .build-image
.shell: .build-image
endif
.push-build-image:
$(DOCKER) push $(BUILDIMAGE)
$(DOCKER_TARGETS): docker-%: .build-image
@echo "Running 'make $(*)' in docker container $(BUILDIMAGE)"
$(DOCKER_TARGETS): docker-%:
@echo "Running 'make $(*)' in container image $(BUILDIMAGE)"
$(DOCKER) run \
--rm \
-e GOCACHE=/tmp/.cache \
-v $(PWD):$(PWD) \
-w $(PWD) \
-e GOCACHE=/tmp/.cache/go \
-e GOMODCACHE=/tmp/.cache/gomod \
-e GOLANGCI_LINT_CACHE=/tmp/.cache/golangci-lint \
-v $(PWD):/work \
-w /work \
--user $$(id -u):$$(id -g) \
$(BUILDIMAGE) \
make $(*)
@@ -153,8 +161,10 @@ PHONY: .shell
$(DOCKER) run \
--rm \
-ti \
-e GOCACHE=/tmp/.cache \
-v $(PWD):$(PWD) \
-w $(PWD) \
-e GOCACHE=/tmp/.cache/go \
-e GOMODCACHE=/tmp/.cache/gomod \
-e GOLANGCI_LINT_CACHE=/tmp/.cache/golangci-lint \
-v $(PWD):/work \
-w /work \
--user $$(id -u):$$(id -g) \
$(BUILDIMAGE)

36
RELEASE.md Normal file
View File

@@ -0,0 +1,36 @@
# Release Process
The NVIDIA Container Toolkit consists of the following artifacts:
- The NVIDIA Container Toolkit container
- Packages for debian-based systems
- Packages for rpm-based systems
# Release Process Checklist:
- [ ] Create a release PR:
- [ ] Run the `./hack/prepare-release.sh` script to update the version in all the needed files. This also creates a [release issue](https://github.com/NVIDIA/cloud-native-team/issues?q=is%3Aissue+is%3Aopen+label%3Arelease)
- [ ] Run the `./hack/generate-changelog.sh` script to generate the a draft changelog and update `CHANGELOG.md` with the changes.
- [ ] Create a PR from the created `bump-release-{{ .VERSION }}` branch.
- [ ] Merge the release PR
- [ ] Tag the release and push the tag to the `internal` mirror:
- [ ] Image release pipeline: https://gitlab-master.nvidia.com/dl/container-dev/container-toolkit/-/pipelines/16466098
- [ ] Wait for the image release to complete.
- [ ] Push the tag to the the upstream GitHub repo.
- [ ] Wait for the [`Release`](https://github.com/NVIDIA/k8s-device-plugin/actions/workflows/release.yaml) GitHub Action to complete
- [ ] Publish the [draft release](https://github.com/NVIDIA/k8s-device-plugin/releases) created by the GitHub Action
- [ ] Publish the packages to the gh-pages branch of the libnvidia-container repo
- [ ] Create a KitPick
## Troubleshooting
*Note*: This assumes that we have the release tag checked out locally.
- If the `Release` GitHub Action fails:
- Check the logs for the error first.
- Create the helm packages locally by running:
```bash
./hack/prepare-artifacts.sh {{ .VERSION }}
```
- Create the draft release by running:
```bash
./hack/create-release.sh {{ .VERSION }}
```

View File

@@ -1,97 +0,0 @@
# Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
ARG BASE_DIST
ARG CUDA_VERSION
ARG GOLANG_VERSION=x.x.x
ARG VERSION="N/A"
# NOTE: In cases where the libc version is a concern, we would have to use an
# image based on the target OS to build the golang executables here -- especially
# if cgo code is included.
FROM golang:${GOLANG_VERSION} as build
# We override the GOPATH to ensure that the binaries are installed to
# /artifacts/bin
ARG GOPATH=/artifacts
# Install the experiemental nvidia-container-runtime
# NOTE: This will be integrated into the nvidia-container-toolkit package / repo
ARG NVIDIA_CONTAINER_RUNTIME_EXPERIMENTAL_VERSION=experimental
RUN GOPATH=/artifacts go install github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime.experimental@${NVIDIA_CONTAINER_RUNTIME_EXPERIMENTAL_VERSION}
WORKDIR /build
COPY . .
# NOTE: Until the config utilities are properly integrated into the
# nvidia-container-toolkit repository, these are built from the `tools` folder
# and not `cmd`.
RUN GOPATH=/artifacts go install -ldflags="-s -w -X 'main.Version=${VERSION}'" ./tools/...
FROM nvidia/cuda:${CUDA_VERSION}-base-${BASE_DIST}
ARG BASE_DIST
# See https://www.centos.org/centos-linux-eol/
# and https://stackoverflow.com/a/70930049 for move to vault.centos.org
# and https://serverfault.com/questions/1093922/failing-to-run-yum-update-in-centos-8 for move to vault.epel.cloud
RUN [[ "${BASE_DIST}" != "centos8" ]] || \
( \
sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-Linux-* && \
sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.epel.cloud|g' /etc/yum.repos.d/CentOS-Linux-* \
)
ENV NVIDIA_DISABLE_REQUIRE="true"
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=utility
ARG ARTIFACTS_ROOT
ARG PACKAGE_DIST
COPY ${ARTIFACTS_ROOT}/${PACKAGE_DIST} /artifacts/packages/${PACKAGE_DIST}
WORKDIR /artifacts/packages
ARG PACKAGE_VERSION
ARG TARGETARCH
ENV PACKAGE_ARCH ${TARGETARCH}
RUN PACKAGE_ARCH=${PACKAGE_ARCH/amd64/x86_64} && PACKAGE_ARCH=${PACKAGE_ARCH/arm64/aarch64} && \
yum localinstall -y \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container1-1.*.rpm \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container-tools-1.*.rpm \
${PACKAGE_DIST}/${PACKAGE_ARCH}/nvidia-container-toolkit*-${PACKAGE_VERSION}*.rpm
WORKDIR /work
COPY --from=build /artifacts/bin /work
ENV PATH=/work:$PATH
LABEL io.k8s.display-name="NVIDIA Container Runtime Config"
LABEL name="NVIDIA Container Runtime Config"
LABEL vendor="NVIDIA"
LABEL version="${VERSION}"
LABEL release="N/A"
LABEL summary="Automatically Configure your Container Runtime for GPU support."
LABEL description="See summary"
RUN mkdir /licenses && mv /NGC-DL-CONTAINER-LICENSE /licenses/NGC-DL-CONTAINER-LICENSE
# Install / upgrade packages here that are required to resolve CVEs
ARG CVE_UPDATES
RUN if [ -n "${CVE_UPDATES}" ]; then \
yum update -y ${CVE_UPDATES} && \
rm -rf /var/cache/yum/*; \
fi
ENTRYPOINT ["/work/nvidia-toolkit"]

View File

@@ -1,105 +0,0 @@
# Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
ARG BASE_DIST
ARG CUDA_VERSION
ARG GOLANG_VERSION=x.x.x
ARG VERSION="N/A"
# NOTE: In cases where the libc version is a concern, we would have to use an
# image based on the target OS to build the golang executables here -- especially
# if cgo code is included.
FROM golang:${GOLANG_VERSION} as build
# We override the GOPATH to ensure that the binaries are installed to
# /artifacts/bin
ARG GOPATH=/artifacts
# Install the experiemental nvidia-container-runtime
# NOTE: This will be integrated into the nvidia-container-toolkit package / repo
ARG NVIDIA_CONTAINER_RUNTIME_EXPERIMENTAL_VERSION=experimental
RUN GOPATH=/artifacts go install github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime.experimental@${NVIDIA_CONTAINER_RUNTIME_EXPERIMENTAL_VERSION}
WORKDIR /build
COPY . .
# NOTE: Until the config utilities are properly integrated into the
# nvidia-container-toolkit repository, these are built from the `tools` folder
# and not `cmd`.
RUN GOPATH=/artifacts go install -ldflags="-s -w -X 'main.Version=${VERSION}'" ./tools/...
FROM nvcr.io/nvidia/cuda:${CUDA_VERSION}-base-${BASE_DIST}
# Remove the CUDA repository configurations to avoid issues with rotated GPG keys
RUN rm -f /etc/apt/sources.list.d/cuda.list
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
libcap2 \
curl \
&& \
rm -rf /var/lib/apt/lists/*
ENV NVIDIA_DISABLE_REQUIRE="true"
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=utility
ARG ARTIFACTS_ROOT
ARG PACKAGE_DIST
COPY ${ARTIFACTS_ROOT}/${PACKAGE_DIST} /artifacts/packages/${PACKAGE_DIST}
WORKDIR /artifacts/packages
ARG PACKAGE_VERSION
ARG TARGETARCH
ENV PACKAGE_ARCH ${TARGETARCH}
ARG LIBNVIDIA_CONTAINER_REPO="https://nvidia.github.io/libnvidia-container"
ARG LIBNVIDIA_CONTAINER0_VERSION
RUN if [ "${PACKAGE_ARCH}" = "arm64" ]; then \
curl -L ${LIBNVIDIA_CONTAINER_REPO}/${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container0_${LIBNVIDIA_CONTAINER0_VERSION}_${PACKAGE_ARCH}.deb \
--output ${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container0_${LIBNVIDIA_CONTAINER0_VERSION}_${PACKAGE_ARCH}.deb && \
dpkg -i ${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container0_${LIBNVIDIA_CONTAINER0_VERSION}_${PACKAGE_ARCH}.deb; \
fi
RUN dpkg -i \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container1_1.*.deb \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container-tools_1.*.deb \
${PACKAGE_DIST}/${PACKAGE_ARCH}/nvidia-container-toolkit*_${PACKAGE_VERSION}*.deb
WORKDIR /work
COPY --from=build /artifacts/bin /work/
ENV PATH=/work:$PATH
LABEL io.k8s.display-name="NVIDIA Container Runtime Config"
LABEL name="NVIDIA Container Runtime Config"
LABEL vendor="NVIDIA"
LABEL version="${VERSION}"
LABEL release="N/A"
LABEL summary="Automatically Configure your Container Runtime for GPU support."
LABEL description="See summary"
RUN mkdir /licenses && mv /NGC-DL-CONTAINER-LICENSE /licenses/NGC-DL-CONTAINER-LICENSE
# Install / upgrade packages here that are required to resolve CVEs
ARG CVE_UPDATES
RUN if [ -n "${CVE_UPDATES}" ]; then \
apt-get update && apt-get upgrade -y ${CVE_UPDATES} && \
rm -rf /var/lib/apt/lists/*; \
fi
ENTRYPOINT ["/work/nvidia-toolkit"]

View File

@@ -0,0 +1,31 @@
# NVIDIA CDI Hook
The CLI `nvidia-cdi-hook` provides container device runtime hook capabilities when
called by a container runtime, as specific in a
[Container Device Interface](https://tags.cncf.io/container-device-interface/blob/main/SPEC.md)
file.
## Generating a CDI
The CDI itself is created for an NVIDIA-capable device using the
[`nvidia-ctk cdi generate`](../nvidia-ctk/) command.
When `nvidia-ctk cdi generate` is run, the CDI specification is generated as a yaml file.
The CDI specification provides instructions for a container runtime to set up devices, files and
other resources for the container prior to starting it. Those instructions
may include executing command-line tools to prepare the filesystem. The execution
of such command-line tools is called a hook.
`nvidia-cdi-hook` is the CLI tool that is expected to be called by the container runtime,
when specified by the CDI file.
See the [`nvidia-ctk` documentation](../nvidia-ctk/README.md) for more information
on generating a CDI file.
## Functionality
The `nvidia-cdi-hook` CLI provides the following functionality:
* `chmod` - Change the permissions of a file or directory inside the directory path to be mounted into a container.
* `create-symlinks` - Create symlinks inside the directory path to be mounted into a container.
* `update-ldcache` - Update the dynamic linker cache inside the directory path to be mounted into a container.

View File

@@ -17,29 +17,33 @@
package chmod
import (
"errors"
"fmt"
"io/fs"
"os"
"path/filepath"
"strconv"
"strings"
"syscall"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
)
type command struct {
logger *logrus.Logger
logger logger.Interface
}
type config struct {
paths cli.StringSlice
mode string
modeStr string
mode fs.FileMode
containerSpec string
}
// NewCommand constructs a chmod command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
@@ -65,13 +69,13 @@ func (m command) build() *cli.Command {
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "path",
Usage: "Specifiy a path to apply the specified mode to",
Usage: "Specify a path to apply the specified mode to",
Destination: &cfg.paths,
},
&cli.StringFlag{
Name: "mode",
Usage: "Specify the file mode",
Destination: &cfg.mode,
Destination: &cfg.modeStr,
},
&cli.StringFlag{
Name: "container-spec",
@@ -84,10 +88,16 @@ func (m command) build() *cli.Command {
}
func validateFlags(c *cli.Context, cfg *config) error {
if strings.TrimSpace(cfg.mode) == "" {
if strings.TrimSpace(cfg.modeStr) == "" {
return fmt.Errorf("a non-empty mode must be specified")
}
modeInt, err := strconv.ParseUint(cfg.modeStr, 8, 32)
if err != nil {
return fmt.Errorf("failed to parse mode as octal: %v", err)
}
cfg.mode = fs.FileMode(modeInt)
for _, p := range cfg.paths.Value() {
if strings.TrimSpace(p) == "" {
return fmt.Errorf("paths must not be empty")
@@ -111,29 +121,39 @@ func (m command) run(c *cli.Context, cfg *config) error {
return fmt.Errorf("empty container root detected")
}
paths := m.getPaths(containerRoot, cfg.paths.Value())
paths := m.getPaths(containerRoot, cfg.paths.Value(), cfg.mode)
if len(paths) == 0 {
m.logger.Debugf("No paths specified; exiting")
return nil
}
locator := lookup.NewExecutableLocator(m.logger, "")
targets, err := locator.Locate("chmod")
if err != nil {
return fmt.Errorf("failed to locate chmod: %v", err)
for _, path := range paths {
err = os.Chmod(path, cfg.mode)
// in some cases this is not an issue (e.g. whole /dev mounted), see #143
if errors.Is(err, fs.ErrPermission) {
m.logger.Debugf("Ignoring permission error with chmod: %v", err)
err = nil
}
}
chmodPath := targets[0]
args := append([]string{filepath.Base(chmodPath), cfg.mode}, paths...)
return syscall.Exec(chmodPath, args, nil)
return err
}
// getPaths updates the specified paths relative to the root.
func (m command) getPaths(root string, paths []string) []string {
func (m command) getPaths(root string, paths []string, desiredMode fs.FileMode) []string {
var pathsInRoot []string
for _, f := range paths {
pathsInRoot = append(pathsInRoot, filepath.Join(root, f))
path := filepath.Join(root, f)
stat, err := os.Stat(path)
if err != nil {
m.logger.Debugf("Skipping path %q: %v", path, err)
continue
}
if (stat.Mode()&(fs.ModePerm|fs.ModeSetuid|fs.ModeSetgid|fs.ModeSticky))^desiredMode == 0 {
m.logger.Debugf("Skipping path %q: already desired mode", path)
continue
}
pathsInRoot = append(pathsInRoot, path)
}
return pathsInRoot

View File

@@ -0,0 +1,36 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package commands
import (
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/chmod"
symlinks "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/create-symlinks"
ldcache "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/update-ldcache"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
// New creates the commands associated with supported CDI hooks.
// These are shared by the nvidia-cdi-hook and nvidia-ctk hook commands.
func New(logger logger.Interface) []*cli.Command {
return []*cli.Command{
ldcache.NewCommand(logger),
symlinks.NewCommand(logger),
chmod.NewCommand(logger),
}
}

View File

@@ -0,0 +1,172 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package symlinks
import (
"errors"
"fmt"
"os"
"path/filepath"
"strings"
"github.com/moby/sys/symlink"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup/symlinks"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
)
type command struct {
logger logger.Interface
}
type config struct {
links cli.StringSlice
containerSpec string
}
// NewCommand constructs a hook command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build creates the create-symlink command.
func (m command) build() *cli.Command {
cfg := config{}
c := cli.Command{
Name: "create-symlinks",
Usage: "A hook to create symlinks in the container.",
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "link",
Usage: "Specify a specific link to create. The link is specified as target::link. If the link exists in the container root, it is removed.",
Destination: &cfg.links,
},
// The following flags are testing-only flags.
&cli.StringFlag{
Name: "container-spec",
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN. This is only intended for testing.",
Destination: &cfg.containerSpec,
Hidden: true,
},
}
return &c
}
func (m command) run(c *cli.Context, cfg *config) error {
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
return fmt.Errorf("failed to load container state: %v", err)
}
containerRoot, err := s.GetContainerRoot()
if err != nil {
return fmt.Errorf("failed to determined container root: %v", err)
}
created := make(map[string]bool)
for _, l := range cfg.links.Value() {
if created[l] {
m.logger.Debugf("Link %v already processed", l)
continue
}
parts := strings.Split(l, "::")
if len(parts) != 2 {
return fmt.Errorf("invalid symlink specification %v", l)
}
err := m.createLink(containerRoot, parts[0], parts[1])
if err != nil {
return fmt.Errorf("failed to create link %v: %w", parts, err)
}
created[l] = true
}
return nil
}
// createLink creates a symbolic link in the specified container root.
// This is equivalent to:
//
// chroot {{ .containerRoot }} ln -f -s {{ .target }} {{ .link }}
//
// If the specified link already exists and points to the same target, this
// operation is a no-op.
// If a file exists at the link path or the link points to a different target
// this file is removed before creating the link.
//
// Note that if the link path resolves to an absolute path oudside of the
// specified root, this is treated as an absolute path in this root.
func (m command) createLink(containerRoot string, targetPath string, link string) error {
linkPath := filepath.Join(containerRoot, link)
exists, err := linkExists(targetPath, linkPath)
if err != nil {
return fmt.Errorf("failed to check if link exists: %w", err)
}
if exists {
m.logger.Debugf("Link %s already exists", linkPath)
return nil
}
// We resolve the parent of the symlink that we're creating in the container root.
// If we resolve the full link path, an existing link at the location itself
// is also resolved here and we are unable to force create the link.
resolvedLinkParent, err := symlink.FollowSymlinkInScope(filepath.Dir(linkPath), containerRoot)
if err != nil {
return fmt.Errorf("failed to follow path for link %v relative to %v: %w", link, containerRoot, err)
}
resolvedLinkPath := filepath.Join(resolvedLinkParent, filepath.Base(linkPath))
m.logger.Infof("Symlinking %v to %v", resolvedLinkPath, targetPath)
err = os.MkdirAll(filepath.Dir(resolvedLinkPath), 0755)
if err != nil {
return fmt.Errorf("failed to create directory: %v", err)
}
err = symlinks.ForceCreate(targetPath, resolvedLinkPath)
if err != nil {
return fmt.Errorf("failed to create symlink: %v", err)
}
return nil
}
// linkExists checks whether the specified link exists.
// A link exists if the path exists, is a symlink, and points to the specified target.
func linkExists(target string, link string) (bool, error) {
currentTarget, err := symlinks.Resolve(link)
if errors.Is(err, os.ErrNotExist) {
return false, nil
}
if err != nil {
return false, fmt.Errorf("failed to resolve existing symlink %s: %w", link, err)
}
if currentTarget == target {
return true, nil
}
return false, nil
}

View File

@@ -0,0 +1,297 @@
package symlinks
import (
"os"
"path/filepath"
"strings"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup/symlinks"
)
func TestLinkExist(t *testing.T) {
tmpDir := t.TempDir()
require.NoError(
t,
makeFs(tmpDir,
dirOrLink{path: "/a/b/c", target: "d"},
dirOrLink{path: "/a/b/e", target: "/a/b/f"},
),
)
exists, err := linkExists("d", filepath.Join(tmpDir, "/a/b/c"))
require.NoError(t, err)
require.True(t, exists)
exists, err = linkExists("/a/b/f", filepath.Join(tmpDir, "/a/b/e"))
require.NoError(t, err)
require.True(t, exists)
exists, err = linkExists("different-target", filepath.Join(tmpDir, "/a/b/c"))
require.NoError(t, err)
require.False(t, exists)
exists, err = linkExists("/a/b/d", filepath.Join(tmpDir, "/a/b/c"))
require.NoError(t, err)
require.False(t, exists)
exists, err = linkExists("foo", filepath.Join(tmpDir, "/a/b/does-not-exist"))
require.NoError(t, err)
require.False(t, exists)
}
func TestCreateLink(t *testing.T) {
type link struct {
path string
target string
}
type expectedLink struct {
link
err error
}
testCases := []struct {
description string
containerContents []dirOrLink
link link
expectedCreateError error
expectedLinks []expectedLink
}{
{
description: "link to / resolves to container root",
containerContents: []dirOrLink{
{path: "/lib/foo", target: "/"},
},
link: link{
path: "/lib/foo/libfoo.so",
target: "libfoo.so.1",
},
expectedLinks: []expectedLink{
{
link: link{
path: "{{ .containerRoot }}/libfoo.so",
target: "libfoo.so.1",
},
},
},
},
{
description: "link to / resolves to container root; parent relative link",
containerContents: []dirOrLink{
{path: "/lib/foo", target: "/"},
},
link: link{
path: "/lib/foo/libfoo.so",
target: "../libfoo.so.1",
},
expectedLinks: []expectedLink{
{
link: link{
path: "{{ .containerRoot }}/libfoo.so",
target: "../libfoo.so.1",
},
},
},
},
{
description: "link to / resolves to container root; absolute link",
containerContents: []dirOrLink{
{path: "/lib/foo", target: "/"},
},
link: link{
path: "/lib/foo/libfoo.so",
target: "/a-path-in-container/foo/libfoo.so.1",
},
expectedLinks: []expectedLink{
{
link: link{
path: "{{ .containerRoot }}/libfoo.so",
target: "/a-path-in-container/foo/libfoo.so.1",
},
},
{
// We also check that the target is NOT created.
link: link{
path: "{{ .containerRoot }}/a-path-in-container/foo/libfoo.so.1",
},
err: os.ErrNotExist,
},
},
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
tmpDir := t.TempDir()
hostRoot := filepath.Join(tmpDir, "/host-root/")
containerRoot := filepath.Join(tmpDir, "/container-root")
require.NoError(t, makeFs(hostRoot))
require.NoError(t, makeFs(containerRoot, tc.containerContents...))
// nvidia-cdi-hook create-symlinks --link linkSpec
err := getTestCommand().createLink(containerRoot, tc.link.target, tc.link.path)
// TODO: We may be able to replace this with require.ErrorIs.
if tc.expectedCreateError != nil {
require.Error(t, err)
} else {
require.NoError(t, err)
}
for _, expectedLink := range tc.expectedLinks {
path := strings.ReplaceAll(expectedLink.path, "{{ .containerRoot }}", containerRoot)
path = strings.ReplaceAll(path, "{{ .hostRoot }}", hostRoot)
if expectedLink.target != "" {
target, err := symlinks.Resolve(path)
require.ErrorIs(t, err, expectedLink.err)
require.Equal(t, expectedLink.target, target)
} else {
_, err := os.Stat(path)
require.ErrorIs(t, err, expectedLink.err)
}
}
})
}
}
func TestCreateLinkRelativePath(t *testing.T) {
tmpDir := t.TempDir()
hostRoot := filepath.Join(tmpDir, "/host-root/")
containerRoot := filepath.Join(tmpDir, "/container-root")
require.NoError(t, makeFs(hostRoot))
require.NoError(t, makeFs(containerRoot, dirOrLink{path: "/lib/"}))
// nvidia-cdi-hook create-symlinks --link libfoo.so.1::/lib/libfoo.so
err := getTestCommand().createLink(containerRoot, "libfoo.so.1", "/lib/libfoo.so")
require.NoError(t, err)
target, err := symlinks.Resolve(filepath.Join(containerRoot, "/lib/libfoo.so"))
require.NoError(t, err)
require.Equal(t, "libfoo.so.1", target)
}
func TestCreateLinkAbsolutePath(t *testing.T) {
tmpDir := t.TempDir()
hostRoot := filepath.Join(tmpDir, "/host-root/")
containerRoot := filepath.Join(tmpDir, "/container-root")
require.NoError(t, makeFs(hostRoot))
require.NoError(t, makeFs(containerRoot, dirOrLink{path: "/lib/"}))
// nvidia-cdi-hook create-symlinks --link /lib/libfoo.so.1::/lib/libfoo.so
err := getTestCommand().createLink(containerRoot, "/lib/libfoo.so.1", "/lib/libfoo.so")
require.NoError(t, err)
target, err := symlinks.Resolve(filepath.Join(containerRoot, "/lib/libfoo.so"))
require.NoError(t, err)
require.Equal(t, "/lib/libfoo.so.1", target)
}
func TestCreateLinkAlreadyExists(t *testing.T) {
testCases := []struct {
description string
containerContents []dirOrLink
shouldExist []string
}{
{
description: "link already exists with correct target",
containerContents: []dirOrLink{{path: "/lib/libfoo.so", target: "libfoo.so.1"}},
shouldExist: []string{},
},
{
description: "link already exists with different target",
containerContents: []dirOrLink{{path: "/lib/libfoo.so", target: "different-target"}, {path: "different-target"}},
shouldExist: []string{"{{ .containerRoot }}/different-target"},
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
tmpDir := t.TempDir()
hostRoot := filepath.Join(tmpDir, "/host-root/")
containerRoot := filepath.Join(tmpDir, "/container-root")
require.NoError(t, makeFs(hostRoot))
require.NoError(t, makeFs(containerRoot, tc.containerContents...))
// nvidia-cdi-hook create-symlinks --link libfoo.so.1::/lib/libfoo.so
err := getTestCommand().createLink(containerRoot, "libfoo.so.1", "/lib/libfoo.so")
require.NoError(t, err)
target, err := symlinks.Resolve(filepath.Join(containerRoot, "lib/libfoo.so"))
require.NoError(t, err)
require.Equal(t, "libfoo.so.1", target)
for _, p := range tc.shouldExist {
require.DirExists(t, strings.ReplaceAll(p, "{{ .containerRoot }}", containerRoot))
}
})
}
}
func TestCreateLinkOutOfBounds(t *testing.T) {
tmpDir := t.TempDir()
hostRoot := filepath.Join(tmpDir, "/host-root")
containerRoot := filepath.Join(tmpDir, "/container-root")
require.NoError(t,
makeFs(hostRoot,
dirOrLink{path: "libfoo.so"},
),
)
require.NoError(t,
makeFs(containerRoot,
dirOrLink{path: "/lib"},
dirOrLink{path: "/lib/foo", target: hostRoot},
),
)
path, err := symlinks.Resolve(filepath.Join(containerRoot, "/lib/foo"))
require.NoError(t, err)
require.Equal(t, hostRoot, path)
// nvidia-cdi-hook create-symlinks --link ../libfoo.so.1::/lib/foo/libfoo.so
_ = getTestCommand().createLink(containerRoot, "../libfoo.so.1", "/lib/foo/libfoo.so")
require.NoError(t, err)
target, err := symlinks.Resolve(filepath.Join(containerRoot, hostRoot, "libfoo.so"))
require.NoError(t, err)
require.Equal(t, "../libfoo.so.1", target)
require.DirExists(t, filepath.Join(hostRoot, "libfoo.so"))
}
type dirOrLink struct {
path string
target string
}
func makeFs(tmpdir string, fs ...dirOrLink) error {
if err := os.MkdirAll(tmpdir, 0o755); err != nil {
return err
}
for _, s := range fs {
s.path = filepath.Join(tmpdir, s.path)
if s.target == "" {
_ = os.MkdirAll(s.path, 0o755)
continue
}
if err := os.MkdirAll(filepath.Dir(s.path), 0o755); err != nil {
return err
}
if err := os.Symlink(s.target, s.path); err != nil && !os.IsExist(err) {
return err
}
}
return nil
}
// getTestCommand creates a command for running tests against.
func getTestCommand() *command {
logger, _ := testlog.NewNullLogger()
return &command{
logger: logger,
}
}

View File

@@ -0,0 +1,93 @@
/**
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package main
import (
"os"
"github.com/sirupsen/logrus"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
cli "github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/commands"
)
// options defines the options that can be set for the CLI through config files,
// environment variables, or command line flags
type options struct {
// Debug indicates whether the CLI is started in "debug" mode
Debug bool
// Quiet indicates whether the CLI is started in "quiet" mode
Quiet bool
}
func main() {
logger := logrus.New()
// Create a options struct to hold the parsed environment variables or command line flags
opts := options{}
// Create the top-level CLI
c := cli.NewApp()
c.Name = "NVIDIA CDI Hook"
c.UseShortOptionHandling = true
c.EnableBashCompletion = true
c.Usage = "Command to structure files for usage inside a container, called as hooks from a container runtime, defined in a CDI yaml file"
c.Version = info.GetVersionString()
// Setup the flags for this command
c.Flags = []cli.Flag{
&cli.BoolFlag{
Name: "debug",
Aliases: []string{"d"},
Usage: "Enable debug-level logging",
Destination: &opts.Debug,
EnvVars: []string{"NVIDIA_CDI_DEBUG"},
},
&cli.BoolFlag{
Name: "quiet",
Usage: "Suppress all output except for errors; overrides --debug",
Destination: &opts.Quiet,
EnvVars: []string{"NVIDIA_CDI_QUIET"},
},
}
// Set log-level for all subcommands
c.Before = func(c *cli.Context) error {
logLevel := logrus.InfoLevel
if opts.Debug {
logLevel = logrus.DebugLevel
}
if opts.Quiet {
logLevel = logrus.ErrorLevel
}
logger.SetLevel(logLevel)
return nil
}
// Define the subcommands
c.Commands = commands.New(logger)
// Run the CLI
err := c.Run(os.Args)
if err != nil {
logger.Errorf("%v", err)
os.Exit(1)
}
}

View File

@@ -0,0 +1,197 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package ldcache
import (
"errors"
"fmt"
"os"
"path/filepath"
"strings"
"syscall"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
)
type command struct {
logger logger.Interface
}
type options struct {
folders cli.StringSlice
ldconfigPath string
containerSpec string
}
// NewCommand constructs an update-ldcache command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build the update-ldcache command
func (m command) build() *cli.Command {
cfg := options{}
// Create the 'update-ldcache' command
c := cli.Command{
Name: "update-ldcache",
Usage: "Update ldcache in a container by running ldconfig",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &cfg)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "folder",
Usage: "Specify a folder to add to /etc/ld.so.conf before updating the ld cache",
Destination: &cfg.folders,
},
&cli.StringFlag{
Name: "ldconfig-path",
Usage: "Specify the path to the ldconfig program",
Destination: &cfg.ldconfigPath,
Value: "/sbin/ldconfig",
},
&cli.StringFlag{
Name: "container-spec",
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN",
Destination: &cfg.containerSpec,
},
}
return &c
}
func (m command) validateFlags(c *cli.Context, cfg *options) error {
if cfg.ldconfigPath == "" {
return errors.New("ldconfig-path must be specified")
}
return nil
}
func (m command) run(c *cli.Context, cfg *options) error {
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
return fmt.Errorf("failed to load container state: %v", err)
}
containerRoot, err := s.GetContainerRoot()
if err != nil {
return fmt.Errorf("failed to determined container root: %v", err)
}
ldconfigPath := m.resolveLDConfigPath(cfg.ldconfigPath)
args := []string{filepath.Base(ldconfigPath)}
if containerRoot != "" {
args = append(args, "-r", containerRoot)
}
if root(containerRoot).hasPath("/etc/ld.so.cache") {
args = append(args, "-C", "/etc/ld.so.cache")
} else {
m.logger.Debugf("No ld.so.cache found, skipping update")
args = append(args, "-N")
}
folders := cfg.folders.Value()
if root(containerRoot).hasPath("/etc/ld.so.conf.d") {
err := m.createConfig(containerRoot, folders)
if err != nil {
return fmt.Errorf("failed to update ld.so.conf.d: %v", err)
}
} else {
args = append(args, folders...)
}
// Explicitly specify using /etc/ld.so.conf since the host's ldconfig may
// be configured to use a different config file by default.
args = append(args, "-f", "/etc/ld.so.conf")
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
return syscall.Exec(ldconfigPath, args, nil)
}
type root string
func (r root) hasPath(path string) bool {
_, err := os.Stat(filepath.Join(string(r), path))
if err != nil && os.IsNotExist(err) {
return false
}
return true
}
// resolveLDConfigPath determines the LDConfig path to use for the system.
// On systems such as Ubuntu where `/sbin/ldconfig` is a wrapper around
// /sbin/ldconfig.real, the latter is returned.
func (m command) resolveLDConfigPath(path string) string {
return strings.TrimPrefix(config.NormalizeLDConfigPath("@"+path), "@")
}
// createConfig creates (or updates) /etc/ld.so.conf.d/00-nvcr-<RANDOM_STRING>.conf in the container
// to include the required paths.
// Note that the 00-nvcr prefix is chosen to ensure that these libraries have
// a higher precedence than other libraries on the system but are applied AFTER
// 00-cuda-compat.conf.
func (m command) createConfig(root string, folders []string) error {
if len(folders) == 0 {
m.logger.Debugf("No folders to add to /etc/ld.so.conf")
return nil
}
if err := os.MkdirAll(filepath.Join(root, "/etc/ld.so.conf.d"), 0755); err != nil {
return fmt.Errorf("failed to create ld.so.conf.d: %v", err)
}
configFile, err := os.CreateTemp(filepath.Join(root, "/etc/ld.so.conf.d"), "00-nvcr-*.conf")
if err != nil {
return fmt.Errorf("failed to create config file: %v", err)
}
defer configFile.Close()
m.logger.Debugf("Adding folders %v to %v", folders, configFile.Name())
configured := make(map[string]bool)
for _, folder := range folders {
if configured[folder] {
continue
}
_, err = configFile.WriteString(fmt.Sprintf("%s\n", folder))
if err != nil {
return fmt.Errorf("failed to update ld.so.conf.d: %v", err)
}
configured[folder] = true
}
// The created file needs to be world readable for the cases where the container is run as a non-root user.
if err := os.Chmod(configFile.Name(), 0644); err != nil {
return fmt.Errorf("failed to chmod config file: %v", err)
}
return nil
}

View File

@@ -2,15 +2,6 @@ package main
import (
"log"
"strings"
)
const (
allDriverCapabilities = DriverCapabilities("compute,compat32,graphics,utility,video,display,ngx")
defaultDriverCapabilities = DriverCapabilities("utility,compute")
none = DriverCapabilities("")
all = DriverCapabilities("all")
)
func capabilityToCLI(cap string) string {
@@ -34,50 +25,3 @@ func capabilityToCLI(cap string) string {
}
return ""
}
// DriverCapabilities is used to process the NVIDIA_DRIVER_CAPABILITIES environment
// variable. Operations include default values, filtering, and handling meta values such as "all"
type DriverCapabilities string
// Intersection returns intersection between two sets of capabilities.
func (d DriverCapabilities) Intersection(capabilities DriverCapabilities) DriverCapabilities {
if capabilities == all {
return d
}
if d == all {
return capabilities
}
lookup := make(map[string]bool)
for _, c := range d.list() {
lookup[c] = true
}
var found []string
for _, c := range capabilities.list() {
if lookup[c] {
found = append(found, c)
}
}
intersection := DriverCapabilities(strings.Join(found, ","))
return intersection
}
// String returns the string representation of the driver capabilities
func (d DriverCapabilities) String() string {
return string(d)
}
// list returns the driver capabilities as a list
func (d DriverCapabilities) list() []string {
var caps []string
for _, c := range strings.Split(string(d), ",") {
trimmed := strings.TrimSpace(c)
if len(trimmed) == 0 {
continue
}
caps = append(caps, trimmed)
}
return caps
}

View File

@@ -1,134 +0,0 @@
/**
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package main
import (
"fmt"
"testing"
"github.com/stretchr/testify/require"
)
func TestDriverCapabilitiesIntersection(t *testing.T) {
testCases := []struct {
capabilities DriverCapabilities
supportedCapabilities DriverCapabilities
expectedIntersection DriverCapabilities
}{
{
capabilities: none,
supportedCapabilities: none,
expectedIntersection: none,
},
{
capabilities: all,
supportedCapabilities: none,
expectedIntersection: none,
},
{
capabilities: all,
supportedCapabilities: allDriverCapabilities,
expectedIntersection: allDriverCapabilities,
},
{
capabilities: allDriverCapabilities,
supportedCapabilities: all,
expectedIntersection: allDriverCapabilities,
},
{
capabilities: none,
supportedCapabilities: all,
expectedIntersection: none,
},
{
capabilities: none,
supportedCapabilities: DriverCapabilities("cap1"),
expectedIntersection: none,
},
{
capabilities: DriverCapabilities("cap0,cap1"),
supportedCapabilities: DriverCapabilities("cap1,cap0"),
expectedIntersection: DriverCapabilities("cap0,cap1"),
},
{
capabilities: defaultDriverCapabilities,
supportedCapabilities: allDriverCapabilities,
expectedIntersection: defaultDriverCapabilities,
},
{
capabilities: DriverCapabilities("compute,compat32,graphics,utility,video,display"),
supportedCapabilities: DriverCapabilities("compute,compat32,graphics,utility,video,display,ngx"),
expectedIntersection: DriverCapabilities("compute,compat32,graphics,utility,video,display"),
},
{
capabilities: DriverCapabilities("cap1"),
supportedCapabilities: none,
expectedIntersection: none,
},
{
capabilities: DriverCapabilities("compute,compat32,graphics,utility,video,display,ngx"),
supportedCapabilities: DriverCapabilities("compute,compat32,graphics,utility,video,display"),
expectedIntersection: DriverCapabilities("compute,compat32,graphics,utility,video,display"),
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("test case %d", i), func(t *testing.T) {
intersection := tc.supportedCapabilities.Intersection(tc.capabilities)
require.EqualValues(t, tc.expectedIntersection, intersection)
})
}
}
func TestDriverCapabilitiesList(t *testing.T) {
testCases := []struct {
capabilities DriverCapabilities
expected []string
}{
{
capabilities: DriverCapabilities(""),
},
{
capabilities: DriverCapabilities(" "),
},
{
capabilities: DriverCapabilities(","),
},
{
capabilities: DriverCapabilities(",cap"),
expected: []string{"cap"},
},
{
capabilities: DriverCapabilities("cap,"),
expected: []string{"cap"},
},
{
capabilities: DriverCapabilities("cap0,,cap1"),
expected: []string{"cap0", "cap1"},
},
{
capabilities: DriverCapabilities("cap1,cap0,cap3"),
expected: []string{"cap1", "cap0", "cap3"},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("test case %d", i), func(t *testing.T) {
require.EqualValues(t, tc.expected, tc.capabilities.list())
})
}
}

View File

@@ -6,45 +6,33 @@ import (
"log"
"os"
"path"
"path/filepath"
"strings"
"github.com/opencontainers/runtime-spec/specs-go"
"golang.org/x/mod/semver"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
"golang.org/x/mod/semver"
)
const (
envCUDAVersion = "CUDA_VERSION"
envNVRequirePrefix = "NVIDIA_REQUIRE_"
envNVRequireCUDA = envNVRequirePrefix + "CUDA"
envNVDisableRequire = "NVIDIA_DISABLE_REQUIRE"
envNVVisibleDevices = "NVIDIA_VISIBLE_DEVICES"
envNVMigConfigDevices = "NVIDIA_MIG_CONFIG_DEVICES"
envNVMigMonitorDevices = "NVIDIA_MIG_MONITOR_DEVICES"
envNVDriverCapabilities = "NVIDIA_DRIVER_CAPABILITIES"
)
const (
capSysAdmin = "CAP_SYS_ADMIN"
)
const (
deviceListAsVolumeMountsRoot = "/var/run/nvidia-container-devices"
)
type nvidiaConfig struct {
Devices string
Devices []string
MigConfigDevices string
MigMonitorDevices string
ImexChannels []string
DriverCapabilities string
Requirements []string
DisableRequire bool
// Requirements defines the requirements DSL for the container to run.
// This is empty if no specific requirements are needed, or if requirements are
// explicitly disabled.
Requirements []string
}
type containerConfig struct {
Pid int
Rootfs string
Env map[string]string
Image image.CUDA
Nvidia *nvidiaConfig
}
@@ -71,23 +59,14 @@ type LinuxCapabilities struct {
Ambient []string `json:"ambient,omitempty" platform:"linux"`
}
// Mount from OCI runtime spec
// https://github.com/opencontainers/runtime-spec/blob/v1.0.0/specs-go/config.go#L103
type Mount struct {
Destination string `json:"destination"`
Type string `json:"type,omitempty" platform:"linux,solaris"`
Source string `json:"source,omitempty"`
Options []string `json:"options,omitempty"`
}
// Spec from OCI runtime spec
// We use pointers to structs, similarly to the latest version of runtime-spec:
// https://github.com/opencontainers/runtime-spec/blob/v1.0.0/specs-go/config.go#L5-L28
type Spec struct {
Version *string `json:"ociVersion"`
Process *Process `json:"process,omitempty"`
Root *Root `json:"root,omitempty"`
Mounts []Mount `json:"mounts,omitempty"`
Version *string `json:"ociVersion"`
Process *Process `json:"process,omitempty"`
Root *Root `json:"root,omitempty"`
Mounts []specs.Mount `json:"mounts,omitempty"`
}
// HookState holds state information about the hook
@@ -130,7 +109,7 @@ func isPrivileged(s *Spec) bool {
}
var caps []string
// If v1.1.0-rc1 <= OCI version < v1.0.0-rc5 parse s.Process.Capabilities as:
// If v1.0.0-rc1 <= OCI version < v1.0.0-rc5 parse s.Process.Capabilities as:
// github.com/opencontainers/runtime-spec/blob/v1.0.0-rc1/specs-go/config.go#L30-L54
rc1cmp := semver.Compare("v"+*s.Version, "v1.0.0-rc1")
rc5cmp := semver.Compare("v"+*s.Version, "v1.0.0-rc5")
@@ -139,106 +118,57 @@ func isPrivileged(s *Spec) bool {
if err != nil {
log.Panicln("could not decode Process.Capabilities in OCI spec:", err)
}
// Otherwise, parse s.Process.Capabilities as:
// github.com/opencontainers/runtime-spec/blob/v1.0.0/specs-go/config.go#L30-L54
} else {
var lc LinuxCapabilities
err := json.Unmarshal(*s.Process.Capabilities, &lc)
if err != nil {
log.Panicln("could not decode Process.Capabilities in OCI spec:", err)
for _, c := range caps {
if c == capSysAdmin {
return true
}
}
// We only make sure that the bounding capabibility set has
// CAP_SYS_ADMIN. This allows us to make sure that the container was
// actually started as '--privileged', but also allow non-root users to
// access the privileged NVIDIA capabilities.
caps = lc.Bounding
return false
}
for _, c := range caps {
if c == capSysAdmin {
return true
}
// Otherwise, parse s.Process.Capabilities as:
// github.com/opencontainers/runtime-spec/blob/v1.0.0/specs-go/config.go#L30-L54
process := specs.Process{
Env: s.Process.Env,
}
return false
err := json.Unmarshal(*s.Process.Capabilities, &process.Capabilities)
if err != nil {
log.Panicln("could not decode Process.Capabilities in OCI spec:", err)
}
fullSpec := specs.Spec{
Version: *s.Version,
Process: &process,
}
return image.IsPrivileged(&fullSpec)
}
func getDevicesFromEnvvar(image image.CUDA, swarmResourceEnvvars []string) *string {
func getDevicesFromEnvvar(containerImage image.CUDA, swarmResourceEnvvars []string) []string {
// We check if the image has at least one of the Swarm resource envvars defined and use this
// if specified.
var hasSwarmEnvvar bool
for _, envvar := range swarmResourceEnvvars {
if _, exists := image[envvar]; exists {
hasSwarmEnvvar = true
break
if containerImage.HasEnvvar(envvar) {
return containerImage.DevicesFromEnvvars(swarmResourceEnvvars...).List()
}
}
var devices []string
if hasSwarmEnvvar {
devices = image.DevicesFromEnvvars(swarmResourceEnvvars...).List()
} else {
devices = image.DevicesFromEnvvars(envNVVisibleDevices).List()
}
if len(devices) == 0 {
return nil
}
devicesString := strings.Join(devices, ",")
return &devicesString
return containerImage.VisibleDevicesFromEnvVar()
}
func getDevicesFromMounts(mounts []Mount) *string {
var devices []string
for _, m := range mounts {
root := filepath.Clean(deviceListAsVolumeMountsRoot)
source := filepath.Clean(m.Source)
destination := filepath.Clean(m.Destination)
// Only consider mounts who's host volume is /dev/null
if source != "/dev/null" {
continue
}
// Only consider container mount points that begin with 'root'
if len(destination) < len(root) {
continue
}
if destination[:len(root)] != root {
continue
}
// Grab the full path beyond 'root' and add it to the list of devices
device := destination[len(root):]
if len(device) > 0 && device[0] == '/' {
device = device[1:]
}
if len(device) == 0 {
continue
}
devices = append(devices, device)
}
if devices == nil {
return nil
}
ret := strings.Join(devices, ",")
return &ret
}
func getDevices(hookConfig *HookConfig, image image.CUDA, mounts []Mount, privileged bool) *string {
func (hookConfig *hookConfig) getDevices(image image.CUDA, privileged bool) []string {
// If enabled, try and get the device list from volume mounts first
if hookConfig.AcceptDeviceListAsVolumeMounts {
devices := getDevicesFromMounts(mounts)
if devices != nil {
devices := image.VisibleDevicesFromMounts()
if len(devices) > 0 {
return devices
}
}
// Fallback to reading from the environment variable if privileges are correct
devices := getDevicesFromEnvvar(image, hookConfig.getSwarmResourceEnvvars())
if devices == nil {
if len(devices) == 0 {
return nil
}
if privileged || hookConfig.AcceptEnvvarUnprivileged {
@@ -251,26 +181,51 @@ func getDevices(hookConfig *HookConfig, image image.CUDA, mounts []Mount, privil
return nil
}
func getMigConfigDevices(env map[string]string) *string {
if devices, ok := env[envNVMigConfigDevices]; ok {
return &devices
func getMigConfigDevices(i image.CUDA) *string {
return getMigDevices(i, image.EnvVarNvidiaMigConfigDevices)
}
func getMigMonitorDevices(i image.CUDA) *string {
return getMigDevices(i, image.EnvVarNvidiaMigMonitorDevices)
}
func getMigDevices(image image.CUDA, envvar string) *string {
if !image.HasEnvvar(envvar) {
return nil
}
devices := image.Getenv(envvar)
return &devices
}
func (hookConfig *hookConfig) getImexChannels(image image.CUDA, privileged bool) []string {
// If enabled, try and get the device list from volume mounts first
if hookConfig.AcceptDeviceListAsVolumeMounts {
devices := image.ImexChannelsFromMounts()
if len(devices) > 0 {
return devices
}
}
devices := image.ImexChannelsFromEnvVar()
if len(devices) == 0 {
return nil
}
if privileged || hookConfig.AcceptEnvvarUnprivileged {
return devices
}
return nil
}
func getMigMonitorDevices(env map[string]string) *string {
if devices, ok := env[envNVMigMonitorDevices]; ok {
return &devices
}
return nil
}
func getDriverCapabilities(env map[string]string, supportedDriverCapabilities DriverCapabilities, legacyImage bool) DriverCapabilities {
func (hookConfig *hookConfig) getDriverCapabilities(cudaImage image.CUDA, legacyImage bool) image.DriverCapabilities {
// We use the default driver capabilities by default. This is filtered to only include the
// supported capabilities
capabilities := supportedDriverCapabilities.Intersection(defaultDriverCapabilities)
supportedDriverCapabilities := image.NewDriverCapabilities(hookConfig.SupportedDriverCapabilities)
capsEnv, capsEnvSpecified := env[envNVDriverCapabilities]
capabilities := supportedDriverCapabilities.Intersection(image.DefaultDriverCapabilities)
capsEnvSpecified := cudaImage.HasEnvvar(image.EnvVarNvidiaDriverCapabilities)
capsEnv := cudaImage.Getenv(image.EnvVarNvidiaDriverCapabilities)
if !capsEnvSpecified && legacyImage {
// Environment variable unset with legacy image: set all capabilities.
@@ -279,9 +234,9 @@ func getDriverCapabilities(env map[string]string, supportedDriverCapabilities Dr
if capsEnvSpecified && len(capsEnv) > 0 {
// If the envvironment variable is specified and is non-empty, use the capabilities value
envCapabilities := DriverCapabilities(capsEnv)
envCapabilities := image.NewDriverCapabilities(capsEnv)
capabilities = supportedDriverCapabilities.Intersection(envCapabilities)
if envCapabilities != all && capabilities != envCapabilities {
if !envCapabilities.IsAll() && len(capabilities) != len(envCapabilities) {
log.Panicln(fmt.Errorf("unsupported capabilities found in '%v' (allowed '%v')", envCapabilities, capabilities))
}
}
@@ -289,14 +244,12 @@ func getDriverCapabilities(env map[string]string, supportedDriverCapabilities Dr
return capabilities
}
func getNvidiaConfig(hookConfig *HookConfig, image image.CUDA, mounts []Mount, privileged bool) *nvidiaConfig {
func (hookConfig *hookConfig) getNvidiaConfig(image image.CUDA, privileged bool) *nvidiaConfig {
legacyImage := image.IsLegacy()
var devices string
if d := getDevices(hookConfig, image, mounts, privileged); d != nil {
devices = *d
} else {
// 'nil' devices means this is not a GPU container.
devices := hookConfig.getDevices(image, privileged)
if len(devices) == 0 {
// empty devices means this is not a GPU container.
return nil
}
@@ -316,26 +269,26 @@ func getNvidiaConfig(hookConfig *HookConfig, image image.CUDA, mounts []Mount, p
log.Panicln("cannot set MIG_MONITOR_DEVICES in non privileged container")
}
driverCapabilities := getDriverCapabilities(image, hookConfig.SupportedDriverCapabilities, legacyImage).String()
imexChannels := hookConfig.getImexChannels(image, privileged)
driverCapabilities := hookConfig.getDriverCapabilities(image, legacyImage).String()
requirements, err := image.GetRequirements()
if err != nil {
log.Panicln("failed to get requirements", err)
}
disableRequire := image.HasDisableRequire()
return &nvidiaConfig{
Devices: devices,
MigConfigDevices: migConfigDevices,
MigMonitorDevices: migMonitorDevices,
ImexChannels: imexChannels,
DriverCapabilities: driverCapabilities,
Requirements: requirements,
DisableRequire: disableRequire,
}
}
func getContainerConfig(hook HookConfig) (config containerConfig) {
func (hookConfig *hookConfig) getContainerConfig() (config containerConfig) {
var h HookState
d := json.NewDecoder(os.Stdin)
if err := d.Decode(&h); err != nil {
@@ -349,7 +302,11 @@ func getContainerConfig(hook HookConfig) (config containerConfig) {
s := loadSpec(path.Join(b, "config.json"))
image, err := image.NewCUDAImageFromEnv(s.Process.Env)
image, err := image.New(
image.WithEnv(s.Process.Env),
image.WithMounts(s.Mounts),
image.WithDisableRequire(hookConfig.DisableRequire),
)
if err != nil {
log.Panicln(err)
}
@@ -358,7 +315,7 @@ func getContainerConfig(hook HookConfig) (config containerConfig) {
return containerConfig{
Pid: h.Pid,
Rootfs: s.Root.Path,
Env: image,
Nvidia: getNvidiaConfig(&hook, image, s.Mounts, privileged),
Image: image,
Nvidia: hookConfig.getNvidiaConfig(image, privileged),
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,14 +1,15 @@
package main
import (
"fmt"
"log"
"os"
"path"
"reflect"
"strings"
"github.com/BurntSushi/toml"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
)
const (
@@ -16,96 +17,63 @@ const (
driverPath = "/run/nvidia/driver"
)
var defaultPaths = [...]string{
path.Join(driverPath, configPath),
configPath,
// hookConfig wraps the toolkit config.
// This allows for functions to be defined on the local type.
type hookConfig struct {
*config.Config
}
// CLIConfig : options for nvidia-container-cli.
type CLIConfig struct {
Root *string `toml:"root"`
Path *string `toml:"path"`
Environment []string `toml:"environment"`
Debug *string `toml:"debug"`
Ldcache *string `toml:"ldcache"`
LoadKmods bool `toml:"load-kmods"`
NoPivot bool `toml:"no-pivot"`
NoCgroups bool `toml:"no-cgroups"`
User *string `toml:"user"`
Ldconfig *string `toml:"ldconfig"`
}
// HookConfig : options for the nvidia-container-runtime-hook.
type HookConfig struct {
DisableRequire bool `toml:"disable-require"`
SwarmResource *string `toml:"swarm-resource"`
AcceptEnvvarUnprivileged bool `toml:"accept-nvidia-visible-devices-envvar-when-unprivileged"`
AcceptDeviceListAsVolumeMounts bool `toml:"accept-nvidia-visible-devices-as-volume-mounts"`
SupportedDriverCapabilities DriverCapabilities `toml:"supported-driver-capabilities"`
NvidiaContainerCLI CLIConfig `toml:"nvidia-container-cli"`
NVIDIAContainerRuntime config.RuntimeConfig `toml:"nvidia-container-runtime"`
}
func getDefaultHookConfig() HookConfig {
return HookConfig{
DisableRequire: false,
SwarmResource: nil,
AcceptEnvvarUnprivileged: true,
AcceptDeviceListAsVolumeMounts: false,
SupportedDriverCapabilities: allDriverCapabilities,
NvidiaContainerCLI: CLIConfig{
Root: nil,
Path: nil,
Environment: []string{},
Debug: nil,
Ldcache: nil,
LoadKmods: true,
NoPivot: false,
NoCgroups: false,
User: nil,
Ldconfig: nil,
},
NVIDIAContainerRuntime: *config.GetDefaultRuntimeConfig(),
}
}
func getHookConfig() (config HookConfig) {
var err error
if len(*configflag) > 0 {
config = getDefaultHookConfig()
_, err = toml.DecodeFile(*configflag, &config)
if err != nil {
log.Panicln("couldn't open configuration file:", err)
}
// loadConfig loads the required paths for the hook config.
func loadConfig() (*config.Config, error) {
var configPaths []string
var required bool
if len(*configflag) != 0 {
configPaths = append(configPaths, *configflag)
required = true
} else {
for _, p := range defaultPaths {
config = getDefaultHookConfig()
_, err = toml.DecodeFile(p, &config)
if err == nil {
break
} else if !os.IsNotExist(err) {
log.Panicln("couldn't open default configuration file:", err)
}
configPaths = append(configPaths, path.Join(driverPath, configPath), configPath)
}
for _, p := range configPaths {
cfg, err := config.New(
config.WithConfigFile(p),
config.WithRequired(true),
)
if err == nil {
return cfg.Config()
} else if os.IsNotExist(err) && !required {
continue
}
return nil, fmt.Errorf("couldn't open required configuration file: %v", err)
}
if config.SupportedDriverCapabilities == all {
config.SupportedDriverCapabilities = allDriverCapabilities
return config.GetDefault()
}
func getHookConfig() (*hookConfig, error) {
cfg, err := loadConfig()
if err != nil {
return nil, fmt.Errorf("failed to load config: %v", err)
}
// We ensure that the supported-driver-capabilites option is a subset of allDriverCapabilities
if intersection := allDriverCapabilities.Intersection(config.SupportedDriverCapabilities); intersection != config.SupportedDriverCapabilities {
config := &hookConfig{cfg}
allSupportedDriverCapabilities := image.SupportedDriverCapabilities
if config.SupportedDriverCapabilities == "all" {
config.SupportedDriverCapabilities = allSupportedDriverCapabilities.String()
}
configuredCapabilities := image.NewDriverCapabilities(config.SupportedDriverCapabilities)
// We ensure that the configured value is a subset of all supported capabilities
if !allSupportedDriverCapabilities.IsSuperset(configuredCapabilities) {
configName := config.getConfigOption("SupportedDriverCapabilities")
log.Panicf("Invalid value for config option '%v'; %v (supported: %v)\n", configName, config.SupportedDriverCapabilities, allDriverCapabilities)
log.Panicf("Invalid value for config option '%v'; %v (supported: %v)\n", configName, config.SupportedDriverCapabilities, allSupportedDriverCapabilities.String())
}
return config
return config, nil
}
// getConfigOption returns the toml config option associated with the
// specified struct field.
func (c HookConfig) getConfigOption(fieldName string) string {
func (c hookConfig) getConfigOption(fieldName string) string {
t := reflect.TypeOf(c)
f, ok := t.FieldByName(fieldName)
if !ok {
@@ -119,12 +87,12 @@ func (c HookConfig) getConfigOption(fieldName string) string {
}
// getSwarmResourceEnvvars returns the swarm resource envvars for the config.
func (c *HookConfig) getSwarmResourceEnvvars() []string {
if c.SwarmResource == nil {
func (c *hookConfig) getSwarmResourceEnvvars() []string {
if c.SwarmResource == "" {
return nil
}
candidates := strings.Split(*c.SwarmResource, ",")
candidates := strings.Split(c.SwarmResource, ",")
var envvars []string
for _, c := range candidates {

View File

@@ -22,22 +22,25 @@ import (
"testing"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
)
func TestGetHookConfig(t *testing.T) {
testCases := []struct {
lines []string
expectedPanic bool
expectedDriverCapabilities DriverCapabilities
expectedDriverCapabilities string
}{
{
expectedDriverCapabilities: allDriverCapabilities,
expectedDriverCapabilities: image.SupportedDriverCapabilities.String(),
},
{
lines: []string{
"supported-driver-capabilities = \"all\"",
},
expectedDriverCapabilities: allDriverCapabilities,
expectedDriverCapabilities: image.SupportedDriverCapabilities.String(),
},
{
lines: []string{
@@ -47,19 +50,19 @@ func TestGetHookConfig(t *testing.T) {
},
{
lines: []string{},
expectedDriverCapabilities: allDriverCapabilities,
expectedDriverCapabilities: image.SupportedDriverCapabilities.String(),
},
{
lines: []string{
"supported-driver-capabilities = \"\"",
},
expectedDriverCapabilities: none,
expectedDriverCapabilities: "",
},
{
lines: []string{
"supported-driver-capabilities = \"utility,compute\"",
"supported-driver-capabilities = \"compute,utility\"",
},
expectedDriverCapabilities: DriverCapabilities("utility,compute"),
expectedDriverCapabilities: "compute,utility",
},
}
@@ -87,9 +90,10 @@ func TestGetHookConfig(t *testing.T) {
}
}
var config HookConfig
var cfg hookConfig
getHookConfig := func() {
config = getHookConfig()
c, _ := getHookConfig()
cfg = *c
}
if tc.expectedPanic {
@@ -99,7 +103,7 @@ func TestGetHookConfig(t *testing.T) {
getHookConfig()
require.EqualValues(t, tc.expectedDriverCapabilities, config.SupportedDriverCapabilities)
require.EqualValues(t, tc.expectedDriverCapabilities, cfg.SupportedDriverCapabilities)
})
}
}
@@ -109,10 +113,6 @@ func TestGetSwarmResourceEnvvars(t *testing.T) {
value string
expected []string
}{
{
value: "nil",
expected: nil,
},
{
value: "",
expected: nil,
@@ -145,13 +145,10 @@ func TestGetSwarmResourceEnvvars(t *testing.T) {
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
c := &HookConfig{
SwarmResource: func() *string {
if tc.value == "nil" {
return nil
}
return &tc.value
}(),
c := &hookConfig{
Config: &config.Config{
SwarmResource: tc.value,
},
}
envvars := c.getSwarmResourceEnvvars()

View File

@@ -13,7 +13,9 @@ import (
"strings"
"syscall"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
)
@@ -36,16 +38,12 @@ func exit() {
os.Exit(0)
}
func getCLIPath(config CLIConfig) string {
if config.Path != nil {
return *config.Path
func getCLIPath(config config.ContainerCLIConfig) string {
if config.Path != "" {
return config.Path
}
var root string
if config.Root != nil {
root = *config.Root
}
if err := os.Setenv("PATH", lookup.GetPath(root)); err != nil {
if err := os.Setenv("PATH", lookup.GetPath(config.Root)); err != nil {
log.Panicln("couldn't set PATH variable:", err)
}
@@ -71,53 +69,62 @@ func doPrestart() {
defer exit()
log.SetFlags(0)
hook := getHookConfig()
cli := hook.NvidiaContainerCLI
if info.ResolveAutoMode(&logInterceptor{}, hook.NVIDIAContainerRuntime.Mode) != "legacy" {
log.Panicln("invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime (e.g. specify the --runtime=nvidia flag) instead.")
hook, err := getHookConfig()
if err != nil || hook == nil {
log.Panicln("error getting hook config:", err)
}
cli := hook.NVIDIAContainerCLIConfig
container := getContainerConfig(hook)
container := hook.getContainerConfig()
nvidia := container.Nvidia
if nvidia == nil {
// Not a GPU container, nothing to do.
return
}
if !hook.NVIDIAContainerRuntimeHookConfig.SkipModeDetection && info.ResolveAutoMode(&logInterceptor{}, hook.NVIDIAContainerRuntimeConfig.Mode, container.Image) != "legacy" {
log.Panicln("invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime (e.g. specify the --runtime=nvidia flag) instead.")
}
rootfs := getRootfsPath(container)
args := []string{getCLIPath(cli)}
if cli.Root != nil {
args = append(args, fmt.Sprintf("--root=%s", *cli.Root))
if cli.Root != "" {
args = append(args, fmt.Sprintf("--root=%s", cli.Root))
}
if cli.LoadKmods {
args = append(args, "--load-kmods")
}
if hook.Features.DisableImexChannelCreation.IsEnabled() {
args = append(args, "--no-create-imex-channels")
}
if cli.NoPivot {
args = append(args, "--no-pivot")
}
if *debugflag {
args = append(args, "--debug=/dev/stderr")
} else if cli.Debug != nil {
args = append(args, fmt.Sprintf("--debug=%s", *cli.Debug))
} else if cli.Debug != "" {
args = append(args, fmt.Sprintf("--debug=%s", cli.Debug))
}
if cli.Ldcache != nil {
args = append(args, fmt.Sprintf("--ldcache=%s", *cli.Ldcache))
if cli.Ldcache != "" {
args = append(args, fmt.Sprintf("--ldcache=%s", cli.Ldcache))
}
if cli.User != nil {
args = append(args, fmt.Sprintf("--user=%s", *cli.User))
if cli.User != "" {
args = append(args, fmt.Sprintf("--user=%s", cli.User))
}
args = append(args, "configure")
if cli.Ldconfig != nil {
args = append(args, fmt.Sprintf("--ldconfig=%s", *cli.Ldconfig))
if !hook.Features.AllowCUDACompatLibsFromContainer.IsEnabled() {
args = append(args, "--no-cntlibs")
}
if ldconfigPath := cli.NormalizeLDConfigPath(); ldconfigPath != "" {
args = append(args, fmt.Sprintf("--ldconfig=%s", ldconfigPath))
}
if cli.NoCgroups {
args = append(args, "--no-cgroups")
}
if len(nvidia.Devices) > 0 {
args = append(args, fmt.Sprintf("--device=%s", nvidia.Devices))
if devicesString := strings.Join(nvidia.Devices, ","); len(devicesString) > 0 {
args = append(args, fmt.Sprintf("--device=%s", devicesString))
}
if len(nvidia.MigConfigDevices) > 0 {
args = append(args, fmt.Sprintf("--mig-config=%s", nvidia.MigConfigDevices))
@@ -125,6 +132,9 @@ func doPrestart() {
if len(nvidia.MigMonitorDevices) > 0 {
args = append(args, fmt.Sprintf("--mig-monitor=%s", nvidia.MigMonitorDevices))
}
if imexString := strings.Join(nvidia.ImexChannels, ","); len(imexString) > 0 {
args = append(args, fmt.Sprintf("--imex-channel=%s", imexString))
}
for _, cap := range strings.Split(nvidia.DriverCapabilities, ",") {
if len(cap) == 0 {
@@ -133,16 +143,15 @@ func doPrestart() {
args = append(args, capabilityToCLI(cap))
}
if !hook.DisableRequire && !nvidia.DisableRequire {
for _, req := range nvidia.Requirements {
args = append(args, fmt.Sprintf("--require=%s", req))
}
for _, req := range nvidia.Requirements {
args = append(args, fmt.Sprintf("--require=%s", req))
}
args = append(args, fmt.Sprintf("--pid=%s", strconv.FormatUint(uint64(container.Pid), 10)))
args = append(args, rootfs)
env := append(os.Environ(), cli.Environment...)
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection?
err = syscall.Exec(args[0], args, env)
log.Panicln("exec failed:", err)
}
@@ -185,11 +194,11 @@ func main() {
}
}
// logInterceptor implements the info.Logger interface to allow for logging from this function.
type logInterceptor struct{}
// logInterceptor implements the logger.Interface to allow for logging from executable.
type logInterceptor struct {
logger.NullLogger
}
func (l *logInterceptor) Infof(format string, args ...interface{}) {
log.Printf(format, args...)
}
func (l *logInterceptor) Debugf(format string, args ...interface{}) {}

View File

@@ -0,0 +1,34 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package main
import (
"os"
"github.com/NVIDIA/nvidia-container-toolkit/internal/runtime"
)
func main() {
rt := runtime.New(
runtime.WithModeOverride("cdi"),
)
err := rt.Run(os.Args)
if err != nil {
os.Exit(1)
}
}

View File

@@ -0,0 +1,34 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package main
import (
"os"
"github.com/NVIDIA/nvidia-container-toolkit/internal/runtime"
)
func main() {
rt := runtime.New(
runtime.WithModeOverride("legacy"),
)
err := rt.Run(os.Args)
if err != nil {
os.Exit(1)
}
}

View File

@@ -85,3 +85,126 @@ Alternatively the NVIDIA Container Runtime can be set as the default runtime for
}
}
```
## Environment variables (OCI spec)
Each environment variable maps to an command-line argument for `nvidia-container-cli` from [libnvidia-container](https://github.com/NVIDIA/libnvidia-container).
These variables are already set in our [official CUDA images](https://hub.docker.com/r/nvidia/cuda/).
### `NVIDIA_VISIBLE_DEVICES`
This variable controls which GPUs will be made accessible inside the container.
#### Possible values
* `0,1,2`, `GPU-fef8089b` …: a comma-separated list of GPU UUID(s) or index(es).
* `all`: all GPUs will be accessible, this is the default value in our container images.
* `none`: no GPU will be accessible, but driver capabilities will be enabled.
* `void` or *empty* or *unset*: `nvidia-container-runtime` will have the same behavior as `runc`.
**Note**: When running on a MIG capable device, the following values will also be available:
* `0:0,0:1,1:0`, `MIG-GPU-fef8089b/0/1` …: a comma-separated list of MIG Device UUID(s) or index(es).
Where the MIG device indices have the form `<GPU Device Index>:<MIG Device Index>` as seen in the example output:
```
$ nvidia-smi -L
GPU 0: Graphics Device (UUID: GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5)
MIG Device 0: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/0)
MIG Device 1: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/1)
MIG Device 2: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/11/0)
```
### `NVIDIA_MIG_CONFIG_DEVICES`
This variable controls which of the visible GPUs can have their MIG
configuration managed from within the container. This includes enabling and
disabling MIG mode, creating and destroying GPU Instances and Compute
Instances, etc.
#### Possible values
* `all`: Allow all MIG-capable GPUs in the visible device list to have their
MIG configurations managed.
**Note**:
* This feature is only available on MIG capable devices (e.g. the A100).
* To use this feature, the container must be started with `CAP_SYS_ADMIN` privileges.
* When not running as `root`, the container user must have read access to the
`/proc/driver/nvidia/capabilities/mig/config` file on the host.
### `NVIDIA_MIG_MONITOR_DEVICES`
This variable controls which of the visible GPUs can have aggregate information
about all of their MIG devices monitored from within the container. This
includes inspecting the aggregate memory usage, listing the aggregate running
processes, etc.
#### Possible values
* `all`: Allow all MIG-capable GPUs in the visible device list to have their
MIG devices monitored.
**Note**:
* This feature is only available on MIG capable devices (e.g. the A100).
* To use this feature, the container must be started with `CAP_SYS_ADMIN` privileges.
* When not running as `root`, the container user must have read access to the
`/proc/driver/nvidia/capabilities/mig/monitor` file on the host.
### `NVIDIA_DRIVER_CAPABILITIES`
This option controls which driver libraries/binaries will be mounted inside the container.
#### Possible values
* `compute,video`, `graphics,utility` …: a comma-separated list of driver features the container needs.
* `all`: enable all available driver capabilities.
* *empty* or *unset*: use default driver capability: `utility,compute`.
#### Supported driver capabilities
* `compute`: required for CUDA and OpenCL applications.
* `compat32`: required for running 32-bit applications.
* `graphics`: required for running OpenGL and Vulkan applications.
* `utility`: required for using `nvidia-smi` and NVML.
* `video`: required for using the Video Codec SDK.
* `display`: required for leveraging X11 display.
### `NVIDIA_REQUIRE_*`
A logical expression to define constraints on the configurations supported by the container.
#### Supported constraints
* `cuda`: constraint on the CUDA driver version.
* `driver`: constraint on the driver version.
* `arch`: constraint on the compute architectures of the selected GPUs.
* `brand`: constraint on the brand of the selected GPUs (e.g. GeForce, Tesla, GRID).
#### Expressions
Multiple constraints can be expressed in a single environment variable: space-separated constraints are ORed, comma-separated constraints are ANDed.
Multiple environment variables of the form `NVIDIA_REQUIRE_*` are ANDed together.
### `NVIDIA_DISABLE_REQUIRE`
Single switch to disable all the constraints of the form `NVIDIA_REQUIRE_*`.
### `NVIDIA_REQUIRE_CUDA`
The version of the CUDA toolkit used by the container. It is an instance of the generic `NVIDIA_REQUIRE_*` case and it is set by official CUDA images.
If the version of the NVIDIA driver is insufficient to run this version of CUDA, the container will not be started.
#### Possible values
* `cuda>=7.5`, `cuda>=8.0`, `cuda>=9.0` …: any valid CUDA version in the form `major.minor`.
### `CUDA_VERSION`
Similar to `NVIDIA_REQUIRE_CUDA`, for legacy CUDA images.
In addition, if `NVIDIA_REQUIRE_CUDA` is not set, `NVIDIA_VISIBLE_DEVICES` and `NVIDIA_DRIVER_CAPABILITIES` will default to `all`.
## Usage example
**NOTE:** The use of the `nvidia-container-runtime` as CLI replacement for `runc` is uncommon and is only provided for completeness.
Although the `nvidia-container-runtime` is typically configured as a replacement for `runc` or `crun` in various container engines, it can also be
invoked from the command line as `runc` would. For example:
```sh
# Setup a rootfs based on Ubuntu 16.04
cd $(mktemp -d) && mkdir rootfs
curl -sS http://cdimage.ubuntu.com/ubuntu-base/releases/16.04/release/ubuntu-base-16.04.6-base-amd64.tar.gz | tar --exclude 'dev/*' -C rootfs -xz
# Create an OCI runtime spec
nvidia-container-runtime spec
sed -i 's;"sh";"nvidia-smi";' config.json
sed -i 's;\("TERM=xterm"\);\1, "NVIDIA_VISIBLE_DEVICES=0";' config.json
# Run the container
sudo nvidia-container-runtime run nvidia_smi
```

View File

@@ -1,89 +1,15 @@
package main
import (
"fmt"
"os"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/NVIDIA/nvidia-container-toolkit/internal/runtime"
)
// version must be set by go build's -X main.version= option in the Makefile.
var version = "unknown"
// gitCommit will be the hash that the binary was built from
// and will be populated by the Makefile
var gitCommit = ""
var logger = NewLogger()
func main() {
err := run(os.Args)
r := runtime.New()
err := r.Run(os.Args)
if err != nil {
logger.Errorf("%v", err)
os.Exit(1)
}
}
// run is an entry point that allows for idiomatic handling of errors
// when calling from the main function.
func run(argv []string) (rerr error) {
printVersion := hasVersionFlag(argv)
if printVersion {
fmt.Printf("%v version %v\n", "NVIDIA Container Runtime", info.GetVersionString(fmt.Sprintf("spec: %v", specs.Version)))
}
cfg, err := config.GetConfig()
if err != nil {
return fmt.Errorf("error loading config: %v", err)
}
logger, err = UpdateLogger(
cfg.NVIDIAContainerRuntimeConfig.DebugFilePath,
cfg.NVIDIAContainerRuntimeConfig.LogLevel,
argv,
)
if err != nil {
return fmt.Errorf("failed to set up logger: %v", err)
}
defer func() {
if rerr != nil {
logger.Errorf("%v", rerr)
}
logger.Reset()
}()
logger.Debugf("Command line arguments: %v", argv)
runtime, err := newNVIDIAContainerRuntime(logger.Logger, cfg, argv)
if err != nil {
return fmt.Errorf("failed to create NVIDIA Container Runtime: %v", err)
}
if printVersion {
fmt.Print("\n")
}
return runtime.Exec(argv)
}
// TODO: This should be refactored / combined with parseArgs in logger.
func hasVersionFlag(args []string) bool {
for i := 0; i < len(args); i++ {
param := args[i]
parts := strings.SplitN(param, "=", 2)
trimmed := strings.TrimLeft(parts[0], "-")
// If this is not a flag we continue
if parts[0] == trimmed {
continue
}
// Check the version flag
if trimmed == "version" {
return true
}
}
return false
}

View File

@@ -3,25 +3,28 @@ package main
import (
"bytes"
"encoding/json"
"io/ioutil"
"io"
"log"
"os"
"os/exec"
"path/filepath"
"strings"
"testing"
"github.com/opencontainers/runtime-spec/specs-go"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/internal/modifier"
"github.com/NVIDIA/nvidia-container-toolkit/internal/test"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/stretchr/testify/require"
)
const (
nvidiaRuntime = "nvidia-container-runtime"
nvidiaHook = "nvidia-container-runtime-hook"
bundlePathSuffix = "test/output/bundle/"
bundlePathSuffix = "tests/output/bundle/"
specFile = "config.json"
unmodifiedSpecFileSuffix = "test/input/test_spec.json"
unmodifiedSpecFileSuffix = "tests/input/test_spec.json"
)
const (
@@ -41,10 +44,10 @@ func TestMain(m *testing.M) {
var err error
moduleRoot, err := test.GetModuleRoot()
if err != nil {
logger.Fatalf("error in test setup: could not get module root: %v", err)
log.Fatalf("error in test setup: could not get module root: %v", err)
}
testBinPath := filepath.Join(moduleRoot, "test", "bin")
testInputPath := filepath.Join(moduleRoot, "test", "input")
testBinPath := filepath.Join(moduleRoot, "tests", "bin")
testInputPath := filepath.Join(moduleRoot, "tests", "input")
// Set the environment variables for the test
os.Setenv("PATH", test.PrependToPath(testBinPath, moduleRoot))
@@ -53,11 +56,11 @@ func TestMain(m *testing.M) {
// Confirm that the environment is configured correctly
runcPath, err := exec.LookPath(runcExecutableName)
if err != nil || filepath.Join(testBinPath, runcExecutableName) != runcPath {
logger.Fatalf("error in test setup: mock runc path set incorrectly in TestMain(): %v", err)
log.Fatalf("error in test setup: mock runc path set incorrectly in TestMain(): %v", err)
}
hookPath, err := exec.LookPath(nvidiaHook)
if err != nil || filepath.Join(testBinPath, nvidiaHook) != hookPath {
logger.Fatalf("error in test setup: mock hook path set incorrectly in TestMain(): %v", err)
log.Fatalf("error in test setup: mock hook path set incorrectly in TestMain(): %v", err)
}
// Store the root and binary paths in the test Config
@@ -77,13 +80,14 @@ func TestMain(m *testing.M) {
// case 1) nvidia-container-runtime run --bundle
// case 2) nvidia-container-runtime create --bundle
// - Confirm the runtime handles bad input correctly
// - Confirm the runtime handles bad input correctly
func TestBadInput(t *testing.T) {
err := cfg.generateNewRuntimeSpec()
if err != nil {
t.Fatal(err)
}
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
cmdCreate := exec.Command(nvidiaRuntime, "create", "--bundle")
t.Logf("executing: %s\n", strings.Join(cmdCreate.Args, " "))
err = cmdCreate.Run()
@@ -91,15 +95,17 @@ func TestBadInput(t *testing.T) {
}
// case 1) nvidia-container-runtime run --bundle <bundle-name> <ctr-name>
// - Confirm the runtime runs with no errors
// - Confirm the runtime runs with no errors
//
// case 2) nvidia-container-runtime create --bundle <bundle-name> <ctr-name>
// - Confirm the runtime inserts the NVIDIA prestart hook correctly
// - Confirm the runtime inserts the NVIDIA prestart hook correctly
func TestGoodInput(t *testing.T) {
err := cfg.generateNewRuntimeSpec()
if err != nil {
t.Fatalf("error generating runtime spec: %v", err)
}
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
cmdRun := exec.Command(nvidiaRuntime, "run", "--bundle", cfg.bundlePath(), "testcontainer")
t.Logf("executing: %s\n", strings.Join(cmdRun.Args, " "))
output, err := cmdRun.CombinedOutput()
@@ -110,6 +116,7 @@ func TestGoodInput(t *testing.T) {
require.NoError(t, err, "should be no errors when reading and parsing spec from config.json")
require.Empty(t, spec.Hooks, "there should be no hooks in config.json")
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
cmdCreate := exec.Command(nvidiaRuntime, "create", "--bundle", cfg.bundlePath(), "testcontainer")
t.Logf("executing: %s\n", strings.Join(cmdCreate.Args, " "))
err = cmdCreate.Run()
@@ -155,6 +162,7 @@ func TestDuplicateHook(t *testing.T) {
}
// Test how runtime handles already existing prestart hook in config.json
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
cmdCreate := exec.Command(nvidiaRuntime, "create", "--bundle", cfg.bundlePath(), "testcontainer")
t.Logf("executing: %s\n", strings.Join(cmdCreate.Args, " "))
output, err := cmdCreate.CombinedOutput()
@@ -170,7 +178,8 @@ func TestDuplicateHook(t *testing.T) {
// addNVIDIAHook is a basic wrapper for an addHookModifier that is used for
// testing.
func addNVIDIAHook(spec *specs.Spec) error {
m := modifier.NewStableRuntimeModifier(logger.Logger)
logger, _ := testlog.NewNullLogger()
m := modifier.NewStableRuntimeModifier(logger, nvidiaHook)
return m.Modify(spec)
}
@@ -184,15 +193,16 @@ func (c testConfig) getRuntimeSpec() (specs.Spec, error) {
}
defer jsonFile.Close()
jsonContent, err := ioutil.ReadAll(jsonFile)
if err != nil {
jsonContent, err := io.ReadAll(jsonFile)
switch {
case err != nil:
return spec, err
} else if json.Valid(jsonContent) {
case json.Valid(jsonContent):
err = json.Unmarshal(jsonContent, &spec)
if err != nil {
return spec, err
}
} else {
default:
err = json.NewDecoder(bytes.NewReader(jsonContent)).Decode(&spec)
if err != nil {
return spec, err
@@ -222,6 +232,7 @@ func (c testConfig) generateNewRuntimeSpec() error {
return err
}
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
cmd := exec.Command("cp", c.unmodifiedSpecFile(), c.specFilePath())
err = cmd.Run()
if err != nil {

View File

@@ -1,111 +0,0 @@
/*
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package main
import (
"fmt"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
"github.com/NVIDIA/nvidia-container-toolkit/internal/modifier"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/NVIDIA/nvidia-container-toolkit/internal/runtime"
"github.com/sirupsen/logrus"
)
// newNVIDIAContainerRuntime is a factory method that constructs a runtime based on the selected configuration and specified logger
func newNVIDIAContainerRuntime(logger *logrus.Logger, cfg *config.Config, argv []string) (oci.Runtime, error) {
lowLevelRuntime, err := oci.NewLowLevelRuntime(logger, cfg.NVIDIAContainerRuntimeConfig.Runtimes)
if err != nil {
return nil, fmt.Errorf("error constructing low-level runtime: %v", err)
}
if !oci.HasCreateSubcommand(argv) {
logger.Debugf("Skipping modifier for non-create subcommand")
return lowLevelRuntime, nil
}
ociSpec, err := oci.NewSpec(logger, argv)
if err != nil {
return nil, fmt.Errorf("error constructing OCI specification: %v", err)
}
specModifier, err := newSpecModifier(logger, cfg, ociSpec, argv)
if err != nil {
return nil, fmt.Errorf("failed to construct OCI spec modifier: %v", err)
}
// Create the wrapping runtime with the specified modifier
r := runtime.NewModifyingRuntimeWrapper(
logger,
lowLevelRuntime,
ociSpec,
specModifier,
)
return r, nil
}
// newSpecModifier is a factory method that creates constructs an OCI spec modifer based on the provided config.
func newSpecModifier(logger *logrus.Logger, cfg *config.Config, ociSpec oci.Spec, argv []string) (oci.SpecModifier, error) {
modeModifier, err := newModeModifier(logger, cfg, ociSpec, argv)
if err != nil {
return nil, err
}
graphicsModifier, err := modifier.NewGraphicsModifier(logger, cfg, ociSpec)
if err != nil {
return nil, err
}
gdsModifier, err := modifier.NewGDSModifier(logger, cfg, ociSpec)
if err != nil {
return nil, err
}
mofedModifier, err := modifier.NewMOFEDModifier(logger, cfg, ociSpec)
if err != nil {
return nil, err
}
tegraModifier, err := modifier.NewTegraPlatformFiles(logger)
if err != nil {
return nil, err
}
modifiers := modifier.Merge(
modeModifier,
graphicsModifier,
gdsModifier,
mofedModifier,
tegraModifier,
)
return modifiers, nil
}
func newModeModifier(logger *logrus.Logger, cfg *config.Config, ociSpec oci.Spec, argv []string) (oci.SpecModifier, error) {
switch info.ResolveAutoMode(logger, cfg.NVIDIAContainerRuntimeConfig.Mode) {
case "legacy":
return modifier.NewStableRuntimeModifier(logger), nil
case "csv":
return modifier.NewCSVModifier(logger, cfg, ociSpec)
case "cdi":
return modifier.NewCDIModifier(logger, cfg, ociSpec)
}
return nil, fmt.Errorf("invalid runtime mode: %v", cfg.NVIDIAContainerRuntimeConfig.Mode)
}

View File

@@ -1,129 +0,0 @@
/*
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package main
import (
"encoding/json"
"os"
"path/filepath"
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/opencontainers/runtime-spec/specs-go"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
)
func TestFactoryMethod(t *testing.T) {
logger, _ := testlog.NewNullLogger()
testCases := []struct {
description string
cfg *config.Config
spec *specs.Spec
expectedError bool
}{
{
description: "empty config raises error",
cfg: &config.Config{
NVIDIAContainerRuntimeConfig: config.RuntimeConfig{},
},
expectedError: true,
},
{
description: "config with runtime raises no error",
cfg: &config.Config{
NVIDIAContainerRuntimeConfig: config.RuntimeConfig{
Runtimes: []string{"runc"},
Mode: "legacy",
},
},
},
{
description: "csv mode is supported",
cfg: &config.Config{
NVIDIAContainerRuntimeConfig: config.RuntimeConfig{
Runtimes: []string{"runc"},
Mode: "csv",
},
},
spec: &specs.Spec{
Process: &specs.Process{
Env: []string{
"NVIDIA_VISIBLE_DEVICES=all",
},
},
},
},
{
description: "non-legacy discover mode raises error",
cfg: &config.Config{
NVIDIAContainerRuntimeConfig: config.RuntimeConfig{
Runtimes: []string{"runc"},
Mode: "non-legacy",
},
},
expectedError: true,
},
{
description: "legacy discover mode returns modifier",
cfg: &config.Config{
NVIDIAContainerRuntimeConfig: config.RuntimeConfig{
Runtimes: []string{"runc"},
Mode: "legacy",
},
},
},
{
description: "csv discover mode returns modifier",
cfg: &config.Config{
NVIDIAContainerRuntimeConfig: config.RuntimeConfig{
Runtimes: []string{"runc"},
Mode: "csv",
},
},
},
{
description: "empty mode raises error",
cfg: &config.Config{
NVIDIAContainerRuntimeConfig: config.RuntimeConfig{
Runtimes: []string{"runc"},
},
},
expectedError: true,
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
bundleDir := t.TempDir()
specFile, err := os.Create(filepath.Join(bundleDir, "config.json"))
require.NoError(t, err)
require.NoError(t, json.NewEncoder(specFile).Encode(tc.spec))
argv := []string{"--bundle", bundleDir, "create"}
_, err = newNVIDIAContainerRuntime(logger, tc.cfg, argv)
if tc.expectedError {
require.Error(t, err)
} else {
require.NoError(t, err)
}
})
}
}

View File

@@ -15,28 +15,23 @@ docker setup \
/run/nvidia/toolkit
```
Configure the `nvidia-container-runtime` as a docker runtime named `NAME`. If the `--runtime-name` flag is not specified, this runtime would be called `nvidia`. A runtime named `nvidia-experimental` will also be configured using the `nvidia-container-runtime-experimental` OCI-compliant runtime shim.
Configure the `nvidia-container-runtime` as a docker runtime named `NAME`. If the `--runtime-name` flag is not specified, this runtime would be called `nvidia`.
Since `--set-as-default` is enabled by default, the specified runtime name will also be set as the default docker runtime. This can be disabled by explicityly specifying `--set-as-default=false`.
**Note**: If `--runtime-name` is specified as `nvidia-experimental` explicitly, the `nvidia-experimental` runtime will be configured as the default runtime, with the `nvidia` runtime still configured and available for use.
The following table describes the behaviour for different `--runtime-name` and `--set-as-default` flag combinations.
| Flags | Installed Runtimes | Default Runtime |
|-------------------------------------------------------------|:--------------------------------|:----------------------|
| **NONE SPECIFIED** | `nvidia`, `nvidia-experimental` | `nvidia` |
| `--runtime-name nvidia` | `nvidia`, `nvidia-experimental` | `nvidia` |
| `--runtime-name NAME` | `NAME`, `nvidia-experimental` | `NAME` |
| `--runtime-name nvidia-experimental` | `nvidia`, `nvidia-experimental` | `nvidia-experimental` |
| `--set-as-default` | `nvidia`, `nvidia-experimental` | `nvidia` |
| `--set-as-default --runtime-name nvidia` | `nvidia`, `nvidia-experimental` | `nvidia` |
| `--set-as-default --runtime-name NAME` | `NAME`, `nvidia-experimental` | `NAME` |
| `--set-as-default --runtime-name nvidia-experimental` | `nvidia`, `nvidia-experimental` | `nvidia-experimental` |
| `--set-as-default=false` | `nvidia`, `nvidia-experimental` | **NOT SET** |
| `--set-as-default=false --runtime-name NAME` | `NAME`, `nvidia-experimental` | **NOT SET** |
| `--set-as-default=false --runtime-name nvidia` | `nvidia`, `nvidia-experimental` | **NOT SET** |
| `--set-as-default=false --runtime-name nvidia-experimental` | `nvidia`, `nvidia-experimental` | **NOT SET** |
| **NONE SPECIFIED** | `nvidia` | `nvidia` |
| `--runtime-name nvidia` | `nvidia` | `nvidia` |
| `--runtime-name NAME` | `NAME` | `NAME` |
| `--set-as-default` | `nvidia` | `nvidia` |
| `--set-as-default --runtime-name nvidia` | `nvidia` | `nvidia` |
| `--set-as-default --runtime-name NAME` | `NAME` | `NAME` |
| `--set-as-default=false` | `nvidia` | **NOT SET** |
| `--set-as-default=false --runtime-name NAME` | `NAME` | **NOT SET** |
| `--set-as-default=false --runtime-name nvidia` | `nvidia` | **NOT SET** |
These combinations also hold for the environment variables that map to the command line flags: `DOCKER_RUNTIME_NAME`, `DOCKER_SET_AS_DEFAULT`.
@@ -48,7 +43,7 @@ containerd setup \
/run/nvidia/toolkit
```
Configure the `nvidia-container-runtime` as a runtime class named `NAME`. If the `--runtime-class` flag is not specified, this runtime would be called `nvidia`. A runtime class named `nvidia-experimental` will also be configured using the `nvidia-container-runtime-experimental` OCI-compliant runtime shim.
Configure the `nvidia-container-runtime` as a runtime class named `NAME`. If the `--runtime-class` flag is not specified, this runtime would be called `nvidia`.
Adding the `--set-as-default` flag as follows:
```bash
@@ -59,19 +54,15 @@ containerd setup \
```
will set the runtime class `NAME` (or `nvidia` if not specified) as the default runtime class.
**Note**: If `--runtime-class` is specified as `nvidia-experimental` explicitly and `--set-as-default` is specified, the `nvidia-experimental` runtime will be configured as the default runtime class, with the `nvidia` runtime class still configured and available for use.
The following table describes the behaviour for different `--runtime-class` and `--set-as-default` flag combinations.
| Flags | Installed Runtime Classes | Default Runtime Class |
|--------------------------------------------------------|:--------------------------------|:----------------------|
| **NONE SPECIFIED** | `nvidia`, `nvidia-experimental` | **NOT SET** |
| `--runtime-class NAME` | `NAME`, `nvidia-experimental` | **NOT SET** |
| `--runtime-class nvidia` | `nvidia`, `nvidia-experimental` | **NOT SET** |
| `--runtime-class nvidia-experimental` | `nvidia`, `nvidia-experimental` | **NOT SET** |
| `--set-as-default` | `nvidia`, `nvidia-experimental` | `nvidia` |
| `--set-as-default --runtime-class NAME` | `NAME`, `nvidia-experimental` | `NAME` |
| `--set-as-default --runtime-class nvidia` | `nvidia`, `nvidia-experimental` | `nvidia` |
| `--set-as-default --runtime-class nvidia-experimental` | `nvidia`, `nvidia-experimental` | `nvidia-experimental` |
| **NONE SPECIFIED** | `nvidia` | **NOT SET** |
| `--runtime-class NAME` | `NAME` | **NOT SET** |
| `--runtime-class nvidia` | `nvidia` | **NOT SET** |
| `--set-as-default` | `nvidia` | `nvidia` |
| `--set-as-default --runtime-class NAME` | `NAME` | `NAME` |
| `--set-as-default --runtime-class nvidia` | `nvidia` | `nvidia` |
These combinations also hold for the environment variables that map to the command line flags.

View File

@@ -0,0 +1,177 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package container
import (
"fmt"
"os"
"os/exec"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/operator"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine"
)
const (
restartModeNone = "none"
restartModeSignal = "signal"
restartModeSystemd = "systemd"
)
// Options defines the shared options for the CLIs to configure containers runtimes.
type Options struct {
Config string
Socket string
// EnabledCDI indicates whether CDI should be enabled.
EnableCDI bool
RuntimeName string
RuntimeDir string
SetAsDefault bool
RestartMode string
HostRootMount string
}
// ParseArgs parses the command line arguments to the CLI
func ParseArgs(c *cli.Context, o *Options) error {
if o.RuntimeDir != "" {
logrus.Debug("Runtime directory already set; ignoring arguments")
return nil
}
args := c.Args()
logrus.Infof("Parsing arguments: %v", args.Slice())
if c.NArg() != 1 {
return fmt.Errorf("incorrect number of arguments")
}
o.RuntimeDir = args.Get(0)
logrus.Infof("Successfully parsed arguments")
return nil
}
// Configure applies the options to the specified config
func (o Options) Configure(cfg engine.Interface) error {
err := o.UpdateConfig(cfg)
if err != nil {
return fmt.Errorf("unable to update config: %v", err)
}
return o.flush(cfg)
}
// Unconfigure removes the options from the specified config
func (o Options) Unconfigure(cfg engine.Interface) error {
err := o.RevertConfig(cfg)
if err != nil {
return fmt.Errorf("unable to update config: %v", err)
}
return o.flush(cfg)
}
// flush flushes the specified config to disk
func (o Options) flush(cfg engine.Interface) error {
logrus.Infof("Flushing config to %v", o.Config)
n, err := cfg.Save(o.Config)
if err != nil {
return fmt.Errorf("unable to flush config: %v", err)
}
if n == 0 {
logrus.Infof("Config file is empty, removed")
}
return nil
}
// UpdateConfig updates the specified config to include the nvidia runtimes
func (o Options) UpdateConfig(cfg engine.Interface) error {
runtimes := operator.GetRuntimes(
operator.WithNvidiaRuntimeName(o.RuntimeName),
operator.WithSetAsDefault(o.SetAsDefault),
operator.WithRoot(o.RuntimeDir),
)
for name, runtime := range runtimes {
err := cfg.AddRuntime(name, runtime.Path, runtime.SetAsDefault)
if err != nil {
return fmt.Errorf("failed to update runtime %q: %v", name, err)
}
}
if o.EnableCDI {
cfg.EnableCDI()
}
return nil
}
// RevertConfig reverts the specified config to remove the nvidia runtimes
func (o Options) RevertConfig(cfg engine.Interface) error {
runtimes := operator.GetRuntimes(
operator.WithNvidiaRuntimeName(o.RuntimeName),
operator.WithSetAsDefault(o.SetAsDefault),
operator.WithRoot(o.RuntimeDir),
)
for name := range runtimes {
err := cfg.RemoveRuntime(name)
if err != nil {
return fmt.Errorf("failed to remove runtime %q: %v", name, err)
}
}
return nil
}
// Restart restarts the specified service
func (o Options) Restart(service string, withSignal func(string) error) error {
switch o.RestartMode {
case restartModeNone:
logrus.Warningf("Skipping restart of %v due to --restart-mode=%v", service, o.RestartMode)
return nil
case restartModeSignal:
return withSignal(o.Socket)
case restartModeSystemd:
return o.SystemdRestart(service)
}
return fmt.Errorf("invalid restart mode specified: %v", o.RestartMode)
}
// SystemdRestart restarts the specified service using systemd
func (o Options) SystemdRestart(service string) error {
var args []string
var msg string
if o.HostRootMount != "" {
msg = " on host"
args = append(args, "chroot", o.HostRootMount)
}
args = append(args, "systemctl", "restart", service)
logrus.Infof("Restarting %v%v using systemd: %v", service, msg, args)
//nolint:gosec // TODO: Can we harden this so that there is less risk of command injection
cmd := exec.Command(args[0], args[1:]...)
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
err := cmd.Run()
if err != nil {
return fmt.Errorf("error restarting %v using systemd: %v", service, err)
}
return nil
}

View File

@@ -0,0 +1,133 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package operator
import "path/filepath"
const (
defaultRuntimeName = "nvidia"
defaultRoot = "/usr/bin"
)
// Runtime defines a runtime to be configured.
// The path and whether the runtime is the default runtime can be specified
type Runtime struct {
name string
Path string
SetAsDefault bool
}
// Runtimes defines a set of runtimes to be configured for use in the GPU Operator
type Runtimes map[string]Runtime
type config struct {
root string
nvidiaRuntimeName string
setAsDefault bool
}
// GetRuntimes returns the set of runtimes to be configured for use with the GPU Operator.
func GetRuntimes(opts ...Option) Runtimes {
c := &config{}
for _, opt := range opts {
opt(c)
}
if c.root == "" {
c.root = defaultRoot
}
if c.nvidiaRuntimeName == "" {
c.nvidiaRuntimeName = defaultRuntimeName
}
runtimes := make(Runtimes)
runtimes.add(c.nvidiaRuntime())
modes := []string{"cdi", "legacy"}
for _, mode := range modes {
runtimes.add(c.modeRuntime(mode))
}
return runtimes
}
// DefaultRuntimeName returns the name of the default runtime.
func (r Runtimes) DefaultRuntimeName() string {
for _, runtime := range r {
if runtime.SetAsDefault {
return runtime.name
}
}
return ""
}
// Add a runtime to the set of runtimes.
func (r Runtimes) add(runtime Runtime) {
r[runtime.name] = runtime
}
// nvidiaRuntime creates a runtime that corresponds to the nvidia runtime.
// If name is equal to one of the predefined runtimes, `nvidia` is used as the runtime name instead.
func (c config) nvidiaRuntime() Runtime {
predefinedRuntimes := map[string]struct{}{
"nvidia-cdi": {},
"nvidia-legacy": {},
}
name := c.nvidiaRuntimeName
if _, isPredefinedRuntime := predefinedRuntimes[name]; isPredefinedRuntime {
name = defaultRuntimeName
}
return c.newRuntime(name, "nvidia-container-runtime")
}
// modeRuntime creates a runtime for the specified mode.
func (c config) modeRuntime(mode string) Runtime {
return c.newRuntime("nvidia-"+mode, "nvidia-container-runtime."+mode)
}
// newRuntime creates a runtime based on the configuration
func (c config) newRuntime(name string, binary string) Runtime {
return Runtime{
name: name,
Path: filepath.Join(c.root, binary),
SetAsDefault: c.setAsDefault && name == c.nvidiaRuntimeName,
}
}
// Option is a functional option for configuring set of runtimes.
type Option func(*config)
// WithRoot sets the root directory for the runtime binaries.
func WithRoot(root string) Option {
return func(c *config) {
c.root = root
}
}
// WithNvidiaRuntimeName sets the name of the nvidia runtime.
func WithNvidiaRuntimeName(name string) Option {
return func(c *config) {
c.nvidiaRuntimeName = name
}
}
// WithSetAsDefault sets the default runtime to the nvidia runtime.
func WithSetAsDefault(set bool) Option {
return func(c *config) {
c.setAsDefault = set
}
}

View File

@@ -0,0 +1,141 @@
/**
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package operator
import (
"fmt"
"testing"
"github.com/stretchr/testify/require"
)
func TestOptions(t *testing.T) {
testCases := []struct {
setAsDefault bool
nvidiaRuntimeName string
root string
expectedDefaultRuntime string
expectedRuntimes Runtimes
}{
{
expectedRuntimes: Runtimes{
"nvidia": Runtime{
name: "nvidia",
Path: "/usr/bin/nvidia-container-runtime",
},
"nvidia-cdi": Runtime{
name: "nvidia-cdi",
Path: "/usr/bin/nvidia-container-runtime.cdi",
},
"nvidia-legacy": Runtime{
name: "nvidia-legacy",
Path: "/usr/bin/nvidia-container-runtime.legacy",
},
},
},
{
setAsDefault: true,
expectedDefaultRuntime: "nvidia",
expectedRuntimes: Runtimes{
"nvidia": Runtime{
name: "nvidia",
Path: "/usr/bin/nvidia-container-runtime",
SetAsDefault: true,
},
"nvidia-cdi": Runtime{
name: "nvidia-cdi",
Path: "/usr/bin/nvidia-container-runtime.cdi",
},
"nvidia-legacy": Runtime{
name: "nvidia-legacy",
Path: "/usr/bin/nvidia-container-runtime.legacy",
},
},
},
{
setAsDefault: true,
nvidiaRuntimeName: "nvidia",
expectedDefaultRuntime: "nvidia",
expectedRuntimes: Runtimes{
"nvidia": Runtime{
name: "nvidia",
Path: "/usr/bin/nvidia-container-runtime",
SetAsDefault: true,
},
"nvidia-cdi": Runtime{
name: "nvidia-cdi",
Path: "/usr/bin/nvidia-container-runtime.cdi",
},
"nvidia-legacy": Runtime{
name: "nvidia-legacy",
Path: "/usr/bin/nvidia-container-runtime.legacy",
},
},
},
{
setAsDefault: true,
nvidiaRuntimeName: "NAME",
expectedDefaultRuntime: "NAME",
expectedRuntimes: Runtimes{
"NAME": Runtime{
name: "NAME",
Path: "/usr/bin/nvidia-container-runtime",
SetAsDefault: true,
},
"nvidia-cdi": Runtime{
name: "nvidia-cdi",
Path: "/usr/bin/nvidia-container-runtime.cdi",
},
"nvidia-legacy": Runtime{
name: "nvidia-legacy",
Path: "/usr/bin/nvidia-container-runtime.legacy",
},
},
},
{
setAsDefault: false,
nvidiaRuntimeName: "NAME",
expectedRuntimes: Runtimes{
"NAME": Runtime{
name: "NAME",
Path: "/usr/bin/nvidia-container-runtime",
},
"nvidia-cdi": Runtime{
name: "nvidia-cdi",
Path: "/usr/bin/nvidia-container-runtime.cdi",
},
"nvidia-legacy": Runtime{
name: "nvidia-legacy",
Path: "/usr/bin/nvidia-container-runtime.legacy",
},
},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
runtimes := GetRuntimes(
WithNvidiaRuntimeName(tc.nvidiaRuntimeName),
WithSetAsDefault(tc.setAsDefault),
WithRoot(tc.root),
)
require.EqualValues(t, tc.expectedRuntimes, runtimes)
require.Equal(t, tc.expectedDefaultRuntime, runtimes.DefaultRuntimeName())
})
}
}

View File

@@ -0,0 +1,584 @@
/**
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package containerd
import (
"fmt"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/containerd"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/toml"
)
func TestUpdateV1ConfigDefaultRuntime(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
legacyConfig bool
setAsDefault bool
runtimeName string
expectedDefaultRuntimeName interface{}
expectedDefaultRuntimeBinary interface{}
}{
{},
{
legacyConfig: true,
setAsDefault: false,
expectedDefaultRuntimeName: nil,
expectedDefaultRuntimeBinary: nil,
},
{
legacyConfig: true,
setAsDefault: true,
expectedDefaultRuntimeName: nil,
expectedDefaultRuntimeBinary: "/test/runtime/dir/nvidia-container-runtime",
},
{
legacyConfig: true,
setAsDefault: true,
runtimeName: "NAME",
expectedDefaultRuntimeName: nil,
expectedDefaultRuntimeBinary: "/test/runtime/dir/nvidia-container-runtime",
},
{
legacyConfig: false,
setAsDefault: false,
expectedDefaultRuntimeName: nil,
expectedDefaultRuntimeBinary: nil,
},
{
legacyConfig: false,
setAsDefault: true,
expectedDefaultRuntimeName: "nvidia",
expectedDefaultRuntimeBinary: nil,
},
{
legacyConfig: false,
setAsDefault: true,
runtimeName: "NAME",
expectedDefaultRuntimeName: "NAME",
expectedDefaultRuntimeBinary: nil,
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
SetAsDefault: tc.setAsDefault,
}
v1, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.Empty),
containerd.WithRuntimeType(runtimeType),
containerd.WithUseLegacyConfig(tc.legacyConfig),
containerd.WithConfigVersion(1),
)
require.NoError(t, err)
err = o.UpdateConfig(v1)
require.NoError(t, err)
cfg := v1.(*containerd.ConfigV1)
defaultRuntimeName := cfg.GetPath([]string{"plugins", "cri", "containerd", "default_runtime_name"})
require.EqualValues(t, tc.expectedDefaultRuntimeName, defaultRuntimeName)
defaultRuntime := cfg.GetPath([]string{"plugins", "cri", "containerd", "default_runtime"})
if tc.expectedDefaultRuntimeBinary == nil {
require.Nil(t, defaultRuntime)
} else {
require.NotNil(t, defaultRuntime)
expected, err := defaultRuntimeTomlConfigV1(tc.expectedDefaultRuntimeBinary.(string))
require.NoError(t, err)
configContents, _ := toml.Marshal(defaultRuntime.(*toml.Tree))
expectedContents, _ := toml.Marshal(expected)
require.Equal(t, string(expectedContents), string(configContents), "%d: %v: %v", i, tc)
}
})
}
}
func TestUpdateV1Config(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
runtimeName string
expectedConfig map[string]interface{}
}{
{
runtimeName: "nvidia",
expectedConfig: map[string]interface{}{
"version": int64(1),
"plugins": map[string]interface{}{
"cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
"Runtime": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.cdi",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.cdi",
},
},
"nvidia-legacy": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.legacy",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.legacy",
},
},
},
},
},
},
},
},
{
runtimeName: "NAME",
expectedConfig: map[string]interface{}{
"version": int64(1),
"plugins": map[string]interface{}{
"cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"NAME": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
"Runtime": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.cdi",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.cdi",
},
},
"nvidia-legacy": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.legacy",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.legacy",
},
},
},
},
},
},
},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
}
v1, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.Empty),
containerd.WithRuntimeType(runtimeType),
containerd.WithConfigVersion(1),
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
err = o.UpdateConfig(v1)
require.NoError(t, err)
expected, err := toml.TreeFromMap(tc.expectedConfig)
require.NoError(t, err)
require.Equal(t, expected.String(), v1.String())
})
}
}
func TestUpdateV1ConfigWithRuncPresent(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
runtimeName string
expectedConfig map[string]interface{}
}{
{
runtimeName: "nvidia",
expectedConfig: map[string]interface{}{
"version": int64(1),
"plugins": map[string]interface{}{
"cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"runc": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/runc-binary",
},
},
"nvidia": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
"Runtime": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.cdi",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.cdi",
},
},
"nvidia-legacy": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.legacy",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.legacy",
},
},
},
},
},
},
},
},
{
runtimeName: "NAME",
expectedConfig: map[string]interface{}{
"version": int64(1),
"plugins": map[string]interface{}{
"cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"runc": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/runc-binary",
},
},
"NAME": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
"Runtime": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.cdi",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.cdi",
},
},
"nvidia-legacy": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.legacy",
"Runtime": "/test/runtime/dir/nvidia-container-runtime.legacy",
},
},
},
},
},
},
},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
}
v1, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.FromMap(runcConfigMapV1("/runc-binary"))),
containerd.WithRuntimeType(runtimeType),
containerd.WithConfigVersion(1),
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
err = o.UpdateConfig(v1)
require.NoError(t, err)
expected, err := toml.TreeFromMap(tc.expectedConfig)
require.NoError(t, err)
require.Equal(t, expected.String(), v1.String())
})
}
}
func TestUpdateV1EnableCDI(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
enableCDI bool
expectedEnableCDIValue interface{}
}{
{},
{
enableCDI: false,
expectedEnableCDIValue: nil,
},
{
enableCDI: true,
expectedEnableCDIValue: true,
},
}
for _, tc := range testCases {
t.Run(fmt.Sprintf("%v", tc.enableCDI), func(t *testing.T) {
o := &container.Options{
EnableCDI: tc.enableCDI,
RuntimeName: "nvidia",
RuntimeDir: runtimeDir,
}
cfg, err := toml.Empty.Load()
require.NoError(t, err)
v1 := &containerd.ConfigV1{
Logger: logger,
Tree: cfg,
RuntimeType: runtimeType,
}
err = o.UpdateConfig(v1)
require.NoError(t, err)
enableCDIValue := v1.GetPath([]string{"plugins", "cri", "containerd", "enable_cdi"})
require.EqualValues(t, tc.expectedEnableCDIValue, enableCDIValue)
})
}
}
func TestRevertV1Config(t *testing.T) {
logger, _ := testlog.NewNullLogger()
testCases := []struct {
config map[string]interface {
}
expected map[string]interface{}
}{
{},
{
config: map[string]interface{}{
"version": int64(1),
},
},
{
config: map[string]interface{}{
"version": int64(1),
"plugins": map[string]interface{}{
"cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime"),
"nvidia-cdi": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.cdi"),
"nvidia-legacy": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.legacy"),
},
},
},
},
},
},
{
config: map[string]interface{}{
"version": int64(1),
"plugins": map[string]interface{}{
"cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime"),
"nvidia-cdi": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.cdi"),
"nvidia-legacy": runtimeMapV1("/test/runtime/dir/nvidia-container-runtime.legacy"),
},
"default_runtime": defaultRuntimeV1("/test/runtime/dir/nvidia-container-runtime"),
"default_runtime_name": "nvidia",
},
},
},
},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &container.Options{
RuntimeName: "nvidia",
}
expected, err := toml.TreeFromMap(tc.expected)
require.NoError(t, err)
v1, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.FromMap(tc.config)),
containerd.WithRuntimeType(runtimeType),
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
err = o.RevertConfig(v1)
require.NoError(t, err)
require.Equal(t, expected.String(), v1.String())
})
}
}
func defaultRuntimeTomlConfigV1(binary string) (*toml.Tree, error) {
return toml.TreeFromMap(defaultRuntimeV1(binary))
}
func defaultRuntimeV1(binary string) map[string]interface{} {
return map[string]interface{}{
"runtime_type": runtimeType,
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"options": map[string]interface{}{
"BinaryName": binary,
"Runtime": binary,
},
}
}
func runtimeMapV1(binary string) map[string]interface{} {
return map[string]interface{}{
"runtime_type": runtimeType,
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"options": map[string]interface{}{
"BinaryName": binary,
"Runtime": binary,
},
}
}
func runcConfigMapV1(binary string) map[string]interface{} {
return map[string]interface{}{
"plugins": map[string]interface{}{
"cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"runc": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": binary,
},
},
},
},
},
},
}
}

View File

@@ -0,0 +1,520 @@
/**
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package containerd
import (
"fmt"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/containerd"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/toml"
)
const (
runtimeType = "runtime_type"
)
func TestUpdateV2ConfigDefaultRuntime(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
setAsDefault bool
runtimeName string
expectedDefaultRuntimeName interface{}
}{
{},
{
setAsDefault: false,
runtimeName: "nvidia",
expectedDefaultRuntimeName: nil,
},
{
setAsDefault: false,
runtimeName: "NAME",
expectedDefaultRuntimeName: nil,
},
{
setAsDefault: true,
runtimeName: "nvidia",
expectedDefaultRuntimeName: "nvidia",
},
{
setAsDefault: true,
runtimeName: "NAME",
expectedDefaultRuntimeName: "NAME",
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
SetAsDefault: tc.setAsDefault,
}
v2, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.Empty),
containerd.WithRuntimeType(runtimeType),
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
err = o.UpdateConfig(v2)
require.NoError(t, err)
cfg := v2.(*containerd.Config)
defaultRuntimeName := cfg.GetPath([]string{"plugins", "io.containerd.grpc.v1.cri", "containerd", "default_runtime_name"})
require.EqualValues(t, tc.expectedDefaultRuntimeName, defaultRuntimeName)
})
}
}
func TestUpdateV2Config(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
runtimeName string
expectedConfig map[string]interface{}
}{
{
runtimeName: "nvidia",
expectedConfig: map[string]interface{}{
"version": int64(2),
"plugins": map[string]interface{}{
"io.containerd.grpc.v1.cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.cdi",
},
},
"nvidia-legacy": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.legacy",
},
},
},
},
},
},
},
},
{
runtimeName: "NAME",
expectedConfig: map[string]interface{}{
"version": int64(2),
"plugins": map[string]interface{}{
"io.containerd.grpc.v1.cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"NAME": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.cdi",
},
},
"nvidia-legacy": map[string]interface{}{
"runtime_type": "runtime_type",
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.legacy",
},
},
},
},
},
},
},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
}
v2, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.Empty),
containerd.WithRuntimeType(runtimeType),
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
err = o.UpdateConfig(v2)
require.NoError(t, err)
expected, err := toml.TreeFromMap(tc.expectedConfig)
require.NoError(t, err)
require.Equal(t, expected.String(), v2.String())
})
}
}
func TestUpdateV2ConfigWithRuncPresent(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
runtimeName string
expectedConfig map[string]interface{}
}{
{
runtimeName: "nvidia",
expectedConfig: map[string]interface{}{
"version": int64(2),
"plugins": map[string]interface{}{
"io.containerd.grpc.v1.cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"runc": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/runc-binary",
},
},
"nvidia": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.cdi",
},
},
"nvidia-legacy": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.legacy",
},
},
},
},
},
},
},
},
{
runtimeName: "NAME",
expectedConfig: map[string]interface{}{
"version": int64(2),
"plugins": map[string]interface{}{
"io.containerd.grpc.v1.cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"runc": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/runc-binary",
},
},
"NAME": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime",
},
},
"nvidia-cdi": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.cdi",
},
},
"nvidia-legacy": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"container_annotations": []string{"cdi.k8s.io/*"},
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": "/test/runtime/dir/nvidia-container-runtime.legacy",
},
},
},
},
},
},
},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
}
v2, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.FromMap(runcConfigMapV2("/runc-binary"))),
containerd.WithRuntimeType(runtimeType),
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
err = o.UpdateConfig(v2)
require.NoError(t, err)
expected, err := toml.TreeFromMap(tc.expectedConfig)
require.NoError(t, err)
require.Equal(t, expected.String(), v2.String())
})
}
}
func TestUpdateV2ConfigEnableCDI(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
enableCDI bool
expectedEnableCDIValue interface{}
}{
{},
{
enableCDI: false,
expectedEnableCDIValue: nil,
},
{
enableCDI: true,
expectedEnableCDIValue: true,
},
}
for _, tc := range testCases {
t.Run(fmt.Sprintf("%v", tc.enableCDI), func(t *testing.T) {
o := &container.Options{
EnableCDI: tc.enableCDI,
RuntimeName: "nvidia",
RuntimeDir: runtimeDir,
SetAsDefault: false,
}
cfg, err := toml.LoadMap(map[string]interface{}{})
require.NoError(t, err)
v2 := &containerd.Config{
Logger: logger,
Tree: cfg,
RuntimeType: runtimeType,
CRIRuntimePluginName: "io.containerd.grpc.v1.cri",
}
err = o.UpdateConfig(v2)
require.NoError(t, err)
enableCDIValue := cfg.GetPath([]string{"plugins", "io.containerd.grpc.v1.cri", "enable_cdi"})
require.EqualValues(t, tc.expectedEnableCDIValue, enableCDIValue)
})
}
}
func TestRevertV2Config(t *testing.T) {
logger, _ := testlog.NewNullLogger()
testCases := []struct {
config map[string]interface {
}
expected map[string]interface{}
}{
{},
{
config: map[string]interface{}{
"version": int64(2),
},
},
{
config: map[string]interface{}{
"version": int64(2),
"plugins": map[string]interface{}{
"io.containerd.grpc.v1.cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": runtimeMapV2("/test/runtime/dir/nvidia-container-runtime"),
},
},
},
},
},
},
{
config: map[string]interface{}{
"version": int64(2),
"plugins": map[string]interface{}{
"io.containerd.grpc.v1.cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": runtimeMapV2("/test/runtime/dir/nvidia-container-runtime"),
},
"default_runtime_name": "nvidia",
},
},
},
},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
o := &container.Options{
RuntimeName: "nvidia",
}
expected, err := toml.TreeFromMap(tc.expected)
require.NoError(t, err)
v2, err := containerd.New(
containerd.WithLogger(logger),
containerd.WithConfigSource(toml.FromMap(tc.config)),
containerd.WithRuntimeType(runtimeType),
containerd.WithContainerAnnotations("cdi.k8s.io/*"),
)
require.NoError(t, err)
err = o.RevertConfig(v2)
require.NoError(t, err)
require.Equal(t, expected.String(), v2.String())
})
}
}
func runtimeMapV2(binary string) map[string]interface{} {
return map[string]interface{}{
"runtime_type": runtimeType,
"runtime_root": "",
"runtime_engine": "",
"privileged_without_host_devices": false,
"options": map[string]interface{}{
"BinaryName": binary,
},
}
}
func runcConfigMapV2(binary string) map[string]interface{} {
return map[string]interface{}{
"version": 2,
"plugins": map[string]interface{}{
"io.containerd.grpc.v1.cri": map[string]interface{}{
"containerd": map[string]interface{}{
"runtimes": map[string]interface{}{
"runc": map[string]interface{}{
"runtime_type": "runc_runtime_type",
"runtime_root": "runc_runtime_root",
"runtime_engine": "runc_runtime_engine",
"privileged_without_host_devices": true,
"options": map[string]interface{}{
"runc-option": "value",
"BinaryName": binary,
},
},
},
},
},
},
}
}

View File

@@ -0,0 +1,184 @@
/**
# Copyright (c) 2020-2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package containerd
import (
"encoding/json"
"fmt"
log "github.com/sirupsen/logrus"
cli "github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/containerd"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/toml"
)
const (
Name = "containerd"
DefaultConfig = "/etc/containerd/config.toml"
DefaultSocket = "/run/containerd/containerd.sock"
DefaultRestartMode = "signal"
defaultRuntmeType = "io.containerd.runc.v2"
)
// Options stores the containerd-specific options
type Options struct {
useLegacyConfig bool
runtimeType string
ContainerRuntimeModesCDIAnnotationPrefixes cli.StringSlice
runtimeConfigOverrideJSON string
}
func Flags(opts *Options) []cli.Flag {
flags := []cli.Flag{
&cli.BoolFlag{
Name: "use-legacy-config",
Usage: "Specify whether a legacy (pre v1.3) config should be used. " +
"This ensures that a version 1 container config is created by default and that the " +
"containerd.runtimes.default_runtime config section is used to define the default " +
"runtime instead of container.default_runtime_name.",
Destination: &opts.useLegacyConfig,
EnvVars: []string{"CONTAINERD_USE_LEGACY_CONFIG"},
},
&cli.StringFlag{
Name: "runtime-type",
Usage: "The runtime_type to use for the configured runtime classes",
Value: defaultRuntmeType,
Destination: &opts.runtimeType,
EnvVars: []string{"CONTAINERD_RUNTIME_TYPE"},
},
&cli.StringSliceFlag{
Name: "nvidia-container-runtime-modes.cdi.annotation-prefixes",
Destination: &opts.ContainerRuntimeModesCDIAnnotationPrefixes,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_MODES_CDI_ANNOTATION_PREFIXES"},
},
&cli.StringFlag{
Name: "runtime-config-override",
Destination: &opts.runtimeConfigOverrideJSON,
Usage: "specify additional runtime options as a JSON string. The paths are relative to the runtime config.",
Value: "{}",
EnvVars: []string{"RUNTIME_CONFIG_OVERRIDE", "CONTAINERD_RUNTIME_CONFIG_OVERRIDE"},
},
}
return flags
}
// Setup updates a containerd configuration to include the nvidia-containerd-runtime and reloads it
func Setup(c *cli.Context, o *container.Options, co *Options) error {
log.Infof("Starting 'setup' for %v", c.App.Name)
cfg, err := getRuntimeConfig(o, co)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = o.Configure(cfg)
if err != nil {
return fmt.Errorf("unable to configure containerd: %v", err)
}
err = RestartContainerd(o)
if err != nil {
return fmt.Errorf("unable to restart containerd: %v", err)
}
log.Infof("Completed 'setup' for %v", c.App.Name)
return nil
}
// Cleanup reverts a containerd configuration to remove the nvidia-containerd-runtime and reloads it
func Cleanup(c *cli.Context, o *container.Options, co *Options) error {
log.Infof("Starting 'cleanup' for %v", c.App.Name)
cfg, err := getRuntimeConfig(o, co)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = o.Unconfigure(cfg)
if err != nil {
return fmt.Errorf("unable to unconfigure containerd: %v", err)
}
err = RestartContainerd(o)
if err != nil {
return fmt.Errorf("unable to restart containerd: %v", err)
}
log.Infof("Completed 'cleanup' for %v", c.App.Name)
return nil
}
// RestartContainerd restarts containerd depending on the value of restartModeFlag
func RestartContainerd(o *container.Options) error {
return o.Restart("containerd", SignalContainerd)
}
// containerAnnotationsFromCDIPrefixes returns the container annotations to set for the given CDI prefixes.
func (o *Options) containerAnnotationsFromCDIPrefixes() []string {
var annotations []string
for _, prefix := range o.ContainerRuntimeModesCDIAnnotationPrefixes.Value() {
annotations = append(annotations, prefix+"*")
}
return annotations
}
func (o *Options) runtimeConfigOverride() (map[string]interface{}, error) {
if o.runtimeConfigOverrideJSON == "" {
return nil, nil
}
runtimeOptions := make(map[string]interface{})
if err := json.Unmarshal([]byte(o.runtimeConfigOverrideJSON), &runtimeOptions); err != nil {
return nil, fmt.Errorf("failed to read %v as JSON: %w", o.runtimeConfigOverrideJSON, err)
}
return runtimeOptions, nil
}
func GetLowlevelRuntimePaths(o *container.Options, co *Options) ([]string, error) {
cfg, err := getRuntimeConfig(o, co)
if err != nil {
return nil, fmt.Errorf("unable to load containerd config: %w", err)
}
return engine.GetBinaryPathsForRuntimes(cfg), nil
}
func getRuntimeConfig(o *container.Options, co *Options) (engine.Interface, error) {
return containerd.New(
containerd.WithPath(o.Config),
containerd.WithConfigSource(
toml.LoadFirst(
containerd.CommandLineSource(o.HostRootMount),
toml.FromFile(o.Config),
),
),
containerd.WithRuntimeType(co.runtimeType),
containerd.WithUseLegacyConfig(co.useLegacyConfig),
containerd.WithContainerAnnotations(co.containerAnnotationsFromCDIPrefixes()...),
)
}

View File

@@ -0,0 +1,113 @@
/**
# Copyright 2020-2023 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package containerd
import (
"fmt"
"net"
"syscall"
"time"
log "github.com/sirupsen/logrus"
)
const (
reloadBackoff = 5 * time.Second
maxReloadAttempts = 6
socketMessageToGetPID = ""
)
// SignalContainerd sends a SIGHUP signal to the containerd daemon
func SignalContainerd(socket string) error {
log.Infof("Sending SIGHUP signal to containerd")
// Wrap the logic to perform the SIGHUP in a function so we can retry it on failure
retriable := func() error {
conn, err := net.Dial("unix", socket)
if err != nil {
return fmt.Errorf("unable to dial: %v", err)
}
defer conn.Close()
sconn, err := conn.(*net.UnixConn).SyscallConn()
if err != nil {
return fmt.Errorf("unable to get syscall connection: %v", err)
}
err1 := sconn.Control(func(fd uintptr) {
err = syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_PASSCRED, 1)
})
if err1 != nil {
return fmt.Errorf("unable to issue call on socket fd: %v", err1)
}
if err != nil {
return fmt.Errorf("unable to SetsockoptInt on socket fd: %v", err)
}
_, _, err = conn.(*net.UnixConn).WriteMsgUnix([]byte(socketMessageToGetPID), nil, nil)
if err != nil {
return fmt.Errorf("unable to WriteMsgUnix on socket fd: %v", err)
}
oob := make([]byte, 1024)
_, oobn, _, _, err := conn.(*net.UnixConn).ReadMsgUnix(nil, oob)
if err != nil {
return fmt.Errorf("unable to ReadMsgUnix on socket fd: %v", err)
}
oob = oob[:oobn]
scm, err := syscall.ParseSocketControlMessage(oob)
if err != nil {
return fmt.Errorf("unable to ParseSocketControlMessage from message received on socket fd: %v", err)
}
ucred, err := syscall.ParseUnixCredentials(&scm[0])
if err != nil {
return fmt.Errorf("unable to ParseUnixCredentials from message received on socket fd: %v", err)
}
err = syscall.Kill(int(ucred.Pid), syscall.SIGHUP)
if err != nil {
return fmt.Errorf("unable to send SIGHUP to 'containerd' process: %v", err)
}
return nil
}
// Try to send a SIGHUP up to maxReloadAttempts times
var err error
for i := 0; i < maxReloadAttempts; i++ {
err = retriable()
if err == nil {
break
}
if i == maxReloadAttempts-1 {
break
}
log.Warningf("Error signaling containerd, attempt %v/%v: %v", i+1, maxReloadAttempts, err)
time.Sleep(reloadBackoff)
}
if err != nil {
log.Warningf("Max retries reached %v/%v, aborting", maxReloadAttempts, maxReloadAttempts)
return err
}
log.Infof("Successfully signaled containerd")
return nil
}

View File

@@ -0,0 +1,29 @@
//go:build !linux
// +build !linux
/**
# Copyright 2023 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package containerd
import (
"errors"
)
// SignalContainerd is unsupported on non-linux platforms.
func SignalContainerd(socket string) error {
return errors.New("SignalContainerd is unsupported on non-linux platforms")
}

View File

@@ -0,0 +1,72 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package containerd
import (
"testing"
"github.com/stretchr/testify/require"
)
func TestRuntimeOptions(t *testing.T) {
testCases := []struct {
description string
options Options
expected map[string]interface{}
expectedError error
}{
{
description: "empty is nil",
},
{
description: "empty json",
options: Options{
runtimeConfigOverrideJSON: "{}",
},
expected: map[string]interface{}{},
expectedError: nil,
},
{
description: "SystemdCgroup is true",
options: Options{
runtimeConfigOverrideJSON: "{\"SystemdCgroup\": true}",
},
expected: map[string]interface{}{
"SystemdCgroup": true,
},
expectedError: nil,
},
{
description: "SystemdCgroup is false",
options: Options{
runtimeConfigOverrideJSON: "{\"SystemdCgroup\": false}",
},
expected: map[string]interface{}{
"SystemdCgroup": false,
},
expectedError: nil,
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
runtimeOptions, err := tc.options.runtimeConfigOverride()
require.ErrorIs(t, tc.expectedError, err)
require.EqualValues(t, tc.expected, runtimeOptions)
})
}
}

View File

@@ -0,0 +1,210 @@
/**
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package crio
import (
"fmt"
"os"
"path/filepath"
log "github.com/sirupsen/logrus"
cli "github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/crio"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/ocihook"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/toml"
)
const (
Name = "crio"
defaultConfigMode = "hook"
// Hook-based settings
defaultHooksDir = "/usr/share/containers/oci/hooks.d"
defaultHookFilename = "oci-nvidia-hook.json"
// Config-based settings
DefaultConfig = "/etc/crio/crio.conf"
DefaultSocket = "/var/run/crio/crio.sock"
DefaultRestartMode = "systemd"
)
// Options defines the cri-o specific options.
type Options struct {
configMode string
// hook-specific options
hooksDir string
hookFilename string
}
func Flags(opts *Options) []cli.Flag {
flags := []cli.Flag{
&cli.StringFlag{
Name: "hooks-dir",
Usage: "path to the cri-o hooks directory",
Value: defaultHooksDir,
Destination: &opts.hooksDir,
EnvVars: []string{"CRIO_HOOKS_DIR"},
DefaultText: defaultHooksDir,
},
&cli.StringFlag{
Name: "hook-filename",
Usage: "filename of the cri-o hook that will be created / removed in the hooks directory",
Value: defaultHookFilename,
Destination: &opts.hookFilename,
EnvVars: []string{"CRIO_HOOK_FILENAME"},
DefaultText: defaultHookFilename,
},
&cli.StringFlag{
Name: "config-mode",
Usage: "the configuration mode to use. One of [hook | config]",
Value: defaultConfigMode,
Destination: &opts.configMode,
EnvVars: []string{"CRIO_CONFIG_MODE"},
},
}
return flags
}
// Setup installs the prestart hook required to launch GPU-enabled containers
func Setup(c *cli.Context, o *container.Options, co *Options) error {
log.Infof("Starting 'setup' for %v", c.App.Name)
switch co.configMode {
case "hook":
return setupHook(o, co)
case "config":
return setupConfig(o)
default:
return fmt.Errorf("invalid config-mode '%v'", co.configMode)
}
}
// setupHook installs the prestart hook required to launch GPU-enabled containers
func setupHook(o *container.Options, co *Options) error {
log.Infof("Installing prestart hook")
hookPath := filepath.Join(co.hooksDir, co.hookFilename)
err := ocihook.CreateHook(hookPath, filepath.Join(o.RuntimeDir, config.NVIDIAContainerRuntimeHookExecutable))
if err != nil {
return fmt.Errorf("error creating hook: %v", err)
}
return nil
}
// setupConfig updates the cri-o config for the NVIDIA container runtime
func setupConfig(o *container.Options) error {
log.Infof("Updating config file")
cfg, err := getRuntimeConfig(o)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = o.Configure(cfg)
if err != nil {
return fmt.Errorf("unable to configure cri-o: %v", err)
}
err = RestartCrio(o)
if err != nil {
return fmt.Errorf("unable to restart crio: %v", err)
}
return nil
}
// Cleanup removes the specified prestart hook
func Cleanup(c *cli.Context, o *container.Options, co *Options) error {
log.Infof("Starting 'cleanup' for %v", c.App.Name)
switch co.configMode {
case "hook":
return cleanupHook(co)
case "config":
return cleanupConfig(o)
default:
return fmt.Errorf("invalid config-mode '%v'", co.configMode)
}
}
// cleanupHook removes the prestart hook
func cleanupHook(co *Options) error {
log.Infof("Removing prestart hook")
hookPath := filepath.Join(co.hooksDir, co.hookFilename)
err := os.Remove(hookPath)
if err != nil {
return fmt.Errorf("error removing hook '%v': %v", hookPath, err)
}
return nil
}
// cleanupConfig removes the NVIDIA container runtime from the cri-o config
func cleanupConfig(o *container.Options) error {
log.Infof("Reverting config file modifications")
cfg, err := getRuntimeConfig(o)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = o.Unconfigure(cfg)
if err != nil {
return fmt.Errorf("unable to unconfigure cri-o: %v", err)
}
err = RestartCrio(o)
if err != nil {
return fmt.Errorf("unable to restart crio: %v", err)
}
return nil
}
// RestartCrio restarts crio depending on the value of restartModeFlag
func RestartCrio(o *container.Options) error {
return o.Restart("crio", func(string) error { return fmt.Errorf("supporting crio via signal is unsupported") })
}
func GetLowlevelRuntimePaths(o *container.Options) ([]string, error) {
cfg, err := getRuntimeConfig(o)
if err != nil {
return nil, fmt.Errorf("unable to load crio config: %w", err)
}
return engine.GetBinaryPathsForRuntimes(cfg), nil
}
func getRuntimeConfig(o *container.Options) (engine.Interface, error) {
return crio.New(
crio.WithPath(o.Config),
crio.WithConfigSource(
toml.LoadFirst(
crio.CommandLineSource(o.HostRootMount),
toml.FromFile(o.Config),
),
),
)
}

View File

@@ -0,0 +1,111 @@
/**
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package docker
import (
"fmt"
log "github.com/sirupsen/logrus"
cli "github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/docker"
)
const (
Name = "docker"
DefaultConfig = "/etc/docker/daemon.json"
DefaultSocket = "/var/run/docker.sock"
DefaultRestartMode = "signal"
)
type Options struct{}
func Flags(opts *Options) []cli.Flag {
return nil
}
// Setup updates docker configuration to include the nvidia runtime and reloads it
func Setup(c *cli.Context, o *container.Options) error {
log.Infof("Starting 'setup' for %v", c.App.Name)
cfg, err := getRuntimeConfig(o)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = o.Configure(cfg)
if err != nil {
return fmt.Errorf("unable to configure docker: %v", err)
}
err = RestartDocker(o)
if err != nil {
return fmt.Errorf("unable to restart docker: %v", err)
}
log.Infof("Completed 'setup' for %v", c.App.Name)
return nil
}
// Cleanup reverts docker configuration to remove the nvidia runtime and reloads it
func Cleanup(c *cli.Context, o *container.Options) error {
log.Infof("Starting 'cleanup' for %v", c.App.Name)
cfg, err := getRuntimeConfig(o)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = o.Unconfigure(cfg)
if err != nil {
return fmt.Errorf("unable to unconfigure docker: %v", err)
}
err = RestartDocker(o)
if err != nil {
return fmt.Errorf("unable to signal docker: %v", err)
}
log.Infof("Completed 'cleanup' for %v", c.App.Name)
return nil
}
// RestartDocker restarts docker depending on the value of restartModeFlag
func RestartDocker(o *container.Options) error {
return o.Restart("docker", SignalDocker)
}
func GetLowlevelRuntimePaths(o *container.Options) ([]string, error) {
cfg, err := docker.New(
docker.WithPath(o.Config),
)
if err != nil {
return nil, fmt.Errorf("unable to load docker config: %w", err)
}
return engine.GetBinaryPathsForRuntimes(cfg), nil
}
func getRuntimeConfig(o *container.Options) (engine.Interface, error) {
return docker.New(
docker.WithPath(o.Config),
)
}

View File

@@ -0,0 +1,113 @@
/**
# Copyright 2021-2023 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package docker
import (
"fmt"
"net"
"syscall"
"time"
log "github.com/sirupsen/logrus"
)
const (
reloadBackoff = 5 * time.Second
maxReloadAttempts = 6
socketMessageToGetPID = "GET /info HTTP/1.0\r\n\r\n"
)
// SignalDocker sends a SIGHUP signal to docker daemon
func SignalDocker(socket string) error {
log.Infof("Sending SIGHUP signal to docker")
// Wrap the logic to perform the SIGHUP in a function so we can retry it on failure
retriable := func() error {
conn, err := net.Dial("unix", socket)
if err != nil {
return fmt.Errorf("unable to dial: %v", err)
}
defer conn.Close()
sconn, err := conn.(*net.UnixConn).SyscallConn()
if err != nil {
return fmt.Errorf("unable to get syscall connection: %v", err)
}
err1 := sconn.Control(func(fd uintptr) {
err = syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_PASSCRED, 1)
})
if err1 != nil {
return fmt.Errorf("unable to issue call on socket fd: %v", err1)
}
if err != nil {
return fmt.Errorf("unable to SetsockoptInt on socket fd: %v", err)
}
_, _, err = conn.(*net.UnixConn).WriteMsgUnix([]byte(socketMessageToGetPID), nil, nil)
if err != nil {
return fmt.Errorf("unable to WriteMsgUnix on socket fd: %v", err)
}
oob := make([]byte, 1024)
_, oobn, _, _, err := conn.(*net.UnixConn).ReadMsgUnix(nil, oob)
if err != nil {
return fmt.Errorf("unable to ReadMsgUnix on socket fd: %v", err)
}
oob = oob[:oobn]
scm, err := syscall.ParseSocketControlMessage(oob)
if err != nil {
return fmt.Errorf("unable to ParseSocketControlMessage from message received on socket fd: %v", err)
}
ucred, err := syscall.ParseUnixCredentials(&scm[0])
if err != nil {
return fmt.Errorf("unable to ParseUnixCredentials from message received on socket fd: %v", err)
}
err = syscall.Kill(int(ucred.Pid), syscall.SIGHUP)
if err != nil {
return fmt.Errorf("unable to send SIGHUP to 'docker' process: %v", err)
}
return nil
}
// Try to send a SIGHUP up to maxReloadAttempts times
var err error
for i := 0; i < maxReloadAttempts; i++ {
err = retriable()
if err == nil {
break
}
if i == maxReloadAttempts-1 {
break
}
log.Warningf("Error signaling docker, attempt %v/%v: %v", i+1, maxReloadAttempts, err)
time.Sleep(reloadBackoff)
}
if err != nil {
log.Warningf("Max retries reached %v/%v, aborting", maxReloadAttempts, maxReloadAttempts)
return err
}
log.Infof("Successfully signaled docker")
return nil
}

View File

@@ -0,0 +1,29 @@
//go:build !linux
// +build !linux
/**
# Copyright 2023 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package docker
import (
"errors"
)
// SignalDocker is unsupported on non-linux platforms.
func SignalDocker(socket string) error {
return errors.New("SignalDocker is unsupported on non-linux platforms")
}

View File

@@ -14,13 +14,16 @@
# limitations under the License.
*/
package main
package docker
import (
"encoding/json"
"testing"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/docker"
)
func TestUpdateConfigDefaultRuntime(t *testing.T) {
@@ -41,11 +44,6 @@ func TestUpdateConfigDefaultRuntime(t *testing.T) {
runtimeName: "NAME",
expectedDefaultRuntimeName: "NAME",
},
{
setAsDefault: true,
runtimeName: "nvidia-experimental",
expectedDefaultRuntimeName: "nvidia-experimental",
},
{
setAsDefault: true,
runtimeName: "nvidia",
@@ -54,15 +52,15 @@ func TestUpdateConfigDefaultRuntime(t *testing.T) {
}
for i, tc := range testCases {
o := &options{
setAsDefault: tc.setAsDefault,
runtimeName: tc.runtimeName,
runtimeDir: runtimeDir,
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
SetAsDefault: tc.setAsDefault,
}
config := map[string]interface{}{}
config := docker.Config(map[string]interface{}{})
err := UpdateConfig(config, o)
err := o.UpdateConfig(&config)
require.NoError(t, err, "%d: %v", i, tc)
defaultRuntimeName := config["default-runtime"]
@@ -74,7 +72,7 @@ func TestUpdateConfig(t *testing.T) {
const runtimeDir = "/test/runtime/dir"
testCases := []struct {
config map[string]interface{}
config docker.Config
setAsDefault bool
runtimeName string
expectedConfig map[string]interface{}
@@ -88,8 +86,12 @@ func TestUpdateConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime-experimental",
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
},
"nvidia-legacy": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.legacy",
"args": []string{},
},
},
@@ -105,25 +107,12 @@ func TestUpdateConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime-experimental",
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
},
},
},
},
{
config: map[string]interface{}{},
setAsDefault: false,
runtimeName: "nvidia-experimental",
expectedConfig: map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime-experimental",
"nvidia-legacy": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.legacy",
"args": []string{},
},
},
@@ -145,8 +134,12 @@ func TestUpdateConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime-experimental",
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
},
"nvidia-legacy": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.legacy",
"args": []string{},
},
},
@@ -171,8 +164,12 @@ func TestUpdateConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime-experimental",
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
},
"nvidia-legacy": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.legacy",
"args": []string{},
},
},
@@ -191,28 +188,12 @@ func TestUpdateConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime-experimental",
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
},
},
},
},
{
config: map[string]interface{}{
"default-runtime": "runc",
},
setAsDefault: true,
runtimeName: "nvidia-experimental",
expectedConfig: map[string]interface{}{
"default-runtime": "nvidia-experimental",
"runtimes": map[string]interface{}{
"nvidia": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime-experimental",
"nvidia-legacy": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.legacy",
"args": []string{},
},
},
@@ -239,8 +220,12 @@ func TestUpdateConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime-experimental",
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
},
"nvidia-legacy": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.legacy",
"args": []string{},
},
},
@@ -249,12 +234,15 @@ func TestUpdateConfig(t *testing.T) {
}
for i, tc := range testCases {
options := &options{
setAsDefault: tc.setAsDefault,
runtimeName: tc.runtimeName,
runtimeDir: runtimeDir,
tc := tc
o := &container.Options{
RuntimeName: tc.runtimeName,
RuntimeDir: runtimeDir,
SetAsDefault: tc.setAsDefault,
}
err := UpdateConfig(tc.config, options)
err := o.UpdateConfig(&tc.config)
require.NoError(t, err, "%d: %v", i, tc)
configContent, err := json.MarshalIndent(tc.config, "", " ")
@@ -269,7 +257,7 @@ func TestUpdateConfig(t *testing.T) {
func TestRevertConfig(t *testing.T) {
testCases := []struct {
config map[string]interface{}
config docker.Config
expectedConfig map[string]interface{}
}{
{
@@ -290,7 +278,7 @@ func TestRevertConfig(t *testing.T) {
{
config: map[string]interface{}{
"runtimes": map[string]interface{}{
"nvidia-experimental": map[string]interface{}{
"nvidia": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
@@ -305,8 +293,12 @@ func TestRevertConfig(t *testing.T) {
"path": "/test/runtime/dir/nvidia-container-runtime",
"args": []string{},
},
"nvidia-experimental": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime-experimental",
"nvidia-cdi": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.cdi",
"args": []string{},
},
"nvidia-legacy": map[string]interface{}{
"path": "/test/runtime/dir/nvidia-container-runtime.legacy",
"args": []string{},
},
},
@@ -368,7 +360,9 @@ func TestRevertConfig(t *testing.T) {
}
for i, tc := range testCases {
err := RevertConfig(tc.config)
tc := tc
o := &container.Options{}
err := o.RevertConfig(&tc.config)
require.NoError(t, err, "%d: %v", i, tc)
@@ -381,43 +375,3 @@ func TestRevertConfig(t *testing.T) {
require.EqualValues(t, string(expectedContent), string(configContent), "%d: %v", i, tc)
}
}
func TestFlagsDefaultRuntime(t *testing.T) {
testCases := []struct {
setAsDefault bool
runtimeName string
expected string
}{
{
expected: "",
},
{
runtimeName: "not-bool",
expected: "",
},
{
setAsDefault: false,
runtimeName: "nvidia",
expected: "",
},
{
setAsDefault: true,
runtimeName: "nvidia",
expected: "nvidia",
},
{
setAsDefault: true,
runtimeName: "nvidia-experimental",
expected: "nvidia-experimental",
},
}
for i, tc := range testCases {
f := options{
setAsDefault: tc.setAsDefault,
runtimeName: tc.runtimeName,
}
require.Equal(t, tc.expected, f.getDefaultRuntime(), "%d: %v", i, tc)
}
}

View File

@@ -0,0 +1,192 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package runtime
import (
"fmt"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/runtime/containerd"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/runtime/crio"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/runtime/docker"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/toolkit"
)
const (
defaultSetAsDefault = true
// defaultRuntimeName specifies the NVIDIA runtime to be use as the default runtime if setting the default runtime is enabled
defaultRuntimeName = "nvidia"
defaultHostRootMount = "/host"
runtimeSpecificDefault = "RUNTIME_SPECIFIC_DEFAULT"
)
type Options struct {
container.Options
containerdOptions containerd.Options
crioOptions crio.Options
}
func Flags(opts *Options) []cli.Flag {
flags := []cli.Flag{
&cli.StringFlag{
Name: "config",
Usage: "Path to the runtime config file",
Value: runtimeSpecificDefault,
Destination: &opts.Config,
EnvVars: []string{"RUNTIME_CONFIG", "CONTAINERD_CONFIG", "DOCKER_CONFIG"},
},
&cli.StringFlag{
Name: "socket",
Usage: "Path to the runtime socket file",
Value: runtimeSpecificDefault,
Destination: &opts.Socket,
EnvVars: []string{"RUNTIME_SOCKET", "CONTAINERD_SOCKET", "DOCKER_SOCKET"},
},
&cli.StringFlag{
Name: "restart-mode",
Usage: "Specify how the runtime should be restarted; If 'none' is selected it will not be restarted [signal | systemd | none ]",
Value: runtimeSpecificDefault,
Destination: &opts.RestartMode,
EnvVars: []string{"RUNTIME_RESTART_MODE"},
},
&cli.BoolFlag{
Name: "enable-cdi-in-runtime",
Usage: "Enable CDI in the configured runt ime",
Destination: &opts.EnableCDI,
EnvVars: []string{"RUNTIME_ENABLE_CDI"},
},
&cli.StringFlag{
Name: "host-root",
Usage: "Specify the path to the host root to be used when restarting the runtime using systemd",
Value: defaultHostRootMount,
Destination: &opts.HostRootMount,
EnvVars: []string{"HOST_ROOT_MOUNT"},
},
&cli.StringFlag{
Name: "runtime-name",
Aliases: []string{"nvidia-runtime-name", "runtime-class"},
Usage: "Specify the name of the `nvidia` runtime. If set-as-default is selected, the runtime is used as the default runtime.",
Value: defaultRuntimeName,
Destination: &opts.RuntimeName,
EnvVars: []string{"NVIDIA_RUNTIME_NAME", "CONTAINERD_RUNTIME_CLASS", "DOCKER_RUNTIME_NAME"},
},
&cli.BoolFlag{
Name: "set-as-default",
Usage: "Set the `nvidia` runtime as the default runtime.",
Value: defaultSetAsDefault,
Destination: &opts.SetAsDefault,
EnvVars: []string{"NVIDIA_RUNTIME_SET_AS_DEFAULT", "CONTAINERD_SET_AS_DEFAULT", "DOCKER_SET_AS_DEFAULT"},
Hidden: true,
},
}
flags = append(flags, containerd.Flags(&opts.containerdOptions)...)
flags = append(flags, crio.Flags(&opts.crioOptions)...)
return flags
}
// ValidateOptions checks whether the specified options are valid
func ValidateOptions(c *cli.Context, opts *Options, runtime string, toolkitRoot string, to *toolkit.Options) error {
// We set this option here to ensure that it is available in future calls.
opts.RuntimeDir = toolkitRoot
if !c.IsSet("enable-cdi-in-runtime") {
opts.EnableCDI = to.CDI.Enabled
}
// Apply the runtime-specific config changes.
switch runtime {
case containerd.Name:
if opts.Config == runtimeSpecificDefault {
opts.Config = containerd.DefaultConfig
}
if opts.Socket == runtimeSpecificDefault {
opts.Socket = containerd.DefaultSocket
}
if opts.RestartMode == runtimeSpecificDefault {
opts.RestartMode = containerd.DefaultRestartMode
}
case crio.Name:
if opts.Config == runtimeSpecificDefault {
opts.Config = crio.DefaultConfig
}
if opts.Socket == runtimeSpecificDefault {
opts.Socket = crio.DefaultSocket
}
if opts.RestartMode == runtimeSpecificDefault {
opts.RestartMode = crio.DefaultRestartMode
}
case docker.Name:
if opts.Config == runtimeSpecificDefault {
opts.Config = docker.DefaultConfig
}
if opts.Socket == runtimeSpecificDefault {
opts.Socket = docker.DefaultSocket
}
if opts.RestartMode == runtimeSpecificDefault {
opts.RestartMode = docker.DefaultRestartMode
}
default:
return fmt.Errorf("undefined runtime %v", runtime)
}
return nil
}
func Setup(c *cli.Context, opts *Options, runtime string) error {
switch runtime {
case containerd.Name:
return containerd.Setup(c, &opts.Options, &opts.containerdOptions)
case crio.Name:
return crio.Setup(c, &opts.Options, &opts.crioOptions)
case docker.Name:
return docker.Setup(c, &opts.Options)
default:
return fmt.Errorf("undefined runtime %v", runtime)
}
}
func Cleanup(c *cli.Context, opts *Options, runtime string) error {
switch runtime {
case containerd.Name:
return containerd.Cleanup(c, &opts.Options, &opts.containerdOptions)
case crio.Name:
return crio.Cleanup(c, &opts.Options, &opts.crioOptions)
case docker.Name:
return docker.Cleanup(c, &opts.Options)
default:
return fmt.Errorf("undefined runtime %v", runtime)
}
}
func GetLowlevelRuntimePaths(opts *Options, runtime string) ([]string, error) {
switch runtime {
case containerd.Name:
return containerd.GetLowlevelRuntimePaths(&opts.Options, &opts.containerdOptions)
case crio.Name:
return crio.GetLowlevelRuntimePaths(&opts.Options)
case docker.Name:
return docker.GetLowlevelRuntimePaths(&opts.Options)
default:
return nil, fmt.Errorf("undefined runtime %v", runtime)
}
}

View File

@@ -14,7 +14,7 @@
# limitations under the License.
*/
package main
package toolkit
import (
"fmt"
@@ -23,8 +23,6 @@ import (
"path/filepath"
"sort"
"strings"
log "github.com/sirupsen/logrus"
)
type executableTarget struct {
@@ -33,6 +31,7 @@ type executableTarget struct {
}
type executable struct {
fileInstaller
source string
target executableTarget
env map[string]string
@@ -43,21 +42,21 @@ type executable struct {
// install installs an executable component of the NVIDIA container toolkit. The source executable
// is copied to a `.real` file and a wapper is created to set up the environment as required.
func (e executable) install(destFolder string) (string, error) {
log.Infof("Installing executable '%v' to %v", e.source, destFolder)
e.logger.Infof("Installing executable '%v' to %v", e.source, destFolder)
dotfileName := e.dotfileName()
installedDotfileName, err := installFileToFolderWithName(destFolder, dotfileName, e.source)
installedDotfileName, err := e.installFileToFolderWithName(destFolder, dotfileName, e.source)
if err != nil {
return "", fmt.Errorf("error installing file '%v' as '%v': %v", e.source, dotfileName, err)
}
log.Infof("Installed '%v'", installedDotfileName)
e.logger.Infof("Installed '%v'", installedDotfileName)
wrapperFilename, err := e.installWrapper(destFolder, installedDotfileName)
if err != nil {
return "", fmt.Errorf("error wrapping '%v': %v", installedDotfileName, err)
}
log.Infof("Installed wrapper '%v'", wrapperFilename)
e.logger.Infof("Installed wrapper '%v'", wrapperFilename)
return wrapperFilename, nil
}

View File

@@ -14,7 +14,7 @@
# limitations under the License.
*/
package main
package toolkit
import (
"bytes"
@@ -23,10 +23,13 @@ import (
"strings"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
)
func TestWrapper(t *testing.T) {
logger, _ := testlog.NewNullLogger()
const shebang = "#! /bin/sh"
const destFolder = "/dest/folder"
const dotfileName = "source.real"
@@ -98,6 +101,8 @@ func TestWrapper(t *testing.T) {
for i, tc := range testCases {
buf := &bytes.Buffer{}
tc.e.logger = logger
err := tc.e.writeWrapperTo(buf, destFolder, dotfileName)
require.NoError(t, err)
@@ -107,6 +112,8 @@ func TestWrapper(t *testing.T) {
}
func TestInstallExecutable(t *testing.T) {
logger, _ := testlog.NewNullLogger()
inputFolder, err := os.MkdirTemp("", "")
require.NoError(t, err)
defer os.RemoveAll(inputFolder)
@@ -121,6 +128,9 @@ func TestInstallExecutable(t *testing.T) {
require.NoError(t, sourceFile.Close())
e := executable{
fileInstaller: fileInstaller{
logger: logger,
},
source: source,
target: executableTarget{
dotfileName: "input.real",

View File

@@ -0,0 +1,95 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package toolkit
import (
"fmt"
"io"
"os"
"path/filepath"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type fileInstaller struct {
logger logger.Interface
// sourceRoot specifies the root that is searched for the components to install.
sourceRoot string
}
// installFileToFolder copies a source file to a destination folder.
// The path of the input file is ignored.
// e.g. installFileToFolder("/some/path/file.txt", "/output/path")
// will result in a file "/output/path/file.txt" being generated
func (t *fileInstaller) installFileToFolder(destFolder string, src string) (string, error) {
name := filepath.Base(src)
return t.installFileToFolderWithName(destFolder, name, src)
}
// cp src destFolder/name
func (t *fileInstaller) installFileToFolderWithName(destFolder string, name, src string) (string, error) {
dest := filepath.Join(destFolder, name)
err := t.installFile(dest, src)
if err != nil {
return "", fmt.Errorf("error copying '%v' to '%v': %v", src, dest, err)
}
return dest, nil
}
// installFile copies a file from src to dest and maintains
// file modes
func (t *fileInstaller) installFile(dest string, src string) error {
src = filepath.Join(t.sourceRoot, src)
t.logger.Infof("Installing '%v' to '%v'", src, dest)
source, err := os.Open(src)
if err != nil {
return fmt.Errorf("error opening source: %v", err)
}
defer source.Close()
destination, err := os.Create(dest)
if err != nil {
return fmt.Errorf("error creating destination: %v", err)
}
defer destination.Close()
_, err = io.Copy(destination, source)
if err != nil {
return fmt.Errorf("error copying file: %v", err)
}
err = applyModeFromSource(dest, src)
if err != nil {
return fmt.Errorf("error setting destination file mode: %v", err)
}
return nil
}
// applyModeFromSource sets the file mode for a destination file
// to match that of a specified source file
func applyModeFromSource(dest string, src string) error {
sourceInfo, err := os.Stat(src)
if err != nil {
return fmt.Errorf("error getting file info for '%v': %v", src, err)
}
err = os.Chmod(dest, sourceInfo.Mode())
if err != nil {
return fmt.Errorf("error setting mode for '%v': %v", dest, err)
}
return nil
}

View File

@@ -0,0 +1,40 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package toolkit
import "github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
// An Option provides a mechanism to configure an Installer.
type Option func(*Installer)
func WithLogger(logger logger.Interface) Option {
return func(i *Installer) {
i.logger = logger
}
}
func WithToolkitRoot(toolkitRoot string) Option {
return func(i *Installer) {
i.toolkitRoot = toolkitRoot
}
}
func WithSourceRoot(sourceRoot string) Option {
return func(i *Installer) {
i.sourceRoot = sourceRoot
}
}

View File

@@ -14,7 +14,7 @@
# limitations under the License.
*/
package main
package toolkit
import "strings"

View File

@@ -0,0 +1,85 @@
/**
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package toolkit
import (
"fmt"
"path/filepath"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/operator"
)
const (
nvidiaContainerRuntimeSource = "/usr/bin/nvidia-container-runtime"
)
// installContainerRuntimes sets up the NVIDIA container runtimes, copying the executables
// and implementing the required wrapper
func (t *Installer) installContainerRuntimes(toolkitDir string) error {
runtimes := operator.GetRuntimes()
for _, runtime := range runtimes {
r := t.newNvidiaContainerRuntimeInstaller(runtime.Path)
_, err := r.install(toolkitDir)
if err != nil {
return fmt.Errorf("error installing NVIDIA container runtime: %v", err)
}
}
return nil
}
// newNVidiaContainerRuntimeInstaller returns a new executable installer for the NVIDIA container runtime.
// This installer will copy the specified source executable to the toolkit directory.
// The executable is copied to a file with the same name as the source, but with a ".real" suffix and a wrapper is
// created to allow for the configuration of the runtime environment.
func (t *Installer) newNvidiaContainerRuntimeInstaller(source string) *executable {
wrapperName := filepath.Base(source)
dotfileName := wrapperName + ".real"
target := executableTarget{
dotfileName: dotfileName,
wrapperName: wrapperName,
}
return t.newRuntimeInstaller(source, target, nil)
}
func (t *Installer) newRuntimeInstaller(source string, target executableTarget, env map[string]string) *executable {
preLines := []string{
"",
"cat /proc/modules | grep -e \"^nvidia \" >/dev/null 2>&1",
"if [ \"${?}\" != \"0\" ]; then",
" echo \"nvidia driver modules are not yet loaded, invoking runc directly\"",
" exec runc \"$@\"",
"fi",
"",
}
runtimeEnv := make(map[string]string)
runtimeEnv["XDG_CONFIG_HOME"] = filepath.Join(destDirPattern, ".config")
for k, v := range env {
runtimeEnv[k] = v
}
r := executable{
fileInstaller: t.fileInstaller,
source: source,
target: target,
env: runtimeEnv,
preLines: preLines,
}
return &r
}

View File

@@ -14,18 +14,25 @@
# limitations under the License.
*/
package main
package toolkit
import (
"bytes"
"strings"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
)
func TestNvidiaContainerRuntimeInstallerWrapper(t *testing.T) {
r := newNvidiaContainerRuntimeInstaller()
logger, _ := testlog.NewNullLogger()
i := Installer{
fileInstaller: fileInstaller{
logger: logger,
},
}
r := i.newNvidiaContainerRuntimeInstaller(nvidiaContainerRuntimeSource)
const shebang = "#! /bin/sh"
const destFolder = "/dest/folder"
@@ -55,36 +62,3 @@ func TestNvidiaContainerRuntimeInstallerWrapper(t *testing.T) {
exepectedContents := strings.Join(expectedLines, "\n")
require.Equal(t, exepectedContents, buf.String())
}
func TestExperimentalContainerRuntimeInstallerWrapper(t *testing.T) {
r := newNvidiaContainerRuntimeExperimentalInstaller("/some/root/usr/lib64")
const shebang = "#! /bin/sh"
const destFolder = "/dest/folder"
const dotfileName = "source.real"
buf := &bytes.Buffer{}
err := r.writeWrapperTo(buf, destFolder, dotfileName)
require.NoError(t, err)
expectedLines := []string{
shebang,
"",
"cat /proc/modules | grep -e \"^nvidia \" >/dev/null 2>&1",
"if [ \"${?}\" != \"0\" ]; then",
" echo \"nvidia driver modules are not yet loaded, invoking runc directly\"",
" exec runc \"$@\"",
"fi",
"",
"LD_LIBRARY_PATH=/some/root/usr/lib64:$LD_LIBRARY_PATH \\",
"PATH=/dest/folder:$PATH \\",
"XDG_CONFIG_HOME=/dest/folder/.config \\",
"source.real \\",
"\t\"$@\"",
"",
}
exepectedContents := strings.Join(expectedLines, "\n")
require.Equal(t, exepectedContents, buf.String())
}

View File

@@ -0,0 +1,748 @@
/**
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package toolkit
import (
"fmt"
"os"
"path/filepath"
"strings"
"github.com/urfave/cli/v2"
"tags.cncf.io/container-device-interface/pkg/cdi"
"tags.cncf.io/container-device-interface/pkg/parser"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/system/nvdevices"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi"
transformroot "github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi/transform/root"
)
const (
// DefaultNvidiaDriverRoot specifies the default NVIDIA driver run directory
DefaultNvidiaDriverRoot = "/run/nvidia/driver"
nvidiaContainerCliSource = "/usr/bin/nvidia-container-cli"
nvidiaContainerRuntimeHookSource = "/usr/bin/nvidia-container-runtime-hook"
nvidiaContainerToolkitConfigSource = "/etc/nvidia-container-runtime/config.toml"
configFilename = "config.toml"
)
type cdiOptions struct {
Enabled bool
outputDir string
kind string
vendor string
class string
}
type Options struct {
DriverRoot string
DevRoot string
DriverRootCtrPath string
DevRootCtrPath string
ContainerRuntimeMode string
ContainerRuntimeDebug string
ContainerRuntimeLogLevel string
ContainerRuntimeModesCdiDefaultKind string
ContainerRuntimeModesCDIAnnotationPrefixes cli.StringSlice
ContainerRuntimeRuntimes cli.StringSlice
ContainerRuntimeHookSkipModeDetection bool
ContainerCLIDebug string
// CDI stores the CDI options for the toolkit.
CDI cdiOptions
createDeviceNodes cli.StringSlice
acceptNVIDIAVisibleDevicesWhenUnprivileged bool
acceptNVIDIAVisibleDevicesAsVolumeMounts bool
ignoreErrors bool
optInFeatures cli.StringSlice
}
func Flags(opts *Options) []cli.Flag {
flags := []cli.Flag{
&cli.StringFlag{
Name: "driver-root",
Aliases: []string{"nvidia-driver-root"},
Value: DefaultNvidiaDriverRoot,
Destination: &opts.DriverRoot,
EnvVars: []string{"NVIDIA_DRIVER_ROOT", "DRIVER_ROOT"},
},
&cli.StringFlag{
Name: "driver-root-ctr-path",
Value: DefaultNvidiaDriverRoot,
Destination: &opts.DriverRootCtrPath,
EnvVars: []string{"DRIVER_ROOT_CTR_PATH"},
},
&cli.StringFlag{
Name: "dev-root",
Usage: "Specify the root where `/dev` is located. If this is not specified, the driver-root is assumed.",
Destination: &opts.DevRoot,
EnvVars: []string{"NVIDIA_DEV_ROOT", "DEV_ROOT"},
},
&cli.StringFlag{
Name: "dev-root-ctr-path",
Usage: "Specify the root where `/dev` is located in the container. If this is not specified, the driver-root-ctr-path is assumed.",
Destination: &opts.DevRootCtrPath,
EnvVars: []string{"DEV_ROOT_CTR_PATH"},
},
&cli.StringFlag{
Name: "nvidia-container-runtime.debug",
Aliases: []string{"nvidia-container-runtime-debug"},
Usage: "Specify the location of the debug log file for the NVIDIA Container Runtime",
Destination: &opts.ContainerRuntimeDebug,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_DEBUG"},
},
&cli.StringFlag{
Name: "nvidia-container-runtime.log-level",
Aliases: []string{"nvidia-container-runtime-debug-log-level"},
Destination: &opts.ContainerRuntimeLogLevel,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_LOG_LEVEL"},
},
&cli.StringFlag{
Name: "nvidia-container-runtime.mode",
Aliases: []string{"nvidia-container-runtime-mode"},
Destination: &opts.ContainerRuntimeMode,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_MODE"},
},
&cli.StringFlag{
Name: "nvidia-container-runtime.modes.cdi.default-kind",
Destination: &opts.ContainerRuntimeModesCdiDefaultKind,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_MODES_CDI_DEFAULT_KIND"},
},
&cli.StringSliceFlag{
Name: "nvidia-container-runtime.modes.cdi.annotation-prefixes",
Destination: &opts.ContainerRuntimeModesCDIAnnotationPrefixes,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_MODES_CDI_ANNOTATION_PREFIXES"},
},
&cli.StringSliceFlag{
Name: "nvidia-container-runtime.runtimes",
Destination: &opts.ContainerRuntimeRuntimes,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_RUNTIMES"},
},
&cli.BoolFlag{
Name: "nvidia-container-runtime-hook.skip-mode-detection",
Value: true,
Destination: &opts.ContainerRuntimeHookSkipModeDetection,
EnvVars: []string{"NVIDIA_CONTAINER_RUNTIME_HOOK_SKIP_MODE_DETECTION"},
},
&cli.StringFlag{
Name: "nvidia-container-cli.debug",
Aliases: []string{"nvidia-container-cli-debug"},
Usage: "Specify the location of the debug log file for the NVIDIA Container CLI",
Destination: &opts.ContainerCLIDebug,
EnvVars: []string{"NVIDIA_CONTAINER_CLI_DEBUG"},
},
&cli.BoolFlag{
Name: "accept-nvidia-visible-devices-envvar-when-unprivileged",
Usage: "Set the accept-nvidia-visible-devices-envvar-when-unprivileged config option",
Value: true,
Destination: &opts.acceptNVIDIAVisibleDevicesWhenUnprivileged,
EnvVars: []string{"ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED"},
},
&cli.BoolFlag{
Name: "accept-nvidia-visible-devices-as-volume-mounts",
Usage: "Set the accept-nvidia-visible-devices-as-volume-mounts config option",
Destination: &opts.acceptNVIDIAVisibleDevicesAsVolumeMounts,
EnvVars: []string{"ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS"},
},
&cli.BoolFlag{
Name: "cdi-enabled",
Aliases: []string{"enable-cdi"},
Usage: "enable the generation of a CDI specification",
Destination: &opts.CDI.Enabled,
EnvVars: []string{"CDI_ENABLED", "ENABLE_CDI"},
},
&cli.StringFlag{
Name: "cdi-output-dir",
Usage: "the directory where the CDI output files are to be written. If this is set to '', no CDI specification is generated.",
Value: "/var/run/cdi",
Destination: &opts.CDI.outputDir,
EnvVars: []string{"CDI_OUTPUT_DIR"},
},
&cli.StringFlag{
Name: "cdi-kind",
Usage: "the vendor string to use for the generated CDI specification",
Value: "management.nvidia.com/gpu",
Destination: &opts.CDI.kind,
EnvVars: []string{"CDI_KIND"},
},
&cli.BoolFlag{
Name: "ignore-errors",
Usage: "ignore errors when installing the NVIDIA Container toolkit. This is used for testing purposes only.",
Hidden: true,
Destination: &opts.ignoreErrors,
},
&cli.StringSliceFlag{
Name: "create-device-nodes",
Usage: "(Only applicable with --cdi-enabled) specifies which device nodes should be created. If any one of the options is set to '' or 'none', no device nodes will be created.",
Value: cli.NewStringSlice("control"),
Destination: &opts.createDeviceNodes,
EnvVars: []string{"CREATE_DEVICE_NODES"},
},
&cli.StringSliceFlag{
Name: "opt-in-features",
Hidden: true,
Destination: &opts.optInFeatures,
EnvVars: []string{"NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES"},
},
}
return flags
}
// An Installer is used to install the NVIDIA Container Toolkit from the toolkit container.
type Installer struct {
fileInstaller
// toolkitRoot specifies the destination path at which the toolkit is installed.
toolkitRoot string
}
// NewInstaller creates an installer for the NVIDIA Container Toolkit.
func NewInstaller(opts ...Option) *Installer {
i := &Installer{}
for _, opt := range opts {
opt(i)
}
if i.logger == nil {
i.logger = logger.New()
}
return i
}
// ValidateOptions checks whether the specified options are valid
func (t *Installer) ValidateOptions(opts *Options) error {
if t == nil {
return fmt.Errorf("toolkit installer is not initilized")
}
if t.toolkitRoot == "" {
return fmt.Errorf("invalid --toolkit-root option: %v", t.toolkitRoot)
}
vendor, class := parser.ParseQualifier(opts.CDI.kind)
if err := parser.ValidateVendorName(vendor); err != nil {
return fmt.Errorf("invalid CDI vendor name: %v", err)
}
if err := parser.ValidateClassName(class); err != nil {
return fmt.Errorf("invalid CDI class name: %v", err)
}
opts.CDI.vendor = vendor
opts.CDI.class = class
if opts.CDI.Enabled && opts.CDI.outputDir == "" {
t.logger.Warning("Skipping CDI spec generation (no output directory specified)")
opts.CDI.Enabled = false
}
isDisabled := false
for _, mode := range opts.createDeviceNodes.Value() {
if mode != "" && mode != "none" && mode != "control" {
return fmt.Errorf("invalid --create-device-nodes value: %v", mode)
}
if mode == "" || mode == "none" {
isDisabled = true
break
}
}
if !opts.CDI.Enabled && !isDisabled {
t.logger.Info("disabling device node creation since --cdi-enabled=false")
isDisabled = true
}
if isDisabled {
opts.createDeviceNodes = *cli.NewStringSlice()
}
return nil
}
// Install installs the components of the NVIDIA container toolkit.
// Any existing installation is removed.
func (t *Installer) Install(cli *cli.Context, opts *Options) error {
if t == nil {
return fmt.Errorf("toolkit installer is not initilized")
}
t.logger.Infof("Installing NVIDIA container toolkit to '%v'", t.toolkitRoot)
t.logger.Infof("Removing existing NVIDIA container toolkit installation")
err := os.RemoveAll(t.toolkitRoot)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error removing toolkit directory: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error removing toolkit directory: %v", err))
}
toolkitConfigDir := filepath.Join(t.toolkitRoot, ".config", "nvidia-container-runtime")
toolkitConfigPath := filepath.Join(toolkitConfigDir, configFilename)
err = t.createDirectories(t.toolkitRoot, toolkitConfigDir)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("could not create required directories: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("could not create required directories: %v", err))
}
err = t.installContainerLibraries(t.toolkitRoot)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error installing NVIDIA container library: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error installing NVIDIA container library: %v", err))
}
err = t.installContainerRuntimes(t.toolkitRoot)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error installing NVIDIA container runtime: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error installing NVIDIA container runtime: %v", err))
}
nvidiaContainerCliExecutable, err := t.installContainerCLI(t.toolkitRoot)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error installing NVIDIA container CLI: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error installing NVIDIA container CLI: %v", err))
}
nvidiaContainerRuntimeHookPath, err := t.installRuntimeHook(t.toolkitRoot, toolkitConfigPath)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error installing NVIDIA container runtime hook: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error installing NVIDIA container runtime hook: %v", err))
}
nvidiaCTKPath, err := t.installContainerToolkitCLI(t.toolkitRoot)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error installing NVIDIA Container Toolkit CLI: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error installing NVIDIA Container Toolkit CLI: %v", err))
}
nvidiaCDIHookPath, err := t.installContainerCDIHookCLI(t.toolkitRoot)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error installing NVIDIA Container CDI Hook CLI: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error installing NVIDIA Container CDI Hook CLI: %v", err))
}
err = t.installToolkitConfig(cli, toolkitConfigPath, nvidiaContainerCliExecutable, nvidiaCTKPath, nvidiaContainerRuntimeHookPath, opts)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error installing NVIDIA container toolkit config: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error installing NVIDIA container toolkit config: %v", err))
}
err = t.createDeviceNodes(opts)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error creating device nodes: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error creating device nodes: %v", err))
}
err = t.generateCDISpec(opts, nvidiaCDIHookPath)
if err != nil && !opts.ignoreErrors {
return fmt.Errorf("error generating CDI specification: %v", err)
} else if err != nil {
t.logger.Errorf("Ignoring error: %v", fmt.Errorf("error generating CDI specification: %v", err))
}
return nil
}
// installContainerLibraries locates and installs the libraries that are part of
// the nvidia-container-toolkit.
// A predefined set of library candidates are considered, with the first one
// resulting in success being installed to the toolkit folder. The install process
// resolves the symlink for the library and copies the versioned library itself.
func (t *Installer) installContainerLibraries(toolkitRoot string) error {
t.logger.Infof("Installing NVIDIA container library to '%v'", toolkitRoot)
libs := []string{
"libnvidia-container.so.1",
"libnvidia-container-go.so.1",
}
for _, l := range libs {
err := t.installLibrary(l, toolkitRoot)
if err != nil {
return fmt.Errorf("failed to install %s: %v", l, err)
}
}
return nil
}
// installLibrary installs the specified library to the toolkit directory.
func (t *Installer) installLibrary(libName string, toolkitRoot string) error {
libraryPath, err := t.findLibrary(libName)
if err != nil {
return fmt.Errorf("error locating NVIDIA container library: %v", err)
}
installedLibPath, err := t.installFileToFolder(toolkitRoot, libraryPath)
if err != nil {
return fmt.Errorf("error installing %v to %v: %v", libraryPath, toolkitRoot, err)
}
t.logger.Infof("Installed '%v' to '%v'", libraryPath, installedLibPath)
if filepath.Base(installedLibPath) == libName {
return nil
}
err = t.installSymlink(toolkitRoot, libName, installedLibPath)
if err != nil {
return fmt.Errorf("error installing symlink for NVIDIA container library: %v", err)
}
return nil
}
// installToolkitConfig installs the config file for the NVIDIA container toolkit ensuring
// that the settings are updated to match the desired install and nvidia driver directories.
func (t *Installer) installToolkitConfig(c *cli.Context, toolkitConfigPath string, nvidiaContainerCliExecutablePath string, nvidiaCTKPath string, nvidaContainerRuntimeHookPath string, opts *Options) error {
t.logger.Infof("Installing NVIDIA container toolkit config '%v'", toolkitConfigPath)
cfg, err := config.New(
config.WithConfigFile(nvidiaContainerToolkitConfigSource),
)
if err != nil {
return fmt.Errorf("could not open source config file: %v", err)
}
targetConfig, err := os.Create(toolkitConfigPath)
if err != nil {
return fmt.Errorf("could not create target config file: %v", err)
}
defer targetConfig.Close()
// Read the ldconfig path from the config as this may differ per platform
// On ubuntu-based systems this ends in `.real`
ldconfigPath := fmt.Sprintf("%s", cfg.GetDefault("nvidia-container-cli.ldconfig", "/sbin/ldconfig"))
// Use the driver run root as the root:
driverLdconfigPath := config.NormalizeLDConfigPath("@" + filepath.Join(opts.DriverRoot, strings.TrimPrefix(ldconfigPath, "@/")))
configValues := map[string]interface{}{
// Set the options in the root toml table
"accept-nvidia-visible-devices-envvar-when-unprivileged": opts.acceptNVIDIAVisibleDevicesWhenUnprivileged,
"accept-nvidia-visible-devices-as-volume-mounts": opts.acceptNVIDIAVisibleDevicesAsVolumeMounts,
// Set the nvidia-container-cli options
"nvidia-container-cli.root": opts.DriverRoot,
"nvidia-container-cli.path": nvidiaContainerCliExecutablePath,
"nvidia-container-cli.ldconfig": driverLdconfigPath,
// Set nvidia-ctk options
"nvidia-ctk.path": nvidiaCTKPath,
// Set the nvidia-container-runtime-hook options
"nvidia-container-runtime-hook.path": nvidaContainerRuntimeHookPath,
"nvidia-container-runtime-hook.skip-mode-detection": opts.ContainerRuntimeHookSkipModeDetection,
}
toolkitRuntimeList := opts.ContainerRuntimeRuntimes.Value()
if len(toolkitRuntimeList) > 0 {
configValues["nvidia-container-runtime.runtimes"] = toolkitRuntimeList
}
for _, optInFeature := range opts.optInFeatures.Value() {
configValues["features."+optInFeature] = true
}
for key, value := range configValues {
cfg.Set(key, value)
}
// Set the optional config options
optionalConfigValues := map[string]interface{}{
"nvidia-container-runtime.debug": opts.ContainerRuntimeDebug,
"nvidia-container-runtime.log-level": opts.ContainerRuntimeLogLevel,
"nvidia-container-runtime.mode": opts.ContainerRuntimeMode,
"nvidia-container-runtime.modes.cdi.annotation-prefixes": opts.ContainerRuntimeModesCDIAnnotationPrefixes,
"nvidia-container-runtime.modes.cdi.default-kind": opts.ContainerRuntimeModesCdiDefaultKind,
"nvidia-container-runtime.runtimes": opts.ContainerRuntimeRuntimes,
"nvidia-container-cli.debug": opts.ContainerCLIDebug,
}
for key, value := range optionalConfigValues {
if !c.IsSet(key) {
t.logger.Infof("Skipping unset option: %v", key)
continue
}
if value == nil {
t.logger.Infof("Skipping option with nil value: %v", key)
continue
}
switch v := value.(type) {
case string:
if v == "" {
continue
}
case cli.StringSlice:
if len(v.Value()) == 0 {
continue
}
value = v.Value()
default:
t.logger.Warningf("Unexpected type for option %v=%v: %T", key, value, v)
}
cfg.Set(key, value)
}
if _, err := cfg.WriteTo(targetConfig); err != nil {
return fmt.Errorf("error writing config: %v", err)
}
os.Stdout.WriteString("Using config:\n")
if _, err = cfg.WriteTo(os.Stdout); err != nil {
t.logger.Warningf("Failed to output config to STDOUT: %v", err)
}
return nil
}
// installContainerToolkitCLI installs the nvidia-ctk CLI executable and wrapper.
func (t *Installer) installContainerToolkitCLI(toolkitDir string) (string, error) {
e := executable{
fileInstaller: t.fileInstaller,
source: "/usr/bin/nvidia-ctk",
target: executableTarget{
dotfileName: "nvidia-ctk.real",
wrapperName: "nvidia-ctk",
},
}
return e.install(toolkitDir)
}
// installContainerCDIHookCLI installs the nvidia-cdi-hook CLI executable and wrapper.
func (t *Installer) installContainerCDIHookCLI(toolkitDir string) (string, error) {
e := executable{
fileInstaller: t.fileInstaller,
source: "/usr/bin/nvidia-cdi-hook",
target: executableTarget{
dotfileName: "nvidia-cdi-hook.real",
wrapperName: "nvidia-cdi-hook",
},
}
return e.install(toolkitDir)
}
// installContainerCLI sets up the NVIDIA container CLI executable, copying the executable
// and implementing the required wrapper
func (t *Installer) installContainerCLI(toolkitRoot string) (string, error) {
t.logger.Infof("Installing NVIDIA container CLI from '%v'", nvidiaContainerCliSource)
env := map[string]string{
"LD_LIBRARY_PATH": toolkitRoot,
}
e := executable{
fileInstaller: t.fileInstaller,
source: nvidiaContainerCliSource,
target: executableTarget{
dotfileName: "nvidia-container-cli.real",
wrapperName: "nvidia-container-cli",
},
env: env,
}
installedPath, err := e.install(toolkitRoot)
if err != nil {
return "", fmt.Errorf("error installing NVIDIA container CLI: %v", err)
}
return installedPath, nil
}
// installRuntimeHook sets up the NVIDIA runtime hook, copying the executable
// and implementing the required wrapper
func (t *Installer) installRuntimeHook(toolkitRoot string, configFilePath string) (string, error) {
t.logger.Infof("Installing NVIDIA container runtime hook from '%v'", nvidiaContainerRuntimeHookSource)
argLines := []string{
fmt.Sprintf("-config \"%s\"", configFilePath),
}
e := executable{
fileInstaller: t.fileInstaller,
source: nvidiaContainerRuntimeHookSource,
target: executableTarget{
dotfileName: "nvidia-container-runtime-hook.real",
wrapperName: "nvidia-container-runtime-hook",
},
argLines: argLines,
}
installedPath, err := e.install(toolkitRoot)
if err != nil {
return "", fmt.Errorf("error installing NVIDIA container runtime hook: %v", err)
}
err = t.installSymlink(toolkitRoot, "nvidia-container-toolkit", installedPath)
if err != nil {
return "", fmt.Errorf("error installing symlink to NVIDIA container runtime hook: %v", err)
}
return installedPath, nil
}
// installSymlink creates a symlink in the toolkitDirectory that points to the specified target.
// Note: The target is assumed to be local to the toolkit directory
func (t *Installer) installSymlink(toolkitRoot string, link string, target string) error {
symlinkPath := filepath.Join(toolkitRoot, link)
targetPath := filepath.Base(target)
t.logger.Infof("Creating symlink '%v' -> '%v'", symlinkPath, targetPath)
err := os.Symlink(targetPath, symlinkPath)
if err != nil {
return fmt.Errorf("error creating symlink '%v' => '%v': %v", symlinkPath, targetPath, err)
}
return nil
}
// findLibrary searches a set of candidate libraries in the specified root for
// a given library name
func (t *Installer) findLibrary(libName string) (string, error) {
t.logger.Infof("Finding library %v (root=%v)", libName)
candidateDirs := []string{
"/usr/lib64",
"/usr/lib/x86_64-linux-gnu",
"/usr/lib/aarch64-linux-gnu",
}
for _, d := range candidateDirs {
l := filepath.Join(t.sourceRoot, d, libName)
t.logger.Infof("Checking library candidate '%v'", l)
libraryCandidate, err := t.resolveLink(l)
if err != nil {
t.logger.Infof("Skipping library candidate '%v': %v", l, err)
continue
}
return strings.TrimPrefix(libraryCandidate, t.sourceRoot), nil
}
return "", fmt.Errorf("error locating library '%v'", libName)
}
// resolveLink finds the target of a symlink or the file itself in the
// case of a regular file.
// This is equivalent to running `readlink -f ${l}`
func (t *Installer) resolveLink(l string) (string, error) {
resolved, err := filepath.EvalSymlinks(l)
if err != nil {
return "", fmt.Errorf("error resolving link '%v': %v", l, err)
}
if l != resolved {
t.logger.Infof("Resolved link: '%v' => '%v'", l, resolved)
}
return resolved, nil
}
func (t *Installer) createDirectories(dir ...string) error {
for _, d := range dir {
t.logger.Infof("Creating directory '%v'", d)
err := os.MkdirAll(d, 0755)
if err != nil {
return fmt.Errorf("error creating directory: %v", err)
}
}
return nil
}
func (t *Installer) createDeviceNodes(opts *Options) error {
modes := opts.createDeviceNodes.Value()
if len(modes) == 0 {
return nil
}
devices, err := nvdevices.New(
nvdevices.WithDevRoot(opts.DevRootCtrPath),
)
if err != nil {
return fmt.Errorf("failed to create library: %v", err)
}
for _, mode := range modes {
t.logger.Infof("Creating %v device nodes at %v", mode, opts.DevRootCtrPath)
if mode != "control" {
t.logger.Warningf("Unrecognised device mode: %v", mode)
continue
}
if err := devices.CreateNVIDIAControlDevices(); err != nil {
return fmt.Errorf("failed to create control device nodes: %v", err)
}
}
return nil
}
// generateCDISpec generates a CDI spec for use in management containers
func (t *Installer) generateCDISpec(opts *Options, nvidiaCDIHookPath string) error {
if !opts.CDI.Enabled {
return nil
}
t.logger.Info("Generating CDI spec for management containers")
cdilib, err := nvcdi.New(
nvcdi.WithLogger(t.logger),
nvcdi.WithMode(nvcdi.ModeManagement),
nvcdi.WithDriverRoot(opts.DriverRootCtrPath),
nvcdi.WithDevRoot(opts.DevRootCtrPath),
nvcdi.WithNVIDIACDIHookPath(nvidiaCDIHookPath),
nvcdi.WithVendor(opts.CDI.vendor),
nvcdi.WithClass(opts.CDI.class),
)
if err != nil {
return fmt.Errorf("failed to create CDI library for management containers: %v", err)
}
spec, err := cdilib.GetSpec()
if err != nil {
return fmt.Errorf("failed to genereate CDI spec for management containers: %v", err)
}
transformer := transformroot.NewDriverTransformer(
transformroot.WithDriverRoot(opts.DriverRootCtrPath),
transformroot.WithTargetDriverRoot(opts.DriverRoot),
transformroot.WithDevRoot(opts.DevRootCtrPath),
transformroot.WithTargetDevRoot(opts.DevRoot),
)
if err := transformer.Transform(spec.Raw()); err != nil {
return fmt.Errorf("failed to transform driver root in CDI spec: %v", err)
}
name, err := cdi.GenerateNameForSpec(spec.Raw())
if err != nil {
return fmt.Errorf("failed to generate CDI name for management containers: %v", err)
}
err = spec.Save(filepath.Join(opts.CDI.outputDir, name))
if err != nil {
return fmt.Errorf("failed to save CDI spec for management containers: %v", err)
}
return nil
}

View File

@@ -0,0 +1,217 @@
/**
# Copyright 2024 NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package toolkit
import (
"fmt"
"os"
"path/filepath"
"strings"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup/symlinks"
"github.com/NVIDIA/nvidia-container-toolkit/internal/test"
)
func TestInstall(t *testing.T) {
t.Setenv("__NVCT_TESTING_DEVICES_ARE_FILES", "true")
logger, _ := testlog.NewNullLogger()
moduleRoot, err := test.GetModuleRoot()
require.NoError(t, err)
artifactRoot := filepath.Join(moduleRoot, "testdata", "installer", "artifacts")
testCases := []struct {
description string
hostRoot string
packageType string
cdiEnabled bool
expectedError error
expectedCdiSpec string
}{
{
hostRoot: "rootfs-empty",
packageType: "deb",
},
{
hostRoot: "rootfs-empty",
packageType: "rpm",
},
{
hostRoot: "rootfs-empty",
packageType: "deb",
cdiEnabled: true,
expectedError: fmt.Errorf("no NVIDIA device nodes found"),
},
{
hostRoot: "rootfs-1",
packageType: "deb",
cdiEnabled: true,
expectedCdiSpec: `---
cdiVersion: 0.5.0
containerEdits:
env:
- NVIDIA_VISIBLE_DEVICES=void
hooks:
- args:
- nvidia-cdi-hook
- create-symlinks
- --link
- libcuda.so.1::/lib/x86_64-linux-gnu/libcuda.so
hookName: createContainer
path: {{ .toolkitRoot }}/nvidia-cdi-hook
- args:
- nvidia-cdi-hook
- update-ldcache
- --folder
- /lib/x86_64-linux-gnu
hookName: createContainer
path: {{ .toolkitRoot }}/nvidia-cdi-hook
mounts:
- containerPath: /lib/x86_64-linux-gnu/libcuda.so.999.88.77
hostPath: /host/driver/root/lib/x86_64-linux-gnu/libcuda.so.999.88.77
options:
- ro
- nosuid
- nodev
- bind
devices:
- containerEdits:
deviceNodes:
- hostPath: /host/driver/root/dev/nvidia0
path: /dev/nvidia0
- hostPath: /host/driver/root/dev/nvidiactl
path: /dev/nvidiactl
- hostPath: /host/driver/root/dev/nvidia-caps-imex-channels/channel0
path: /dev/nvidia-caps-imex-channels/channel0
- hostPath: /host/driver/root/dev/nvidia-caps-imex-channels/channel1
path: /dev/nvidia-caps-imex-channels/channel1
- hostPath: /host/driver/root/dev/nvidia-caps-imex-channels/channel2047
path: /dev/nvidia-caps-imex-channels/channel2047
name: all
kind: example.com/class
`,
},
}
for _, tc := range testCases {
// hostRoot := filepath.Join(moduleRoot, "testdata", "lookup", tc.hostRoot)
t.Run(tc.description, func(t *testing.T) {
testRoot := t.TempDir()
toolkitRoot := filepath.Join(testRoot, "toolkit-test")
cdiOutputDir := filepath.Join(moduleRoot, "toolkit-test", "/var/cdi")
sourceRoot := filepath.Join(artifactRoot, tc.packageType)
options := Options{
DriverRoot: "/host/driver/root",
DriverRootCtrPath: filepath.Join(moduleRoot, "testdata", "lookup", tc.hostRoot),
CDI: cdiOptions{
Enabled: tc.cdiEnabled,
outputDir: cdiOutputDir,
kind: "example.com/class",
},
}
ti := NewInstaller(
WithLogger(logger),
WithToolkitRoot(toolkitRoot),
WithSourceRoot(sourceRoot),
)
require.NoError(t, ti.ValidateOptions(&options))
err := ti.Install(&cli.Context{}, &options)
if tc.expectedError == nil {
require.NoError(t, err)
} else {
require.Contains(t, err.Error(), tc.expectedError.Error())
}
require.DirExists(t, toolkitRoot)
requireSymlink(t, toolkitRoot, "libnvidia-container.so.1", "libnvidia-container.so.99.88.77")
requireSymlink(t, toolkitRoot, "libnvidia-container-go.so.1", "libnvidia-container-go.so.99.88.77")
requireWrappedExecutable(t, toolkitRoot, "nvidia-cdi-hook")
requireWrappedExecutable(t, toolkitRoot, "nvidia-container-cli")
requireWrappedExecutable(t, toolkitRoot, "nvidia-container-runtime")
requireWrappedExecutable(t, toolkitRoot, "nvidia-container-runtime-hook")
requireWrappedExecutable(t, toolkitRoot, "nvidia-container-runtime.cdi")
requireWrappedExecutable(t, toolkitRoot, "nvidia-container-runtime.legacy")
requireWrappedExecutable(t, toolkitRoot, "nvidia-ctk")
requireSymlink(t, toolkitRoot, "nvidia-container-toolkit", "nvidia-container-runtime-hook")
// TODO: Add checks for wrapper contents
// grep -q -E "nvidia driver modules are not yet loaded, invoking runc directly" "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-runtime"
// grep -q -E "exec runc \".@\"" "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-runtime"
require.DirExists(t, filepath.Join(toolkitRoot, ".config"))
require.DirExists(t, filepath.Join(toolkitRoot, ".config", "nvidia-container-runtime"))
require.FileExists(t, filepath.Join(toolkitRoot, ".config", "nvidia-container-runtime", "config.toml"))
cfgToml, err := config.New(config.WithConfigFile(filepath.Join(toolkitRoot, ".config", "nvidia-container-runtime", "config.toml")))
require.NoError(t, err)
cfg, err := cfgToml.Config()
require.NoError(t, err)
// Ensure that the config file has the required contents.
// TODO: Add checks for additional config options.
require.Equal(t, "/host/driver/root", cfg.NVIDIAContainerCLIConfig.Root)
require.Equal(t, "@/host/driver/root/sbin/ldconfig", string(cfg.NVIDIAContainerCLIConfig.Ldconfig))
require.EqualValues(t, filepath.Join(toolkitRoot, "nvidia-container-cli"), cfg.NVIDIAContainerCLIConfig.Path)
require.EqualValues(t, filepath.Join(toolkitRoot, "nvidia-ctk"), cfg.NVIDIACTKConfig.Path)
if len(tc.expectedCdiSpec) > 0 {
cdiSpecFile := filepath.Join(cdiOutputDir, "example.com-class.yaml")
require.FileExists(t, cdiSpecFile)
info, err := os.Stat(cdiSpecFile)
require.NoError(t, err)
require.NotZero(t, info.Mode()&0004)
contents, err := os.ReadFile(cdiSpecFile)
require.NoError(t, err)
require.Equal(t, strings.ReplaceAll(tc.expectedCdiSpec, "{{ .toolkitRoot }}", toolkitRoot), string(contents))
}
})
}
}
func requireWrappedExecutable(t *testing.T, toolkitRoot string, expectedExecutable string) {
requireExecutable(t, toolkitRoot, expectedExecutable)
requireExecutable(t, toolkitRoot, expectedExecutable+".real")
}
func requireExecutable(t *testing.T, toolkitRoot string, expectedExecutable string) {
executable := filepath.Join(toolkitRoot, expectedExecutable)
require.FileExists(t, executable)
info, err := os.Lstat(executable)
require.NoError(t, err)
require.Zero(t, info.Mode()&os.ModeSymlink)
require.NotZero(t, info.Mode()&0111)
}
func requireSymlink(t *testing.T, toolkitRoot string, expectedLink string, expectedTarget string) {
link := filepath.Join(toolkitRoot, expectedLink)
require.FileExists(t, link)
target, err := symlinks.Resolve(link)
require.NoError(t, err)
require.Equal(t, expectedTarget, target)
}

View File

@@ -0,0 +1,327 @@
package main
import (
"fmt"
"os"
"os/signal"
"path/filepath"
"strings"
"syscall"
"github.com/urfave/cli/v2"
"golang.org/x/sys/unix"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/runtime"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk-installer/container/toolkit"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
const (
toolkitPidFilename = "toolkit.pid"
defaultPidFile = "/run/nvidia/toolkit/" + toolkitPidFilename
toolkitSubDir = "toolkit"
defaultRuntime = "docker"
defaultRuntimeArgs = ""
)
var availableRuntimes = map[string]struct{}{"docker": {}, "crio": {}, "containerd": {}}
var defaultLowLevelRuntimes = []string{"docker-runc", "runc", "crun"}
var waitingForSignal = make(chan bool, 1)
var signalReceived = make(chan bool, 1)
// options stores the command line arguments
type options struct {
noDaemon bool
runtime string
runtimeArgs string
root string
pidFile string
sourceRoot string
toolkitOptions toolkit.Options
runtimeOptions runtime.Options
}
func (o options) toolkitRoot() string {
return filepath.Join(o.root, toolkitSubDir)
}
// Version defines the CLI version. This is set at build time using LD FLAGS
var Version = "development"
func main() {
logger := logger.New()
remainingArgs, root, err := ParseArgs(logger, os.Args)
if err != nil {
logger.Errorf("Error: unable to parse arguments: %v", err)
os.Exit(1)
}
c := NewApp(logger, root)
// Run the CLI
logger.Infof("Starting %v", c.Name)
if err := c.Run(remainingArgs); err != nil {
logger.Errorf("error running nvidia-toolkit: %v", err)
os.Exit(1)
}
logger.Infof("Completed %v", c.Name)
}
// An app represents the nvidia-ctk-installer.
type app struct {
logger logger.Interface
// defaultRoot stores the root to use if the --root flag is not specified.
defaultRoot string
toolkit *toolkit.Installer
}
// NewApp creates the CLI app fro the specified options.
// defaultRoot is used as the root if not specified via the --root flag.
func NewApp(logger logger.Interface, defaultRoot string) *cli.App {
a := app{
logger: logger,
defaultRoot: defaultRoot,
}
return a.build()
}
func (a app) build() *cli.App {
options := options{
toolkitOptions: toolkit.Options{},
}
// Create the top-level CLI
c := cli.NewApp()
c.Name = "nvidia-toolkit"
c.Usage = "Install the nvidia-container-toolkit for use by a given runtime"
c.UsageText = "[DESTINATION] [-n | --no-daemon] [-r | --runtime] [-u | --runtime-args]"
c.Description = "DESTINATION points to the host path underneath which the nvidia-container-toolkit should be installed.\nIt will be installed at ${DESTINATION}/toolkit"
c.Version = Version
c.Before = func(ctx *cli.Context) error {
return a.Before(ctx, &options)
}
c.Action = func(ctx *cli.Context) error {
return a.Run(ctx, &options)
}
// Setup flags for the CLI
c.Flags = []cli.Flag{
&cli.BoolFlag{
Name: "no-daemon",
Aliases: []string{"n"},
Usage: "terminate immediately after setting up the runtime. Note that no cleanup will be performed",
Destination: &options.noDaemon,
EnvVars: []string{"NO_DAEMON"},
},
&cli.StringFlag{
Name: "runtime",
Aliases: []string{"r"},
Usage: "the runtime to setup on this node. One of {'docker', 'crio', 'containerd'}",
Value: defaultRuntime,
Destination: &options.runtime,
EnvVars: []string{"RUNTIME"},
},
// TODO: Remove runtime-args
&cli.StringFlag{
Name: "runtime-args",
Aliases: []string{"u"},
Usage: "arguments to pass to 'docker', 'crio', or 'containerd' setup command",
Value: defaultRuntimeArgs,
Destination: &options.runtimeArgs,
EnvVars: []string{"RUNTIME_ARGS"},
},
&cli.StringFlag{
Name: "root",
Value: a.defaultRoot,
Usage: "the folder where the NVIDIA Container Toolkit is to be installed. It will be installed to `ROOT`/toolkit",
Destination: &options.root,
EnvVars: []string{"ROOT"},
},
&cli.StringFlag{
Name: "source-root",
Value: "/",
Usage: "The folder where the required toolkit artifacts can be found",
Destination: &options.sourceRoot,
EnvVars: []string{"SOURCE_ROOT"},
},
&cli.StringFlag{
Name: "pid-file",
Value: defaultPidFile,
Usage: "the path to a toolkit.pid file to ensure that only a single configuration instance is running",
Destination: &options.pidFile,
EnvVars: []string{"TOOLKIT_PID_FILE", "PID_FILE"},
},
}
c.Flags = append(c.Flags, toolkit.Flags(&options.toolkitOptions)...)
c.Flags = append(c.Flags, runtime.Flags(&options.runtimeOptions)...)
return c
}
func (a *app) Before(c *cli.Context, o *options) error {
a.toolkit = toolkit.NewInstaller(
toolkit.WithLogger(a.logger),
toolkit.WithSourceRoot(o.sourceRoot),
toolkit.WithToolkitRoot(o.toolkitRoot()),
)
return a.validateFlags(c, o)
}
func (a *app) validateFlags(c *cli.Context, o *options) error {
if o.root == "" {
return fmt.Errorf("the install root must be specified")
}
if _, exists := availableRuntimes[o.runtime]; !exists {
return fmt.Errorf("unknown runtime: %v", o.runtime)
}
if filepath.Base(o.pidFile) != toolkitPidFilename {
return fmt.Errorf("invalid toolkit.pid path %v", o.pidFile)
}
if err := a.toolkit.ValidateOptions(&o.toolkitOptions); err != nil {
return err
}
if err := runtime.ValidateOptions(c, &o.runtimeOptions, o.runtime, o.toolkitRoot(), &o.toolkitOptions); err != nil {
return err
}
return nil
}
// Run installs the NVIDIA Container Toolkit and updates the requested runtime.
// If the application is run as a daemon, the application waits and unconfigures
// the runtime on termination.
func (a *app) Run(c *cli.Context, o *options) error {
err := a.initialize(o.pidFile)
if err != nil {
return fmt.Errorf("unable to initialize: %v", err)
}
defer a.shutdown(o.pidFile)
if len(o.toolkitOptions.ContainerRuntimeRuntimes.Value()) == 0 {
lowlevelRuntimePaths, err := runtime.GetLowlevelRuntimePaths(&o.runtimeOptions, o.runtime)
if err != nil {
return fmt.Errorf("unable to determine runtime options: %w", err)
}
lowlevelRuntimePaths = append(lowlevelRuntimePaths, defaultLowLevelRuntimes...)
o.toolkitOptions.ContainerRuntimeRuntimes = *cli.NewStringSlice(lowlevelRuntimePaths...)
}
err = a.toolkit.Install(c, &o.toolkitOptions)
if err != nil {
return fmt.Errorf("unable to install toolkit: %v", err)
}
err = runtime.Setup(c, &o.runtimeOptions, o.runtime)
if err != nil {
return fmt.Errorf("unable to setup runtime: %v", err)
}
if !o.noDaemon {
err = a.waitForSignal()
if err != nil {
return fmt.Errorf("unable to wait for signal: %v", err)
}
err = runtime.Cleanup(c, &o.runtimeOptions, o.runtime)
if err != nil {
return fmt.Errorf("unable to cleanup runtime: %v", err)
}
}
return nil
}
// ParseArgs checks if a single positional argument was defined and extracts this the root.
// If no positional arguments are defined, it is assumed that the root is specified as a flag.
func ParseArgs(logger logger.Interface, args []string) ([]string, string, error) {
logger.Infof("Parsing arguments")
if len(args) < 2 {
return args, "", nil
}
var lastPositionalArg int
for i, arg := range args {
if strings.HasPrefix(arg, "-") {
break
}
lastPositionalArg = i
}
if lastPositionalArg == 0 {
return args, "", nil
}
if lastPositionalArg == 1 {
return append([]string{args[0]}, args[2:]...), args[1], nil
}
return nil, "", fmt.Errorf("unexpected positional argument(s) %v", args[2:lastPositionalArg+1])
}
func (a *app) initialize(pidFile string) error {
a.logger.Infof("Initializing")
if dir := filepath.Dir(pidFile); dir != "" {
err := os.MkdirAll(dir, 0755)
if err != nil {
return fmt.Errorf("unable to create folder for pidfile: %w", err)
}
}
f, err := os.Create(pidFile)
if err != nil {
return fmt.Errorf("unable to create pidfile: %v", err)
}
err = unix.Flock(int(f.Fd()), unix.LOCK_EX|unix.LOCK_NB)
if err != nil {
a.logger.Warningf("Unable to get exclusive lock on '%v'", pidFile)
a.logger.Warningf("This normally means an instance of the NVIDIA toolkit Container is already running, aborting")
return fmt.Errorf("unable to get flock on pidfile: %v", err)
}
_, err = f.WriteString(fmt.Sprintf("%v\n", os.Getpid()))
if err != nil {
return fmt.Errorf("unable to write PID to pidfile: %v", err)
}
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGHUP, syscall.SIGINT, syscall.SIGQUIT, syscall.SIGPIPE, syscall.SIGTERM)
go func() {
<-sigs
select {
case <-waitingForSignal:
signalReceived <- true
default:
a.logger.Infof("Signal received, exiting early")
a.shutdown(pidFile)
os.Exit(0)
}
}()
return nil
}
func (a *app) waitForSignal() error {
a.logger.Infof("Waiting for signal")
waitingForSignal <- true
<-signalReceived
return nil
}
func (a *app) shutdown(pidFile string) {
a.logger.Infof("Shutting Down")
err := os.Remove(pidFile)
if err != nil {
a.logger.Warningf("Unable to remove pidfile: %v", err)
}
}

View File

@@ -0,0 +1,501 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package main
import (
"fmt"
"os"
"path/filepath"
"strings"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
"github.com/NVIDIA/nvidia-container-toolkit/internal/test"
)
func TestParseArgs(t *testing.T) {
logger, _ := testlog.NewNullLogger()
testCases := []struct {
args []string
expectedRemaining []string
expectedRoot string
expectedError error
}{
{
args: []string{},
expectedRemaining: []string{},
expectedRoot: "",
expectedError: nil,
},
{
args: []string{"app"},
expectedRemaining: []string{"app"},
},
{
args: []string{"app", "root"},
expectedRemaining: []string{"app"},
expectedRoot: "root",
},
{
args: []string{"app", "--flag"},
expectedRemaining: []string{"app", "--flag"},
},
{
args: []string{"app", "root", "--flag"},
expectedRemaining: []string{"app", "--flag"},
expectedRoot: "root",
},
{
args: []string{"app", "root", "not-root", "--flag"},
expectedError: fmt.Errorf("unexpected positional argument(s) [not-root]"),
},
{
args: []string{"app", "root", "not-root"},
expectedError: fmt.Errorf("unexpected positional argument(s) [not-root]"),
},
{
args: []string{"app", "root", "not-root", "also"},
expectedError: fmt.Errorf("unexpected positional argument(s) [not-root also]"),
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
remaining, root, err := ParseArgs(logger, tc.args)
if tc.expectedError != nil {
require.EqualError(t, err, tc.expectedError.Error())
} else {
require.NoError(t, err)
}
require.ElementsMatch(t, tc.expectedRemaining, remaining)
require.Equal(t, tc.expectedRoot, root)
})
}
}
func TestApp(t *testing.T) {
t.Setenv("__NVCT_TESTING_DEVICES_ARE_FILES", "true")
logger, _ := testlog.NewNullLogger()
moduleRoot, err := test.GetModuleRoot()
require.NoError(t, err)
artifactRoot := filepath.Join(moduleRoot, "testdata", "installer", "artifacts")
hostRoot := filepath.Join(moduleRoot, "testdata", "lookup", "rootfs-1")
testCases := []struct {
description string
args []string
expectedToolkitConfig string
expectedRuntimeConfig string
}{
{
description: "no args",
expectedToolkitConfig: `accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
swarm-resource = ""
[nvidia-container-cli]
debug = ""
environment = []
ldcache = ""
ldconfig = "@/run/nvidia/driver/sbin/ldconfig"
load-kmods = true
no-cgroups = false
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-cli"
root = "/run/nvidia/driver"
user = ""
[nvidia-container-runtime]
debug = "/dev/null"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime-hook]
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime-hook"
skip-mode-detection = true
[nvidia-ctk]
path = "{{ .toolkitRoot }}/toolkit/nvidia-ctk"
`,
expectedRuntimeConfig: `{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime"
},
"nvidia-cdi": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.cdi"
},
"nvidia-legacy": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.legacy"
}
}
}`,
},
{
description: "CDI enabled enables CDI in docker",
args: []string{"--cdi-enabled", "--create-device-nodes=none"},
expectedToolkitConfig: `accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
swarm-resource = ""
[nvidia-container-cli]
debug = ""
environment = []
ldcache = ""
ldconfig = "@/run/nvidia/driver/sbin/ldconfig"
load-kmods = true
no-cgroups = false
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-cli"
root = "/run/nvidia/driver"
user = ""
[nvidia-container-runtime]
debug = "/dev/null"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime-hook]
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime-hook"
skip-mode-detection = true
[nvidia-ctk]
path = "{{ .toolkitRoot }}/toolkit/nvidia-ctk"
`,
expectedRuntimeConfig: `{
"default-runtime": "nvidia",
"features": {
"cdi": true
},
"runtimes": {
"nvidia": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime"
},
"nvidia-cdi": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.cdi"
},
"nvidia-legacy": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.legacy"
}
}
}`,
},
{
description: "--enable-cdi-in-runtime=false overrides --cdi-enabled in Docker",
args: []string{"--cdi-enabled", "--create-device-nodes=none", "--enable-cdi-in-runtime=false"},
expectedToolkitConfig: `accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
swarm-resource = ""
[nvidia-container-cli]
debug = ""
environment = []
ldcache = ""
ldconfig = "@/run/nvidia/driver/sbin/ldconfig"
load-kmods = true
no-cgroups = false
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-cli"
root = "/run/nvidia/driver"
user = ""
[nvidia-container-runtime]
debug = "/dev/null"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime-hook]
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime-hook"
skip-mode-detection = true
[nvidia-ctk]
path = "{{ .toolkitRoot }}/toolkit/nvidia-ctk"
`,
expectedRuntimeConfig: `{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime"
},
"nvidia-cdi": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.cdi"
},
"nvidia-legacy": {
"args": [],
"path": "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.legacy"
}
}
}`,
},
{
description: "CDI enabled enables CDI in containerd",
args: []string{"--cdi-enabled", "--runtime=containerd"},
expectedToolkitConfig: `accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
swarm-resource = ""
[nvidia-container-cli]
debug = ""
environment = []
ldcache = ""
ldconfig = "@/run/nvidia/driver/sbin/ldconfig"
load-kmods = true
no-cgroups = false
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-cli"
root = "/run/nvidia/driver"
user = ""
[nvidia-container-runtime]
debug = "/dev/null"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime-hook]
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime-hook"
skip-mode-detection = true
[nvidia-ctk]
path = "{{ .toolkitRoot }}/toolkit/nvidia-ctk"
`,
expectedRuntimeConfig: `version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
enable_cdi = true
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-cdi]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-cdi.options]
BinaryName = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.cdi"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-legacy]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-legacy.options]
BinaryName = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.legacy"
`,
},
{
description: "--enable-cdi-in-runtime=false overrides --cdi-enabled in containerd",
args: []string{"--cdi-enabled", "--create-device-nodes=none", "--enable-cdi-in-runtime=false", "--runtime=containerd"},
expectedToolkitConfig: `accept-nvidia-visible-devices-as-volume-mounts = false
accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
swarm-resource = ""
[nvidia-container-cli]
debug = ""
environment = []
ldcache = ""
ldconfig = "@/run/nvidia/driver/sbin/ldconfig"
load-kmods = true
no-cgroups = false
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-cli"
root = "/run/nvidia/driver"
user = ""
[nvidia-container-runtime]
debug = "/dev/null"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime-hook]
path = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime-hook"
skip-mode-detection = true
[nvidia-ctk]
path = "{{ .toolkitRoot }}/toolkit/nvidia-ctk"
`,
expectedRuntimeConfig: `version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-cdi]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-cdi.options]
BinaryName = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.cdi"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-legacy]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-legacy.options]
BinaryName = "{{ .toolkitRoot }}/toolkit/nvidia-container-runtime.legacy"
`,
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
testRoot := t.TempDir()
cdiOutputDir := filepath.Join(testRoot, "/var/run/cdi")
runtimeConfigFile := filepath.Join(testRoot, "config.file")
toolkitRoot := filepath.Join(testRoot, "toolkit-test")
toolkitConfigFile := filepath.Join(toolkitRoot, "toolkit/.config/nvidia-container-runtime/config.toml")
app := NewApp(logger, toolkitRoot)
testArgs := []string{
"nvidia-ctk-installer",
"--no-daemon",
"--cdi-output-dir=" + cdiOutputDir,
"--config=" + runtimeConfigFile,
"--create-device-nodes=none",
"--driver-root-ctr-path=" + hostRoot,
"--pid-file=" + filepath.Join(testRoot, "toolkit.pid"),
"--restart-mode=none",
"--source-root=" + filepath.Join(artifactRoot, "deb"),
}
err := app.Run(append(testArgs, tc.args...))
require.NoError(t, err)
require.FileExists(t, toolkitConfigFile)
toolkitConfigFileContents, err := os.ReadFile(toolkitConfigFile)
require.NoError(t, err)
require.EqualValues(t, strings.ReplaceAll(tc.expectedToolkitConfig, "{{ .toolkitRoot }}", toolkitRoot), string(toolkitConfigFileContents))
require.FileExists(t, runtimeConfigFile)
runtimeConfigFileContents, err := os.ReadFile(runtimeConfigFile)
require.NoError(t, err)
require.EqualValues(t, strings.ReplaceAll(tc.expectedRuntimeConfig, "{{ .toolkitRoot }}", toolkitRoot), string(runtimeConfigFileContents))
})
}
}

View File

@@ -16,9 +16,34 @@ nvidia-ctk runtime configure --set-as-default
will ensure that the NVIDIA Container Runtime is added as the default runtime to the default container
engine.
## Configure the NVIDIA Container Toolkit
The `config` command of the `nvidia-ctk` CLI allows a user to display and manipulate the NVIDIA Container Toolkit
configuration.
For example, running the following command:
```bash
nvidia-ctk config default
```
will display the default config for the detected platform.
Whereas
```bash
nvidia-ctk config
```
will display the effective NVIDIA Container Toolkit config using the configured config file, and running:
Individual config options can be set by specifying these are key-value pairs to the `--set` argument:
```bash
nvidia-ctk config --set nvidia-container-cli.no-cgroups=true
```
By default, all commands output to `STDOUT`, but specifying the `--output` flag writes the config to the specified file.
### Generate CDI specifications
The [Container Device Interface (CDI)](https://github.com/container-orchestrated-devices/container-device-interface) provides
The [Container Device Interface (CDI)](https://tags.cncf.io/container-device-interface) provides
a vendor-agnostic mechanism to make arbitrary devices accessible in containerized environments. To allow NVIDIA devices to be
used in these environments, the NVIDIA Container Toolkit CLI includes functionality to generate a CDI specification for the
available NVIDIA GPUs in a system.

View File

@@ -17,17 +17,20 @@
package cdi
import (
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/generate"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/generate"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/list"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/transform"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type command struct {
logger *logrus.Logger
logger logger.Interface
}
// NewCommand constructs an info command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
@@ -44,6 +47,8 @@ func (m command) build() *cli.Command {
hook.Subcommands = []*cli.Command{
generate.NewCommand(m.logger),
transform.NewCommand(m.logger),
list.NewCommand(m.logger),
}
return &hook

View File

@@ -1,57 +0,0 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package generate
import (
"path/filepath"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/sirupsen/logrus"
)
// deviceDiscoverer defines a discoverer for device nodes
type deviceDiscoverer struct {
logger *logrus.Logger
root string
deviceNodePaths []string
}
var _ discover.Discover = (*deviceDiscoverer)(nil)
// Devices returns the device nodes for the full GPU.
func (d *deviceDiscoverer) Devices() ([]discover.Device, error) {
var deviceNodes []discover.Device
for _, dn := range d.deviceNodePaths {
deviceNode := discover.Device{
HostPath: filepath.Join(d.root, dn),
Path: dn,
}
deviceNodes = append(deviceNodes, deviceNode)
}
return deviceNodes, nil
}
// Hooks returns no hooks for a device discoverer
func (d *deviceDiscoverer) Hooks() ([]discover.Hook, error) {
return nil, nil
}
// Mounts returns no mounts for a device discoverer
func (d *deviceDiscoverer) Mounts() ([]discover.Mount, error) {
return nil, nil
}

View File

@@ -1,171 +0,0 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package generate
import (
"fmt"
"path/filepath"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/ldcache"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/sirupsen/logrus"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvml"
)
type driverLibraries struct {
logger *logrus.Logger
root string
libraries []string
}
var _ discover.Discover = (*driverLibraries)(nil)
// NewDriverDiscoverer creates a discoverer for the libraries and binaries associated with a driver installation.
// The supplied NVML Library is used to query the expected driver version.
func NewDriverDiscoverer(logger *logrus.Logger, root string, nvmllib nvml.Interface) (discover.Discover, error) {
version, r := nvmllib.SystemGetDriverVersion()
if r != nvml.SUCCESS {
return nil, fmt.Errorf("failed to determine driver version: %v", r)
}
libraries, err := NewDriverLibraryDiscoverer(logger, root, version)
if err != nil {
return nil, fmt.Errorf("failed to create discoverer for driver libraries: %v", err)
}
firmwares := NewDriverFirmwareDiscoverer(logger, root, version)
binaries := NewDriverBinariesDiscoverer(logger, root)
d := discover.Merge(
libraries,
firmwares,
binaries,
)
return d, nil
}
// NewDriverLibraryDiscoverer creates a discoverer for the libraries associated with the specified driver version.
func NewDriverLibraryDiscoverer(logger *logrus.Logger, root string, version string) (discover.Discover, error) {
libraries, err := findVersionLibs(logger, root, version)
if err != nil {
return nil, fmt.Errorf("failed to get libraries for driver version: %v", err)
}
d := driverLibraries{
logger: logger,
root: root,
libraries: libraries,
}
return &d, nil
}
// NewDriverFirmwareDiscoverer creates a discoverer for GSP firmware associated with the specified driver version.
func NewDriverFirmwareDiscoverer(logger *logrus.Logger, root string, version string) discover.Discover {
gspFirmwarePath := filepath.Join("/lib/firmware/nvidia", version, "gsp.bin")
return discover.NewMounts(
logger,
lookup.NewFileLocator(
lookup.WithLogger(logger),
lookup.WithRoot(root),
),
root,
[]string{gspFirmwarePath},
)
}
// NewDriverBinariesDiscoverer creates a discoverer for GSP firmware associated with the GPU driver.
func NewDriverBinariesDiscoverer(logger *logrus.Logger, root string) discover.Discover {
return discover.NewMounts(
logger,
lookup.NewExecutableLocator(logger, root),
root,
[]string{
"nvidia-smi", /* System management interface */
"nvidia-debugdump", /* GPU coredump utility */
"nvidia-persistenced", /* Persistence mode utility */
"nvidia-cuda-mps-control", /* Multi process service CLI */
"nvidia-cuda-mps-server", /* Multi process service server */
},
)
}
// Devices are empty for this discoverer
func (d *driverLibraries) Devices() ([]discover.Device, error) {
return nil, nil
}
// Mounts returns the mounts for the driver libraries
func (d *driverLibraries) Mounts() ([]discover.Mount, error) {
var mounts []discover.Mount
for _, d := range d.libraries {
mount := discover.Mount{
HostPath: d,
Path: d,
}
mounts = append(mounts, mount)
}
return mounts, nil
}
// Hooks returns a hook that updates the LDCache for the specified driver library paths.
func (d *driverLibraries) Hooks() ([]discover.Hook, error) {
locator := lookup.NewExecutableLocator(d.logger, d.root)
hook := discover.CreateLDCacheUpdateHook(
d.logger,
locator,
nvidiaCTKExecutable,
nvidiaCTKDefaultFilePath,
d.libraries,
)
return []discover.Hook{hook}, nil
}
func findVersionLibs(logger *logrus.Logger, root string, version string) ([]string, error) {
logger.Infof("Using driver version %v", version)
cache, err := ldcache.New(logger, root)
if err != nil {
return nil, fmt.Errorf("failed to load ldcache: %v", err)
}
libs32, libs64 := cache.List()
var libs []string
for _, l := range libs64 {
if strings.HasSuffix(l, version) {
logger.Infof("found 64-bit driver lib: %v", l)
libs = append(libs, l)
}
}
for _, l := range libs32 {
if strings.HasSuffix(l, version) {
logger.Infof("found 32-bit driver lib: %v", l)
libs = append(libs, l)
}
}
return libs, nil
}

View File

@@ -1,148 +0,0 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package generate
import (
"fmt"
"os"
"path/filepath"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info/drm"
"github.com/sirupsen/logrus"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvlib/device"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvml"
)
// fullGPUDiscoverer wraps a deviceDiscoverer and adds specifics required for discovering full GPUs
type fullGPUDiscoverer struct {
deviceDiscoverer
pciBusID string
}
var _ discover.Discover = (*fullGPUDiscoverer)(nil)
// NewFullGPUDiscoverer creates a discoverer for the full GPU defined by the specified device.
func NewFullGPUDiscoverer(logger *logrus.Logger, root string, d device.Device) (discover.Discover, error) {
// TODO: The functionality to get device paths should be integrated into the go-nvlib/pkg/device.Device interface.
// This will allow reuse here and in other code where the paths are queried such as the NVIDIA device plugin.
minor, ret := d.GetMinorNumber()
if ret != nvml.SUCCESS {
return nil, fmt.Errorf("error getting GPU device minor number: %v", ret)
}
path := fmt.Sprintf("/dev/nvidia%d", minor)
pciInfo, ret := d.GetPciInfo()
if ret != nvml.SUCCESS {
return nil, fmt.Errorf("error getting PCI info for device: %v", ret)
}
pciBusID := getBusID(pciInfo)
drmDeviceNodes, err := drm.GetDeviceNodesByBusID(pciBusID)
if err != nil {
return nil, fmt.Errorf("failed to determine DRM devices for %v: %v", pciBusID, err)
}
deviceNodePaths := append([]string{path}, drmDeviceNodes...)
device := fullGPUDiscoverer{
deviceDiscoverer: deviceDiscoverer{
logger: logger,
root: root,
deviceNodePaths: deviceNodePaths,
},
pciBusID: pciBusID,
}
return &device, nil
}
// Hooks returns the hooks for the GPU device.
// The following hooks are detected:
// 1. A hook to create /dev/dri/by-path symlinks
func (d *fullGPUDiscoverer) Hooks() ([]discover.Hook, error) {
links, err := d.deviceNodeLinks()
if err != nil {
return nil, fmt.Errorf("failed to discover DRA device links: %v", err)
}
if len(links) == 0 {
return nil, nil
}
hookPath := "nvidia-ctk"
args := []string{hookPath, "hook", "create-symlinks"}
for _, l := range links {
args = append(args, "--link", l)
}
var hooks []discover.Hook
hook := discover.Hook{
Lifecycle: "createContainer",
Path: hookPath,
Args: args,
}
hooks = append(hooks, hook)
return hooks, nil
}
// Mounts returns an empty slice for a full GPU
func (d *fullGPUDiscoverer) Mounts() ([]discover.Mount, error) {
return nil, nil
}
func (d *fullGPUDiscoverer) deviceNodeLinks() ([]string, error) {
candidates := []string{
fmt.Sprintf("/dev/dri/by-path/pci-%s-card", d.pciBusID),
fmt.Sprintf("/dev/dri/by-path/pci-%s-render", d.pciBusID),
}
var links []string
for _, c := range candidates {
linkPath := filepath.Join(d.root, c)
device, err := os.Readlink(linkPath)
if err != nil {
d.logger.Warningf("Failed to evaluate symlink %v; ignoring", linkPath)
continue
}
d.logger.Debugf("adding device symlink %v -> %v", linkPath, device)
links = append(links, fmt.Sprintf("%v::%v", device, linkPath))
}
return links, nil
}
// getBusID provides a utility function that returns the string representation of the bus ID.
func getBusID(p nvml.PciInfo) string {
var bytes []byte
for _, b := range p.BusId {
if byte(b) == '\x00' {
break
}
bytes = append(bytes, byte(b))
}
id := strings.ToLower(string(bytes))
if id != "0000" {
id = strings.TrimPrefix(id, "0000")
}
return id
}

View File

@@ -18,42 +18,52 @@ package generate
import (
"fmt"
"io"
"os"
"path/filepath"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/edits"
"github.com/container-orchestrated-devices/container-device-interface/pkg/cdi"
specs "github.com/container-orchestrated-devices/container-device-interface/specs-go"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvlib/device"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvml"
"sigs.k8s.io/yaml"
cdi "tags.cncf.io/container-device-interface/pkg/parser"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/platform-support/tegra/csv"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi/spec"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi/transform"
)
const (
nvidiaCTKExecutable = "nvidia-ctk"
nvidiaCTKDefaultFilePath = "/usr/bin/" + nvidiaCTKExecutable
formatJSON = "json"
formatYAML = "yaml"
allDeviceName = "all"
)
type command struct {
logger *logrus.Logger
logger logger.Interface
}
type config struct {
output string
format string
root string
type options struct {
output string
format string
deviceNameStrategies cli.StringSlice
driverRoot string
devRoot string
nvidiaCDIHookPath string
ldconfigPath string
mode string
vendor string
class string
configSearchPaths cli.StringSlice
librarySearchPaths cli.StringSlice
csv struct {
files cli.StringSlice
ignorePatterns cli.StringSlice
}
}
// NewCommand constructs a generate-cdi command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
@@ -62,283 +72,228 @@ func NewCommand(logger *logrus.Logger) *cli.Command {
// build creates the CLI command
func (m command) build() *cli.Command {
cfg := config{}
opts := options{}
// Create the 'generate-cdi' command
c := cli.Command{
Name: "generate",
Usage: "Generate CDI specifications for use with CDI-enabled runtimes",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &cfg)
return m.validateFlags(c, &opts)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
return m.run(c, &opts)
},
}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "config-search-path",
Usage: "Specify the path to search for config files when discovering the entities that should be included in the CDI specification.",
Destination: &opts.configSearchPaths,
},
&cli.StringFlag{
Name: "output",
Usage: "Specify the file to output the generated CDI specification to. If this is '' the specification is output to STDOUT",
Destination: &cfg.output,
Destination: &opts.output,
},
&cli.StringFlag{
Name: "format",
Usage: "The output format for the generated spec [json | yaml]. This overrides the format defined by the output file extension (if specified).",
Value: formatYAML,
Destination: &cfg.format,
Value: spec.FormatYAML,
Destination: &opts.format,
},
&cli.StringFlag{
Name: "root",
Usage: "Specify the root to use when discovering the entities that should be included in the CDI specification.",
Destination: &cfg.root,
Name: "mode",
Aliases: []string{"discovery-mode"},
Usage: "The mode to use when discovering the available entities. " +
"One of [" + strings.Join(nvcdi.AllModes[string](), " | ") + "]. " +
"If mode is set to 'auto' the mode will be determined based on the system configuration.",
Value: string(nvcdi.ModeAuto),
Destination: &opts.mode,
},
&cli.StringFlag{
Name: "dev-root",
Usage: "Specify the root where `/dev` is located. If this is not specified, the driver-root is assumed.",
Destination: &opts.devRoot,
},
&cli.StringSliceFlag{
Name: "device-name-strategy",
Usage: "Specify the strategy for generating device names. If this is specified multiple times, the devices will be duplicated for each strategy. One of [index | uuid | type-index]",
Value: cli.NewStringSlice(nvcdi.DeviceNameStrategyIndex, nvcdi.DeviceNameStrategyUUID),
Destination: &opts.deviceNameStrategies,
},
&cli.StringFlag{
Name: "driver-root",
Usage: "Specify the NVIDIA GPU driver root to use when discovering the entities that should be included in the CDI specification.",
Destination: &opts.driverRoot,
},
&cli.StringSliceFlag{
Name: "library-search-path",
Usage: "Specify the path to search for libraries when discovering the entities that should be included in the CDI specification.\n\tNote: This option only applies to CSV mode.",
Destination: &opts.librarySearchPaths,
},
&cli.StringFlag{
Name: "nvidia-cdi-hook-path",
Aliases: []string{"nvidia-ctk-path"},
Usage: "Specify the path to use for the nvidia-cdi-hook in the generated CDI specification. " +
"If not specified, the PATH will be searched for `nvidia-cdi-hook`. " +
"NOTE: That if this is specified as `nvidia-ctk`, the PATH will be searched for `nvidia-ctk` instead.",
Destination: &opts.nvidiaCDIHookPath,
},
&cli.StringFlag{
Name: "ldconfig-path",
Usage: "Specify the path to use for ldconfig in the generated CDI specification",
Destination: &opts.ldconfigPath,
},
&cli.StringFlag{
Name: "vendor",
Aliases: []string{"cdi-vendor"},
Usage: "the vendor string to use for the generated CDI specification.",
Value: "nvidia.com",
Destination: &opts.vendor,
},
&cli.StringFlag{
Name: "class",
Aliases: []string{"cdi-class"},
Usage: "the class string to use for the generated CDI specification.",
Value: "gpu",
Destination: &opts.class,
},
&cli.StringSliceFlag{
Name: "csv.file",
Usage: "The path to the list of CSV files to use when generating the CDI specification in CSV mode.",
Value: cli.NewStringSlice(csv.DefaultFileList()...),
Destination: &opts.csv.files,
},
&cli.StringSliceFlag{
Name: "csv.ignore-pattern",
Usage: "Specify a pattern the CSV mount specifications.",
Destination: &opts.csv.ignorePatterns,
},
}
return &c
}
func (m command) validateFlags(r *cli.Context, cfg *config) error {
cfg.format = strings.ToLower(cfg.format)
switch cfg.format {
case formatJSON:
case formatYAML:
func (m command) validateFlags(c *cli.Context, opts *options) error {
opts.format = strings.ToLower(opts.format)
switch opts.format {
case spec.FormatJSON:
case spec.FormatYAML:
default:
return fmt.Errorf("invalid output format: %v", cfg.format)
return fmt.Errorf("invalid output format: %v", opts.format)
}
opts.mode = strings.ToLower(opts.mode)
if !nvcdi.IsValidMode(opts.mode) {
return fmt.Errorf("invalid discovery mode: %v", opts.mode)
}
for _, strategy := range opts.deviceNameStrategies.Value() {
_, err := nvcdi.NewDeviceNamer(strategy)
if err != nil {
return err
}
}
opts.nvidiaCDIHookPath = config.ResolveNVIDIACDIHookPath(m.logger, opts.nvidiaCDIHookPath)
if outputFileFormat := formatFromFilename(opts.output); outputFileFormat != "" {
m.logger.Debugf("Inferred output format as %q from output file name", outputFileFormat)
if !c.IsSet("format") {
opts.format = outputFileFormat
} else if outputFileFormat != opts.format {
m.logger.Warningf("Requested output format %q does not match format implied by output file name: %q", opts.format, outputFileFormat)
}
}
if err := cdi.ValidateVendorName(opts.vendor); err != nil {
return fmt.Errorf("invalid CDI vendor name: %v", err)
}
if err := cdi.ValidateClassName(opts.class); err != nil {
return fmt.Errorf("invalid CDI class name: %v", err)
}
return nil
}
func (m command) run(c *cli.Context, cfg *config) error {
spec, err := m.generateSpec(cfg.root)
func (m command) run(c *cli.Context, opts *options) error {
spec, err := m.generateSpec(opts)
if err != nil {
return fmt.Errorf("failed to generate CDI spec: %v", err)
}
m.logger.Infof("Generated CDI spec with version %v", spec.Raw().Version)
var outputTo io.Writer
if cfg.output == "" {
outputTo = os.Stdout
} else {
err := createParentDirsIfRequired(cfg.output)
if opts.output == "" {
_, err := spec.WriteTo(os.Stdout)
if err != nil {
return fmt.Errorf("failed to create parent folders for output file: %v", err)
return fmt.Errorf("failed to write CDI spec to STDOUT: %v", err)
}
outputFile, err := os.Create(cfg.output)
if err != nil {
return fmt.Errorf("failed to create output file: %v", err)
}
defer outputFile.Close()
outputTo = outputFile
return nil
}
if outputFileFormat := formatFromFilename(cfg.output); outputFileFormat != "" {
m.logger.Debugf("Inferred output format as %q from output file name", outputFileFormat)
if !c.IsSet("format") {
cfg.format = outputFileFormat
} else if outputFileFormat != cfg.format {
m.logger.Warningf("Requested output format %q does not match format implied by output file name: %q", cfg.format, outputFileFormat)
}
}
data, err := yaml.Marshal(spec)
if err != nil {
return fmt.Errorf("failed to marshal CDI spec: %v", err)
}
if strings.ToLower(cfg.format) == formatJSON {
data, err = yaml.YAMLToJSONStrict(data)
if err != nil {
return fmt.Errorf("failed to convert CDI spec from YAML to JSON: %v", err)
}
}
err = writeToOutput(cfg.format, data, outputTo)
if err != nil {
return fmt.Errorf("failed to write output: %v", err)
}
return nil
return spec.Save(opts.output)
}
func formatFromFilename(filename string) string {
ext := filepath.Ext(filename)
switch strings.ToLower(ext) {
case ".json":
return formatJSON
case ".yaml":
return formatYAML
case ".yml":
return formatYAML
return spec.FormatJSON
case ".yaml", ".yml":
return spec.FormatYAML
}
return ""
}
func writeToOutput(format string, data []byte, output io.Writer) error {
if format == formatYAML {
_, err := output.Write([]byte("---\n"))
func (m command) generateSpec(opts *options) (spec.Interface, error) {
var deviceNamers []nvcdi.DeviceNamer
for _, strategy := range opts.deviceNameStrategies.Value() {
deviceNamer, err := nvcdi.NewDeviceNamer(strategy)
if err != nil {
return fmt.Errorf("failed to write YAML separator: %v", err)
return nil, fmt.Errorf("failed to create device namer: %v", err)
}
deviceNamers = append(deviceNamers, deviceNamer)
}
_, err := output.Write(data)
cdilib, err := nvcdi.New(
nvcdi.WithLogger(m.logger),
nvcdi.WithDriverRoot(opts.driverRoot),
nvcdi.WithDevRoot(opts.devRoot),
nvcdi.WithNVIDIACDIHookPath(opts.nvidiaCDIHookPath),
nvcdi.WithLdconfigPath(opts.ldconfigPath),
nvcdi.WithDeviceNamers(deviceNamers...),
nvcdi.WithMode(opts.mode),
nvcdi.WithConfigSearchPaths(opts.configSearchPaths.Value()),
nvcdi.WithLibrarySearchPaths(opts.librarySearchPaths.Value()),
nvcdi.WithCSVFiles(opts.csv.files.Value()),
nvcdi.WithCSVIgnorePatterns(opts.csv.ignorePatterns.Value()),
)
if err != nil {
return fmt.Errorf("failed to write data: %v", err)
return nil, fmt.Errorf("failed to create CDI library: %v", err)
}
return nil
}
func (m command) generateSpec(root string) (*specs.Spec, error) {
nvmllib := nvml.New()
if r := nvmllib.Init(); r != nvml.SUCCESS {
return nil, r
}
defer nvmllib.Shutdown()
devicelib := device.New(device.WithNvml(nvmllib))
deviceSpecs, err := m.generateDeviceSpecs(devicelib, root)
deviceSpecs, err := cdilib.GetAllDeviceSpecs()
if err != nil {
return nil, fmt.Errorf("failed to create device CDI specs: %v", err)
}
allDevice := createAllDevice(deviceSpecs)
deviceSpecs = append(deviceSpecs, allDevice)
allEdits := edits.NewContainerEdits()
ipcs, err := NewIPCDiscoverer(m.logger, root)
commonEdits, err := cdilib.GetCommonEdits()
if err != nil {
return nil, fmt.Errorf("failed to create discoverer for IPC sockets: %v", err)
return nil, fmt.Errorf("failed to create edits common for entities: %v", err)
}
ipcEdits, err := edits.FromDiscoverer(ipcs)
if err != nil {
return nil, fmt.Errorf("failed to create container edits for IPC sockets: %v", err)
}
// TODO: We should not have to update this after the fact
for _, s := range ipcEdits.Mounts {
s.Options = append(s.Options, "noexec")
}
allEdits.Append(ipcEdits)
common, err := NewCommonDiscoverer(m.logger, root, nvmllib)
if err != nil {
return nil, fmt.Errorf("failed to create discoverer for common entities: %v", err)
}
deviceFolderPermissionHooks, err := NewDeviceFolderPermissionHookDiscoverer(m.logger, root, deviceSpecs)
if err != nil {
return nil, fmt.Errorf("failed to generated permission hooks for device nodes: %v", err)
}
commonEdits, err := edits.FromDiscoverer(discover.Merge(common, deviceFolderPermissionHooks))
if err != nil {
return nil, fmt.Errorf("failed to create container edits for common entities: %v", err)
}
allEdits.Append(commonEdits)
// Construct the spec
// TODO: Use the code to determine the minimal version
spec := specs.Spec{
Version: "0.4.0",
Kind: "nvidia.com/gpu",
Devices: deviceSpecs,
ContainerEdits: *allEdits.ContainerEdits,
}
return &spec, nil
}
func (m command) generateDeviceSpecs(devicelib device.Interface, root string) ([]specs.Device, error) {
var deviceSpecs []specs.Device
err := devicelib.VisitDevices(func(i int, d device.Device) error {
isMigEnabled, err := d.IsMigEnabled()
if err != nil {
return fmt.Errorf("failed to check whether device is MIG device: %v", err)
}
if isMigEnabled {
return nil
}
device, err := NewFullGPUDiscoverer(m.logger, root, d)
if err != nil {
return fmt.Errorf("failed to create device: %v", err)
}
deviceEdits, err := edits.FromDiscoverer(device)
if err != nil {
return fmt.Errorf("failed to create container edits for device: %v", err)
}
deviceSpec := specs.Device{
Name: fmt.Sprintf("gpu%d", i),
ContainerEdits: *deviceEdits.ContainerEdits,
}
deviceSpecs = append(deviceSpecs, deviceSpec)
return nil
})
if err != nil {
return nil, fmt.Errorf("failed to generate CDI spec for GPU devices: %v", err)
}
err = devicelib.VisitMigDevices(func(i int, d device.Device, j int, mig device.MigDevice) error {
device, err := NewMigDeviceDiscoverer(m.logger, "", d, mig)
if err != nil {
return fmt.Errorf("failed to create MIG device: %v", err)
}
deviceEdits, err := edits.FromDiscoverer(device)
if err != nil {
return fmt.Errorf("failed to create container edits for MIG device: %v", err)
}
deviceSpec := specs.Device{
Name: fmt.Sprintf("mig%v:%v", i, j),
ContainerEdits: *deviceEdits.ContainerEdits,
}
deviceSpecs = append(deviceSpecs, deviceSpec)
return nil
})
if err != nil {
return nil, fmt.Errorf("falied to generate CDI spec for MIG devices: %v", err)
}
return deviceSpecs, nil
}
// createAllDevice creates an 'all' device which combines the edits from the previous devices
func createAllDevice(deviceSpecs []specs.Device) specs.Device {
edits := edits.NewContainerEdits()
for _, d := range deviceSpecs {
edit := cdi.ContainerEdits{
ContainerEdits: &d.ContainerEdits,
}
edits.Append(&edit)
}
all := specs.Device{
Name: "all",
ContainerEdits: *edits.ContainerEdits,
}
return all
}
// createParentDirsIfRequired creates the parent folders of the specified path if requried.
// Note that MkdirAll does not specifically check whether the specified path is non-empty and raises an error if it is.
// The path will be empty if filename in the current folder is specified, for example
func createParentDirsIfRequired(filename string) error {
dir := filepath.Dir(filename)
if dir == "" {
return nil
}
return os.MkdirAll(dir, 0755)
return spec.New(
spec.WithVendor(opts.vendor),
spec.WithClass(opts.class),
spec.WithDeviceSpecs(deviceSpecs),
spec.WithEdits(*commonEdits.ContainerEdits),
spec.WithFormat(opts.format),
spec.WithMergedDeviceOptions(
transform.WithName(allDeviceName),
transform.WithSkipIfExists(true),
),
spec.WithPermissions(0644),
)
}

View File

@@ -1,84 +0,0 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package generate
import (
"fmt"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/nvcaps"
"github.com/sirupsen/logrus"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvlib/device"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvml"
)
// migDeviceDiscoverer wraps a deviceDiscoverer and adds specifics required for discovering MIG devices.
type migDeviceDiscoverer struct {
deviceDiscoverer
}
var _ discover.Discover = (*migDeviceDiscoverer)(nil)
// NewMigDeviceDiscoverer creates a discoverer for the specified mig device and its parent.
func NewMigDeviceDiscoverer(logger *logrus.Logger, root string, parent device.Device, d device.MigDevice) (discover.Discover, error) {
minor, ret := parent.GetMinorNumber()
if ret != nvml.SUCCESS {
return nil, fmt.Errorf("error getting GPU device minor number: %v", ret)
}
parentPath := fmt.Sprintf("/dev/nvidia%d", minor)
migCaps, err := nvcaps.NewMigCaps()
if err != nil {
return nil, fmt.Errorf("error getting MIG capability device paths: %v", err)
}
gi, ret := d.GetGpuInstanceId()
if ret != nvml.SUCCESS {
return nil, fmt.Errorf("error getting GPU Instance ID: %v", ret)
}
ci, ret := d.GetComputeInstanceId()
if ret != nvml.SUCCESS {
return nil, fmt.Errorf("error getting Compute Instance ID: %v", ret)
}
giCap := nvcaps.NewGPUInstanceCap(minor, gi)
giCapDevicePath, err := migCaps.GetCapDevicePath(giCap)
if err != nil {
return nil, fmt.Errorf("failed to get GI cap device path: %v", err)
}
ciCap := nvcaps.NewComputeInstanceCap(minor, gi, ci)
ciCapDevicePath, err := migCaps.GetCapDevicePath(ciCap)
if err != nil {
return nil, fmt.Errorf("failed to get CI cap device path: %v", err)
}
m := migDeviceDiscoverer{
deviceDiscoverer: deviceDiscoverer{
logger: logger,
root: root,
deviceNodePaths: []string{
parentPath,
giCapDevicePath,
ciCapDevicePath,
},
},
}
return &m, nil
}

View File

@@ -0,0 +1,104 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package list
import (
"errors"
"fmt"
"github.com/urfave/cli/v2"
"tags.cncf.io/container-device-interface/pkg/cdi"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type command struct {
logger logger.Interface
}
type config struct {
cdiSpecDirs cli.StringSlice
}
// NewCommand constructs a cdi list command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build creates the CLI command
func (m command) build() *cli.Command {
cfg := config{}
// Create the command
c := cli.Command{
Name: "list",
Usage: "List the available CDI devices",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &cfg)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "spec-dir",
Usage: "specify the directories to scan for CDI specifications",
Value: cli.NewStringSlice(cdi.DefaultSpecDirs...),
Destination: &cfg.cdiSpecDirs,
},
}
return &c
}
func (m command) validateFlags(c *cli.Context, cfg *config) error {
if len(cfg.cdiSpecDirs.Value()) == 0 {
return errors.New("at least one CDI specification directory must be specified")
}
return nil
}
func (m command) run(c *cli.Context, cfg *config) error {
registry, err := cdi.NewCache(
cdi.WithAutoRefresh(false),
cdi.WithSpecDirs(cfg.cdiSpecDirs.Value()...),
)
if err != nil {
return fmt.Errorf("failed to create CDI cache: %v", err)
}
_ = registry.Refresh()
if errors := registry.GetErrors(); len(errors) > 0 {
m.logger.Warningf("The following registry errors were reported:")
for k, err := range errors {
m.logger.Warningf("%v: %v", k, err)
}
}
devices := registry.ListDevices()
m.logger.Infof("Found %d CDI devices", len(devices))
for _, device := range devices {
fmt.Printf("%s\n", device)
}
return nil
}

View File

@@ -0,0 +1,169 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package root
import (
"fmt"
"io"
"os"
"github.com/urfave/cli/v2"
"tags.cncf.io/container-device-interface/pkg/cdi"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi/spec"
transformroot "github.com/NVIDIA/nvidia-container-toolkit/pkg/nvcdi/transform/root"
)
type command struct {
logger logger.Interface
}
type transformOptions struct {
input string
output string
}
type options struct {
transformOptions
from string
to string
relativeTo string
}
// NewCommand constructs a generate-cdi command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build creates the CLI command
func (m command) build() *cli.Command {
opts := options{}
c := cli.Command{
Name: "root",
Usage: "Apply a root transform to a CDI specification",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &opts)
},
Action: func(c *cli.Context) error {
return m.run(c, &opts)
},
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "from",
Usage: "specify the root to be transformed",
Destination: &opts.from,
},
&cli.StringFlag{
Name: "input",
Usage: "Specify the file to read the CDI specification from. If this is '-' the specification is read from STDIN",
Value: "-",
Destination: &opts.input,
},
&cli.StringFlag{
Name: "output",
Usage: "Specify the file to output the generated CDI specification to. If this is '' the specification is output to STDOUT",
Destination: &opts.output,
},
&cli.StringFlag{
Name: "relative-to",
Usage: "specify whether the transform is relative to the host or to the container. One of [ host | container ]",
Value: "host",
Destination: &opts.relativeTo,
},
&cli.StringFlag{
Name: "to",
Usage: "specify the replacement root. If this is the same as the from root, the transform is a no-op.",
Value: "",
Destination: &opts.to,
},
}
return &c
}
func (m command) validateFlags(c *cli.Context, opts *options) error {
switch opts.relativeTo {
case "host":
case "container":
default:
return fmt.Errorf("invalid --relative-to value: %v", opts.relativeTo)
}
return nil
}
func (m command) run(c *cli.Context, opts *options) error {
spec, err := opts.Load()
if err != nil {
return fmt.Errorf("failed to load CDI specification: %w", err)
}
err = transformroot.New(
transformroot.WithRoot(opts.from),
transformroot.WithTargetRoot(opts.to),
transformroot.WithRelativeTo(opts.relativeTo),
).Transform(spec.Raw())
if err != nil {
return fmt.Errorf("failed to transform CDI specification: %w", err)
}
return opts.Save(spec)
}
// Load lodas the input CDI specification
func (o transformOptions) Load() (spec.Interface, error) {
contents, err := o.getContents()
if err != nil {
return nil, fmt.Errorf("failed to read spec contents: %v", err)
}
raw, err := cdi.ParseSpec(contents)
if err != nil {
return nil, fmt.Errorf("failed to parse CDI spec: %v", err)
}
return spec.New(
spec.WithRawSpec(raw),
)
}
func (o transformOptions) getContents() ([]byte, error) {
if o.input == "-" {
return io.ReadAll(os.Stdin)
}
return os.ReadFile(o.input)
}
// Save saves the CDI specification to the output file
func (o transformOptions) Save(s spec.Interface) error {
if o.output == "" {
_, err := s.WriteTo(os.Stdout)
if err != nil {
return fmt.Errorf("failed to write CDI spec to STDOUT: %v", err)
}
return nil
}
return s.Save(o.output)
}

View File

@@ -0,0 +1,52 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package transform
import (
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/transform/root"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type command struct {
logger logger.Interface
}
// NewCommand constructs a command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build creates the CLI command
func (m command) build() *cli.Command {
c := cli.Command{
Name: "transform",
Usage: "Apply a transform to a CDI specification",
}
c.Flags = []cli.Flag{}
c.Subcommands = []*cli.Command{
root.NewCommand(m.logger),
}
return &c
}

View File

@@ -0,0 +1,244 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package config
import (
"errors"
"fmt"
"reflect"
"strconv"
"strings"
"github.com/urfave/cli/v2"
createdefault "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/config/create-default"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/config/flags"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type command struct {
logger logger.Interface
}
// options stores the subcommand options
type options struct {
flags.Options
setListSeparator string
sets cli.StringSlice
}
// NewCommand constructs an config command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build
func (m command) build() *cli.Command {
opts := options{}
// Create the 'config' command
c := cli.Command{
Name: "config",
Usage: "Interact with the NVIDIA Container Toolkit configuration",
Before: func(ctx *cli.Context) error {
return validateFlags(ctx, &opts)
},
Action: func(ctx *cli.Context) error {
return run(ctx, &opts)
},
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "config-file",
Aliases: []string{"config", "c"},
Usage: "Specify the config file to modify.",
Value: config.GetConfigFilePath(),
Destination: &opts.Config,
},
&cli.StringSliceFlag{
Name: "set",
Usage: "Set a config value using the pattern 'key[=value]'. " +
"Specifying only 'key' is equivalent to 'key=true' for boolean settings. " +
"This flag can be specified multiple times, but only the last value for a specific " +
"config option is applied. " +
"If the setting represents a list, the elements are colon-separated.",
Destination: &opts.sets,
},
&cli.StringFlag{
Name: "set-list-separator",
Usage: "Specify a separator for lists applied using the set command.",
Hidden: true,
Value: ":",
Destination: &opts.setListSeparator,
},
&cli.BoolFlag{
Name: "in-place",
Aliases: []string{"i"},
Usage: "Modify the config file in-place",
Destination: &opts.InPlace,
},
&cli.StringFlag{
Name: "output",
Aliases: []string{"o"},
Usage: "Specify the output file to write to; If not specified, the output is written to stdout",
Destination: &opts.Output,
},
}
c.Subcommands = []*cli.Command{
createdefault.NewCommand(m.logger),
}
return &c
}
func validateFlags(c *cli.Context, opts *options) error {
if opts.setListSeparator == "" {
return fmt.Errorf("set-list-separator must be set")
}
return nil
}
func run(c *cli.Context, opts *options) error {
cfgToml, err := config.New(
config.WithConfigFile(opts.Config),
)
if err != nil {
return fmt.Errorf("unable to create config: %v", err)
}
for _, set := range opts.sets.Value() {
key, value, err := setFlagToKeyValue(set, opts.setListSeparator)
if err != nil {
return fmt.Errorf("invalid --set option %v: %w", set, err)
}
if value == nil {
_ = cfgToml.Delete(key)
} else {
cfgToml.Set(key, value)
}
}
if err := opts.EnsureOutputFolder(); err != nil {
return fmt.Errorf("failed to create output directory: %v", err)
}
output, err := opts.CreateOutput()
if err != nil {
return fmt.Errorf("failed to open output file: %v", err)
}
defer output.Close()
if _, err := cfgToml.Save(output); err != nil {
return fmt.Errorf("failed to save config: %v", err)
}
return nil
}
var errInvalidConfigOption = errors.New("invalid config option")
var errUndefinedField = errors.New("undefined field")
var errInvalidFormat = errors.New("invalid format")
// setFlagToKeyValue converts a --set flag to a key-value pair.
// The set flag is of the form key[=value], with the value being optional if key refers to a
// boolean config option.
func setFlagToKeyValue(setFlag string, setListSeparator string) (string, interface{}, error) {
setParts := strings.SplitN(setFlag, "=", 2)
key := setParts[0]
field, err := getField(key)
if err != nil {
return key, nil, fmt.Errorf("%w: %w", errInvalidConfigOption, err)
}
kind := field.Kind()
if len(setParts) != 2 {
if kind == reflect.Bool || (kind == reflect.Pointer && field.Elem().Kind() == reflect.Bool) {
return key, true, nil
}
return key, nil, fmt.Errorf("%w: expected key=value; got %v", errInvalidFormat, setFlag)
}
value := setParts[1]
if kind == reflect.Pointer && value != "nil" {
kind = field.Elem().Kind()
}
switch kind {
case reflect.Pointer:
return key, nil, nil
case reflect.Bool:
b, err := strconv.ParseBool(value)
if err != nil {
return key, value, fmt.Errorf("%w: %w", errInvalidFormat, err)
}
return key, b, nil
case reflect.String:
return key, value, nil
case reflect.Slice:
valueParts := strings.Split(value, setListSeparator)
switch field.Elem().Kind() {
case reflect.String:
return key, valueParts, nil
case reflect.Int:
var output []int64
for _, v := range valueParts {
vi, err := strconv.ParseInt(v, 10, 0)
if err != nil {
return key, nil, fmt.Errorf("%w: %w", errInvalidFormat, err)
}
output = append(output, vi)
}
return key, output, nil
}
}
return key, nil, fmt.Errorf("unsupported type for %v (%v)", setParts, kind)
}
func getField(key string) (reflect.Type, error) {
s, err := getStruct(reflect.TypeOf(config.Config{}), strings.Split(key, ".")...)
if err != nil {
return nil, err
}
return s.Type, err
}
func getStruct(current reflect.Type, paths ...string) (reflect.StructField, error) {
if len(paths) < 1 {
return reflect.StructField{}, fmt.Errorf("%w: no fields selected", errUndefinedField)
}
tomlField := paths[0]
for i := 0; i < current.NumField(); i++ {
f := current.Field(i)
v, ok := f.Tag.Lookup("toml")
if !ok {
continue
}
if strings.SplitN(v, ",", 2)[0] != tomlField {
continue
}
if len(paths) == 1 {
return f, nil
}
return getStruct(f.Type, paths[1:]...)
}
return reflect.StructField{}, fmt.Errorf("%w: %q", errUndefinedField, tomlField)
}

View File

@@ -0,0 +1,143 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package config
import (
"testing"
"github.com/stretchr/testify/require"
)
func TestSetFlagToKeyValue(t *testing.T) {
// TODO: We need to enable this test again since switching to reflect.
testCases := []struct {
description string
setFlag string
setListSeparator string
expectedKey string
expectedValue interface{}
expectedError error
}{
{
description: "option not present returns an error",
setFlag: "undefined=new-value",
expectedKey: "undefined",
expectedError: errInvalidConfigOption,
},
{
description: "undefined nexted option returns error",
setFlag: "nvidia-container-cli.undefined",
expectedKey: "nvidia-container-cli.undefined",
expectedError: errInvalidConfigOption,
},
{
description: "boolean option assumes true",
setFlag: "disable-require",
expectedKey: "disable-require",
expectedValue: true,
},
{
description: "boolean option returns true",
setFlag: "disable-require=true",
expectedKey: "disable-require",
expectedValue: true,
},
{
description: "boolean option returns false",
setFlag: "disable-require=false",
expectedKey: "disable-require",
expectedValue: false,
},
{
description: "invalid boolean option returns error",
setFlag: "disable-require=something",
expectedKey: "disable-require",
expectedValue: "something",
expectedError: errInvalidFormat,
},
{
description: "string option requires value",
setFlag: "swarm-resource",
expectedKey: "swarm-resource",
expectedValue: nil,
expectedError: errInvalidFormat,
},
{
description: "string option returns value",
setFlag: "swarm-resource=string-value",
expectedKey: "swarm-resource",
expectedValue: "string-value",
},
{
description: "string option returns value with equals",
setFlag: "swarm-resource=string-value=more",
expectedKey: "swarm-resource",
expectedValue: "string-value=more",
},
{
description: "string option treats bool value as string",
setFlag: "swarm-resource=true",
expectedKey: "swarm-resource",
expectedValue: "true",
},
{
description: "string option treats int value as string",
setFlag: "swarm-resource=5",
expectedKey: "swarm-resource",
expectedValue: "5",
},
{
description: "[]string option returns single value",
setFlag: "nvidia-container-cli.environment=string-value",
expectedKey: "nvidia-container-cli.environment",
expectedValue: []string{"string-value"},
},
{
description: "[]string option returns multiple values",
setFlag: "nvidia-container-cli.environment=first,second",
setListSeparator: ",",
expectedKey: "nvidia-container-cli.environment",
expectedValue: []string{"first", "second"},
},
{
description: "[]string option returns values with equals",
setFlag: "nvidia-container-cli.environment=first=1,second=2",
setListSeparator: ",",
expectedKey: "nvidia-container-cli.environment",
expectedValue: []string{"first=1", "second=2"},
},
{
description: "[]string option returns multiple values semi-colon",
setFlag: "nvidia-container-cli.environment=first;second",
setListSeparator: ";",
expectedKey: "nvidia-container-cli.environment",
expectedValue: []string{"first", "second"},
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
if tc.setListSeparator == "" {
tc.setListSeparator = ","
}
k, v, err := setFlagToKeyValue(tc.setFlag, tc.setListSeparator)
require.ErrorIs(t, err, tc.expectedError)
require.EqualValues(t, tc.expectedKey, k)
require.EqualValues(t, tc.expectedValue, v)
})
}
}

View File

@@ -0,0 +1,94 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package defaultsubcommand
import (
"fmt"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/config/flags"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type command struct {
logger logger.Interface
}
// NewCommand constructs a default command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build creates the CLI command
func (m command) build() *cli.Command {
opts := flags.Options{}
// Create the 'default' command
c := cli.Command{
Name: "default",
Aliases: []string{"create-default", "generate-default"},
Usage: "Generate the default NVIDIA Container Toolkit configuration file",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &opts)
},
Action: func(c *cli.Context) error {
return m.run(c, &opts)
},
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "output",
Aliases: []string{"o"},
Usage: "Specify the output file to write to; If not specified, the output is written to stdout",
Destination: &opts.Output,
},
}
return &c
}
func (m command) validateFlags(c *cli.Context, opts *flags.Options) error {
return opts.Validate()
}
func (m command) run(c *cli.Context, opts *flags.Options) error {
cfgToml, err := config.New()
if err != nil {
return fmt.Errorf("unable to load or create config: %v", err)
}
if err := opts.EnsureOutputFolder(); err != nil {
return fmt.Errorf("failed to create output directory: %v", err)
}
output, err := opts.CreateOutput()
if err != nil {
return fmt.Errorf("failed to open output file: %v", err)
}
defer output.Close()
if _, err = cfgToml.Save(output); err != nil {
return fmt.Errorf("failed to write output: %v", err)
}
return nil
}

View File

@@ -0,0 +1,82 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package flags
import (
"fmt"
"io"
"os"
"path/filepath"
)
// Options stores options for the config commands
type Options struct {
Config string
Output string
InPlace bool
}
// Validate checks whether the options are valid.
func (o Options) Validate() error {
if o.InPlace && o.Output != "" {
return fmt.Errorf("cannot specify both --in-place and --output")
}
return nil
}
// GetOutput returns the effective output
func (o Options) GetOutput() string {
if o.InPlace {
return o.Config
}
return o.Output
}
// EnsureOutputFolder creates the output folder if it does not exist.
// If the output folder is not specified (i.e. output to STDOUT), it is ignored.
func (o Options) EnsureOutputFolder() error {
output := o.GetOutput()
if output == "" {
return nil
}
if dir := filepath.Dir(output); dir != "" {
return os.MkdirAll(dir, 0755)
}
return nil
}
// CreateOutput creates the writer for the output.
func (o Options) CreateOutput() (io.WriteCloser, error) {
output := o.GetOutput()
if output == "" {
return nullCloser{os.Stdout}, nil
}
return os.Create(output)
}
// nullCloser is a writer that does nothing on Close.
type nullCloser struct {
io.Writer
}
// Close is a no-op for a nullCloser.
func (d nullCloser) Close() error {
return nil
}

View File

@@ -1,229 +0,0 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package symlinks
import (
"fmt"
"os"
"path/filepath"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover/csv"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
type command struct {
logger *logrus.Logger
}
type config struct {
hostRoot string
filenames cli.StringSlice
links cli.StringSlice
containerSpec string
}
// NewCommand constructs a hook command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build
func (m command) build() *cli.Command {
cfg := config{}
// Create the '' command
c := cli.Command{
Name: "create-symlinks",
Usage: "A hook to create symlinks in the container. This can be used to proces CSV mount specs",
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "host-root",
Usage: "The root on the host filesystem to use to resolve symlinks",
Destination: &cfg.hostRoot,
},
&cli.StringSliceFlag{
Name: "csv-filename",
Usage: "Specify a (CSV) filename to process",
Destination: &cfg.filenames,
},
&cli.StringSliceFlag{
Name: "link",
Usage: "Specify a specific link to create. The link is specified as target::link",
Destination: &cfg.links,
},
&cli.StringFlag{
Name: "container-spec",
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN",
Destination: &cfg.containerSpec,
},
}
return &c
}
func (m command) run(c *cli.Context, cfg *config) error {
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
return fmt.Errorf("failed to load container state: %v", err)
}
containerRoot, err := s.GetContainerRoot()
if err != nil {
return fmt.Errorf("failed to determined container root: %v", err)
}
csvFiles := cfg.filenames.Value()
chainLocator := lookup.NewSymlinkChainLocator(m.logger, cfg.hostRoot)
var candidates []string
for _, file := range csvFiles {
mountSpecs, err := csv.NewCSVFileParser(m.logger, file).Parse()
if err != nil {
m.logger.Debugf("Skipping CSV file %v: %v", file, err)
continue
}
for _, ms := range mountSpecs {
if ms.Type != csv.MountSpecSym {
continue
}
targets, err := chainLocator.Locate(ms.Path)
if err != nil {
m.logger.Warnf("Failed to locate symlink %v", ms.Path)
}
candidates = append(candidates, targets...)
}
}
created := make(map[string]bool)
// candidates is a list of absolute paths to symlinks in a chain, or the final target of the chain.
for _, candidate := range candidates {
targets, err := m.Locate(candidate)
if err != nil {
m.logger.Debugf("Skipping invalid link: %v", err)
continue
} else if len(targets) != 1 {
m.logger.Debugf("Unexepected number of targets: %v", targets)
continue
} else if targets[0] == candidate {
m.logger.Debugf("%v is not a symlink", candidate)
continue
}
err = m.createLink(created, cfg.hostRoot, containerRoot, targets[0], candidate)
if err != nil {
m.logger.Warnf("Failed to create link %v: %v", []string{targets[0], candidate}, err)
}
}
links := cfg.links.Value()
for _, l := range links {
parts := strings.Split(l, "::")
if len(parts) != 2 {
m.logger.Warnf("Invalid link specification %v", l)
continue
}
err := m.createLink(created, cfg.hostRoot, containerRoot, parts[0], parts[1])
if err != nil {
m.logger.Warnf("Failed to create link %v: %v", parts, err)
}
}
return nil
}
func (m command) createLink(created map[string]bool, hostRoot string, containerRoot string, target string, link string) error {
linkPath, err := changeRoot(hostRoot, containerRoot, link)
if err != nil {
m.logger.Warnf("Failed to resolve path for link %v relative to %v: %v", link, containerRoot, err)
}
if created[linkPath] {
m.logger.Debugf("Link %v already created", linkPath)
return nil
}
targetPath, err := changeRoot(hostRoot, "/", target)
if err != nil {
m.logger.Warnf("Failed to resolve path for target %v relative to %v: %v", target, "/", err)
}
m.logger.Infof("Symlinking %v to %v", linkPath, targetPath)
err = os.MkdirAll(filepath.Dir(linkPath), 0755)
if err != nil {
return fmt.Errorf("failed to create directory: %v", err)
}
err = os.Symlink(target, linkPath)
if err != nil {
return fmt.Errorf("failed to create symlink: %v", err)
}
return nil
}
func changeRoot(current string, new string, path string) (string, error) {
if !filepath.IsAbs(path) {
return path, nil
}
relative := path
if current != "" {
r, err := filepath.Rel(current, path)
if err != nil {
return "", err
}
relative = r
}
return filepath.Join(new, relative), nil
}
// Locate returns the link target of the specified filename or an empty slice if the
// specified filename is not a symlink.
func (m command) Locate(filename string) ([]string, error) {
info, err := os.Lstat(filename)
if err != nil {
return nil, fmt.Errorf("failed to get file info: %v", info)
}
if info.Mode()&os.ModeSymlink == 0 {
m.logger.Debugf("%v is not a symlink", filename)
return nil, nil
}
target, err := os.Readlink(filename)
if err != nil {
return nil, fmt.Errorf("error checking symlink: %v", err)
}
m.logger.Debugf("Resolved link: '%v' => '%v'", filename, target)
return []string{target}, nil
}

View File

@@ -17,19 +17,18 @@
package hook
import (
chmod "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook/chmod"
symlinks "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook/create-symlinks"
ldcache "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook/update-ldcache"
"github.com/sirupsen/logrus"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-cdi-hook/commands"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/urfave/cli/v2"
)
type hookCommand struct {
logger *logrus.Logger
logger logger.Interface
}
// NewCommand constructs a hook command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
func NewCommand(logger logger.Interface) *cli.Command {
c := hookCommand{
logger: logger,
}
@@ -44,11 +43,7 @@ func (m hookCommand) build() *cli.Command {
Usage: "A collection of hooks that may be injected into an OCI spec",
}
hook.Subcommands = []*cli.Command{
ldcache.NewCommand(m.logger),
symlinks.NewCommand(m.logger),
chmod.NewCommand(m.logger),
}
hook.Subcommands = commands.New(m.logger)
return &hook
}

View File

@@ -1,129 +0,0 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package ldcache
import (
"fmt"
"os"
"path/filepath"
"syscall"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
type command struct {
logger *logrus.Logger
}
type config struct {
folders cli.StringSlice
containerSpec string
}
// NewCommand constructs an update-ldcache command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build the update-ldcache command
func (m command) build() *cli.Command {
cfg := config{}
// Create the 'update-ldcache' command
c := cli.Command{
Name: "update-ldcache",
Usage: "Update ldcache in a container by running ldconfig",
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "folder",
Usage: "Specifiy a folder to add to /etc/ld.so.conf before updating the ld cache",
Destination: &cfg.folders,
},
&cli.StringFlag{
Name: "container-spec",
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN",
Destination: &cfg.containerSpec,
},
}
return &c
}
func (m command) run(c *cli.Context, cfg *config) error {
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
return fmt.Errorf("failed to load container state: %v", err)
}
containerRoot, err := s.GetContainerRoot()
if err != nil {
return fmt.Errorf("failed to determined container root: %v", err)
}
err = m.createConfig(containerRoot, cfg.folders.Value())
if err != nil {
return fmt.Errorf("failed to update ld.so.conf: %v", err)
}
args := []string{"/sbin/ldconfig"}
if containerRoot != "" {
args = append(args, "-r", containerRoot)
}
return syscall.Exec(args[0], args, nil)
}
// createConfig creates (or updates) /etc/ld.so.conf.d/nvcr-<RANDOM_STRING>.conf in the container
// to include the required paths.
func (m command) createConfig(root string, folders []string) error {
if len(folders) == 0 {
m.logger.Debugf("No folders to add to /etc/ld.so.conf")
return nil
}
configFile, err := os.CreateTemp(filepath.Join(root, "/etc/ld.so.conf.d"), "nvcr-*.conf")
if err != nil {
return fmt.Errorf("failed to create config file: %v", err)
}
defer configFile.Close()
m.logger.Debugf("Adding folders %v to %v", folders, configFile.Name())
configured := make(map[string]bool)
for _, folder := range folders {
if configured[folder] {
continue
}
_, err = configFile.WriteString(fmt.Sprintf("%s\n", folder))
if err != nil {
return fmt.Errorf("failed to update ld.so.conf.d: %v", err)
}
configured[folder] = true
}
return nil
}

View File

@@ -17,16 +17,17 @@
package info
import (
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type command struct {
logger *logrus.Logger
logger logger.Interface
}
// NewCommand constructs an info command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
@@ -35,13 +36,13 @@ func NewCommand(logger *logrus.Logger) *cli.Command {
// build
func (m command) build() *cli.Command {
// Create the 'hook' command
hook := cli.Command{
// Create the 'info' command
info := cli.Command{
Name: "info",
Usage: "Provide information about the system",
}
hook.Subcommands = []*cli.Command{}
info.Subcommands = []*cli.Command{}
return &hook
return &info
}

View File

@@ -19,28 +19,33 @@ package main
import (
"os"
"github.com/sirupsen/logrus"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/config"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook"
infoCLI "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/info"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/system"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
log "github.com/sirupsen/logrus"
cli "github.com/urfave/cli/v2"
)
var logger = log.New()
// config defines the options that can be set for the CLI through config files,
// options defines the options that can be set for the CLI through config files,
// environment variables, or command line flags
type config struct {
type options struct {
// Debug indicates whether the CLI is started in "debug" mode
Debug bool
// Quiet indicates whether the CLI is started in "quiet" mode
Quiet bool
}
func main() {
// Create a config struct to hold the parsed environment variables or command line flags
config := config{}
logger := logrus.New()
// Create a options struct to hold the parsed environment variables or command line flags
opts := options{}
// Create the top-level CLI
c := cli.NewApp()
@@ -56,16 +61,25 @@ func main() {
Name: "debug",
Aliases: []string{"d"},
Usage: "Enable debug-level logging",
Destination: &config.Debug,
Destination: &opts.Debug,
EnvVars: []string{"NVIDIA_CTK_DEBUG"},
},
&cli.BoolFlag{
Name: "quiet",
Usage: "Suppress all output except for errors; overrides --debug",
Destination: &opts.Quiet,
EnvVars: []string{"NVIDIA_CTK_QUIET"},
},
}
// Set log-level for all subcommands
c.Before = func(c *cli.Context) error {
logLevel := log.InfoLevel
if config.Debug {
logLevel = log.DebugLevel
logLevel := logrus.InfoLevel
if opts.Debug {
logLevel = logrus.DebugLevel
}
if opts.Quiet {
logLevel = logrus.ErrorLevel
}
logger.SetLevel(logLevel)
return nil
@@ -77,12 +91,14 @@ func main() {
runtime.NewCommand(logger),
infoCLI.NewCommand(logger),
cdi.NewCommand(logger),
system.NewCommand(logger),
config.NewCommand(logger),
}
// Run the CLI
err := c.Run(os.Args)
if err != nil {
log.Errorf("%v", err)
log.Exit(1)
logger.Errorf("%v", err)
os.Exit(1)
}
}

View File

@@ -17,31 +17,45 @@
package configure
import (
"encoding/json"
"fmt"
"os"
"path/filepath"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime/nvidia"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/crio"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/docker"
"github.com/pelletier/go-toml"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/containerd"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/crio"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine/docker"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/ocihook"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/toml"
)
const (
defaultRuntime = "docker"
defaultDockerConfigFilePath = "/etc/docker/daemon.json"
defaultCrioConfigFilePath = "/etc/crio/crio.conf"
// defaultNVIDIARuntimeName is the default name to use in configs for the NVIDIA Container Runtime
defaultNVIDIARuntimeName = "nvidia"
// defaultNVIDIARuntimeExecutable is the default NVIDIA Container Runtime executable file name
defaultNVIDIARuntimeExecutable = "nvidia-container-runtime"
defaultNVIDIARuntimeExpecutablePath = "/usr/bin/nvidia-container-runtime"
defaultNVIDIARuntimeHookExpecutablePath = "/usr/bin/nvidia-container-runtime-hook"
defaultContainerdConfigFilePath = "/etc/containerd/config.toml"
defaultCrioConfigFilePath = "/etc/crio/crio.conf"
defaultDockerConfigFilePath = "/etc/docker/daemon.json"
defaultConfigSource = configSourceFile
configSourceCommand = "command"
configSourceFile = "file"
)
type command struct {
logger *logrus.Logger
logger logger.Interface
}
// NewCommand constructs an configure command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
// NewCommand constructs a configure command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
@@ -54,7 +68,23 @@ type config struct {
dryRun bool
runtime string
configFilePath string
nvidiaOptions nvidia.Options
configSource string
mode string
hookFilePath string
runtimeConfigOverrideJSON string
nvidiaRuntime struct {
name string
path string
hookPath string
setAsDefault bool
}
// cdi-specific options
cdi struct {
enabled bool
}
}
func (m command) build() *cli.Command {
@@ -65,6 +95,9 @@ func (m command) build() *cli.Command {
configure := cli.Command{
Name: "configure",
Usage: "Add a runtime to the specified container engine",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &config)
},
Action: func(c *cli.Context) error {
return m.configureWrapper(c, &config)
},
@@ -78,7 +111,7 @@ func (m command) build() *cli.Command {
},
&cli.StringFlag{
Name: "runtime",
Usage: "the target runtime engine. One of [crio, docker]",
Usage: "the target runtime engine; one of [containerd, crio, docker]",
Value: defaultRuntime,
Destination: &config.runtime,
},
@@ -88,116 +121,235 @@ func (m command) build() *cli.Command {
Destination: &config.configFilePath,
},
&cli.StringFlag{
Name: "nvidia-runtime-name",
Usage: "specify the name of the NVIDIA runtime that will be added",
Value: nvidia.RuntimeName,
Destination: &config.nvidiaOptions.RuntimeName,
Name: "config-mode",
Usage: "the config mode for runtimes that support multiple configuration mechanisms",
Destination: &config.mode,
},
&cli.StringFlag{
Name: "runtime-path",
Name: "config-source",
Usage: "the source to retrieve the container runtime configuration; one of [command, file]\"",
Destination: &config.configSource,
Value: defaultConfigSource,
},
&cli.StringFlag{
Name: "oci-hook-path",
Usage: "the path to the OCI runtime hook to create if --config-mode=oci-hook is specified. If no path is specified, the generated hook is output to STDOUT.\n\tNote: The use of OCI hooks is deprecated.",
Destination: &config.hookFilePath,
},
&cli.StringFlag{
Name: "nvidia-runtime-name",
Usage: "specify the name of the NVIDIA runtime that will be added",
Value: defaultNVIDIARuntimeName,
Destination: &config.nvidiaRuntime.name,
},
&cli.StringFlag{
Name: "nvidia-runtime-path",
Aliases: []string{"runtime-path"},
Usage: "specify the path to the NVIDIA runtime executable",
Value: nvidia.RuntimeExecutable,
Destination: &config.nvidiaOptions.RuntimePath,
Value: defaultNVIDIARuntimeExecutable,
Destination: &config.nvidiaRuntime.path,
},
&cli.StringFlag{
Name: "nvidia-runtime-hook-path",
Usage: "specify the path to the NVIDIA Container Runtime hook executable",
Value: defaultNVIDIARuntimeHookExpecutablePath,
Destination: &config.nvidiaRuntime.hookPath,
},
&cli.BoolFlag{
Name: "set-as-default",
Usage: "set the specified runtime as the default runtime",
Destination: &config.nvidiaOptions.SetAsDefault,
Name: "nvidia-set-as-default",
Aliases: []string{"set-as-default"},
Usage: "set the NVIDIA runtime as the default runtime",
Destination: &config.nvidiaRuntime.setAsDefault,
},
&cli.BoolFlag{
Name: "cdi.enabled",
Aliases: []string{"cdi.enable", "enable-cdi"},
Usage: "Enable CDI in the configured runtime",
Destination: &config.cdi.enabled,
},
}
return &configure
}
func (m command) configureWrapper(c *cli.Context, config *config) error {
func (m command) validateFlags(c *cli.Context, config *config) error {
if config.mode == "oci-hook" {
if !filepath.IsAbs(config.nvidiaRuntime.hookPath) {
return fmt.Errorf("the NVIDIA runtime hook path %q is not an absolute path", config.nvidiaRuntime.hookPath)
}
return nil
}
if config.mode != "" && config.mode != "config-file" {
m.logger.Warningf("Ignoring unsupported config mode for %v: %q", config.runtime, config.mode)
}
config.mode = "config-file"
switch config.runtime {
case "containerd", "crio", "docker":
break
default:
return fmt.Errorf("unrecognized runtime '%v'", config.runtime)
}
switch config.runtime {
case "containerd", "crio":
if config.nvidiaRuntime.path == defaultNVIDIARuntimeExecutable {
config.nvidiaRuntime.path = defaultNVIDIARuntimeExpecutablePath
}
if !filepath.IsAbs(config.nvidiaRuntime.path) {
return fmt.Errorf("the NVIDIA runtime path %q is not an absolute path", config.nvidiaRuntime.path)
}
}
if config.runtime != "containerd" && config.runtime != "docker" {
if config.cdi.enabled {
m.logger.Warningf("Ignoring cdi.enabled flag for %v", config.runtime)
}
config.cdi.enabled = false
}
if config.runtimeConfigOverrideJSON != "" && config.runtime != "containerd" {
m.logger.Warningf("Ignoring runtime-config-override flag for %v", config.runtime)
config.runtimeConfigOverrideJSON = ""
}
switch config.configSource {
case configSourceCommand:
if config.runtime == "docker" {
m.logger.Warningf("A %v Config Source is not supported for %v; using %v", config.configSource, config.runtime, configSourceFile)
config.configSource = configSourceFile
}
case configSourceFile:
break
default:
return fmt.Errorf("unrecognized Config Source: %v", config.configSource)
}
if config.configFilePath == "" {
switch config.runtime {
case "containerd":
config.configFilePath = defaultContainerdConfigFilePath
case "crio":
config.configFilePath = defaultCrioConfigFilePath
case "docker":
config.configFilePath = defaultDockerConfigFilePath
}
}
return nil
}
// configureWrapper updates the specified container engine config to enable the NVIDIA runtime
func (m command) configureWrapper(c *cli.Context, config *config) error {
switch config.mode {
case "oci-hook":
return m.configureOCIHook(c, config)
case "config-file":
return m.configureConfigFile(c, config)
}
return fmt.Errorf("unsupported config-mode: %v", config.mode)
}
// configureConfigFile updates the specified container engine config file to enable the NVIDIA runtime.
func (m command) configureConfigFile(c *cli.Context, config *config) error {
configSource, err := config.resolveConfigSource()
if err != nil {
return err
}
var cfg engine.Interface
switch config.runtime {
case "containerd":
cfg, err = containerd.New(
containerd.WithLogger(m.logger),
containerd.WithPath(config.configFilePath),
containerd.WithConfigSource(configSource),
)
case "crio":
return m.configureCrio(c, config)
cfg, err = crio.New(
crio.WithLogger(m.logger),
crio.WithPath(config.configFilePath),
crio.WithConfigSource(configSource),
)
case "docker":
return m.configureDocker(c, config)
cfg, err = docker.New(
docker.WithLogger(m.logger),
docker.WithPath(config.configFilePath),
)
default:
err = fmt.Errorf("unrecognized runtime '%v'", config.runtime)
}
if err != nil || cfg == nil {
return fmt.Errorf("unable to load config for runtime %v: %v", config.runtime, err)
}
return fmt.Errorf("unrecognized runtime '%v'", config.runtime)
}
// configureDocker updates the docker config to enable the NVIDIA Container Runtime
func (m command) configureDocker(c *cli.Context, config *config) error {
configFilePath := config.configFilePath
if configFilePath == "" {
configFilePath = defaultDockerConfigFilePath
}
cfg, err := docker.LoadConfig(configFilePath)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = docker.UpdateConfig(
cfg,
config.nvidiaOptions.RuntimeName,
config.nvidiaOptions.RuntimePath,
config.nvidiaOptions.SetAsDefault,
err = cfg.AddRuntime(
config.nvidiaRuntime.name,
config.nvidiaRuntime.path,
config.nvidiaRuntime.setAsDefault,
)
if err != nil {
return fmt.Errorf("unable to update config: %v", err)
}
if config.dryRun {
output, err := json.MarshalIndent(cfg, "", " ")
if err != nil {
return fmt.Errorf("unable to convert to JSON: %v", err)
}
os.Stdout.WriteString(fmt.Sprintf("%s\n", output))
return nil
if config.cdi.enabled {
cfg.EnableCDI()
}
err = docker.FlushConfig(cfg, configFilePath)
outputPath := config.getOutputConfigPath()
n, err := cfg.Save(outputPath)
if err != nil {
return fmt.Errorf("unable to flush config: %v", err)
}
m.logger.Infof("Wrote updated config to %v", configFilePath)
m.logger.Infof("It is recommended that the docker daemon be restarted.")
return nil
}
// configureCrio updates the crio config to enable the NVIDIA Container Runtime
func (m command) configureCrio(c *cli.Context, config *config) error {
configFilePath := config.configFilePath
if configFilePath == "" {
configFilePath = defaultCrioConfigFilePath
}
cfg, err := crio.LoadConfig(configFilePath)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = crio.UpdateConfig(
cfg,
config.nvidiaOptions.RuntimeName,
config.nvidiaOptions.RuntimePath,
config.nvidiaOptions.SetAsDefault,
)
if err != nil {
return fmt.Errorf("unable to update config: %v", err)
}
if config.dryRun {
output, err := toml.Marshal(cfg)
if err != nil {
return fmt.Errorf("unable to convert to TOML: %v", err)
if outputPath != "" {
if n == 0 {
m.logger.Infof("Removed empty config from %v", outputPath)
} else {
m.logger.Infof("Wrote updated config to %v", outputPath)
}
os.Stdout.WriteString(fmt.Sprintf("%s\n", output))
return nil
m.logger.Infof("It is recommended that %v daemon be restarted.", config.runtime)
}
err = crio.FlushConfig(configFilePath, cfg)
if err != nil {
return fmt.Errorf("unable to flush config: %v", err)
}
m.logger.Infof("Wrote updated config to %v", configFilePath)
m.logger.Infof("It is recommended that the cri-o daemon be restarted.")
return nil
}
// resolveConfigSource returns the default config source or the user provided config source
func (c *config) resolveConfigSource() (toml.Loader, error) {
switch c.configSource {
case configSourceCommand:
return c.getCommandConfigSource(), nil
case configSourceFile:
return toml.FromFile(c.configFilePath), nil
default:
return nil, fmt.Errorf("unrecognized config source: %s", c.configSource)
}
}
// getConfigSourceCommand returns the default cli command to fetch the current runtime config
func (c *config) getCommandConfigSource() toml.Loader {
switch c.runtime {
case "containerd":
return containerd.CommandLineSource("")
case "crio":
return crio.CommandLineSource("")
}
return toml.Empty
}
// getOutputConfigPath returns the configured config path or "" if dry-run is enabled
func (c *config) getOutputConfigPath() string {
if c.dryRun {
return ""
}
return c.configFilePath
}
// configureOCIHook creates and configures the OCI hook for the NVIDIA runtime
func (m *command) configureOCIHook(c *cli.Context, config *config) error {
err := ocihook.CreateHook(config.hookFilePath, config.nvidiaRuntime.hookPath)
if err != nil {
return fmt.Errorf("error creating OCI hook: %v", err)
}
return nil
}

View File

@@ -1,75 +0,0 @@
/*
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package nvidia
const (
// RuntimeName is the default name to use in configs for the NVIDIA Container Runtime
RuntimeName = "nvidia"
// RuntimeExecutable is the default NVIDIA Container Runtime executable file name
RuntimeExecutable = "nvidia-container-runtime"
)
// Options specifies the options for the NVIDIA Container Runtime w.r.t a container engine such as docker.
type Options struct {
SetAsDefault bool
RuntimeName string
RuntimePath string
}
// Runtime defines an NVIDIA runtime with a name and a executable
type Runtime struct {
Name string
Path string
}
// DefaultRuntime returns the default runtime for the configured options.
// If the configuration is invalid or the default runtimes should not be set
// the empty string is returned.
func (o Options) DefaultRuntime() string {
if !o.SetAsDefault {
return ""
}
return o.RuntimeName
}
// Runtime creates a runtime struct based on the options.
func (o Options) Runtime() Runtime {
path := o.RuntimePath
if o.RuntimePath == "" {
path = RuntimeExecutable
}
r := Runtime{
Name: o.RuntimeName,
Path: path,
}
return r
}
// DockerRuntimesConfig generatest the expected docker config for the specified runtime
func (r Runtime) DockerRuntimesConfig() map[string]interface{} {
runtimes := make(map[string]interface{})
runtimes[r.Name] = map[string]interface{}{
"path": r.Path,
"args": []string{},
}
return runtimes
}

View File

@@ -17,17 +17,18 @@
package runtime
import (
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime/configure"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime/configure"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type runtimeCommand struct {
logger *logrus.Logger
logger logger.Interface
}
// NewCommand constructs a runtime command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
func NewCommand(logger logger.Interface) *cli.Command {
c := runtimeCommand{
logger: logger,
}

View File

@@ -0,0 +1,188 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package devchar
import (
"fmt"
"path/filepath"
"github.com/NVIDIA/go-nvlib/pkg/nvpci"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info/proc/devices"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/nvcaps"
)
type allPossible struct {
logger logger.Interface
devRoot string
deviceMajors devices.Devices
migCaps nvcaps.MigCaps
}
// newAllPossible returns a new allPossible device node lister.
// This lister lists all possible device nodes for NVIDIA GPUs, control devices, and capability devices.
func newAllPossible(logger logger.Interface, devRoot string) (nodeLister, error) {
deviceMajors, err := devices.GetNVIDIADevices()
if err != nil {
return nil, fmt.Errorf("failed reading device majors: %v", err)
}
var requiredMajors []devices.Name
migCaps, err := nvcaps.NewMigCaps()
if err != nil {
return nil, fmt.Errorf("failed to read MIG caps: %v", err)
}
if migCaps == nil {
migCaps = make(nvcaps.MigCaps)
} else {
requiredMajors = append(requiredMajors, devices.NVIDIACaps)
}
requiredMajors = append(requiredMajors, devices.NVIDIAGPU, devices.NVIDIAUVM)
for _, name := range requiredMajors {
if !deviceMajors.Exists(name) {
return nil, fmt.Errorf("missing required device major %s", name)
}
}
l := allPossible{
logger: logger,
devRoot: devRoot,
deviceMajors: deviceMajors,
migCaps: migCaps,
}
return l, nil
}
// DeviceNodes returns a list of all possible device nodes for NVIDIA GPUs, control devices, and capability devices.
func (m allPossible) DeviceNodes() ([]deviceNode, error) {
gpus, err := nvpci.New(
nvpci.WithPCIDevicesRoot(filepath.Join(m.devRoot, nvpci.PCIDevicesRoot)),
nvpci.WithLogger(m.logger),
).GetGPUs()
if err != nil {
return nil, fmt.Errorf("failed to get GPU information: %v", err)
}
count := len(gpus)
if count == 0 {
m.logger.Infof("No NVIDIA devices found in %s", m.devRoot)
return nil, nil
}
deviceNodes, err := m.getControlDeviceNodes()
if err != nil {
return nil, fmt.Errorf("failed to get control device nodes: %v", err)
}
for gpu := 0; gpu < count; gpu++ {
deviceNodes = append(deviceNodes, m.getGPUDeviceNodes(gpu)...)
deviceNodes = append(deviceNodes, m.getNVCapDeviceNodes(gpu)...)
}
return deviceNodes, nil
}
// getControlDeviceNodes generates a list of control devices
func (m allPossible) getControlDeviceNodes() ([]deviceNode, error) {
var deviceNodes []deviceNode
// Define the control devices for standard GPUs.
controlDevices := []deviceNode{
m.newDeviceNode(devices.NVIDIAGPU, "/dev/nvidia-modeset", devices.NVIDIAModesetMinor),
m.newDeviceNode(devices.NVIDIAGPU, "/dev/nvidiactl", devices.NVIDIACTLMinor),
m.newDeviceNode(devices.NVIDIAUVM, "/dev/nvidia-uvm", devices.NVIDIAUVMMinor),
m.newDeviceNode(devices.NVIDIAUVM, "/dev/nvidia-uvm-tools", devices.NVIDIAUVMToolsMinor),
}
deviceNodes = append(deviceNodes, controlDevices...)
for _, migControlDevice := range []nvcaps.MigCap{"config", "monitor"} {
migControlMinor, exist := m.migCaps[migControlDevice]
if !exist {
continue
}
d := m.newDeviceNode(
devices.NVIDIACaps,
migControlMinor.DevicePath(),
int(migControlMinor),
)
deviceNodes = append(deviceNodes, d)
}
return deviceNodes, nil
}
// getGPUDeviceNodes generates a list of device nodes for a given GPU.
func (m allPossible) getGPUDeviceNodes(gpu int) []deviceNode {
d := m.newDeviceNode(
devices.NVIDIAGPU,
fmt.Sprintf("/dev/nvidia%d", gpu),
gpu,
)
return []deviceNode{d}
}
// getNVCapDeviceNodes generates a list of cap device nodes for a given GPU.
func (m allPossible) getNVCapDeviceNodes(gpu int) []deviceNode {
var selectedCapMinors []nvcaps.MigMinor
for gi := 0; ; gi++ {
giCap := nvcaps.NewGPUInstanceCap(gpu, gi)
giMinor, exist := m.migCaps[giCap]
if !exist {
break
}
selectedCapMinors = append(selectedCapMinors, giMinor)
for ci := 0; ; ci++ {
ciCap := nvcaps.NewComputeInstanceCap(gpu, gi, ci)
ciMinor, exist := m.migCaps[ciCap]
if !exist {
break
}
selectedCapMinors = append(selectedCapMinors, ciMinor)
}
}
var deviceNodes []deviceNode
for _, capMinor := range selectedCapMinors {
d := m.newDeviceNode(
devices.NVIDIACaps,
capMinor.DevicePath(),
int(capMinor),
)
deviceNodes = append(deviceNodes, d)
}
return deviceNodes
}
// newDeviceNode creates a new device node with the specified path and major/minor numbers.
// The path is adjusted for the specified driver root.
func (m allPossible) newDeviceNode(deviceName devices.Name, path string, minor int) deviceNode {
major, _ := m.deviceMajors.Get(deviceName)
return deviceNode{
path: filepath.Join(m.devRoot, path),
major: uint32(major),
minor: uint32(minor),
}
}

View File

@@ -0,0 +1,340 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package devchar
import (
"fmt"
"os"
"path/filepath"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/system/nvdevices"
"github.com/NVIDIA/nvidia-container-toolkit/internal/system/nvmodules"
)
const (
defaultDevCharPath = "/dev/char"
)
type command struct {
logger logger.Interface
}
type config struct {
devCharPath string
driverRoot string
dryRun bool
createAll bool
createDeviceNodes bool
loadKernelModules bool
}
// NewCommand constructs a command sub-command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build
func (m command) build() *cli.Command {
cfg := config{}
// Create the 'create-dev-char-symlinks' command
c := cli.Command{
Name: "create-dev-char-symlinks",
Usage: "A utility to create symlinks to possible /dev/nv* devices in /dev/char",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &cfg)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "dev-char-path",
Usage: "The path at which the symlinks will be created. Symlinks will be created as `DEV_CHAR`/MAJOR:MINOR where MAJOR and MINOR are the major and minor numbers of a corresponding device node.",
Value: defaultDevCharPath,
Destination: &cfg.devCharPath,
EnvVars: []string{"DEV_CHAR_PATH"},
},
&cli.StringFlag{
Name: "driver-root",
Usage: "The path to the driver root. `DRIVER_ROOT`/dev is searched for NVIDIA device nodes.",
Value: "/",
Destination: &cfg.driverRoot,
EnvVars: []string{"NVIDIA_DRIVER_ROOT", "DRIVER_ROOT"},
},
&cli.BoolFlag{
Name: "create-all",
Usage: "Create all possible /dev/char symlinks instead of limiting these to existing device nodes.",
Destination: &cfg.createAll,
EnvVars: []string{"CREATE_ALL"},
},
&cli.BoolFlag{
Name: "load-kernel-modules",
Usage: "Load the NVIDIA kernel modules before creating symlinks. This is only applicable when --create-all is set.",
Destination: &cfg.loadKernelModules,
EnvVars: []string{"LOAD_KERNEL_MODULES"},
},
&cli.BoolFlag{
Name: "create-device-nodes",
Usage: "Create the NVIDIA control device nodes in the driver root if they do not exist. This is only applicable when --create-all is set",
Destination: &cfg.createDeviceNodes,
EnvVars: []string{"CREATE_DEVICE_NODES"},
},
&cli.BoolFlag{
Name: "dry-run",
Usage: "If set, the command will not create any symlinks.",
Value: false,
Destination: &cfg.dryRun,
EnvVars: []string{"DRY_RUN"},
},
}
return &c
}
func (m command) validateFlags(r *cli.Context, cfg *config) error {
if cfg.createAll {
return fmt.Errorf("create-all and watch are mutually exclusive")
}
if cfg.loadKernelModules && !cfg.createAll {
m.logger.Warning("load-kernel-modules is only applicable when create-all is set; ignoring")
cfg.loadKernelModules = false
}
if cfg.createDeviceNodes && !cfg.createAll {
m.logger.Warning("create-device-nodes is only applicable when create-all is set; ignoring")
cfg.createDeviceNodes = false
}
return nil
}
func (m command) run(c *cli.Context, cfg *config) error {
l, err := NewSymlinkCreator(
WithLogger(m.logger),
WithDevCharPath(cfg.devCharPath),
WithDriverRoot(cfg.driverRoot),
WithDryRun(cfg.dryRun),
WithCreateAll(cfg.createAll),
WithLoadKernelModules(cfg.loadKernelModules),
WithCreateDeviceNodes(cfg.createDeviceNodes),
)
if err != nil {
return fmt.Errorf("failed to create symlink creator: %v", err)
}
err = l.CreateLinks()
if err != nil {
return fmt.Errorf("failed to create links: %v", err)
}
return nil
}
type linkCreator struct {
logger logger.Interface
lister nodeLister
driverRoot string
devRoot string
devCharPath string
dryRun bool
createAll bool
createDeviceNodes bool
loadKernelModules bool
}
// Creator is an interface for creating symlinks to /dev/nv* devices in /dev/char.
type Creator interface {
CreateLinks() error
}
// Option is a functional option for configuring the linkCreator.
type Option func(*linkCreator)
// NewSymlinkCreator creates a new linkCreator.
func NewSymlinkCreator(opts ...Option) (Creator, error) {
c := linkCreator{}
for _, opt := range opts {
opt(&c)
}
if c.logger == nil {
c.logger = logger.New()
}
if c.driverRoot == "" {
c.driverRoot = "/"
}
if c.devRoot == "" {
c.devRoot = "/"
}
if c.devCharPath == "" {
c.devCharPath = defaultDevCharPath
}
if err := c.setup(); err != nil {
return nil, err
}
if c.createAll {
lister, err := newAllPossible(c.logger, c.devRoot)
if err != nil {
return nil, fmt.Errorf("failed to create all possible device lister: %v", err)
}
c.lister = lister
} else {
c.lister = existing{c.logger, c.devRoot}
}
return c, nil
}
func (m linkCreator) setup() error {
if !m.loadKernelModules && !m.createDeviceNodes {
return nil
}
if m.loadKernelModules {
modules := nvmodules.New(
nvmodules.WithLogger(m.logger),
nvmodules.WithDryRun(m.dryRun),
nvmodules.WithRoot(m.driverRoot),
)
if err := modules.LoadAll(); err != nil {
return fmt.Errorf("failed to load NVIDIA kernel modules: %v", err)
}
}
if m.createDeviceNodes {
devices, err := nvdevices.New(
nvdevices.WithLogger(m.logger),
nvdevices.WithDryRun(m.dryRun),
nvdevices.WithDevRoot(m.devRoot),
)
if err != nil {
return err
}
if err := devices.CreateNVIDIAControlDevices(); err != nil {
return fmt.Errorf("failed to create NVIDIA device nodes: %v", err)
}
}
return nil
}
// WithDriverRoot sets the driver root path.
// This is the path in which kernel modules must be loaded.
func WithDriverRoot(root string) Option {
return func(c *linkCreator) {
c.driverRoot = root
}
}
// WithDevRoot sets the root path for the /dev directory.
func WithDevRoot(root string) Option {
return func(c *linkCreator) {
c.devRoot = root
}
}
// WithDevCharPath sets the path at which the symlinks will be created.
func WithDevCharPath(path string) Option {
return func(c *linkCreator) {
c.devCharPath = path
}
}
// WithDryRun sets the dry run flag.
func WithDryRun(dryRun bool) Option {
return func(c *linkCreator) {
c.dryRun = dryRun
}
}
// WithLogger sets the logger.
func WithLogger(logger logger.Interface) Option {
return func(c *linkCreator) {
c.logger = logger
}
}
// WithCreateAll sets the createAll flag for the linkCreator.
func WithCreateAll(createAll bool) Option {
return func(lc *linkCreator) {
lc.createAll = createAll
}
}
// WithLoadKernelModules sets the loadKernelModules flag for the linkCreator.
func WithLoadKernelModules(loadKernelModules bool) Option {
return func(lc *linkCreator) {
lc.loadKernelModules = loadKernelModules
}
}
// WithCreateDeviceNodes sets the createDeviceNodes flag for the linkCreator.
func WithCreateDeviceNodes(createDeviceNodes bool) Option {
return func(lc *linkCreator) {
lc.createDeviceNodes = createDeviceNodes
}
}
// CreateLinks creates symlinks for all NVIDIA device nodes found in the driver root.
func (m linkCreator) CreateLinks() error {
deviceNodes, err := m.lister.DeviceNodes()
if err != nil {
return fmt.Errorf("failed to get device nodes: %v", err)
}
if len(deviceNodes) != 0 && !m.dryRun {
err := os.MkdirAll(m.devCharPath, 0755)
if err != nil {
return fmt.Errorf("failed to create directory %s: %v", m.devCharPath, err)
}
}
for _, deviceNode := range deviceNodes {
target := deviceNode.path
linkPath := filepath.Join(m.devCharPath, deviceNode.devCharName())
m.logger.Infof("Creating link %s => %s", linkPath, target)
if m.dryRun {
continue
}
err = os.Symlink(target, linkPath)
if err != nil {
m.logger.Warningf("Could not create symlink: %v", err)
}
}
return nil
}
type deviceNode struct {
path string
major uint32
minor uint32
}
func (d deviceNode) devCharName() string {
return fmt.Sprintf("%d:%d", d.major, d.minor)
}

View File

@@ -0,0 +1,89 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package devchar
import (
"path/filepath"
"strings"
"golang.org/x/sys/unix"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
)
type nodeLister interface {
DeviceNodes() ([]deviceNode, error)
}
type existing struct {
logger logger.Interface
devRoot string
}
// DeviceNodes returns a list of NVIDIA device nodes in the specified root.
// The nvidia-nvswitch* and nvidia-nvlink devices are excluded.
func (m existing) DeviceNodes() ([]deviceNode, error) {
locator := lookup.NewCharDeviceLocator(
lookup.WithLogger(m.logger),
lookup.WithRoot(m.devRoot),
lookup.WithOptional(true),
)
devices, err := locator.Locate("/dev/nvidia*")
if err != nil {
m.logger.Warningf("Error while locating device: %v", err)
}
capDevices, err := locator.Locate("/dev/nvidia-caps/nvidia-*")
if err != nil {
m.logger.Warningf("Error while locating caps device: %v", err)
}
if len(devices) == 0 && len(capDevices) == 0 {
m.logger.Infof("No NVIDIA devices found in %s", m.devRoot)
return nil, nil
}
var deviceNodes []deviceNode
for _, d := range append(devices, capDevices...) {
if m.nodeIsBlocked(d) {
continue
}
var stat unix.Stat_t
err := unix.Stat(d, &stat)
if err != nil {
m.logger.Warningf("Could not stat device: %v", err)
continue
}
deviceNodes = append(deviceNodes, newDeviceNode(d, stat))
}
return deviceNodes, nil
}
// nodeIsBlocked returns true if the specified device node should be ignored.
func (m existing) nodeIsBlocked(path string) bool {
blockedPrefixes := []string{"nvidia-fs", "nvidia-nvswitch", "nvidia-nvlink"}
nodeName := filepath.Base(path)
for _, prefix := range blockedPrefixes {
if strings.HasPrefix(nodeName, prefix) {
return true
}
}
return false
}

View File

@@ -0,0 +1,28 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package devchar
import "golang.org/x/sys/unix"
func newDeviceNode(d string, stat unix.Stat_t) deviceNode {
deviceNode := deviceNode{
path: d,
major: unix.Major(stat.Rdev),
minor: unix.Minor(stat.Rdev),
}
return deviceNode
}

View File

@@ -0,0 +1,30 @@
//go:build !linux
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package devchar
import "golang.org/x/sys/unix"
func newDeviceNode(d string, stat unix.Stat_t) deviceNode {
deviceNode := deviceNode{
path: d,
major: unix.Major(uint64(stat.Rdev)),
minor: unix.Minor(uint64(stat.Rdev)),
}
return deviceNode
}

View File

@@ -0,0 +1,142 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package createdevicenodes
import (
"fmt"
"github.com/urfave/cli/v2"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
"github.com/NVIDIA/nvidia-container-toolkit/internal/system/nvdevices"
"github.com/NVIDIA/nvidia-container-toolkit/internal/system/nvmodules"
)
type command struct {
logger logger.Interface
}
type options struct {
root string
devRoot string
dryRun bool
control bool
loadKernelModules bool
}
// NewCommand constructs a command sub-command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build
func (m command) build() *cli.Command {
opts := options{}
c := cli.Command{
Name: "create-device-nodes",
Usage: "A utility to create NVIDIA device nodes",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &opts)
},
Action: func(c *cli.Context) error {
return m.run(c, &opts)
},
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "root",
// TODO: Remove this alias
Aliases: []string{"driver-root"},
Usage: "the path to to the root to use to load the kernel modules. This root must be a chrootable path. " +
"If device nodes to be created these will be created at `ROOT`/dev unless an alternative path is specified",
Value: "/",
Destination: &opts.root,
// TODO: Remove the NVIDIA_DRIVER_ROOT and DRIVER_ROOT envvars.
EnvVars: []string{"ROOT", "NVIDIA_DRIVER_ROOT", "DRIVER_ROOT"},
},
&cli.StringFlag{
Name: "dev-root",
Usage: "specify the root where `/dev` is located. If this is not specified, the root is assumed.",
Destination: &opts.devRoot,
EnvVars: []string{"NVIDIA_DEV_ROOT", "DEV_ROOT"},
},
&cli.BoolFlag{
Name: "control-devices",
Usage: "create all control device nodes: nvidiactl, nvidia-modeset, nvidia-uvm, nvidia-uvm-tools",
Destination: &opts.control,
},
&cli.BoolFlag{
Name: "load-kernel-modules",
Usage: "load the NVIDIA Kernel Modules before creating devices nodes",
Destination: &opts.loadKernelModules,
},
&cli.BoolFlag{
Name: "dry-run",
Usage: "if set, the command will not perform any operations",
Value: false,
Destination: &opts.dryRun,
EnvVars: []string{"DRY_RUN"},
},
}
return &c
}
func (m command) validateFlags(r *cli.Context, opts *options) error {
if opts.devRoot == "" && opts.root != "" {
m.logger.Infof("Using dev-root %q", opts.root)
opts.devRoot = opts.root
}
return nil
}
func (m command) run(c *cli.Context, opts *options) error {
if opts.loadKernelModules {
modules := nvmodules.New(
nvmodules.WithLogger(m.logger),
nvmodules.WithDryRun(opts.dryRun),
nvmodules.WithRoot(opts.root),
)
if err := modules.LoadAll(); err != nil {
return fmt.Errorf("failed to load NVIDIA kernel modules: %v", err)
}
}
if opts.control {
devices, err := nvdevices.New(
nvdevices.WithLogger(m.logger),
nvdevices.WithDryRun(opts.dryRun),
nvdevices.WithDevRoot(opts.devRoot),
)
if err != nil {
return err
}
m.logger.Infof("Creating control device nodes at %s", opts.devRoot)
if err := devices.CreateNVIDIAControlDevices(); err != nil {
return fmt.Errorf("failed to create NVIDIA control device nodes: %v", err)
}
}
return nil
}

View File

@@ -0,0 +1,52 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package system
import (
"github.com/urfave/cli/v2"
devchar "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/system/create-dev-char-symlinks"
devicenodes "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/system/create-device-nodes"
"github.com/NVIDIA/nvidia-container-toolkit/internal/logger"
)
type command struct {
logger logger.Interface
}
// NewCommand constructs a runtime command with the specified logger
func NewCommand(logger logger.Interface) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
func (m command) build() *cli.Command {
// Create the 'system' command
system := cli.Command{
Name: "system",
Usage: "A collection of system-related utilities for the NVIDIA Container Toolkit",
}
system.Subcommands = []*cli.Command{
devchar.NewCommand(m.logger),
devicenodes.NewCommand(m.logger),
}
return &system
}

View File

@@ -1,32 +0,0 @@
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
# Specify the runtimes to consider. This list is processed in order and the PATH
# searched for matching executables unless the entry is an absolute path.
runtimes = [
"docker-runc",
"runc",
]
mode = "auto"
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

View File

@@ -1,32 +0,0 @@
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
user = "root:video"
ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
# Specify the runtimes to consider. This list is processed in order and the PATH
# searched for matching executables unless the entry is an absolute path.
runtimes = [
"docker-runc",
"runc",
]
mode = "auto"
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

Some files were not shown because too many files have changed in this diff Show More