Compare commits

...

377 Commits

Author SHA1 Message Date
Evan Lezar
95ca1c2e50 Merge branch 'fix-git-branch' into 'main'
Fix git branch shell command

See merge request nvidia/container-toolkit/container-toolkit!277
2023-01-31 14:08:35 +00:00
Evan Lezar
e4031ced39 Fix git branch shell command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-31 15:08:12 +01:00
Evan Lezar
7f6d21c53b Merge branch 'fix-kitmaker' into 'main'
Fix GIT_BRANCH command

See merge request nvidia/container-toolkit/container-toolkit!276
2023-01-31 13:16:10 +00:00
Evan Lezar
846ac347fe Fix GIT_BRANCH command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-31 14:15:48 +01:00
Evan Lezar
50afd443fc Merge branch 'fix-libraries-cdi' into 'main'
Fix relative link resolution for ldcache

See merge request nvidia/container-toolkit/container-toolkit!275
2023-01-31 13:07:35 +00:00
Evan Lezar
14bcebd8b7 Fix relative link resolution for ldcache
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-31 13:51:48 +01:00
Evan Lezar
d091d3c7f4 Merge branch 'fix-kitmaker' into 'main'
Ensure git is available in image build step

See merge request nvidia/container-toolkit/container-toolkit!274
2023-01-31 11:20:40 +00:00
Evan Lezar
eb0ef8ab31 Ensure git is available in image build step
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-31 12:20:11 +01:00
Evan Lezar
9c5c12a1bc Merge branch 'kitmaker-update' into 'main'
Update the process for publishing packages to kitmaker

See merge request nvidia/container-toolkit/container-toolkit!271
2023-01-30 18:37:42 +00:00
Evan Lezar
8b197b27ed Rework the upload of archives to kitmaker.
This change simplifies how Kitmaker archives are constructed.

Currently only centos8 and ubuntu18.04 packages are included.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-30 14:52:16 +01:00
Evan Lezar
8c57e55b59 Add additional information to the manifest
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-30 14:52:16 +01:00
Evan Lezar
6d1639a513 Merge branch 'set-default-to-index' into 'main'
Use device index as CDI device names by default

See merge request nvidia/container-toolkit/container-toolkit!273
2023-01-30 13:22:36 +00:00
Evan Lezar
5e6f72e8f4 Update changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-30 13:40:25 +01:00
Evan Lezar
707e3479f8 Fix lint errors
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-30 13:39:57 +01:00
Evan Lezar
201232dae3 Add logging of minimum CDI version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-30 13:39:08 +01:00
Evan Lezar
f768bb5783 Use device index as CDI device names by default
This change uses the `index` mode for the --device-name-strategy when
generating CDI specifications by default. This generates device names
such as nvidia.com/gpu=0 or nvidia.com/gpu=1:0 by default.

Note that this requires a CDI spec version of 0.5.0 and for consumers
(e.g. podman) that are only compatible with older versions one of the
other stragegies (`type-index` or `uuid`) should be used instead to
generate a v0.3.0 or v0.4.0 specification.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-30 13:36:17 +01:00
Evan Lezar
f0de3ccd9c Merge branch 'CNT-3718/allow-device-name-to-be-controlled' into 'main'
Add --device-name-strategy flag for CDI spec generation

See merge request nvidia/container-toolkit/container-toolkit!269
2023-01-30 12:28:38 +00:00
Evan Lezar
09e8d4c4f3 Merge branch 'move-dev-char-creation' into 'main'
Move `create-dev-char-symlinks` from `nvidia-ctk hook` to `nvidia-ctk system`

See merge request nvidia/container-toolkit/container-toolkit!272
2023-01-30 12:10:17 +00:00
Evan Lezar
8188400c97 Move create-dev-char-symlinks subcommand from hook to system
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-27 12:12:54 +01:00
Evan Lezar
962d38e9dd Add nvidia-ctk system subcommand
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-27 12:12:54 +01:00
Kevin Klues
9fc2c59122 Merge branch 'CNT-3845/add-dev-char-symlink' into 'main'
Add create-dev-char-symlinks hook

See merge request nvidia/container-toolkit/container-toolkit!267
2023-01-27 10:11:29 +00:00
Evan Lezar
540f4349f5 Update vendoring for nvpci
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 13:43:43 +01:00
Evan Lezar
1d7e419008 Add --create-all mode to creation of dev/char symlinks
This change adds a --create-all mode to the create-dev-char-symlinks hook.
This mode creates all POSSIBLE symlinks to device nodes for regular and cap
devices. With the number of GPUs inferred from the PCI device information.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 13:43:43 +01:00
Evan Lezar
95394e0fc8 Add internal/info/proc/devices package to read device majors
This change adds basic functionality to process the /proc/devices
file to extract device majors.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 13:43:43 +01:00
Evan Lezar
f9330a4c2c Add --watch option to create-dev-char-symlinks
This change adds a --watch option to the create-dev-char-symlinks hook. This
installs an fsnotify watcher that creates symlinks for ADDED device nodes under
/dev/char.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 13:43:43 +01:00
Evan Lezar
be0e4667a5 Add create-dev-char-symlinks hook
This change adds an nvidia-ctk hook create-dev-char-symlinks
subcommand that creates symlinks to device nodes (as required by
systemd) under /dev/char.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 13:43:43 +01:00
Evan Lezar
408eeae70f Allow locator to be marked as optional
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 10:38:11 +01:00
Evan Lezar
27c82c19ea Merge branch 'bump-version' into 'main'
Bump version to 1.12.0-rc.4

See merge request nvidia/container-toolkit/container-toolkit!270
2023-01-25 09:37:41 +00:00
Evan Lezar
937f3d0d78 Add changelog for v1.12.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 10:37:24 +01:00
Evan Lezar
bc3cc71f90 Update libnvidia-container to 1.12.0-rc.4
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 10:23:54 +01:00
Evan Lezar
ad4531db1e Bump version to 1.12.0-rc.4
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-25 10:23:02 +01:00
Evan Lezar
e5d8d10d4f Merge branch 'CNT-3854/discover-first' into 'main'
Limit number of candidates for executables

See merge request nvidia/container-toolkit/container-toolkit!268
2023-01-23 17:43:27 +00:00
Evan Lezar
89bf81a9db Add --device-name-strategy flag for CDI spec generation
This change adds a --device-name-strategy flag for generating a CDI
specificaion. This allows a CDI spec to be generated with the following
names used for device:

* type-index: gpu0 and mig0:1
* index: 0 and 0:1
* uuid: GPU and MIG UUIDs

Note that the use of 'index' generates a v0.5.0 CDI specification since
this relaxes the restriction on the device names.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-20 16:17:32 +01:00
Evan Lezar
6237477ba3 Limit number of candidates for executables
This change ensures that the first match of an executable in the path
is retured instead of a list of candidates. This prevents a CDI spec,
for example, from containing multiple entries for a single executable
(e.g. nvidia-smi).

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-20 15:10:24 +01:00
Evan Lezar
6706024687 Merge branch 'fix-missing-nvidia-container-runtime-hook-1682' into 'main'
Avoid missing nvidia-container-runtime-hook during rpm update from <=1.10.0

See merge request nvidia/container-toolkit/container-toolkit!263
2023-01-19 16:32:58 +00:00
Evan Lezar
7649126248 Remove rpm-state directory instead of just single file. 2023-01-19 14:50:35 +00:00
Evan Lezar
104dca867f Merge branch 'fix-ldcache-list' into 'main'
Fix and refactor code related to reading LDCache

See merge request nvidia/container-toolkit/container-toolkit!266
2023-01-19 13:52:55 +00:00
Evan Lezar
881b1c0e08 introduce resolveSelected helper
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 14:10:55 +01:00
Evan Lezar
3537d76726 Further refactoring of ldcache code
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 14:10:36 +01:00
Evan Lezar
ccd1961c60 Ensure root is included in absolute ldcache paths
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 14:09:43 +01:00
Evan Lezar
f350f0c0bb Refactor resolving of links in ldcache
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 14:09:41 +01:00
Evan Lezar
80672d33af Continue instead of break on error when listing libraries
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 13:54:24 +01:00
Evan Lezar
7a1cfb48b9 Merge branch 'update-cdi' into 'main'
Determine the minumum required spec version

See merge request nvidia/container-toolkit/container-toolkit!265
2023-01-19 11:57:19 +00:00
Evan Lezar
ae3b213b0e Merge branch 'fix-cdi-library-container-path' into 'main'
Reuse mount discovery for driver libraries

See merge request nvidia/container-toolkit/container-toolkit!262
2023-01-19 11:52:21 +00:00
Evan Lezar
eaf9bdaeb4 Determine the minumum required spec version
This change uses functionality from the CDI package to determine
the minimum required CDI spec version. This allows for a spec with
the widest compatibility to be specified.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 12:14:00 +01:00
Evan Lezar
bc4bfb94a2 Update CDI package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 12:12:54 +01:00
Evan Lezar
a77331f8f0 Reuse mount discovery for driver libraries
This change implements the discovery of versioned driver libaries
by reusing the mounts and update ldcache discoverers use for, for example,
CVS file discovery. This allows the container paths to be correctly generated
without requiring specific manipulation.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 12:11:13 +01:00
Evan Lezar
94b7add334 Merge branch 'bugfix-nvidia-ctk' into 'main'
Fix nvidia-ctk path in spec generation

See merge request nvidia/container-toolkit/container-toolkit!264
2023-01-19 11:10:38 +00:00
Evan Lezar
9c9e6cd324 Fix nvidia-ctk path in spec generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 12:03:07 +01:00
Evan Lezar
f50efca73f Merge branch 'specify-nvidia-ctk-path' into 'main'
Make handling of nvidia-ctk path consistent

See merge request nvidia/container-toolkit/container-toolkit!261
2023-01-19 10:20:25 +00:00
Evan Lezar
19cfb2774d Use common code to construct nvidia-ctk hooks
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 10:37:10 +01:00
Evan Lezar
27347c98d9 Consolidate code to find nvidia-ctk
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-19 10:31:42 +01:00
Evan Lezar
ebbc47702d Remove 'Executable' from private struct member names
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-18 17:02:42 +01:00
Evan Lezar
09d42f0ad9 Remove 'Executable' from config struct member
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-18 17:02:42 +01:00
Evan Lezar
35df24d63a Make handling of nvidia-ctk path consistent
This change adds an --nvidia-ctk-path to the nvidia-ctk cdi generate
command. This ensures that the executable path for the generated
hooks can be specified consistently.

Since the NVIDIA Container Runtime already allows for the executable
path to be specified in the config the utility code to update the
LDCache and create other nvidia-ctk hooks are also updated.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-01-18 17:02:42 +01:00
Claudius Volz
f93b6a13f4 Preserve a temporary copy of nvidia-container-runtime-hook during rpm update,
to avoid being destroyed by when updating from <=1.10.0

Signed-off-by: Claudius Volz <c.volz@gmx.de>
2023-01-15 01:22:37 +01:00
Evan Lezar
50d7fb8f41 Merge branch 'missing-dra-devices' into 'main'
Ensure existence of DRM devices nodes is checked

See merge request nvidia/container-toolkit/container-toolkit!260
2022-12-13 12:42:04 +00:00
Evan Lezar
311e7a1feb Ensure existence of DRM devices nodes is checked
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-12 14:48:54 +01:00
Evan Lezar
14e587d55f Merge branch 'update-libnvidia-container' into 'main'
Update libnvidia-container to v1.12.0-rc.3

See merge request nvidia/container-toolkit/container-toolkit!259
2022-12-09 09:38:27 +00:00
Evan Lezar
66ec967de2 Update libnvidia-container to 1.12.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-09 10:20:36 +01:00
Evan Lezar
252693aeac Use SHA for ineffassign
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-09 09:58:49 +01:00
Evan Lezar
079b47ed94 Use sha instead of latest for golint
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-09 09:51:22 +01:00
Evan Lezar
d2952b07aa Merge branch 'fix-from-discover' into 'main'
Ensure that an empty discoverer returns valid edits

See merge request nvidia/container-toolkit/container-toolkit!258
2022-12-09 08:37:26 +00:00
Evan Lezar
41f1b93422 Use NewContainerEdits utility function for CDI generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-07 11:09:19 +01:00
Evan Lezar
3140810c95 Add NewContainerEdits utility function
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-07 11:03:45 +01:00
Evan Lezar
046d761f4c Ensure that an empty discoverer returns valid edits
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-06 14:01:35 +01:00
Evan Lezar
0a2083df72 Merge branch 'CNT-3707/add-root-flag' into 'main'
Add --root flag to nvidia-ctk cdi generate command

See merge request nvidia/container-toolkit/container-toolkit!256
2022-12-02 15:54:27 +00:00
Evan Lezar
80c810bf9e Add --root flag to CDI generate command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 16:13:53 +01:00
Evan Lezar
82ba424212 Simplify device folder permission hook
This simplifies the device folder permission hook to only handle
/dev/dri and /dev/nvidia-caps folders.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 16:13:53 +01:00
Evan Lezar
c131b99cb3 Merge branch 'CNT-3613/add-firmware-cdi' into 'main'
Include GSP firmware path in CDI specification

See merge request nvidia/container-toolkit/container-toolkit!254
2022-12-02 14:51:53 +00:00
Evan Lezar
64a85fb832 Include GSP firmware path in CDI specification
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 14:35:22 +01:00
Evan Lezar
ebf1772068 Merge branch 'CNT-3580/inject-egl-wayland' into 'main'
Add egl_external_platform.d/10_nvidia_wayland.json to graphics mounts

See merge request nvidia/container-toolkit/container-toolkit!252
2022-12-02 13:05:18 +00:00
Evan Lezar
8604c255c4 Use Options to set FileLocator options
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 13:57:33 +01:00
Evan Lezar
bea8321205 Use prefix search for locating graphics files
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 13:55:13 +01:00
Evan Lezar
db962c4bf2 Use getSearchPrefixes for all locators
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 13:55:13 +01:00
Evan Lezar
d1a3de7671 Add test for device locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 13:55:13 +01:00
Evan Lezar
8da7e74408 Add tests for executable locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 13:55:13 +01:00
Evan Lezar
55eb898186 Add support for specifying multiple prefixes
This change allows the file Locator to be instantiated with multiple
search prefixes.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 13:55:13 +01:00
Evan Lezar
a7fc29d4bd Add tests for file locator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 13:55:13 +01:00
Evan Lezar
fdb3e51294 Add egl_external_platform.d/10_nvidia_wayland.json to graphics mounts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 13:55:13 +01:00
Evan Lezar
0582180cab Merge branch 'rework-cdi-generation' into 'main'
Rework CDI spec generation to use discoverers

See merge request nvidia/container-toolkit/container-toolkit!248
2022-12-02 11:32:29 +00:00
Evan Lezar
46667b5a8c Remove unused code
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 11:49:37 +01:00
Evan Lezar
e4e1de82ec Refactor nvidia-ctk cdi generate command
This change refactors the generation of CDI specifications
to use discoverers and generate the CDI specifications from these
discoverers. This allows for better reuse.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 11:49:37 +01:00
Evan Lezar
d51c8fcfa7 Add utility function to generatee nvidia-ctk OCI hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 10:01:22 +01:00
Evan Lezar
9b33c34a57 Allow graphics mount discoverer to be instantiated independently
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 10:01:22 +01:00
Evan Lezar
0b6cd7e90e Add FromDiscoverer function to generate container edits
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 10:01:22 +01:00
Evan Lezar
029a04c37d Use blank device hostPath if same as Path
The HostPath field was added in the v0.5.0 CDI specification.
The cdi package uses strict unmarshalling when loading specs
from file causing failures for unexpected fields.

Since the behaviour for HostPath == "" and HostPath == Path are
equivalent, we clear HostPath if it is equal to Path to ensure
compatibility with the widest range of specs.

This allows, for example, a v0.4.0 spec to be generated as required.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 10:01:22 +01:00
Evan Lezar
60c1df4e9c Remove unneeded workaround for CDI edit generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-02 10:01:22 +01:00
Evan Lezar
3e35312537 Merge branch 'fix-json-mode' into 'main'
Remove unused jsonMode and fix output

See merge request nvidia/container-toolkit/container-toolkit!255
2022-12-01 16:01:33 +00:00
Evan Lezar
932b39fd08 Remove unused jsonMode and fix output
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-01 16:21:50 +01:00
Evan Lezar
78cafe45d4 Merge branch 'create-cdi-output-folder' into 'main'
Ensure output folder exists for CDI spec

See merge request nvidia/container-toolkit/container-toolkit!250
2022-12-01 12:44:49 +00:00
Evan Lezar
584e792a5a Ensure output folder exists for CDI spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-30 19:40:58 +01:00
Evan Lezar
f0bcfa0415 Merge branch 'add-format-flag' into 'main'
Switch to string-based flag for CDI output format

See merge request nvidia/container-toolkit/container-toolkit!247
2022-11-29 16:47:40 +00:00
Evan Lezar
d45ec7bd28 Switch to string-based flag for CDI output format
This change replaces the `--json` flag of the nvidia-ctk cdi generate
command with a --format flag that accepts a string format of either
json or yaml.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-29 16:56:26 +01:00
Evan Lezar
153f2f6300 Merge branch 'fix-by-path-missing' into 'main'
Skip missing by-path symlinks instead of failing

See merge request nvidia/container-toolkit/container-toolkit!249
2022-11-25 12:11:56 +00:00
Evan Lezar
9df3975740 Merge branch 'bump-version-1.12.0-rc.3' into 'main'
Bump version to 1.12.0-rc.3

See merge request nvidia/container-toolkit/container-toolkit!246
2022-11-23 21:22:46 +00:00
Evan Lezar
5575b391ff Skip missing by-path symlinks instead of failing
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-23 22:21:58 +01:00
Evan Lezar
9faf11ddf3 Fix error message
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-23 22:21:58 +01:00
Evan Lezar
d3ed27722e Bump version to 1.12.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-23 21:26:34 +01:00
Evan Lezar
07a3f3040a Merge branch 'fix-release-scripts' into 'main'
Fix array arguments for release scripts

See merge request nvidia/container-toolkit/container-toolkit!245
2022-11-22 13:00:42 +00:00
Evan Lezar
749ab2a746 Fix array arguments for release scripts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-21 21:19:29 +01:00
Evan Lezar
217a135eb1 Merge branch 'fix-kitmaker' into 'main'
Fix kitmaker release for opensuse-leap

See merge request nvidia/container-toolkit/container-toolkit!244
2022-11-21 15:21:46 +00:00
Evan Lezar
22e65b320b Fix kitmaker release for opensuse-leap
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-21 16:19:13 +01:00
Evan Lezar
53bb940b30 Merge branch 'bump-golang-version' into 'main'
Update go version to 1.18

See merge request nvidia/container-toolkit/container-toolkit!243
2022-11-21 09:26:32 +00:00
Evan Lezar
1c1ad8098a Update go version to 1.18
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-21 09:35:03 +01:00
Evan Lezar
203db4390c Merge branch 'add-graphics-edits-to-CDI-spec' into 'main'
Include graphics devices in generated CDI specification

See merge request nvidia/container-toolkit/container-toolkit!242
2022-11-20 21:46:00 +00:00
Evan Lezar
b6d9c2c1ad Add graphics devices and libraries to CDI specification
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-14 13:55:56 +01:00
Evan Lezar
429ef4d4e9 Make NewVisibleDevices public
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-14 12:19:59 +01:00
Evan Lezar
25759ca933 Merge branch 'fix-kitmaker-scripts' into 'main'
Fix scripts and pipeline for artifactory release

See merge request nvidia/container-toolkit/container-toolkit!241
2022-11-11 12:28:35 +00:00
Evan Lezar
74abea07e2 Add top-level variable to set kitmaker folder
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-10 16:06:12 +01:00
Evan Lezar
7955bb1a84 Use short sha for kitmaker version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-10 16:06:12 +01:00
Evan Lezar
75b11eb80a Use VERSION in kitmaker archive name
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-10 16:06:12 +01:00
Evan Lezar
c958817eef Log applied properties.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-10 16:06:12 +01:00
Evan Lezar
80f8c2a418 Correct artifactory upload URL
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-10 16:06:12 +01:00
Evan Lezar
08640a6f64 Ensure CURL is set for kitmaker upload
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-10 16:06:12 +01:00
Evan Lezar
9db31f7506 Fix number of arguments for kitmaker release script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-10 16:06:12 +01:00
Evan Lezar
7fd40632fe Update regctl version
The regctl image copy-file command was added in v0.4.5.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-10 14:43:19 +01:00
Evan Lezar
6ef19d2925 Remove call to non-existant script
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-10 14:42:56 +01:00
Evan Lezar
83ce83239b Correct extract package image argument
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-10 14:33:48 +01:00
Evan Lezar
30fb486e44 Add basic logging
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-10 14:27:58 +01:00
Evan Lezar
0022661565 Merge branch 'add-cdi-readme' into 'main'
Add README for generating CDI specifications

See merge request nvidia/container-toolkit/container-toolkit!239
2022-11-09 13:29:36 +00:00
Jon Mayo
28e882f26f Merge branch 'cnt-2210' into 'main'
[ci] push package releases to artifactory

See merge request nvidia/container-toolkit/container-toolkit!231
2022-11-08 16:45:34 +00:00
Jon Mayo
71fbe7a812 [ci] push package releases to artifactory 2022-11-08 16:45:34 +00:00
Evan Lezar
ce3d94af1a Add README for generating CDI specifications
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-08 15:15:27 +01:00
Evan Lezar
0bc09665a8 Merge branch 'CNT-1380/add-crio-config' into 'main'
Add support for updating crio config

See merge request nvidia/container-toolkit/container-toolkit!176
2022-11-07 10:54:34 +00:00
Evan Lezar
205ba098e9 Merge branch 'multiple-docker-swarm' into 'main'
Consider all Swarm resource envvars

See merge request nvidia/container-toolkit/container-toolkit!222
2022-11-07 10:43:49 +00:00
Evan Lezar
877832da69 Consider all Swarm resource envvars
This change extends the support for multiple envvars when
specifying swarm resources to consider ALL of the specified
environment variables instead of the first match.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-04 10:01:28 +01:00
Evan Lezar
b7ba96a72e Merge branch 'update-libnvidia-container' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!237
2022-11-03 13:38:50 +00:00
Evan Lezar
93c59f2d9c Skip nvidia-container-runtime and nvidia-docker builds for release candidates
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-03 14:38:11 +01:00
Evan Lezar
5a56b658ba Update changelog
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-03 14:30:13 +01:00
Evan Lezar
99889671b5 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-03 14:25:31 +01:00
Evan Lezar
a2fb017208 Merge branch 'rework-cdi-cli' into 'main'
Rename nvidia-ctk info generate-cdi command

See merge request nvidia/container-toolkit/container-toolkit!236
2022-11-03 09:31:26 +00:00
Evan Lezar
f7021d84b5 Merge branch 'add-dev-dri' into 'main'
Inject DRM device nodes into containers when Graphics or Display capabilities are requested

See merge request nvidia/container-toolkit/container-toolkit!235
2022-11-03 09:31:03 +00:00
Evan Lezar
c793fc27d8 Output YAML separator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-02 15:03:18 +01:00
Evan Lezar
3d2328bdfd Rename nvidia-ctk info generate-cdi command
This change renames the nvidia-ctk info generate-cdi command as

nvidia-ctk cdi generate

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-02 14:56:26 +01:00
Evan Lezar
76b69f45de Add discovery of DRM devices
This change adds the discovery of DRM devices associated with requested
devices. This means that the /dev/dri/card* and /dev/dri/renderD*
devices associated with each requested NVIDIA GPU are injected into
the container and that the /dev/dri/by-path symlinks associated with
these devices are created in the container.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-02 14:49:08 +01:00
Evan Lezar
73e65edaa9 Also trigger graphics modifier for display capability
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-02 14:42:51 +01:00
Evan Lezar
cd7ee5a435 Add test for graphics modifier
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-02 14:42:51 +01:00
Evan Lezar
eac4faddc6 Use :: as link separator
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-02 14:42:51 +01:00
Evan Lezar
bc8a73dde4 Add a Filter interface to the discover package
This change adds support for filtering entities by specifying a filter.
This can be used, for example, to check whether a mount or device
has a particular property and removing it from the set of discovered
entities if it does not.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-02 14:42:48 +01:00
Evan Lezar
624b9d8ee6 Add internal drm package for determining DRM devices
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-02 14:39:53 +01:00
Evan Lezar
9d6e2ff1b0 Add internal proc package for processing GPU information files
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-02 14:39:53 +01:00
Evan Lezar
aca0c7bc5a Add Devices abstraction to CUDA image
This change adds a Devices abstraction to the CUDA image utilities. This
allows for checking whether a devices is selected, for example.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-02 14:39:53 +01:00
Evan Lezar
db47b58275 Add utilities for driver capabilities to image packages
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-02 14:35:42 +01:00
Evan Lezar
59bf7607ce Merge branch 'ipc-rw' into 'main'
Mount IPC sockets with noexec flag

See merge request nvidia/container-toolkit/container-toolkit!234
2022-11-02 12:15:47 +00:00
Evan Lezar
61ff3fbd7b Merge branch 'chmod-hook' into 'main'
Add nvidia-ctk hook chmod command to set permissions and ensure permissions of `/dev/nvidia-caps` is set

See merge request nvidia/container-toolkit/container-toolkit!232
2022-11-02 12:15:23 +00:00
Evan Lezar
523fc57ab4 Use an Executable Locator to lookup chmod
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-26 16:24:11 +02:00
Evan Lezar
ae18c5d847 Include chmod hook for device subfolders in CDI spec generation
This change generates one or more createContainer hooks for ensuring
that subfolders in /dev have the required permissions in the container.
As an example, a user requires read permissions to the /dev/nvidia-caps
in addition to including the specific caps devices under this folder.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-26 16:08:13 +02:00
Evan Lezar
4abdc2f35d Add nvidia-ctk hook chmod command to set permissions
This change adds an nvidia-ctk hook chmod command that can be used
to update the permissions for paths in the container.

This prepends the container root to the paths to allow these to be
updated by runtime executables.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-26 16:01:52 +02:00
Evan Lezar
f8748bfa9a Mount IPC sockets with noexec flag
This change ensures that the CDI spec mounts the ipc sockets with the
noexec flag to allow these to function in rootless mode with podman.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-21 16:44:02 +02:00
Evan Lezar
5fb0ae2c2d Merge branch 'fix-mig-caps-paths' into 'main'
Correct construction of MIG Caps

See merge request nvidia/container-toolkit/container-toolkit!230
2022-10-17 11:41:18 +00:00
Evan Lezar
899fc72014 Correct constructin of MIG Caps
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-13 14:06:30 +02:00
Evan Lezar
1267c1d9a2 Refactor docker config update
This change updates the docker config update for simplicitly.
This also allows for the API to match the crio update code.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-11 11:42:38 +02:00
Evan Lezar
9a697e340b Add support for updating crio configs
This adds support for updating crio configs (instead of installing hooks)
and adds crio support to the nvidia-ctk runtime configure command.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-11 11:42:38 +02:00
Evan Lezar
abe8ca71e0 Use struct to store cri-o command line flags
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-11 11:35:56 +02:00
Evan Lezar
9bbf7dcf96 Merge branch 'fix-hook-removal' into 'main'
Improve locating NVIDIA Container Runtime Hook

See merge request nvidia/container-toolkit/container-toolkit!215
2022-10-11 09:32:08 +00:00
Evan Lezar
ec1222b58b Merge branch 'bump-1.12.0-rc.2' into 'main'
Bump version to 1.12.0-rc.2

See merge request nvidia/container-toolkit/container-toolkit!229
2022-10-11 09:27:16 +00:00
Evan Lezar
229b46e0ca Bump version to 1.12.0-rc.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-10 17:11:53 +02:00
Evan Lezar
b6a68c4add Merge branch 'overwrite-rule' into 'main'
Reorder extends for internal pipelines

See merge request nvidia/container-toolkit/container-toolkit!228
2022-10-10 12:58:34 +00:00
Evan Lezar
e588bfac7d Reorder extends for internal pipelines
This change updates the ordering of internal pipeline dependencies to
ensure that the correct rules are applied.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-10 14:58:09 +02:00
Evan Lezar
224020533e Merge branch 'fix-internal-ci' into 'main'
Fix internal CI rules

See merge request nvidia/container-toolkit/container-toolkit!227
2022-10-10 11:43:32 +00:00
Evan Lezar
3736bb3aca Fix internal CI rules
This change updates the internal CI rules for the optimizations
to skip non-critical images on MRs.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-10 13:43:01 +02:00
Evan Lezar
1e72f92b74 Merge branch 'update-changelog' into 'main'
Update changelog for v1.12.0-rc.1

See merge request nvidia/container-toolkit/container-toolkit!226
2022-10-10 10:12:46 +00:00
Evan Lezar
896f5b2e9f Update changelog for v1.12.0-rc.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-10 12:12:14 +02:00
Evan Lezar
c068d4048f Merge branch 'update-cdi-spec-generation' into 'main'
Update CDI spec generation

See merge request nvidia/container-toolkit/container-toolkit!225
2022-10-10 10:07:19 +00:00
Evan Lezar
8796cd76b0 Merge branch 'streamline-cicd' into 'main'
Add rules to skip distributions when not on main

See merge request nvidia/container-toolkit/container-toolkit!224
2022-10-10 08:34:00 +00:00
Evan Lezar
1597ede2af Add all device
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-10 10:19:08 +02:00
Evan Lezar
3dd8020695 Include meta devices in generated CDI spec
This change includes meta devices (e.g. /dev/nvidiactl) in the
generated CDI spec. Missing device nodes are ignored.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-07 16:23:37 +02:00
Evan Lezar
dfa041991f Generate v0.4.0 CDI spec
This change generates a v0.4.0 CDI spec instead of a v0.5.0 spec.
This allows older versions of podman, for example, to be used.

This requires that the device names do not start on a numeric character
and that the HostPath for a device is unspecified.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-07 16:10:47 +02:00
Evan Lezar
568896742b Remove ubuntu 20.04 tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-07 15:49:23 +02:00
Evan Lezar
f52973217f Add rules to skip distributions when not on main
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-07 15:46:26 +02:00
Evan Lezar
efd29f1cec Merge branch 'update-cuda-base-image' into 'main'
Update CUDA base image to 11.8.0

See merge request nvidia/container-toolkit/container-toolkit!223
2022-10-07 12:32:25 +00:00
Evan Lezar
4b02670049 Use 40 digit sha for version string
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-07 14:31:49 +02:00
Evan Lezar
8550874686 Update CUDA base image to 11.8.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-07 14:31:10 +02:00
Evan Lezar
38513d5a53 Merge branch 'multiple-docker-swarm' into 'main'
Add support for multiple swarm resource envvars

See merge request nvidia/container-toolkit/container-toolkit!220
2022-10-04 13:03:27 +00:00
Evan Lezar
a35236a8f6 Correct test cases for NVIDIA_VISIBLE_DEVICES=void
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-04 14:14:44 +02:00
Evan Lezar
0c2e72b7c1 Update gitignore
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-04 14:11:10 +02:00
Evan Lezar
f0bdfbebe4 Add support for multiple swarm resource envvars
This change allows the swarm-resource config option to specify a
comma-separated list of environment variables instead of a single
environment variable.

The first environment variable matched is considered and other
environment variables are ignored.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-04 14:11:10 +02:00
Evan Lezar
a4fa61d05d Merge branch 'cdi-tooling' into 'main'
Add nvidia-ctk info generate-cdi command to generate CDI specification

See merge request nvidia/container-toolkit/container-toolkit!217
2022-10-04 12:10:07 +00:00
Evan Lezar
6e23a635c6 Merge branch 'update-libnvidia-container' into 'main'
Update libnvidia-contianer submodule

See merge request nvidia/container-toolkit/container-toolkit!218
2022-09-29 10:48:15 +00:00
Evan Lezar
4dedac6a24 Use base filename as first hook argument
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-29 12:14:12 +02:00
Evan Lezar
8c1b9b33c1 Use common code to construct ldconfig hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-29 12:12:42 +02:00
Evan Lezar
d37c17857e Add nvidia-ctk info generate-cdi command
This change adds functionality to generate CDI specifications
for all devices detected on the system. A specification containing
all GPUs and MIG devices is generated. All libraries on the host
ldcache that have an NVIDIA Driver Version suffix are included as
are the required binaries and IPC sockets.

A hook (based on the nvidia-ctk hook subcommand) to update the ldcache
in the container for the libraries being injected is also added to the
CDI specificiation.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-29 12:11:42 +02:00
Evan Lezar
a0065456d0 Add internal/nvcaps package
This change adds an internal nvcaps pacakge.

This package will be migrated to go-nvlib.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-29 12:11:42 +02:00
Evan Lezar
a34a571d2e Update CDI dependency to v0.5.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-29 12:11:41 +02:00
Evan Lezar
bb4cfece61 Update go module version to 1.17
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-29 12:11:41 +02:00
Evan Lezar
b16d263ee7 Add tests for ldcache hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-29 12:11:40 +02:00
Evan Lezar
027395bb8a Update libnvidia-contianer submodule
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-29 11:26:18 +02:00
Evan Lezar
3ecd790206 Merge branch 'opengl-poc' into 'main'
Add support for injecting vulkan configs and libraries

See merge request nvidia/container-toolkit/container-toolkit!196
2022-09-29 09:23:54 +00:00
Evan Lezar
52bb9e186b Add vulkan support through OCI spec modification
This change allows the NVIDIA Container Runtime to inject vulkan
loaders and libraries by modifying the OCI runtime specification.

This allows vulkan applications to run in containers without
additional modifications.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-28 16:51:52 +02:00
Evan Lezar
68b6d1cab1 Add a locator for libraries
This change adds a Locator that can be used to locate libraries.
If library names are specified, the ldcache is searched otherwise
symlinks are resolved.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-28 16:43:21 +02:00
Evan Lezar
bdb67b4fba Add package for locating libraries in LDCache
This change adds a package that reads an ldcache and allows for libraries
to be searched by prefix.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-28 16:43:21 +02:00
Evan Lezar
d0c39a11d5 Merge branch 'update-go-nvlib' into 'main'
Use go-nvlib nvlib/info package

See merge request nvidia/container-toolkit/container-toolkit!216
2022-09-28 12:28:43 +00:00
Evan Lezar
9de6361938 Update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-28 13:40:18 +02:00
Evan Lezar
fb016dca86 Use go-nvlib nvlib/info package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-28 13:40:18 +02:00
Evan Lezar
8beb7b4231 Only remove nvidia-container-toolkit if it is a symlink
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-19 15:31:10 +02:00
Evan Lezar
2b08a79206 Ensure that errors are logged
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-19 15:29:29 +02:00
Evan Lezar
5885fead8f Improve locating NVIDIA Container Runtime Hook
This change ensures that a more concrete error is provided by the NVIDIA
Container Runtime if the NVIDIA Container Runtime hook cannot be
located.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-19 15:29:29 +02:00
Evan Lezar
a9fb7a4a88 Merge branch 'remove-positional-arguments' into 'main'
Allow install root to be set as positional argument OR flag

See merge request nvidia/container-toolkit/container-toolkit!212
2022-09-16 09:36:17 +00:00
Evan Lezar
b5dbcaeaf9 Merge branch 'bump-post-release' into 'main'
Bump versions post release

See merge request nvidia/container-toolkit/container-toolkit!214
2022-09-14 15:12:09 +00:00
Evan Lezar
80a46d4a5c Bump version to 1.12.0-rc.1
This bumps the package versions to:

* nvidia-container-toolkit 1.12.0-rc.1
* nvidia-container-runtime 3.12.0-rc.1
* nvidia-docker2 2.12.0-rc.1

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-14 15:42:13 +02:00
Evan Lezar
febce822d5 Fix fedora35 test container repo URL
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-14 14:17:46 +02:00
Evan Lezar
e8099a713c Ensure that existing packages are not re-released
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-14 14:17:25 +02:00
Evan Lezar
d9de4a09b8 Merge branch 'bump-version-1.11.0' into 'main'
Bump version to 1.11.0

See merge request nvidia/container-toolkit/container-toolkit!213
2022-09-06 09:12:10 +00:00
Evan Lezar
2dbcda2619 Ensure that base package is built for debian
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-05 17:04:49 +02:00
Evan Lezar
691b93ffb0 Update libnvidia-container submodule
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-05 16:33:42 +02:00
Evan Lezar
cb0c94cd40 Bump version to v1.11.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-05 15:57:57 +02:00
Evan Lezar
3168718563 Update git commit command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-05 15:57:29 +02:00
Evan Lezar
dc8972a26a Allow install root to be set as flag
This change allows the destination / root to be set as the
first positional argument OR as a command line flag. This
allows for the GPU Operator to transition to a case where
on the flag / envvar is used.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-26 16:06:48 +02:00
Evan Lezar
0a2d8f4d22 Move destinationArg to options struct
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-26 15:04:07 +02:00
Evan Lezar
8d623967ed Move runtime flags to struct
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-26 14:59:57 +02:00
Evan Lezar
503ed96275 Merge branch 'fix-release-tooling' into 'main'
Ensure CLI versions are set correctly for RPM packages

See merge request nvidia/container-toolkit/container-toolkit!211
2022-08-24 10:45:38 +00:00
Evan Lezar
d8ba84d427 Add release tests for fedora35
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-24 11:57:20 +02:00
Evan Lezar
8e8c41a3bc Clean up repo test scripts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-24 11:57:20 +02:00
Evan Lezar
e34fe17b45 Add fedora35 to release and signing scripts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-24 11:57:20 +02:00
Evan Lezar
c5b0278c58 Ensure CLI versions are set correctly for RPM packages
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-24 11:57:20 +02:00
Evan Lezar
8daa257b35 Merge branch 'update-changelog' into 'main'
Add changelog for 1.11.0-rc.3

See merge request nvidia/container-toolkit/container-toolkit!210
2022-08-24 09:01:39 +00:00
Evan Lezar
6329174cfc Add changelog for 1.11.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-24 10:08:23 +02:00
Evan Lezar
1ec41c1bf1 Merge branch 'update-libnvidia-container' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!209
2022-08-23 16:52:09 +00:00
Evan Lezar
581a76de38 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 17:29:01 +02:00
Evan Lezar
5d52ca8909 Merge branch 'add-fedora35' into 'main'
Add fedora35 package targets

See merge request nvidia/container-toolkit/container-toolkit!205
2022-08-23 13:04:45 +00:00
Evan Lezar
ad7151d394 Update CUDA base image to 11.7.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
3269a7b0e7 Update libnvidia-container submodule
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
6a155cc606 Increase package build timeout to 3 hours for slow aarch64 builds
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
a5bbf613e8 Use single config file for centos, al2, and fedora
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
22427c1359 Add fedora35 CI targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
f17121fd6c Add fedora targets to release scripts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
256e37eb3f Add fedora35 package targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
bdfd123b9d Switch to single docker file yum-based rpm builds
This reuses the docker file for yum-based rpm distros (centos, amazonlinux)
instead of maintaining two files with the same contents.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Jon Mayo
3f7dce202a Merge branch 'remove-podman' into 'main'
Specify hook structure instead of importing Podman

See merge request nvidia/container-toolkit/container-toolkit!208
2022-08-22 15:25:40 +00:00
Evan Lezar
a6d21abe14 Merge branch 'add-package-with-no-libnvidia-container' into 'main'
Split nvidia-container-toolkit package

See merge request nvidia/container-toolkit/container-toolkit!195
2022-08-22 09:08:33 +00:00
Evan Lezar
d0f1fe2273 Use new packages in toolkit image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-19 12:38:17 +02:00
Evan Lezar
8de9593209 Split nvidia-container-toolkit package
This change splits the nvidia-container-toolkit package into the top-level package and
an nvidia-container-toolkit-base package.
The nvidia-container-toolkit-base package allows the NVIDIA Container Runtime and
NVIDIA Container Toolkit CLI to be installed on systems without requiring that the
NVIDIA Container Runtine Hook and the transitive dependencies included in the NVIDIA
Container Library and NVIDIA Container CLI also be installed.

This allows the runtime to be used on systems where the CSV or CDI mode of the runtime
is used exclusively.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-19 12:38:17 +02:00
Evan Lezar
64b2b50470 Fix centos8 test image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-19 12:36:52 +02:00
Evan Lezar
4dc1451c49 Fix indentation in makefile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-19 12:36:52 +02:00
Evan Lezar
211081ff25 Update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-19 10:28:00 +02:00
Evan Lezar
c1c1d5cf8e Specify hook structure instead of importing Podman
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-19 10:26:34 +02:00
Evan Lezar
e91ffef258 Merge branch 'fix-runtime-hook-rename' into 'main'
Fix cleanup of nvidia-container-toolkit link

See merge request nvidia/container-toolkit/container-toolkit!207
2022-08-18 12:51:51 +00:00
Evan Lezar
47c8aa3790 Fix cleanup of nvidia-container-toolkit link
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-18 14:06:08 +02:00
Evan Lezar
33b4e7fb0a Merge branch 'fix-containerd-tests' into 'main'
Fix image in containerd tests

See merge request nvidia/container-toolkit/container-toolkit!206
2022-08-12 13:46:24 +00:00
Evan Lezar
936da0295b Use proper cuda image for containerd tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-12 14:23:24 +02:00
Evan Lezar
c2205c14fb Update subcomponents
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-12 14:22:40 +02:00
Evan Lezar
56935f5743 Merge branch 'fix-mounts' into 'main'
Fix setting of toolkit config option in toolkit container

See merge request nvidia/container-toolkit/container-toolkit!204
2022-08-09 15:46:15 +00:00
Evan Lezar
1b3bae790c Update image used for containerd tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-09 16:55:51 +02:00
Evan Lezar
47559a8c87 Output applied config to toolkit container stdout
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-09 15:18:59 +02:00
Evan Lezar
86412ea821 Ensure that toolkit-container sets correct default value
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-09 15:18:52 +02:00
Evan Lezar
b8aa844171 Fix setting of toolkit config option in toolkit container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-09 15:18:52 +02:00
Evan Lezar
f9464c5cf9 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-09 15:18:52 +02:00
Evan Lezar
9df75e1fa3 Merge branch 'add-tegra-files-as-mounts' into 'main'
Add modifier to inject Tegra platform files

See merge request nvidia/container-toolkit/container-toolkit!203
2022-08-09 11:43:04 +00:00
Evan Lezar
0218e2ebf7 Update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-08 17:12:47 +02:00
Evan Lezar
a9dc6550d5 Use nvinfo package from go-nvlib
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-08 17:11:42 +02:00
Evan Lezar
ffd6ec3c54 Add modifier to inject Tegra platform files
This change adds a modifier to that injects the tegra platform files
* /etc/nv_tegra_release
* /sys/devices/soc0/family

allowing these files to be used for platform detection in a containerized
context such as the GPU device plugin.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-08 16:04:20 +02:00
Evan Lezar
de3e0df96c Merge branch 'bump-version-1.11.0-rc.3' into 'main'
Bump version to 1.11.0-rc.3

See merge request nvidia/container-toolkit/container-toolkit!202
2022-08-08 13:45:59 +00:00
Evan Lezar
e5dadf34d9 Bump version to 1.11.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-08 14:56:01 +02:00
Evan Lezar
52145f2d73 Merge branch 'fix-libnvidia-container-tag' into 'main'
Fix setting of LIBNVIDIA_CONTAINER_TAG

See merge request nvidia/container-toolkit/container-toolkit!201
2022-07-27 11:31:06 +00:00
Evan Lezar
90df3caf62 Fix setting of LIBNVIDIA_CONTAINER_TAG
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 13:30:31 +02:00
Evan Lezar
50db66a925 Merge branch 'release-1.11.0-rc.2' into 'main'
Add CHANGELOG entry for 1.11.0-rc.2

See merge request nvidia/container-toolkit/container-toolkit!200
2022-07-27 10:53:26 +00:00
Evan Lezar
8587fa05bd Add CHANGELOG entry for 1.11.0-rc.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 12:06:09 +02:00
Evan Lezar
8129dade3c Merge branch 'set-mount-devices' into 'main'
Allow accept-nvidia-visible-devices-* to be set by toolkit contianer

See merge request nvidia/container-toolkit/container-toolkit!198
2022-07-27 09:58:25 +00:00
Evan Lezar
3610fe7c33 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 11:12:57 +02:00
Evan Lezar
90518e0ce5 Allow accept-visible-devices config options to be set
This change allows the
* accept-nvidia-visible-devices-envvar-when-unprivileged
* accept-nvidia-visible-devices-as-volume-mounts

options to be set in the toolkit-container. These are controlled
by command line flags or the following environment variables:

* ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
* ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 10:57:43 +02:00
Evan Lezar
9c060f06ba Remove unused TOOLKIT_ARGS / --toolkit-args
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 10:50:18 +02:00
Evan Lezar
e848aa7813 Set toolkit root as flag
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 10:50:06 +02:00
Evan Lezar
feedc912e4 Rename toolkitDir toolkitRoot
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 10:50:05 +02:00
Evan Lezar
ab3f05cf62 Move global toolkitDir to options struct
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 10:41:46 +02:00
Evan Lezar
35982e51bf Move toolkit options to struct
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 10:40:19 +02:00
Evan Lezar
94e650c518 Merge branch 'bump-version' into 'main'
bump version to 1.11.0-rc.2

See merge request nvidia/container-toolkit/container-toolkit!197
2022-07-26 17:57:23 +00:00
Evan Lezar
d9edc18bf8 Bump version to 1.11.0-rc.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-25 09:51:20 +02:00
Evan Lezar
f4d01e0a05 Add changelog entries for 1.11.0-rc.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-25 09:51:01 +02:00
Evan Lezar
648cfaba51 Merge branch 'update-error-message' into 'main'
Make error message clearer

See merge request nvidia/container-toolkit/container-toolkit!194
2022-07-21 08:49:56 +00:00
Christopher Desiniotis
3a9de13f4e Apply 1 suggestion(s) to 1 file(s) 2022-07-21 08:03:39 +00:00
Evan Lezar
629a68937e Merge branch 'fix-relative-files' into 'main'
Fix adjusting relative paths for containerised devices and mounts.

See merge request nvidia/container-toolkit/container-toolkit!193
2022-07-20 11:40:28 +00:00
Evan Lezar
34e80abdea Add root to mounts type
This change adds a root member to the mounts type that is used to
perform most of the lookups for files and devices. This allows
for consistent handling of relative paths.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-18 14:37:02 +02:00
Evan Lezar
1161b21166 Make error message clearer
This change improves the error message when invoking the NVIDIA
Runtime Hook in non-legacy mode. This should guide users to specifying
the --runtime=nvidia flag when using docker.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-18 13:09:59 +02:00
Evan Lezar
bcdef81e30 Merge branch 'fix-ordering-of-csv-hooks' into 'main'
Fix ordering of create-symlink and update-ldcache hooks

See merge request nvidia/container-toolkit/container-toolkit!192
2022-07-18 10:59:41 +00:00
Evan Lezar
acc0afbb7a Remove Relative method from Locator
The Relative method added to the Locator interface was
not correctly implemented in the file type. The root was
never set when instantiating the object.

This change removes this method from the interface and the file
type, switching to a local implementation in the mounts type
instead.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-15 16:40:27 +02:00
Evan Lezar
7584044b3c Fix bug where ldcache may not contain symlinks
Since the creation of symlinks may include other libraries / folders
the ldcache should be updated AFTER the symlinks are created.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-15 12:18:40 +02:00
Evan Lezar
02c14e981c Add tests for identifying libraries
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-15 12:17:15 +02:00
Evan Lezar
37ee972f74 Merge branch 'CNT-2349/configure-docker' into 'main'
Add nvidia-ctk runtime configure command to update docker config

See merge request nvidia/container-toolkit/container-toolkit!166
2022-07-14 08:06:27 +00:00
Evan Lezar
3809407b6a Merge branch 'rename-to-nvidia-container-hook' into 'main'
Rename -toolkit executable to -runtime-hook

See merge request nvidia/container-toolkit/container-toolkit!189
2022-07-13 11:08:53 +00:00
Evan Lezar
f9547c447a Merge branch 'fix-cdi-refresh' into 'main'
Ensure that CDI registry is refreshed

See merge request nvidia/container-toolkit/container-toolkit!191
2022-07-13 09:38:45 +00:00
Evan Lezar
eb85d45137 Merge branch 'CNT-3297/cdi-config' into 'main'
Add runtime config option for CDI spec dirs

See merge request nvidia/container-toolkit/container-toolkit!190
2022-07-13 09:36:33 +00:00
Evan Lezar
9f0060f651 Add nvidia-ctk runtime configure command
This change adds a `runtime configure` command to the nvidia-ctk CLI. This
command is currently limited to configuring the docker config on the
system by modifying the daemon.json config file associated with docker.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-13 10:33:46 +02:00
Evan Lezar
0e6dc3f7ea Move docker config handling to internal package
In preparation for adding a command to the nvidia-ctk CLI to modify
the docker config, this change refactors load, update, and flush logic
from the toolkit container docker CLI to an internal package.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-13 10:30:01 +02:00
Evan Lezar
1b4944e1de Ensure that CDI registry is refreshed
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-12 14:07:21 +02:00
Evan Lezar
83743e3613 Add runtime config option for CDI spec dirs
This change adds an nvidia-container-runtime.modes.cdi.spec-dirs
config option that allows the default spec dirs to be overridden.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-11 15:39:48 +02:00
Evan Lezar
87afcc3ef4 Reuse check for existing hook
This change reuse the code that checks for the existing NVIDIA
Container Runtime hook to ensure that both nvidia-container-toolkit
and nvidia-container-runtime-hook are detected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-08 12:20:19 +02:00
Evan Lezar
6ed3a4e1a6 Update package descriptions and URLs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-08 12:16:03 +02:00
Evan Lezar
8a56671d18 Update package definitions
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-08 12:16:03 +02:00
Evan Lezar
1d81db76a6 Update references to nvidia-container-runtime-hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-08 12:15:56 +02:00
Evan Lezar
f50aecb84e Rename -toolkit executable to -runtime-hook
This change renames the nvidia-container-toolkit executable
to nvidia-container-runtime-hook. Here nvidia-container-toolkit
is created as a symlink to nvidia-container-runtime-hook.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-08 12:09:11 +02:00
Evan Lezar
a4258277e1 Merge branch 'update-release-script' into 'main'
Update release tooling to allow for rc release that don't update all packages.

See merge request nvidia/container-toolkit/container-toolkit!188
2022-07-07 14:33:27 +00:00
Evan Lezar
18eb3c7c38 Skip packages that already exist
For rc releases we allow nvidia-container-toolkit versions
to not match libnvidia-container versions. This change ensures
that only changed packages are released.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-07 15:41:20 +02:00
Evan Lezar
a0e728b5c8 Use centos:stream8 image for signing
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-07 15:40:53 +02:00
Evan Lezar
df0176cca4 Merge branch 'support-host-device-paths' into 'main'
Support device nodes with a different root

See merge request nvidia/container-toolkit/container-toolkit!187
2022-07-07 11:35:10 +00:00
Evan Lezar
b68b3c543b Use device host path to determine properties
This mirrors what is done in cri-o and allows for devices nodes
from, for example, the driver container to be injected into a
container at /dev instead of <ROOT>/dev

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-07 12:03:23 +02:00
Evan Lezar
aea1a85bb4 Update vendored runc version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-07 11:30:01 +02:00
Evan Lezar
98e874e750 Merge branch 'add-cdi-mode' into 'main'
Add CDI mode to NVIDIA Container Runtime

See merge request nvidia/container-toolkit/container-toolkit!172
2022-07-07 08:09:38 +00:00
Evan Lezar
eef016c27d Merge branch 'refactor-csv-discovery' into 'main'
Refactor device discovery

See merge request nvidia/container-toolkit/container-toolkit!185
2022-07-07 08:07:43 +00:00
Evan Lezar
19f89ecafd Update cdi package and run go mod vendor
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-06 16:53:38 +02:00
Evan Lezar
8817dee66c Add support for specifying devices in annotations
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-06 16:53:36 +02:00
Evan Lezar
404e266222 Add cdi mode to NVIDIA Container Runtime
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-06 16:53:05 +02:00
Evan Lezar
9b898c65fa Merge branch 'move-license-make-target' into 'main'
The licenses make target should not be a check target

See merge request nvidia/container-toolkit/container-toolkit!186
2022-07-06 13:14:43 +00:00
Evan Lezar
5c39cf4deb The licenses make target should not be a check target
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-06 14:24:11 +02:00
Evan Lezar
beff276a52 Add charDevices discoverer for devices
This change adds a charDevices discoverer and using this
for CSV, GDS, and MOFED discovery. Internally the discoverer
is a "mounts" discoverer with a charDevice locator.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-06 13:43:23 +02:00
Evan Lezar
55cb82c6c8 Create single discoverer per mount type for CSV
Instead of creating a set of discoverers per file, this change creates
a discoverer per type by first concatenating the mount specifications
from all files. This will allow all device nodes, for example, to
be treated as a single device.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-06 10:57:35 +02:00
Evan Lezar
88d1143827 Merge branch 'add-go-license' into 'main'
Add tooling to check go licenses

See merge request nvidia/container-toolkit/container-toolkit!183
2022-07-06 05:08:59 +00:00
Evan Lezar
d5162b1917 Add tooling to check go licenses
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-05 20:19:23 +02:00
Evan Lezar
ec078543a1 Merge branch 'rename-discover-merge' into 'main'
Rename discover.NewList to discover.Merge

See merge request nvidia/container-toolkit/container-toolkit!182
2022-07-05 09:37:03 +00:00
Evan Lezar
9191074666 Rename discover.NewList to discover.Merge
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-05 10:28:40 +02:00
Evan Lezar
89824849d3 Merge branch 'refactor-envvar-devices' into 'main'
Add DevicesFromEnvvars function to CUDA image abstraction

See merge request nvidia/container-toolkit/container-toolkit!178
2022-07-04 08:47:28 +00:00
Evan Lezar
877083f091 Merge branch 'CNT-3242/strip-root-from-container-mount' into 'main'
Strip root (e.g. driver root) from located mount paths in the container

See merge request nvidia/container-toolkit/container-toolkit!177
2022-07-04 08:45:38 +00:00
Evan Lezar
6467fcd0f5 Merge branch 'ensure-test-output-path-exists' into 'main'
Ensure test/output path exists

See merge request nvidia/container-toolkit/container-toolkit!180
2022-07-04 08:44:11 +00:00
Evan Lezar
fd135f1a8b Add Relative function to Locator interface
This adds a Relative function to the Locator interface and uses
this to determine the host and container paths for located files
(and devices). This ensures that the root (e.g. the nvidia driver
root) is stripped from the container path.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 16:23:50 +02:00
Evan Lezar
4e08ec2405 Use CUDA.DevicesFromEnvvar to check if modifications are required
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 16:14:36 +02:00
Evan Lezar
925c348565 Add DevicesFromEnvvars function to CUDA image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 16:12:13 +02:00
Kevin Klues
25fd1aaf7e Merge branch 'CNT-3084/include-cufile.json' into 'main'
Include cufile.json in GDS discovery

See merge request nvidia/container-toolkit/container-toolkit!175
2022-07-01 13:49:02 +00:00
Kevin Klues
91e645b91b Merge branch 'gds-poc' into 'main'
Add initial GDS and MOFED discovery

See merge request nvidia/container-toolkit/container-toolkit!163
2022-07-01 13:43:20 +00:00
Evan Lezar
a1c2f07b6e Add /etc/cufile.json to list of required mounts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:54:58 +02:00
Evan Lezar
7f7bec0668 Create GDS and MOFED modifiers
This change creates GDS and MOFED modifiers and adds them to the
modifer created for the selected runtime mode if the NVIDIA_GDS
and NVIDIA_MOFED envvars are set to "enabled", respectively.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:54:05 +02:00
Evan Lezar
cb34f7c6d1 Add discovery of GDS and MOFED devices
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:40:55 +02:00
Evan Lezar
7f47a61986 Allow globs in filenames for locators
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:30:33 +02:00
Evan Lezar
e8843c38f2 Move cmd/nvidia-container-runtime/modifier package to internal/modifier
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:28:40 +02:00
Evan Lezar
d66c00dd1d Use modifier list and discoverModifer
This change uses modifier compositioning and the discoverModifier to
refactor the existing CSV modifier.

This change adds a discoverModifier to the internal/modifier package and
refactors the CSV modifier to use this abstraction.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:25:19 +02:00
Evan Lezar
55ac8628c8 Add lists of modifiers to allow for modifier compositioning
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:25:18 +02:00
Evan Lezar
175f75b43f Ensure test/output path exists
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 10:07:37 +02:00
Evan Lezar
da3226745c Update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-22 10:58:09 +02:00
Evan Lezar
b23e3ea13a Merge branch 'bump-1.11.0-rc.1' into 'main'
Bump version to 1.11.0-rc.1

See merge request nvidia/container-toolkit/container-toolkit!170
2022-06-22 07:52:19 +00:00
Evan Lezar
02f0ee08fc Update nvidia-docker and nvidia-container-runtime
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-17 11:58:06 +02:00
Evan Lezar
4b0e79be50 Update nvidia-docker and nvidia-container-runtime branches to main
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-17 11:37:53 +02:00
Evan Lezar
8b729475e2 Allow any 1.* version of libnvidia-container package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-16 14:57:30 +02:00
Evan Lezar
a1319b1786 Switch to latest docker and docker dind in CI
This change prevents errors when downloading ubuntu repos on
amd64 architectures. The `stable` images were last pushed
2 years ago.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-16 13:44:14 +02:00
Evan Lezar
278fa43303 Allow libnvidia-container1 version to be specified directly
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-15 13:37:42 +02:00
Evan Lezar
d75f364b27 Update build scripts to set libnvidia-container version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-15 13:37:42 +02:00
Evan Lezar
52d5021b76 Bump version to 1.11.0-rc.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-15 13:37:40 +02:00
Kevin Klues
7cfd3bd510 Merge branch 'bump-v1.10.0' into 'main'
Bump version to v1.10.0

See merge request nvidia/container-toolkit/container-toolkit!169
2022-06-13 10:32:37 +00:00
Evan Lezar
05ca131858 Update libnvidia-container submodule to v1.10.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-13 11:40:18 +02:00
Evan Lezar
181ce8571d Bump version to v1.10.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-13 11:40:18 +02:00
Shiva Krishna Merla
2ab0c6abce Merge branch 'update_container_licenses' into 'main'
Update toolkit images to use NGC DL license

See merge request nvidia/container-toolkit/container-toolkit!164
2022-06-08 19:04:22 +00:00
Shiva Krishna Merla
50caf29b4e Update toolkit images to use NGC DL license 2022-06-08 19:04:21 +00:00
Evan Lezar
067f7af142 Merge branch 'update-nvidia-docker' into 'main'
Bump nvidia-docker version to 2.11.0

See merge request nvidia/container-toolkit/container-toolkit!167
2022-06-08 12:15:17 +00:00
Evan Lezar
d1449951bc Bump nvidia-docker version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-08 13:25:27 +02:00
Evan Lezar
a05af50b0f Merge branch 'bump-cuda-version' into 'main'
Bump CUDA base image version to 11.7.0

See merge request nvidia/container-toolkit/container-toolkit!162
2022-06-07 15:22:05 +00:00
Evan Lezar
950aff269b Merge branch 'bump-version-1.10.0-rc.4' into 'main'
Update NVIDIA Container Runtime readme and installed configs

See merge request nvidia/container-toolkit/container-toolkit!160
2022-06-07 15:05:48 +00:00
Evan Lezar
e033db559f Switch default container-toolkit image target to ubuntu20.04
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-07 11:32:20 +02:00
Evan Lezar
9a24a40fd2 Merge branch 'only-bump-version' into 'main'
Bump version to 1.10.0-rc.4

See merge request nvidia/container-toolkit/container-toolkit!165
2022-06-07 09:00:38 +00:00
Evan Lezar
df391e2144 Only generate amd64 images for ubuntu18.04
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-07 10:58:15 +02:00
Evan Lezar
9146b4d4b6 Remove build and release of centos8 container-toolkit images
Note that the centos8 packages are still built.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-07 10:58:15 +02:00
Evan Lezar
068d7e085b Use ubi8 base image for centos8
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-07 10:58:15 +02:00
Evan Lezar
79510a8290 Bump CUDA base image version to 11.7.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-07 10:58:15 +02:00
Evan Lezar
50240c93bd Update config files with options and defaults
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-03 13:10:24 +02:00
Evan Lezar
7ca0e5db60 Update NVIDIA Container Runtime readme
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-03 13:10:21 +02:00
Evan Lezar
c0e6765d46 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-01 15:29:25 +02:00
Evan Lezar
7739b0e8ea Bump version to 1.10.0-rc.4
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-01 14:46:12 +02:00
Evan Lezar
ab23fc52db Merge branch 'fix-binary-name' into 'main'
Use BinaryName for v1 containerd runtime config

See merge request nvidia/container-toolkit/container-toolkit!159
2022-05-30 07:53:42 +00:00
Evan Lezar
530d66b5c7 Also set default_runtime.options.BinaryName
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-27 16:21:52 +02:00
Evan Lezar
dad3e855b5 Also cleanup v1 default_runtime if BinaryName is set
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-27 16:18:57 +02:00
Evan Lezar
15cbd54d1c Also set Runtime file v1 containerd runtime config
This ensures that older versions of containerd that may be expecting
this over options.BinaryName should continue to work.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-26 06:26:06 +02:00
Evan Lezar
4cd719692e Use BinaryName for v1 containerd runtime config
This fixes a bug where the runtime path for v1 containerd configs
was specified in the options.Runtime setting (which is used
for the default runtime) instead of options.BinaryName.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-26 06:25:09 +02:00
Evan Lezar
b940294557 Merge branch 'CNT-2979/allow-empty-config' into 'main'
Return default config if config path is not found

See merge request nvidia/container-toolkit/container-toolkit!156
2022-05-25 12:20:51 +00:00
Evan Lezar
840cdec36d Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-25 13:23:21 +02:00
Evan Lezar
73a5b70a02 Return default config if config path is not found
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-25 13:22:45 +02:00
Evan Lezar
f0cae49892 Merge branch 'fix-jetpack-require' into 'main'
Ignore NVIDIA_REQUIRE_JETPACK* for image requirements

See merge request nvidia/container-toolkit/container-toolkit!158
2022-05-25 11:19:47 +00:00
Evan Lezar
e07c7f0fa2 Ignore NVIDIA_REQUIRE_JETPACK* for image requirements
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-24 09:53:37 +02:00
Evan Lezar
52ce97929c Merge branch 'fix-is-tegra-check' into 'main'
Fix bug in tegra detection

See merge request nvidia/container-toolkit/container-toolkit!157
2022-05-23 08:00:09 +00:00
Evan Lezar
084eae6e0d Fix bug in tegra detection
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-20 14:39:36 +02:00
Evan Lezar
f656b5c887 Merge branch 'fix-char-device' into 'main'
Fix assertCharDevice matching on all files

See merge request nvidia/container-toolkit/container-toolkit!155
2022-05-20 10:32:51 +00:00
Evan Lezar
55c1d7c256 Fix assertCharDevice matching on all files
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-20 10:08:00 +02:00
Evan Lezar
0f2b20fffc Merge branch 'auto-generate-changelog' into 'main'
Use single  changelog.md file instead of separate package-specific changelogs

See merge request nvidia/container-toolkit/container-toolkit!154
2022-05-20 08:03:19 +00:00
Evan Lezar
bb69727148 Include git commit in changelog URL
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-18 16:02:14 +02:00
Evan Lezar
0b4f3aaf69 Merge branch 'bump-1.10.0-rc.3' into 'main'
Bump version to 1.10.0-rc.3

See merge request nvidia/container-toolkit/container-toolkit!153
2022-05-18 13:46:41 +00:00
Evan Lezar
e5125515f0 Automatically generate changelogs in docker builds
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-18 14:54:58 +02:00
Evan Lezar
033b2fd90d Add dummy entry for rpm changelog matching other components
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-18 14:54:58 +02:00
Evan Lezar
a0a00e38fd Format CHANGELOG.md as markdown
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-18 14:54:58 +02:00
Evan Lezar
77cf70b625 Move debian changelog to CHANGELOG.md
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-18 14:54:58 +02:00
Evan Lezar
8ab3d713bc Update libnvidia-container version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-18 14:53:29 +02:00
Evan Lezar
c58d81cec5 Bump version to 1.10.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-05-18 13:38:54 +02:00
570 changed files with 88408 additions and 24684 deletions

View File

@@ -1,4 +1,4 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -12,9 +12,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.
default:
image: docker:stable
image: docker
services:
- name: docker:stable-dind
- name: docker:dind
command: ["--experimental"]
variables:
@@ -34,51 +34,91 @@ stages:
- scan
- release
.main-or-manual:
rules:
- if: $CI_COMMIT_BRANCH == "main"
- if: $CI_COMMIT_TAG && $CI_COMMIT_TAG != ""
- if: $CI_PIPELINE_SOURCE == "schedule"
when: manual
# Define the distribution targets
.dist-amazonlinux2:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: amazonlinux2
PACKAGE_REPO_TYPE: rpm
.dist-centos7:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: centos7
CVE_UPDATES: "cyrus-sasl-lib"
PACKAGE_REPO_TYPE: rpm
.dist-centos8:
variables:
DIST: centos8
CVE_UPDATES: "cyrus-sasl-lib"
PACKAGE_REPO_TYPE: rpm
.dist-debian10:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: debian10
PACKAGE_REPO_TYPE: debian
.dist-debian9:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: debian9
PACKAGE_REPO_TYPE: debian
.dist-fedora35:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: fedora35
PACKAGE_REPO_TYPE: rpm
.dist-opensuse-leap15.1:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: opensuse-leap15.1
PACKAGE_REPO_TYPE: rpm
.dist-ubi8:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: ubi8
CVE_UPDATES: "cyrus-sasl-lib"
PACKAGE_REPO_TYPE: rpm
.dist-ubuntu16.04:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: ubuntu16.04
PACKAGE_REPO_TYPE: debian
.dist-ubuntu18.04:
variables:
DIST: ubuntu18.04
CVE_UPDATES: "libsasl2-2 libsasl2-modules-db"
PACKAGE_REPO_TYPE: debian
.dist-ubuntu20.04:
rules:
- !reference [.main-or-manual, rules]
variables:
DIST: ubuntu20.04
CVE_UPDATES: "libsasl2-2 libsasl2-modules-db"
PACKAGE_REPO_TYPE: debian
.dist-packaging:
variables:
@@ -98,6 +138,8 @@ stages:
ARCH: arm64
.arch-ppc64le:
rules:
- !reference [.main-or-manual, rules]
variables:
ARCH: ppc64le
@@ -138,7 +180,7 @@ test-packaging:
# Download the regctl binary for use in the release steps
.regctl-setup:
before_script:
- export REGCTL_VERSION=v0.3.10
- export REGCTL_VERSION=v0.4.5
- apk add --no-cache curl
- mkdir -p bin
- curl -sSLo bin/regctl https://github.com/regclient/regclient/releases/download/${REGCTL_VERSION}/regctl-linux-amd64
@@ -210,13 +252,6 @@ release:staging-centos7:
needs:
- image-centos7
release:staging-centos8:
extends:
- .release:staging
- .dist-centos8
needs:
- image-centos8
release:staging-ubi8:
extends:
- .release:staging
@@ -239,10 +274,7 @@ release:staging-ubuntu20.04:
- .release:staging
- .dist-ubuntu20.04
needs:
- test-toolkit-ubuntu20.04
- test-containerd-ubuntu20.04
- test-crio-ubuntu20.04
- test-docker-ubuntu20.04
- image-ubuntu20.04
release:staging-packaging:
extends:

2
.gitignore vendored
View File

@@ -4,6 +4,8 @@ dist
/coverage.out*
/test/output/
/nvidia-container-runtime
/nvidia-container-runtime-hook
/nvidia-container-toolkit
/nvidia-ctk
/shared-*
/release-*

View File

@@ -1,4 +1,4 @@
# Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -94,7 +94,7 @@ unit-tests:
- .multi-arch-build
- .package-artifacts
stage: package-build
timeout: 2h 30m
timeout: 3h
script:
- ./scripts/build-packages.sh ${DIST}-${ARCH}
@@ -158,6 +158,18 @@ package-debian9-amd64:
- .dist-debian9
- .arch-amd64
package-fedora35-aarch64:
extends:
- .package-build
- .dist-fedora35
- .arch-aarch64
package-fedora35-x86_64:
extends:
- .package-build
- .dist-fedora35
- .arch-x86_64
package-opensuse-leap15.1-x86_64:
extends:
- .package-build
@@ -216,7 +228,7 @@ package-ubuntu18.04-ppc64le:
before_script:
- !reference [.buildx-setup, before_script]
- apk add --no-cache bash make
- apk add --no-cache bash make git
- 'echo "Logging in to CI registry ${CI_REGISTRY}"'
- docker login -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}" "${CI_REGISTRY}"
script:
@@ -231,16 +243,6 @@ image-centos7:
- package-centos7-ppc64le
- package-centos7-x86_64
image-centos8:
extends:
- .image-build
- .package-artifacts
- .dist-centos8
needs:
- package-centos8-aarch64
- package-centos8-x86_64
- package-centos8-ppc64le
image-ubi8:
extends:
- .image-build
@@ -260,7 +262,8 @@ image-ubuntu18.04:
needs:
- package-ubuntu18.04-amd64
- package-ubuntu18.04-arm64
- package-ubuntu18.04-ppc64le
- job: package-ubuntu18.04-ppc64le
optional: true
image-ubuntu20.04:
extends:
@@ -279,21 +282,36 @@ image-packaging:
- .package-artifacts
- .dist-packaging
needs:
- package-amazonlinux2-aarch64
- package-amazonlinux2-x86_64
- package-centos7-ppc64le
- package-centos7-x86_64
- package-centos8-aarch64
- package-centos8-ppc64le
- package-centos8-x86_64
- package-debian10-amd64
- package-debian9-amd64
- package-opensuse-leap15.1-x86_64
- package-ubuntu16.04-amd64
- package-ubuntu16.04-ppc64le
- package-ubuntu18.04-amd64
- package-ubuntu18.04-arm64
- package-ubuntu18.04-ppc64le
- job: package-centos8-aarch64
- job: package-centos8-x86_64
- job: package-ubuntu18.04-amd64
- job: package-ubuntu18.04-arm64
- job: package-amazonlinux2-aarch64
optional: true
- job: package-amazonlinux2-x86_64
optional: true
- job: package-centos7-ppc64le
optional: true
- job: package-centos7-x86_64
optional: true
- job: package-centos8-ppc64le
optional: true
- job: package-debian10-amd64
optional: true
- job: package-debian9-amd64
optional: true
- job: package-fedora35-aarch64
optional: true
- job: package-fedora35-x86_64
optional: true
- job: package-opensuse-leap15.1-x86_64
optional: true
- job: package-ubuntu16.04-amd64
optional: true
- job: package-ubuntu16.04-ppc64le
optional: true
- job: package-ubuntu18.04-ppc64le
optional: true
# Define publish test helpers
.test:toolkit:
@@ -353,31 +371,3 @@ test-docker-ubuntu18.04:
needs:
- image-ubuntu18.04
test-toolkit-ubuntu20.04:
extends:
- .test:toolkit
- .dist-ubuntu20.04
needs:
- image-ubuntu20.04
test-containerd-ubuntu20.04:
extends:
- .test:containerd
- .dist-ubuntu20.04
needs:
- image-ubuntu20.04
test-crio-ubuntu20.04:
extends:
- .test:crio
- .dist-ubuntu20.04
needs:
- image-ubuntu20.04
test-docker-ubuntu20.04:
extends:
- .test:docker
- .dist-ubuntu20.04
needs:
- image-ubuntu20.04

2
.gitmodules vendored
View File

@@ -5,6 +5,8 @@
[submodule "third_party/nvidia-container-runtime"]
path = third_party/nvidia-container-runtime
url = https://gitlab.com/nvidia/container-toolkit/container-runtime.git
branch = main
[submodule "third_party/nvidia-docker"]
path = third_party/nvidia-docker
url = https://gitlab.com/nvidia/container-toolkit/nvidia-docker.git
branch = main

View File

@@ -1,4 +1,4 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -35,6 +35,9 @@ variables:
# Define the public staging registry
STAGING_REGISTRY: registry.gitlab.com/nvidia/container-toolkit/container-toolkit/staging
STAGING_VERSION: ${CI_COMMIT_SHORT_SHA}
ARTIFACTORY_REPO_BASE: "https://urm.nvidia.com/artifactory/sw-gpu-cloudnative"
# TODO: We should set the kitmaker release folder here once we have the end-to-end workflow set up
KITMAKER_RELEASE_FOLDER: "testing"
.image-pull:
stage: image-build
@@ -48,8 +51,9 @@ variables:
OUT_IMAGE_NAME: "${CI_REGISTRY_IMAGE}/container-toolkit"
PUSH_MULTIPLE_TAGS: "false"
# We delay the job start to allow the public pipeline to generate the required images.
when: delayed
start_in: 30 minutes
rules:
- when: delayed
start_in: 30 minutes
timeout: 30 minutes
retry:
max: 2
@@ -67,34 +71,29 @@ variables:
image-centos7:
extends:
- .image-pull
- .dist-centos7
image-centos8:
extends:
- .image-pull
- .dist-centos8
image-ubi8:
extends:
- .image-pull
- .dist-ubi8
- .image-pull
image-ubuntu18.04:
extends:
- .image-pull
- .dist-ubuntu18.04
- .image-pull
image-ubuntu20.04:
extends:
- .image-pull
- .dist-ubuntu20.04
- .image-pull
# The DIST=packaging target creates an image containing all built packages
image-packaging:
extends:
- .image-pull
- .dist-packaging
- .image-pull
# We skip the integration tests for the internal CI:
.integration:
@@ -112,9 +111,9 @@ image-packaging:
variables:
IMAGE: "${CI_REGISTRY_IMAGE}/container-toolkit:${CI_COMMIT_SHORT_SHA}-${DIST}"
IMAGE_ARCHIVE: "container-toolkit.tar"
except:
variables:
- $SKIP_SCANS && $SKIP_SCANS == "yes"
rules:
- if: $SKIP_SCANS != "yes"
- when: manual
before_script:
- docker login -u "${CI_REGISTRY_USER}" -p "${CI_REGISTRY_PASSWORD}" "${CI_REGISTRY}"
# TODO: We should specify the architecture here and scan all architectures
@@ -139,85 +138,59 @@ image-packaging:
# Define the scan targets
scan-centos7-amd64:
extends:
- .scan
- .dist-centos7
- .platform-amd64
- .scan
needs:
- image-centos7
scan-centos7-arm64:
extends:
- .scan
- .dist-centos7
- .platform-arm64
- .scan
needs:
- image-centos7
- scan-centos7-amd64
scan-centos8-amd64:
extends:
- .scan
- .dist-centos8
- .platform-amd64
needs:
- image-centos8
scan-centos8-arm64:
extends:
- .scan
- .dist-centos8
- .platform-arm64
needs:
- image-centos8
- scan-centos8-amd64
scan-ubuntu18.04-amd64:
extends:
- .scan
- .dist-ubuntu18.04
- .platform-amd64
needs:
- image-ubuntu18.04
scan-ubuntu18.04-arm64:
extends:
- .scan
- .dist-ubuntu18.04
- .platform-arm64
needs:
- image-ubuntu18.04
- scan-ubuntu18.04-amd64
scan-ubuntu20.04-amd64:
extends:
- .scan
- .dist-ubuntu20.04
- .platform-amd64
- .scan
needs:
- image-ubuntu20.04
scan-ubuntu20.04-arm64:
extends:
- .scan
- .dist-ubuntu20.04
- .platform-arm64
- .scan
needs:
- image-ubuntu20.04
- scan-ubuntu20.04-amd64
scan-ubi8-amd64:
extends:
- .scan
- .dist-ubi8
- .platform-amd64
- .scan
needs:
- image-ubi8
scan-ubi8-arm64:
extends:
- .scan
- .dist-ubi8
- .platform-arm64
- .scan
needs:
- image-ubi8
- scan-ubi8-amd64
@@ -232,6 +205,31 @@ scan-ubi8-arm64:
OUT_REGISTRY: "${NGC_REGISTRY}"
OUT_IMAGE_NAME: "${NGC_REGISTRY_IMAGE}"
.release:packages:
stage: release
needs:
- image-packaging
variables:
VERSION: "${CI_COMMIT_SHORT_SHA}"
PACKAGE_REGISTRY: "${CI_REGISTRY}"
PACKAGE_REGISTRY_USER: "${CI_REGISTRY_USER}"
PACKAGE_REGISTRY_TOKEN: "${CI_REGISTRY_PASSWORD}"
PACKAGE_IMAGE_NAME: "${CI_REGISTRY_IMAGE}/container-toolkit"
PACKAGE_IMAGE_TAG: "${CI_COMMIT_SHORT_SHA}-packaging"
KITMAKER_ARTIFACTORY_REPO: "${ARTIFACTORY_REPO_BASE}-generic-local/${KITMAKER_RELEASE_FOLDER}"
script:
- !reference [.regctl-setup, before_script]
- apk add --no-cache bash git
- regctl registry login "${PACKAGE_REGISTRY}" -u "${PACKAGE_REGISTRY_USER}" -p "${PACKAGE_REGISTRY_TOKEN}"
- ./scripts/extract-packages.sh "${PACKAGE_IMAGE_NAME}:${PACKAGE_IMAGE_TAG}"
# TODO: ./scripts/release-packages-artifactory.sh "${DIST}-${ARCH}" "${PACKAGE_ARTIFACTORY_REPO}"
- ./scripts/release-kitmaker-artifactory.sh "${KITMAKER_ARTIFACTORY_REPO}"
# Define the package release targets
release:packages:kitmaker:
extends:
- .release:packages
release:staging-ubuntu18.04:
extends:
- .release:staging
@@ -239,36 +237,24 @@ release:staging-ubuntu18.04:
needs:
- image-ubuntu18.04
release:staging-ubuntu20.04:
extends:
- .release:staging
- .dist-ubuntu20.04
needs:
- image-ubuntu20.04
# Define the external release targets
# Release to NGC
release:ngc-centos7:
extends:
- .release:ngc
- .dist-centos7
release:ngc-centos8:
extends:
- .release:ngc
- .dist-centos8
release:ngc-ubuntu18.04:
extends:
- .release:ngc
- .dist-ubuntu18.04
- .release:ngc
release:ngc-ubuntu20.04:
extends:
- .release:ngc
- .dist-ubuntu20.04
- .release:ngc
release:ngc-ubi8:
extends:
- .release:ngc
- .dist-ubi8
- .release:ngc

226
CHANGELOG.md Normal file
View File

@@ -0,0 +1,226 @@
# NVIDIA Container Toolkit Changelog
## v1.12.0-rc.4
* Generate a minimum CDI spec version for improved compatibility.
* Add `--device-name-strategy` options to the `nvidia-ctk cdi generate` command that can be used to control how device names are constructed.
* Set default for CDI device name generation to `index` to generate device names such as `nvidia.com/gpu=0` or `nvidia.com/gpu=1:0` by default.
## v1.12.0-rc.3
* Don't fail if by-path symlinks for DRM devices do not exist
* Replace the --json flag with a --format [json|yaml] flag for the nvidia-ctk cdi generate command
* Ensure that the CDI output folder is created if required
* When generating a CDI specification use a blank host path for devices to ensure compatibility with the v0.4.0 CDI specification
* Add injection of Wayland JSON files
* Add GSP firmware paths to generated CDI specification
* Add --root flag to nvidia-ctk cdi generate command
## v1.12.0-rc.2
* Inject Direct Rendering Manager (DRM) devices into a container using the NVIDIA Container Runtime
* Improve logging of errors from the NVIDIA Container Runtime
* Improve CDI specification generation to support rootless podman
* Use `nvidia-ctk cdi generate` to generate CDI specifications instead of `nvidia-ctk info generate-cdi`
* [libnvidia-container] Skip creation of existing files when these are already mounted
## v1.12.0-rc.1
* Add support for multiple Docker Swarm resources
* Improve injection of Vulkan configurations and libraries
* Add `nvidia-ctk info generate-cdi` command to generated CDI specification for available devices
* [libnvidia-container] Include NVVM compiler library in compute libs
## v1.11.0
* Promote v1.11.0-rc.3 to v1.11.0
## v1.11.0-rc.3
* Build fedora35 packages
* Introduce an `nvidia-container-toolkit-base` package for better dependency management
* Fix removal of `nvidia-container-runtime-hook` on RPM-based systems
* Inject platform files into container on Tegra-based systems
* [toolkit container] Update CUDA base images to 11.7.1
* [libnvidia-container] Preload libgcc_s.so.1 on arm64 systems
## v1.11.0-rc.2
* Allow `accept-nvidia-visible-devices-*` config options to be set by toolkit container
* [libnvidia-container] Fix bug where LDCache was not updated when the `--no-pivot-root` option was specified
## v1.11.0-rc.1
* Add discovery of GPUDirect Storage (`nvidia-fs*`) devices if the `NVIDIA_GDS` environment variable of the container is set to `enabled`
* Add discovery of MOFED Infiniband devices if the `NVIDIA_MOFED` environment variable of the container is set to `enabled`
* Fix bug in CSV mode where libraries listed as `sym` entries in mount specification are not added to the LDCache.
* Rename `nvidia-container-toolkit` executable to `nvidia-container-runtime-hook` and create `nvidia-container-toolkit` as a symlink to `nvidia-container-runtime-hook` instead.
* Add `nvidia-ctk runtime configure` command to configure the Docker config file (e.g. `/etc/docker/daemon.json`) for use with the NVIDIA Container Runtime.
## v1.10.0
* Promote v1.10.0-rc.3 to v1.10.0
## v1.10.0-rc.3
* Use default config instead of raising an error if config file cannot be found
* Ignore NVIDIA_REQUIRE_JETPACK* environment variables for requirement checks
* Fix bug in detection of Tegra systems where `/sys/devices/soc0/family` is ignored
* Fix bug where links to devices were detected as devices
* [libnvida-container] Fix bug introduced when adding libcudadebugger.so to list of libraries
## v1.10.0-rc.2
* Add support for NVIDIA_REQUIRE_* checks for cuda version and arch to csv mode
* Switch to debug logging to reduce log verbosity
* Support logging to logs requested in command line
* Fix bug when launching containers with relative root path (e.g. using containerd)
* Allow low-level runtime path to be set explicitly as nvidia-container-runtime.runtimes option
* Fix failure to locate low-level runtime if PATH envvar is unset
* Replace experimental option for NVIDIA Container Runtime with nvidia-container-runtime.mode = csv option
* Use csv as default mode on Tegra systems without NVML
* Add --version flag to all CLIs
* [libnvidia-container] Bump libtirpc to 1.3.2
* [libnvidia-container] Fix bug when running host ldconfig using glibc compiled with a non-standard prefix
* [libnvidia-container] Add libcudadebugger.so to list of compute libraries
## v1.10.0-rc.1
* Include nvidia-ctk CLI in installed binaries
* Add experimental option to NVIDIA Container Runtime
## v1.9.0
* [libnvidia-container] Add additional check for Tegra in /sys/.../family file in CLI
* [libnvidia-container] Update jetpack-specific CLI option to only load Base CSV files by default
* [libnvidia-container] Fix bug (from 1.8.0) when mounting GSP firmware into containers without /lib to /usr/lib symlinks
* [libnvidia-container] Update nvml.h to CUDA 11.6.1 nvML_DEV 11.6.55
* [libnvidia-container] Update switch statement to include new brands from latest nvml.h
* [libnvidia-container] Process all --require flags on Jetson platforms
* [libnvidia-container] Fix long-standing issue with running ldconfig on Debian systems
## v1.8.1
* [libnvidia-container] Fix bug in determining cgroup root when running in nested containers
* [libnvidia-container] Fix permission issue when determining cgroup version
## v1.8.0
* Promote 1.8.0-rc.2-1 to 1.8.0
## v1.8.0-rc.2
* Remove support for building amazonlinux1 packages
## v1.8.0-rc.1
* [libnvidia-container] Add support for cgroupv2
* Release toolkit-container images from nvidia-container-toolkit repository
## v1.7.0
* Promote 1.7.0-rc.1-1 to 1.7.0
* Bump Golang version to 1.16.4
## v1.7.0-rc.1
* Specify containerd runtime type as string in config tools to remove dependency on containerd package
* Add supported-driver-capabilities config option to allow for a subset of all driver capabilities to be specified
## v1.6.0
* Promote 1.6.0-rc.3-1 to 1.6.0
* Fix unnecessary logging to stderr instead of configured nvidia-container-runtime log file
## v1.6.0-rc.3
* Add supported-driver-capabilities config option to the nvidia-container-toolkit
* Move OCI and command line checks for runtime to internal oci package
## v1.6.0-rc.2
* Use relative path to OCI specification file (config.json) if bundle path is not specified as an argument to the nvidia-container-runtime
## v1.6.0-rc.1
* Add AARCH64 package for Amazon Linux 2
* Include nvidia-container-runtime into nvidia-container-toolkit package
## v1.5.1
* Fix bug where Docker Swarm device selection is ignored if NVIDIA_VISIBLE_DEVICES is also set
* Improve unit testing by using require package and adding coverage reports
* Remove unneeded go dependencies by running go mod tidy
* Move contents of pkg directory to cmd for CLI tools
* Ensure make binary target explicitly sets GOOS
## v1.5.0
* Add dependence on libnvidia-container-tools >= 1.4.0
* Add golang check targets to Makefile
* Add Jenkinsfile definition for build targets
* Move docker.mk to docker folder
## v1.4.2
* Add dependence on libnvidia-container-tools >= 1.3.3
## v1.4.1
* Ignore NVIDIA_VISIBLE_DEVICES for containers with insufficent privileges
* Add dependence on libnvidia-container-tools >= 1.3.2
## v1.4.0
* Add 'compute' capability to list of defaults
* Add dependence on libnvidia-container-tools >= 1.3.1
## v1.3.0
* Promote 1.3.0-rc.2-1 to 1.3.0
* Add dependence on libnvidia-container-tools >= 1.3.0
## v1.3.0-rc.2
* 2c180947 Add more tests for new semantics with device list from volume mounts
* 7c003857 Refactor accepting device lists from volume mounts as a boolean
## v1.3.0-rc.1
* b50d86c1 Update build system to accept a TAG variable for things like rc.x
* fe65573b Add common CI tests for things like golint, gofmt, unit tests, etc.
* da6fbb34 Revert "Add ability to merge envars of the form NVIDIA_VISIBLE_DEVICES_*"
* a7fb3330 Flip build-all targets to run automatically on merge requests
* 8b248b66 Rename github.com/NVIDIA/container-toolkit to nvidia-container-toolkit
* da36874e Add new config options to pull device list from mounted files instead of ENVVAR
## v1.2.1
* 4e6e0ed4 Add 'ngx' to list of*all* driver capabilities
* 2f4af743 List config.toml as a config file in the RPM SPEC
## v1.2.0
* 8e0aab46 Fix repo listed in changelog for debian distributions
* 320bb6e4 Update dependence on libnvidia-container to 1.2.0
* 6cfc8097 Update package license to match source license
* e7dc3cbb Fix debian copyright file
* d3aee3e0 Add the 'ngx' driver capability
## v1.1.2
* c32237f3 Add support for parsing Linux Capabilities for older OCI specs
## v1.1.1
* d202aded Update dependence to libnvidia-container 1.1.1
## v1.1.0
* 4e4de762 Update build system to support multi-arch builds
* fcc1d116 Add support for MIG (Multi-Instance GPUs)
* d4ff0416 Add ability to merge envars of the form NVIDIA_VISIBLE_DEVICES_*
* 60f165ad Add no-pivot option to toolkit
## v1.0.5
* Initial release. Replaces older package nvidia-container-runtime-hook. (Closes: #XXXXXX)

View File

@@ -39,7 +39,7 @@ CMDS := $(patsubst ./cmd/%/,%,$(sort $(dir $(wildcard ./cmd/*/))))
CMD_TARGETS := $(patsubst %,cmd-%, $(CMDS))
CHECK_TARGETS := assert-fmt vet lint ineffassign misspell
MAKE_TARGETS := binaries build check fmt lint-internal test examples cmds coverage generate $(CHECK_TARGETS)
MAKE_TARGETS := binaries build check fmt lint-internal test examples cmds coverage generate licenses $(CHECK_TARGETS)
TARGETS := $(MAKE_TARGETS) $(EXAMPLE_TARGETS) $(CMD_TARGETS)
@@ -51,6 +51,7 @@ CLI_VERSION = $(LIB_VERSION)$(if $(LIB_TAG),-$(LIB_TAG))
else
CLI_VERSION = $(VERSION)
endif
CLI_VERSION_PACKAGE = github.com/NVIDIA/nvidia-container-toolkit/internal/info
GOOS ?= linux
@@ -60,7 +61,7 @@ cmd-%: COMMAND_BUILD_OPTIONS = -o $(PREFIX)/$(*)
endif
cmds: $(CMD_TARGETS)
$(CMD_TARGETS): cmd-%:
GOOS=$(GOOS) go build -ldflags "-s -w -X github.com/NVIDIA/nvidia-container-toolkit/internal/info.gitCommit=$(GIT_COMMIT) -X github.com/NVIDIA/nvidia-container-toolkit/internal/info.version=$(CLI_VERSION)" $(COMMAND_BUILD_OPTIONS) $(MODULE)/cmd/$(*)
GOOS=$(GOOS) go build -ldflags "-s -w -X $(CLI_VERSION_PACKAGE).gitCommit=$(GIT_COMMIT) -X $(CLI_VERSION_PACKAGE).version=$(CLI_VERSION)" $(COMMAND_BUILD_OPTIONS) $(MODULE)/cmd/$(*)
build:
GOOS=$(GOOS) go build ./...
@@ -102,6 +103,9 @@ misspell:
vet:
go vet $(MODULE)/...
licenses:
go-licenses csv $(MODULE)/...
COVERAGE_FILE := coverage.out
test: build cmds
go test -v -coverprofile=$(COVERAGE_FILE) $(MODULE)/...

View File

@@ -67,9 +67,9 @@ ARG TARGETARCH
ENV PACKAGE_ARCH ${TARGETARCH}
RUN PACKAGE_ARCH=${PACKAGE_ARCH/amd64/x86_64} && PACKAGE_ARCH=${PACKAGE_ARCH/arm64/aarch64} && \
yum localinstall -y \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container1-${PACKAGE_VERSION}*.rpm \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container-tools-${PACKAGE_VERSION}*.rpm \
${PACKAGE_DIST}/${PACKAGE_ARCH}/nvidia-container-toolkit-${PACKAGE_VERSION}*.rpm
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container1-1.*.rpm \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container-tools-1.*.rpm \
${PACKAGE_DIST}/${PACKAGE_ARCH}/nvidia-container-toolkit*-${PACKAGE_VERSION}*.rpm
WORKDIR /work
@@ -85,7 +85,7 @@ LABEL release="N/A"
LABEL summary="Automatically Configure your Container Runtime for GPU support."
LABEL description="See summary"
COPY ./LICENSE /licenses/LICENSE
RUN mkdir /licenses && mv /NGC-DL-CONTAINER-LICENSE /licenses/NGC-DL-CONTAINER-LICENSE
# Install / upgrade packages here that are required to resolve CVEs
ARG CVE_UPDATES

View File

@@ -15,15 +15,23 @@
ARG BASE_DIST
ARG CUDA_VERSION
ARG GOLANG_VERSION=x.x.x
ARG VERSION="N/A"
FROM nvidia/cuda:${CUDA_VERSION}-base-${BASE_DIST}
ENV NVIDIA_CONTAINER_TOOLKIT_VERSION="${VERSION}"
ARG VERSION="N/A"
ARG ARTIFACTS_ROOT
COPY ${ARTIFACTS_ROOT} /artifacts/packages/
WORKDIR /artifacts/packages
COPY ./LICENSE /licenses/LICENSE
ARG GIT_BRANCH
ARG GIT_COMMIT
ARG SOURCE_DATE_EPOCH
# Create a manifest.txt file with the absolute paths of all deb and rpm packages in the container
RUN echo "IMAGE_EPOCH=$(date '+%s')" | sed 's/^/#/g' > /artifacts/manifest.txt && \
env | sed 's/^/#/g' >> /artifacts/manifest.txt && \
find /artifacts/packages -iname '*.deb' -o -iname '*.rpm' >> /artifacts/manifest.txt
RUN mkdir /licenses && mv /NGC-DL-CONTAINER-LICENSE /licenses/NGC-DL-CONTAINER-LICENSE

View File

@@ -75,9 +75,9 @@ RUN if [ "${PACKAGE_ARCH}" = "arm64" ]; then \
fi
RUN dpkg -i \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container1_${PACKAGE_VERSION}*.deb \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container-tools_${PACKAGE_VERSION}*.deb \
${PACKAGE_DIST}/${PACKAGE_ARCH}/nvidia-container-toolkit_${PACKAGE_VERSION}*.deb
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container1_1.*.deb \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container-tools_1.*.deb \
${PACKAGE_DIST}/${PACKAGE_ARCH}/nvidia-container-toolkit*_${PACKAGE_VERSION}*.deb
WORKDIR /work
@@ -93,7 +93,7 @@ LABEL release="N/A"
LABEL summary="Automatically Configure your Container Runtime for GPU support."
LABEL description="See summary"
COPY ./LICENSE /licenses/LICENSE
RUN mkdir /licenses && mv /NGC-DL-CONTAINER-LICENSE /licenses/NGC-DL-CONTAINER-LICENSE
# Install / upgrade packages here that are required to resolve CVEs
ARG CVE_UPDATES

View File

@@ -43,8 +43,8 @@ OUT_IMAGE_TAG = $(OUT_IMAGE_VERSION)-$(DIST)
OUT_IMAGE = $(OUT_IMAGE_NAME):$(OUT_IMAGE_TAG)
##### Public rules #####
DEFAULT_PUSH_TARGET := ubuntu18.04
DISTRIBUTIONS := ubuntu20.04 ubuntu18.04 ubi8 centos7 centos8
DEFAULT_PUSH_TARGET := ubuntu20.04
DISTRIBUTIONS := ubuntu20.04 ubuntu18.04 ubi8 centos7
META_TARGETS := packaging
@@ -94,6 +94,9 @@ $(BUILD_TARGETS): build-%: $(ARTIFACTS_ROOT)
--build-arg PACKAGE_DIST="$(PACKAGE_DIST)" \
--build-arg PACKAGE_VERSION="$(PACKAGE_VERSION)" \
--build-arg VERSION="$(VERSION)" \
--build-arg GIT_COMMIT="$(GIT_COMMIT)" \
--build-arg GIT_BRANCH="$(GIT_BRANCH)" \
--build-arg SOURCE_DATE_EPOCH="$(SOURCE_DATE_EPOCH)" \
--build-arg CVE_UPDATES="$(CVE_UPDATES)" \
-f $(DOCKERFILE) \
$(CURDIR)
@@ -110,10 +113,10 @@ build-ubi8: DOCKERFILE_SUFFIX := centos
build-ubi8: PACKAGE_DIST = centos8
build-ubi8: PACKAGE_VERSION := $(LIB_VERSION)-$(if $(LIB_TAG),0.1.$(LIB_TAG),1)
build-centos%: BASE_DIST = $(*)
build-centos%: DOCKERFILE_SUFFIX := centos
build-centos%: PACKAGE_DIST = $(BASE_DIST)
build-centos%: PACKAGE_VERSION := $(LIB_VERSION)-$(if $(LIB_TAG),0.1.$(LIB_TAG),1)
build-centos7: BASE_DIST = $(*)
build-centos7: DOCKERFILE_SUFFIX := centos
build-centos7: PACKAGE_DIST = $(BASE_DIST)
build-centos7: PACKAGE_VERSION := $(LIB_VERSION)-$(if $(LIB_TAG),0.1.$(LIB_TAG),1)
build-packaging: BASE_DIST := ubuntu20.04
build-packaging: DOCKERFILE_SUFFIX := packaging

View File

@@ -30,5 +30,8 @@ push-short:
# We only have x86_64 packages for centos7
build-centos7: DOCKER_BUILD_PLATFORM_OPTIONS = --platform=linux/amd64
# We only generate amd64 image for ubuntu18.04
build-ubuntu18.04: DOCKER_BUILD_PLATFORM_OPTIONS = --platform=linux/amd64
# We only generate a single image for packaging targets
build-packaging: DOCKER_BUILD_PLATFORM_OPTIONS = --platform=linux/amd64

View File

@@ -13,8 +13,6 @@ import (
"golang.org/x/mod/semver"
)
var envSwarmGPU *string
const (
envCUDAVersion = "CUDA_VERSION"
envNVRequirePrefix = "NVIDIA_REQUIRE_"
@@ -165,43 +163,31 @@ func isPrivileged(s *Spec) bool {
return false
}
func getDevicesFromEnvvar(env map[string]string, legacyImage bool) *string {
// Build a list of envvars to consider.
envVars := []string{envNVVisibleDevices}
if envSwarmGPU != nil {
// The Swarm envvar has higher precedence.
envVars = append([]string{*envSwarmGPU}, envVars...)
}
// Grab a reference to devices from the first envvar
// in the list that actually exists in the environment.
var devices *string
for _, envVar := range envVars {
if devs, ok := env[envVar]; ok {
devices = &devs
func getDevicesFromEnvvar(image image.CUDA, swarmResourceEnvvars []string) *string {
// We check if the image has at least one of the Swarm resource envvars defined and use this
// if specified.
var hasSwarmEnvvar bool
for _, envvar := range swarmResourceEnvvars {
if _, exists := image[envvar]; exists {
hasSwarmEnvvar = true
break
}
}
// Environment variable unset with legacy image: default to "all".
if devices == nil && legacyImage {
all := "all"
return &all
var devices []string
if hasSwarmEnvvar {
devices = image.DevicesFromEnvvars(swarmResourceEnvvars...).List()
} else {
devices = image.DevicesFromEnvvars(envNVVisibleDevices).List()
}
// Environment variable unset or empty or "void": return nil
if devices == nil || len(*devices) == 0 || *devices == "void" {
if len(devices) == 0 {
return nil
}
// Environment variable set to "none": reset to "".
if *devices == "none" {
empty := ""
return &empty
}
devicesString := strings.Join(devices, ",")
// Any other value.
return devices
return &devicesString
}
func getDevicesFromMounts(mounts []Mount) *string {
@@ -241,7 +227,7 @@ func getDevicesFromMounts(mounts []Mount) *string {
return &ret
}
func getDevices(hookConfig *HookConfig, env map[string]string, mounts []Mount, privileged bool, legacyImage bool) *string {
func getDevices(hookConfig *HookConfig, image image.CUDA, mounts []Mount, privileged bool) *string {
// If enabled, try and get the device list from volume mounts first
if hookConfig.AcceptDeviceListAsVolumeMounts {
devices := getDevicesFromMounts(mounts)
@@ -251,7 +237,7 @@ func getDevices(hookConfig *HookConfig, env map[string]string, mounts []Mount, p
}
// Fallback to reading from the environment variable if privileges are correct
devices := getDevicesFromEnvvar(env, legacyImage)
devices := getDevicesFromEnvvar(image, hookConfig.getSwarmResourceEnvvars())
if devices == nil {
return nil
}
@@ -307,7 +293,7 @@ func getNvidiaConfig(hookConfig *HookConfig, image image.CUDA, mounts []Mount, p
legacyImage := image.IsLegacy()
var devices string
if d := getDevices(hookConfig, image, mounts, privileged, legacyImage); d != nil {
if d := getDevices(hookConfig, image, mounts, privileged); d != nil {
devices = *d
} else {
// 'nil' devices means this is not a GPU container.
@@ -369,7 +355,6 @@ func getContainerConfig(hook HookConfig) (config containerConfig) {
}
privileged := isPrivileged(s)
envSwarmGPU = hook.SwarmResource
return containerConfig{
Pid: h.Pid,
Rootfs: s.Root.Path,

View File

@@ -1,9 +1,11 @@
package main
import (
"fmt"
"path/filepath"
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
"github.com/stretchr/testify/require"
)
@@ -68,7 +70,7 @@ func TestGetNvidiaConfig(t *testing.T) {
description: "Legacy image, devices 'void', no capabilities, no requirements",
env: map[string]string{
envCUDAVersion: "9.0",
envNVVisibleDevices: "",
envNVVisibleDevices: "void",
},
privileged: false,
expectedConfig: nil,
@@ -225,7 +227,7 @@ func TestGetNvidiaConfig(t *testing.T) {
description: "Modern image, devices 'void', no capabilities, no requirements",
env: map[string]string{
envNVRequireCUDA: "cuda>=9.0",
envNVVisibleDevices: "",
envNVVisibleDevices: "void",
},
privileged: false,
expectedConfig: nil,
@@ -448,6 +450,44 @@ func TestGetNvidiaConfig(t *testing.T) {
DriverCapabilities: defaultDriverCapabilities.String(),
},
},
{
description: "Hook config set, swarmResource overrides device selection",
env: map[string]string{
envNVVisibleDevices: "all",
"DOCKER_SWARM_RESOURCE": "GPU1,GPU2",
},
privileged: true,
hookConfig: &HookConfig{
SwarmResource: func() *string {
s := "DOCKER_SWARM_RESOURCE"
return &s
}(),
SupportedDriverCapabilities: "video,display,utility,compute",
},
expectedConfig: &nvidiaConfig{
Devices: "GPU1,GPU2",
DriverCapabilities: defaultDriverCapabilities.String(),
},
},
{
description: "Hook config set, comma separated swarmResource is split and overrides device selection",
env: map[string]string{
envNVVisibleDevices: "all",
"DOCKER_SWARM_RESOURCE": "GPU1,GPU2",
},
privileged: true,
hookConfig: &HookConfig{
SwarmResource: func() *string {
s := "NOT_DOCKER_SWARM_RESOURCE,DOCKER_SWARM_RESOURCE"
return &s
}(),
SupportedDriverCapabilities: "video,display,utility,compute",
},
expectedConfig: &nvidiaConfig{
Devices: "GPU1,GPU2",
DriverCapabilities: defaultDriverCapabilities.String(),
},
},
}
for _, tc := range tests {
t.Run(tc.description, func(t *testing.T) {
@@ -671,7 +711,7 @@ func TestDeviceListSourcePriority(t *testing.T) {
hookConfig := getDefaultHookConfig()
hookConfig.AcceptEnvvarUnprivileged = tc.acceptUnprivileged
hookConfig.AcceptDeviceListAsVolumeMounts = tc.acceptMounts
devices = getDevices(&hookConfig, env, tc.mountDevices, tc.privileged, false)
devices = getDevices(&hookConfig, env, tc.mountDevices, tc.privileged)
}
// For all other tests, just grab the devices and check the results
@@ -688,13 +728,13 @@ func TestGetDevicesFromEnvvar(t *testing.T) {
envDockerResourceGPUs := "DOCKER_RESOURCE_GPUS"
gpuID := "GPU-12345"
anotherGPUID := "GPU-67890"
thirdGPUID := "MIG-12345"
var tests = []struct {
description string
envSwarmGPU *string
env map[string]string
legacyImage bool
expectedDevices *string
description string
swarmResourceEnvvars []string
env map[string]string
expectedDevices *string
}{
{
description: "empty env returns nil for non-legacy image",
@@ -729,13 +769,15 @@ func TestGetDevicesFromEnvvar(t *testing.T) {
description: "NVIDIA_VISIBLE_DEVICES set returns value for legacy image",
env: map[string]string{
envNVVisibleDevices: gpuID,
envCUDAVersion: "legacy",
},
legacyImage: true,
expectedDevices: &gpuID,
},
{
description: "empty env returns all for legacy image",
legacyImage: true,
description: "empty env returns all for legacy image",
env: map[string]string{
envCUDAVersion: "legacy",
},
expectedDevices: &all,
},
// Add the `DOCKER_RESOURCE_GPUS` envvar and ensure that this is ignored when
@@ -781,86 +823,116 @@ func TestGetDevicesFromEnvvar(t *testing.T) {
env: map[string]string{
envNVVisibleDevices: gpuID,
envDockerResourceGPUs: anotherGPUID,
envCUDAVersion: "legacy",
},
legacyImage: true,
expectedDevices: &gpuID,
},
{
description: "empty env returns all for legacy image",
env: map[string]string{
envDockerResourceGPUs: anotherGPUID,
envCUDAVersion: "legacy",
},
legacyImage: true,
expectedDevices: &all,
},
// Add the `DOCKER_RESOURCE_GPUS` envvar and ensure that this is selected when
// enabled
{
description: "empty env returns nil for non-legacy image",
envSwarmGPU: &envDockerResourceGPUs,
description: "empty env returns nil for non-legacy image",
swarmResourceEnvvars: []string{envDockerResourceGPUs},
},
{
description: "blank DOCKER_RESOURCE_GPUS returns nil for non-legacy image",
envSwarmGPU: &envDockerResourceGPUs,
description: "blank DOCKER_RESOURCE_GPUS returns nil for non-legacy image",
swarmResourceEnvvars: []string{envDockerResourceGPUs},
env: map[string]string{
envDockerResourceGPUs: "",
},
},
{
description: "'void' DOCKER_RESOURCE_GPUS returns nil for non-legacy image",
envSwarmGPU: &envDockerResourceGPUs,
description: "'void' DOCKER_RESOURCE_GPUS returns nil for non-legacy image",
swarmResourceEnvvars: []string{envDockerResourceGPUs},
env: map[string]string{
envDockerResourceGPUs: "void",
},
},
{
description: "'none' DOCKER_RESOURCE_GPUS returns empty for non-legacy image",
envSwarmGPU: &envDockerResourceGPUs,
description: "'none' DOCKER_RESOURCE_GPUS returns empty for non-legacy image",
swarmResourceEnvvars: []string{envDockerResourceGPUs},
env: map[string]string{
envDockerResourceGPUs: "none",
},
expectedDevices: &empty,
},
{
description: "DOCKER_RESOURCE_GPUS set returns value for non-legacy image",
envSwarmGPU: &envDockerResourceGPUs,
description: "DOCKER_RESOURCE_GPUS set returns value for non-legacy image",
swarmResourceEnvvars: []string{envDockerResourceGPUs},
env: map[string]string{
envDockerResourceGPUs: gpuID,
},
expectedDevices: &gpuID,
},
{
description: "DOCKER_RESOURCE_GPUS set returns value for legacy image",
envSwarmGPU: &envDockerResourceGPUs,
description: "DOCKER_RESOURCE_GPUS set returns value for legacy image",
swarmResourceEnvvars: []string{envDockerResourceGPUs},
env: map[string]string{
envDockerResourceGPUs: gpuID,
envCUDAVersion: "legacy",
},
legacyImage: true,
expectedDevices: &gpuID,
},
{
description: "DOCKER_RESOURCE_GPUS is selected if present",
envSwarmGPU: &envDockerResourceGPUs,
description: "DOCKER_RESOURCE_GPUS is selected if present",
swarmResourceEnvvars: []string{envDockerResourceGPUs},
env: map[string]string{
envDockerResourceGPUs: anotherGPUID,
},
expectedDevices: &anotherGPUID,
},
{
description: "DOCKER_RESOURCE_GPUS overrides NVIDIA_VISIBLE_DEVICES if present",
envSwarmGPU: &envDockerResourceGPUs,
description: "DOCKER_RESOURCE_GPUS overrides NVIDIA_VISIBLE_DEVICES if present",
swarmResourceEnvvars: []string{envDockerResourceGPUs},
env: map[string]string{
envNVVisibleDevices: gpuID,
envDockerResourceGPUs: anotherGPUID,
},
expectedDevices: &anotherGPUID,
},
{
description: "DOCKER_RESOURCE_GPUS_ADDITIONAL overrides NVIDIA_VISIBLE_DEVICES if present",
swarmResourceEnvvars: []string{"DOCKER_RESOURCE_GPUS_ADDITIONAL"},
env: map[string]string{
envNVVisibleDevices: gpuID,
"DOCKER_RESOURCE_GPUS_ADDITIONAL": anotherGPUID,
},
expectedDevices: &anotherGPUID,
},
{
description: "All available swarm resource envvars are selected and override NVIDIA_VISIBLE_DEVICES if present",
swarmResourceEnvvars: []string{"DOCKER_RESOURCE_GPUS", "DOCKER_RESOURCE_GPUS_ADDITIONAL"},
env: map[string]string{
envNVVisibleDevices: gpuID,
"DOCKER_RESOURCE_GPUS": thirdGPUID,
"DOCKER_RESOURCE_GPUS_ADDITIONAL": anotherGPUID,
},
expectedDevices: func() *string {
result := fmt.Sprintf("%s,%s", thirdGPUID, anotherGPUID)
return &result
}(),
},
{
description: "DOCKER_RESOURCE_GPUS_ADDITIONAL or DOCKER_RESOURCE_GPUS override NVIDIA_VISIBLE_DEVICES if present",
swarmResourceEnvvars: []string{"DOCKER_RESOURCE_GPUS", "DOCKER_RESOURCE_GPUS_ADDITIONAL"},
env: map[string]string{
envNVVisibleDevices: gpuID,
"DOCKER_RESOURCE_GPUS_ADDITIONAL": anotherGPUID,
},
expectedDevices: &anotherGPUID,
},
}
for i, tc := range tests {
t.Run(tc.description, func(t *testing.T) {
envSwarmGPU = tc.envSwarmGPU
devices := getDevicesFromEnvvar(tc.env, tc.legacyImage)
devices := getDevicesFromEnvvar(image.CUDA(tc.env), tc.swarmResourceEnvvars)
if tc.expectedDevices == nil {
require.Nil(t, devices, "%d: %v", i, tc)
return

View File

@@ -5,6 +5,7 @@ import (
"os"
"path"
"reflect"
"strings"
"github.com/BurntSushi/toml"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
@@ -34,7 +35,7 @@ type CLIConfig struct {
Ldconfig *string `toml:"ldconfig"`
}
// HookConfig : options for the nvidia-container-toolkit.
// HookConfig : options for the nvidia-container-runtime-hook.
type HookConfig struct {
DisableRequire bool `toml:"disable-require"`
SwarmResource *string `toml:"swarm-resource"`
@@ -116,3 +117,22 @@ func (c HookConfig) getConfigOption(fieldName string) string {
}
return v
}
// getSwarmResourceEnvvars returns the swarm resource envvars for the config.
func (c *HookConfig) getSwarmResourceEnvvars() []string {
if c.SwarmResource == nil {
return nil
}
candidates := strings.Split(*c.SwarmResource, ",")
var envvars []string
for _, c := range candidates {
trimmed := strings.TrimSpace(c)
if len(trimmed) > 0 {
envvars = append(envvars, trimmed)
}
}
return envvars
}

View File

@@ -103,3 +103,59 @@ func TestGetHookConfig(t *testing.T) {
})
}
}
func TestGetSwarmResourceEnvvars(t *testing.T) {
testCases := []struct {
value string
expected []string
}{
{
value: "nil",
expected: nil,
},
{
value: "",
expected: nil,
},
{
value: " ",
expected: nil,
},
{
value: "single",
expected: []string{"single"},
},
{
value: "single ",
expected: []string{"single"},
},
{
value: "one,two",
expected: []string{"one", "two"},
},
{
value: "one ,two",
expected: []string{"one", "two"},
},
{
value: "one, two",
expected: []string{"one", "two"},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
c := &HookConfig{
SwarmResource: func() *string {
if tc.value == "nil" {
return nil
}
return &tc.value
}(),
}
envvars := c.getSwarmResourceEnvvars()
require.EqualValues(t, tc.expected, envvars)
})
}
}

View File

@@ -75,7 +75,7 @@ func doPrestart() {
cli := hook.NvidiaContainerCLI
if info.ResolveAutoMode(&logInterceptor{}, hook.NVIDIAContainerRuntime.Mode) != "legacy" {
log.Panicln("invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime instead.")
log.Panicln("invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime (e.g. specify the --runtime=nvidia flag) instead.")
}
container := getContainerConfig(hook)

View File

@@ -2,24 +2,86 @@
The NVIDIA Container Runtime is a shim for OCI-compliant low-level runtimes such as [runc](https://github.com/opencontainers/runc). When a `create` command is detected, the incoming [OCI runtime specification](https://github.com/opencontainers/runtime-spec) is modified in place and the command is forwarded to the low-level runtime.
## Standard Mode
## Configuration
In the standard mode configuration, the NVIDIA Container Runtime adds a [`prestart` hook](https://github.com/opencontainers/runtime-spec/blob/master/config.md#prestart) to the incomming OCI specification that invokes the NVIDIA Container Runtime Hook for all containers created. This hook checks whether NVIDIA devices are requested and ensures GPU access is configured using the `nvidia-container-cli` from project [libnvidia-container](https://github.com/NVIDIA/libnvidia-container).
The NVIDIA Container Runtime uses file-based configuration, with the config stored in `/etc/nvidia-container-runtime/config.toml`. The `/etc` path can be overridden using the `XDG_CONFIG_HOME` environment variable with the `${XDG_CONFIG_HOME}/nvidia-container-runtime/config.toml` file used instead if this environment variable is set.
## Experimental Mode
This config file may contain options for other components of the NVIDIA container stack and for the NVIDIA Container Runtime, the relevant config section is `nvidia-container-runtime`
The NVIDIA Container Runtime can be configured in an experimental mode by setting the following options in the runtime's `config.toml` file:
### Logging
The `log-level` config option (default: `"info"`) specifies the log level to use and the `debug` option, if set, specifies a log file to which logs for the NVIDIA Container Runtime must be written.
In addition to this, the NVIDIA Container Runtime considers the value of `--log` and `--log-format` flags that may be passed to it by a container runtime such as docker or containerd. If the `--debug` flag is present the log-level specified in the config file is overridden as `"debug"`.
### Low-level Runtime Path
The `runtimes` config option allows for the low-level runtime to be specified. The first entry in this list that is an existing executable file is used as the low-level runtime. If the entry is not a path, the `PATH` is searched for a matching executable. If the entry is a path this is checked instead.
The default value for this setting is:
```toml
runtimes = [
"docker-runc",
"runc",
]
```
and if, for example, `crun` is to be used instead this can be changed to:
```toml
runtimes = [
"crun",
]
```
### Runtime Mode
The `mode` config option (default `"auto"`) controls the high-level behaviour of the runtime.
#### Auto Mode
When `mode` is set to `"auto"`, the runtime employs heuristics to determine which mode to use based on, for example, the platform where the runtime is being run.
#### Legacy Mode
When `mode` is set to `"legacy"`, the NVIDIA Container Runtime adds a [`prestart` hook](https://github.com/opencontainers/runtime-spec/blob/master/config.md#prestart) to the incomming OCI specification that invokes the NVIDIA Container Runtime Hook for all containers created. This hook checks whether NVIDIA devices are requested and ensures GPU access is configured using the `nvidia-container-cli` from the [libnvidia-container](https://github.com/NVIDIA/libnvidia-container) project.
#### CSV Mode
When `mode` is set to `"csv"`, CSV files at `/etc/nvidia-container-runtime/host-files-for-container.d` define the devices and mounts that are to be injected into a container when it is created. The search path for the files can be overridden by modifying the `nvidia-container-runtime.modes.csv.mount-spec-path` in the config as below:
```toml
[nvidia-container-runtime]
experimental = true
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
```
When this setting is enabled, the modifications made to the OCI specification are controlled by the `nvidia-container-runtime.discover-mode` option, with the following mode supported:
* `"legacy"`: This mode mirrors the behaviour of the standard mode, inserting the NVIDIA Container Runtime Hook as a `prestart` hook into the container's OCI specification.
* `"csv"`: This mode uses CSV files at `/etc/nvidia-container-runtime/host-files-for-container.d` to define the devices and mounts that are to be injected into a container when it is created.
This mode is primarily targeted at Tegra-based systems without NVML available.
### Notes on using the docker CLI
The `docker` CLI supports the `--gpus` flag to select GPUs for inclusion in a container. Since specifying this flag inserts the same NVIDIA Container Runtime Hook into the OCI runtime specification. When experimental mode is activated, the NVIDIA Container Runtime detects the presence of the hook and raises an error. This requirement will be relaxed in the near future.
Note that only the `"legacy"` NVIDIA Container Runtime mode is directly compatible with the `--gpus` flag implemented by the `docker` CLI (assuming the NVIDIA Container Runtime is not used). The reason for this is that `docker` inserts the same NVIDIA Container Runtime Hook into the OCI runtime specification.
If a different mode is explicitly set or detected, the NVIDIA Container Runtime Hook will raise the following error when `--gpus` is set:
```
$ docker run --rm --gpus all ubuntu:18.04
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'csv'
invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime instead.: unknown.
```
Here NVIDIA Container Runtime must be used explicitly. The recommended way to do this is to specify the `--runtime=nvidia` command line argument as part of the `docker run` commmand as follows:
```
$ docker run --rm --gpus all --runtime=nvidia ubuntu:18.04
```
Alternatively the NVIDIA Container Runtime can be set as the default runtime for docker. This can be done by modifying the `/etc/docker/daemon.json` file as follows:
```json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
```

View File

@@ -48,7 +48,12 @@ func run(argv []string) (rerr error) {
if err != nil {
return fmt.Errorf("failed to set up logger: %v", err)
}
defer logger.Reset()
defer func() {
if rerr != nil {
logger.Errorf("%v", rerr)
}
logger.Reset()
}()
logger.Debugf("Command line arguments: %v", argv)
runtime, err := newNVIDIAContainerRuntime(logger.Logger, cfg, argv)

View File

@@ -10,7 +10,7 @@ import (
"strings"
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime/modifier"
"github.com/NVIDIA/nvidia-container-toolkit/internal/modifier"
"github.com/NVIDIA/nvidia-container-toolkit/internal/test"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/stretchr/testify/require"

View File

@@ -19,9 +19,9 @@ package main
import (
"fmt"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime/modifier"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
"github.com/NVIDIA/nvidia-container-toolkit/internal/modifier"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/NVIDIA/nvidia-container-toolkit/internal/runtime"
"github.com/sirupsen/logrus"
@@ -62,11 +62,49 @@ func newNVIDIAContainerRuntime(logger *logrus.Logger, cfg *config.Config, argv [
// newSpecModifier is a factory method that creates constructs an OCI spec modifer based on the provided config.
func newSpecModifier(logger *logrus.Logger, cfg *config.Config, ociSpec oci.Spec, argv []string) (oci.SpecModifier, error) {
modeModifier, err := newModeModifier(logger, cfg, ociSpec, argv)
if err != nil {
return nil, err
}
graphicsModifier, err := modifier.NewGraphicsModifier(logger, cfg, ociSpec)
if err != nil {
return nil, err
}
gdsModifier, err := modifier.NewGDSModifier(logger, cfg, ociSpec)
if err != nil {
return nil, err
}
mofedModifier, err := modifier.NewMOFEDModifier(logger, cfg, ociSpec)
if err != nil {
return nil, err
}
tegraModifier, err := modifier.NewTegraPlatformFiles(logger)
if err != nil {
return nil, err
}
modifiers := modifier.Merge(
modeModifier,
graphicsModifier,
gdsModifier,
mofedModifier,
tegraModifier,
)
return modifiers, nil
}
func newModeModifier(logger *logrus.Logger, cfg *config.Config, ociSpec oci.Spec, argv []string) (oci.SpecModifier, error) {
switch info.ResolveAutoMode(logger, cfg.NVIDIAContainerRuntimeConfig.Mode) {
case "legacy":
return modifier.NewStableRuntimeModifier(logger), nil
case "csv":
return modifier.NewCSVModifier(logger, cfg, ociSpec)
case "cdi":
return modifier.NewCDIModifier(logger, cfg, ociSpec)
}
return nil, fmt.Errorf("invalid runtime mode: %v", cfg.NVIDIAContainerRuntimeConfig.Mode)

View File

@@ -1,3 +1,48 @@
# NVIDIA Container Toolkit CLI
The NVIDIA Container Toolkit CLI `nvidia-ctk` provides a number of utilities that are useful for working with the NVIDIA Container Toolkit.
## Functionality
### Configure runtimes
The `runtime` command of the `nvidia-ctk` CLI provides a set of utilities to related to the configuration
and management of supported container engines.
For example, running the following command:
```bash
nvidia-ctk runtime configure --set-as-default
```
will ensure that the NVIDIA Container Runtime is added as the default runtime to the default container
engine.
### Generate CDI specifications
The [Container Device Interface (CDI)](https://github.com/container-orchestrated-devices/container-device-interface) provides
a vendor-agnostic mechanism to make arbitrary devices accessible in containerized environments. To allow NVIDIA devices to be
used in these environments, the NVIDIA Container Toolkit CLI includes functionality to generate a CDI specification for the
available NVIDIA GPUs in a system.
In order to generate the CDI specification for the available devices, run the following command:\
```bash
nvidia-ctk cdi generate
```
The default is to print the specification to STDOUT and a filename can be specified using the `--output` flag.
The specification will contain a device entries as follows (where applicable):
* An `nvidia.com/gpu=gpu{INDEX}` device for each non-MIG-enabled full GPU in the system
* An `nvidia.com/gpu=mig{GPU_INDEX}:{MIG_INDEX}` device for each MIG-device in the system
* A special device called `nvidia.com/gpu=all` which represents all available devices.
For example, to generate the CDI specification in the default location where CDI-enabled tools such as `podman`, `containerd`, `cri-o`, or the NVIDIA Container Runtime can be configured to load it, the following command can be run:
```bash
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
```
(Note that `sudo` is used to ensure the correct permissions to write to the `/etc/cdi` folder)
With the specification generated, a GPU can be requested by specifying the fully-qualified CDI device name. With `podman` as an exmaple:
```bash
podman run --rm -ti --device=nvidia.com/gpu=gpu0 ubuntu nvidia-smi -L
```

50
cmd/nvidia-ctk/cdi/cdi.go Normal file
View File

@@ -0,0 +1,50 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package cdi
import (
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi/generate"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
type command struct {
logger *logrus.Logger
}
// NewCommand constructs an info command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build
func (m command) build() *cli.Command {
// Create the 'hook' command
hook := cli.Command{
Name: "cdi",
Usage: "Provide tools for interacting with Container Device Interface specifications",
}
hook.Subcommands = []*cli.Command{
generate.NewCommand(m.logger),
}
return &hook
}

View File

@@ -0,0 +1,63 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package generate
import (
"fmt"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/sirupsen/logrus"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvml"
)
// NewCommonDiscoverer returns a discoverer for entities that are not associated with a specific CDI device.
// This includes driver libraries and meta devices, for example.
func NewCommonDiscoverer(logger *logrus.Logger, root string, nvidiaCTKPath string, nvmllib nvml.Interface) (discover.Discover, error) {
metaDevices := discover.NewDeviceDiscoverer(
logger,
lookup.NewCharDeviceLocator(
lookup.WithLogger(logger),
lookup.WithRoot(root),
),
root,
[]string{
"/dev/nvidia-modeset",
"/dev/nvidia-uvm-tools",
"/dev/nvidia-uvm",
"/dev/nvidiactl",
},
)
graphicsMounts, err := discover.NewGraphicsMountsDiscoverer(logger, root)
if err != nil {
return nil, fmt.Errorf("error constructing discoverer for graphics mounts: %v", err)
}
driverFiles, err := NewDriverDiscoverer(logger, root, nvidiaCTKPath, nvmllib)
if err != nil {
return nil, fmt.Errorf("failed to create discoverer for driver files: %v", err)
}
d := discover.Merge(
metaDevices,
graphicsMounts,
driverFiles,
)
return d, nil
}

View File

@@ -0,0 +1,105 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package generate
import (
"path/filepath"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/container-orchestrated-devices/container-device-interface/specs-go"
"github.com/sirupsen/logrus"
)
type deviceFolderPermissions struct {
logger *logrus.Logger
root string
nvidiaCTKPath string
folders []string
}
var _ discover.Discover = (*deviceFolderPermissions)(nil)
// NewDeviceFolderPermissionHookDiscoverer creates a discoverer that can be used to update the permissions for the parent folders of nested device nodes from the specified set of device specs.
// This works around an issue with rootless podman when using crun as a low-level runtime.
// See https://github.com/containers/crun/issues/1047
// The nested devices that are applicable to the NVIDIA GPU devices are:
// - DRM devices at /dev/dri/*
// - NVIDIA Caps devices at /dev/nvidia-caps/*
func NewDeviceFolderPermissionHookDiscoverer(logger *logrus.Logger, root string, nvidiaCTKPath string, deviceSpecs []specs.Device) (discover.Discover, error) {
var folders []string
seen := make(map[string]bool)
for _, device := range deviceSpecs {
for _, dn := range device.ContainerEdits.DeviceNodes {
df := filepath.Dir(dn.Path)
if seen[df] {
continue
}
// We only consider the special case paths
if df != "/dev/dri" && df != "/dev/nvidia-caps" {
continue
}
folders = append(folders, df)
seen[df] = true
}
if len(folders) == 2 {
break
}
}
if len(folders) == 0 {
return discover.None{}, nil
}
d := &deviceFolderPermissions{
logger: logger,
root: root,
nvidiaCTKPath: nvidiaCTKPath,
folders: folders,
}
return d, nil
}
// Devices are empty for this discoverer
func (d *deviceFolderPermissions) Devices() ([]discover.Device, error) {
return nil, nil
}
// Hooks returns a set of hooks that sets the file mode to 755 of parent folders for nested device nodes.
func (d *deviceFolderPermissions) Hooks() ([]discover.Hook, error) {
if len(d.folders) == 0 {
return nil, nil
}
args := []string{"--mode", "755"}
for _, folder := range d.folders {
args = append(args, "--path", folder)
}
hook := discover.CreateNvidiaCTKHook(
d.nvidiaCTKPath,
"chmod",
args...,
)
return []discover.Hook{hook}, nil
}
// Mounts are empty for this discoverer
func (d *deviceFolderPermissions) Mounts() ([]discover.Mount, error) {
return nil, nil
}

View File

@@ -0,0 +1,156 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package generate
import (
"fmt"
"path/filepath"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/ldcache"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/sirupsen/logrus"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvml"
)
// NewDriverDiscoverer creates a discoverer for the libraries and binaries associated with a driver installation.
// The supplied NVML Library is used to query the expected driver version.
func NewDriverDiscoverer(logger *logrus.Logger, root string, nvidiaCTKPath string, nvmllib nvml.Interface) (discover.Discover, error) {
version, r := nvmllib.SystemGetDriverVersion()
if r != nvml.SUCCESS {
return nil, fmt.Errorf("failed to determine driver version: %v", r)
}
libraries, err := NewDriverLibraryDiscoverer(logger, root, nvidiaCTKPath, version)
if err != nil {
return nil, fmt.Errorf("failed to create discoverer for driver libraries: %v", err)
}
firmwares := NewDriverFirmwareDiscoverer(logger, root, version)
binaries := NewDriverBinariesDiscoverer(logger, root)
d := discover.Merge(
libraries,
firmwares,
binaries,
)
return d, nil
}
// NewDriverLibraryDiscoverer creates a discoverer for the libraries associated with the specified driver version.
func NewDriverLibraryDiscoverer(logger *logrus.Logger, root string, nvidiaCTKPath string, version string) (discover.Discover, error) {
libraryPaths, err := getVersionLibs(logger, root, version)
if err != nil {
return nil, fmt.Errorf("failed to get libraries for driver version: %v", err)
}
libraries := discover.NewMounts(
logger,
lookup.NewFileLocator(
lookup.WithLogger(logger),
lookup.WithRoot(root),
),
root,
libraryPaths,
)
cfg := &discover.Config{
Root: root,
NvidiaCTKPath: nvidiaCTKPath,
}
hooks, _ := discover.NewLDCacheUpdateHook(logger, libraries, cfg)
d := discover.Merge(
libraries,
hooks,
)
return d, nil
}
// NewDriverFirmwareDiscoverer creates a discoverer for GSP firmware associated with the specified driver version.
func NewDriverFirmwareDiscoverer(logger *logrus.Logger, root string, version string) discover.Discover {
gspFirmwarePath := filepath.Join("/lib/firmware/nvidia", version, "gsp.bin")
return discover.NewMounts(
logger,
lookup.NewFileLocator(
lookup.WithLogger(logger),
lookup.WithRoot(root),
),
root,
[]string{gspFirmwarePath},
)
}
// NewDriverBinariesDiscoverer creates a discoverer for GSP firmware associated with the GPU driver.
func NewDriverBinariesDiscoverer(logger *logrus.Logger, root string) discover.Discover {
return discover.NewMounts(
logger,
lookup.NewExecutableLocator(logger, root),
root,
[]string{
"nvidia-smi", /* System management interface */
"nvidia-debugdump", /* GPU coredump utility */
"nvidia-persistenced", /* Persistence mode utility */
"nvidia-cuda-mps-control", /* Multi process service CLI */
"nvidia-cuda-mps-server", /* Multi process service server */
},
)
}
// getVersionLibs checks the LDCache for libraries ending in the specified driver version.
// Although the ldcache at the specified root is queried, the paths are returned relative to this root.
// This allows the standard mount location logic to be used for resolving the mounts.
func getVersionLibs(logger *logrus.Logger, root string, version string) ([]string, error) {
logger.Infof("Using driver version %v", version)
cache, err := ldcache.New(logger, root)
if err != nil {
return nil, fmt.Errorf("failed to load ldcache: %v", err)
}
libs32, libs64 := cache.List()
var libs []string
for _, l := range libs64 {
if strings.HasSuffix(l, version) {
logger.Infof("found 64-bit driver lib: %v", l)
libs = append(libs, l)
}
}
for _, l := range libs32 {
if strings.HasSuffix(l, version) {
logger.Infof("found 32-bit driver lib: %v", l)
libs = append(libs, l)
}
}
if root == "/" || root == "" {
return libs, nil
}
var relative []string
for _, l := range libs {
relative = append(relative, strings.TrimPrefix(l, root))
}
return relative, nil
}

View File

@@ -0,0 +1,160 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package generate
import (
"fmt"
"os"
"path/filepath"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info/drm"
"github.com/sirupsen/logrus"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvlib/device"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvml"
)
// byPathHookDiscoverer discovers the entities required for injecting by-path DRM device links
type byPathHookDiscoverer struct {
logger *logrus.Logger
root string
nvidiaCTKPath string
pciBusID string
}
var _ discover.Discover = (*byPathHookDiscoverer)(nil)
// NewFullGPUDiscoverer creates a discoverer for the full GPU defined by the specified device.
func NewFullGPUDiscoverer(logger *logrus.Logger, root string, nvidiaCTKPath string, d device.Device) (discover.Discover, error) {
// TODO: The functionality to get device paths should be integrated into the go-nvlib/pkg/device.Device interface.
// This will allow reuse here and in other code where the paths are queried such as the NVIDIA device plugin.
minor, ret := d.GetMinorNumber()
if ret != nvml.SUCCESS {
return nil, fmt.Errorf("error getting GPU device minor number: %v", ret)
}
path := fmt.Sprintf("/dev/nvidia%d", minor)
pciInfo, ret := d.GetPciInfo()
if ret != nvml.SUCCESS {
return nil, fmt.Errorf("error getting PCI info for device: %v", ret)
}
pciBusID := getBusID(pciInfo)
drmDeviceNodes, err := drm.GetDeviceNodesByBusID(pciBusID)
if err != nil {
return nil, fmt.Errorf("failed to determine DRM devices for %v: %v", pciBusID, err)
}
deviceNodePaths := append([]string{path}, drmDeviceNodes...)
deviceNodes := discover.NewCharDeviceDiscoverer(
logger,
deviceNodePaths,
root,
)
byPathHooks := &byPathHookDiscoverer{
logger: logger,
root: root,
nvidiaCTKPath: nvidiaCTKPath,
pciBusID: pciBusID,
}
dd := discover.Merge(
deviceNodes,
byPathHooks,
)
return dd, nil
}
// Devices returns the empty list for the by-path hook discoverer
func (d *byPathHookDiscoverer) Devices() ([]discover.Device, error) {
return nil, nil
}
// Hooks returns the hooks for the GPU device.
// The following hooks are detected:
// 1. A hook to create /dev/dri/by-path symlinks
func (d *byPathHookDiscoverer) Hooks() ([]discover.Hook, error) {
links, err := d.deviceNodeLinks()
if err != nil {
return nil, fmt.Errorf("failed to discover DRA device links: %v", err)
}
if len(links) == 0 {
return nil, nil
}
var args []string
for _, l := range links {
args = append(args, "--link", l)
}
hook := discover.CreateNvidiaCTKHook(
d.nvidiaCTKPath,
"create-symlinks",
args...,
)
return []discover.Hook{hook}, nil
}
// Mounts returns an empty slice for a full GPU
func (d *byPathHookDiscoverer) Mounts() ([]discover.Mount, error) {
return nil, nil
}
func (d *byPathHookDiscoverer) deviceNodeLinks() ([]string, error) {
candidates := []string{
fmt.Sprintf("/dev/dri/by-path/pci-%s-card", d.pciBusID),
fmt.Sprintf("/dev/dri/by-path/pci-%s-render", d.pciBusID),
}
var links []string
for _, c := range candidates {
linkPath := filepath.Join(d.root, c)
device, err := os.Readlink(linkPath)
if err != nil {
d.logger.Warningf("Failed to evaluate symlink %v; ignoring", linkPath)
continue
}
d.logger.Debugf("adding device symlink %v -> %v", linkPath, device)
links = append(links, fmt.Sprintf("%v::%v", device, linkPath))
}
return links, nil
}
// getBusID provides a utility function that returns the string representation of the bus ID.
func getBusID(p nvml.PciInfo) string {
var bytes []byte
for _, b := range p.BusId {
if byte(b) == '\x00' {
break
}
bytes = append(bytes, byte(b))
}
id := strings.ToLower(string(bytes))
if id != "0000" {
id = strings.TrimPrefix(id, "0000")
}
return id
}

View File

@@ -0,0 +1,383 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package generate
import (
"fmt"
"io"
"os"
"path/filepath"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/edits"
"github.com/container-orchestrated-devices/container-device-interface/pkg/cdi"
specs "github.com/container-orchestrated-devices/container-device-interface/specs-go"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvlib/device"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvml"
"sigs.k8s.io/yaml"
)
const (
formatJSON = "json"
formatYAML = "yaml"
)
type command struct {
logger *logrus.Logger
}
type config struct {
output string
format string
deviceNameStrategy string
root string
nvidiaCTKPath string
}
// NewCommand constructs a generate-cdi command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build creates the CLI command
func (m command) build() *cli.Command {
cfg := config{}
// Create the 'generate-cdi' command
c := cli.Command{
Name: "generate",
Usage: "Generate CDI specifications for use with CDI-enabled runtimes",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &cfg)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "output",
Usage: "Specify the file to output the generated CDI specification to. If this is '' the specification is output to STDOUT",
Destination: &cfg.output,
},
&cli.StringFlag{
Name: "format",
Usage: "The output format for the generated spec [json | yaml]. This overrides the format defined by the output file extension (if specified).",
Value: formatYAML,
Destination: &cfg.format,
},
&cli.StringFlag{
Name: "device-name-strategy",
Usage: "Specify the strategy for generating device names. One of [index | uuid | type-index]",
Value: deviceNameStrategyIndex,
Destination: &cfg.deviceNameStrategy,
},
&cli.StringFlag{
Name: "root",
Usage: "Specify the root to use when discovering the entities that should be included in the CDI specification.",
Destination: &cfg.root,
},
&cli.StringFlag{
Name: "nvidia-ctk-path",
Usage: "Specify the path to use for the nvidia-ctk in the generated CDI specification. If this is left empty, the path will be searched.",
Destination: &cfg.nvidiaCTKPath,
},
}
return &c
}
func (m command) validateFlags(r *cli.Context, cfg *config) error {
cfg.format = strings.ToLower(cfg.format)
switch cfg.format {
case formatJSON:
case formatYAML:
default:
return fmt.Errorf("invalid output format: %v", cfg.format)
}
_, err := newDeviceNamer(cfg.deviceNameStrategy)
if err != nil {
return err
}
return nil
}
func (m command) run(c *cli.Context, cfg *config) error {
deviceNamer, err := newDeviceNamer(cfg.deviceNameStrategy)
if err != nil {
return fmt.Errorf("failed to create device namer: %v", err)
}
spec, err := m.generateSpec(
cfg.root,
discover.FindNvidiaCTK(m.logger, cfg.nvidiaCTKPath),
deviceNamer,
)
if err != nil {
return fmt.Errorf("failed to generate CDI spec: %v", err)
}
var outputTo io.Writer
if cfg.output == "" {
outputTo = os.Stdout
} else {
err := createParentDirsIfRequired(cfg.output)
if err != nil {
return fmt.Errorf("failed to create parent folders for output file: %v", err)
}
outputFile, err := os.Create(cfg.output)
if err != nil {
return fmt.Errorf("failed to create output file: %v", err)
}
defer outputFile.Close()
outputTo = outputFile
}
if outputFileFormat := formatFromFilename(cfg.output); outputFileFormat != "" {
m.logger.Debugf("Inferred output format as %q from output file name", outputFileFormat)
if !c.IsSet("format") {
cfg.format = outputFileFormat
} else if outputFileFormat != cfg.format {
m.logger.Warningf("Requested output format %q does not match format implied by output file name: %q", cfg.format, outputFileFormat)
}
}
data, err := yaml.Marshal(spec)
if err != nil {
return fmt.Errorf("failed to marshal CDI spec: %v", err)
}
if strings.ToLower(cfg.format) == formatJSON {
data, err = yaml.YAMLToJSONStrict(data)
if err != nil {
return fmt.Errorf("failed to convert CDI spec from YAML to JSON: %v", err)
}
}
err = writeToOutput(cfg.format, data, outputTo)
if err != nil {
return fmt.Errorf("failed to write output: %v", err)
}
return nil
}
func formatFromFilename(filename string) string {
ext := filepath.Ext(filename)
switch strings.ToLower(ext) {
case ".json":
return formatJSON
case ".yaml":
return formatYAML
case ".yml":
return formatYAML
}
return ""
}
func writeToOutput(format string, data []byte, output io.Writer) error {
if format == formatYAML {
_, err := output.Write([]byte("---\n"))
if err != nil {
return fmt.Errorf("failed to write YAML separator: %v", err)
}
}
_, err := output.Write(data)
if err != nil {
return fmt.Errorf("failed to write data: %v", err)
}
return nil
}
func (m command) generateSpec(root string, nvidiaCTKPath string, namer deviceNamer) (*specs.Spec, error) {
nvmllib := nvml.New()
if r := nvmllib.Init(); r != nvml.SUCCESS {
return nil, r
}
defer nvmllib.Shutdown()
devicelib := device.New(device.WithNvml(nvmllib))
deviceSpecs, err := m.generateDeviceSpecs(devicelib, root, nvidiaCTKPath, namer)
if err != nil {
return nil, fmt.Errorf("failed to create device CDI specs: %v", err)
}
allDevice := createAllDevice(deviceSpecs)
deviceSpecs = append(deviceSpecs, allDevice)
allEdits := edits.NewContainerEdits()
ipcs, err := NewIPCDiscoverer(m.logger, root)
if err != nil {
return nil, fmt.Errorf("failed to create discoverer for IPC sockets: %v", err)
}
ipcEdits, err := edits.FromDiscoverer(ipcs)
if err != nil {
return nil, fmt.Errorf("failed to create container edits for IPC sockets: %v", err)
}
// TODO: We should not have to update this after the fact
for _, s := range ipcEdits.Mounts {
s.Options = append(s.Options, "noexec")
}
allEdits.Append(ipcEdits)
common, err := NewCommonDiscoverer(m.logger, root, nvidiaCTKPath, nvmllib)
if err != nil {
return nil, fmt.Errorf("failed to create discoverer for common entities: %v", err)
}
deviceFolderPermissionHooks, err := NewDeviceFolderPermissionHookDiscoverer(m.logger, root, nvidiaCTKPath, deviceSpecs)
if err != nil {
return nil, fmt.Errorf("failed to generated permission hooks for device nodes: %v", err)
}
commonEdits, err := edits.FromDiscoverer(discover.Merge(common, deviceFolderPermissionHooks))
if err != nil {
return nil, fmt.Errorf("failed to create container edits for common entities: %v", err)
}
allEdits.Append(commonEdits)
// We construct the spec and determine the minimum required version based on the specification.
spec := specs.Spec{
Version: "NOT_SET",
Kind: "nvidia.com/gpu",
Devices: deviceSpecs,
ContainerEdits: *allEdits.ContainerEdits,
}
minVersion, err := cdi.MinimumRequiredVersion(&spec)
if err != nil {
return nil, fmt.Errorf("failed to get minumum required CDI spec version: %v", err)
}
m.logger.Infof("Using minimum required CDI spec version: %s", minVersion)
spec.Version = minVersion
return &spec, nil
}
func (m command) generateDeviceSpecs(devicelib device.Interface, root string, nvidiaCTKPath string, namer deviceNamer) ([]specs.Device, error) {
var deviceSpecs []specs.Device
err := devicelib.VisitDevices(func(i int, d device.Device) error {
isMigEnabled, err := d.IsMigEnabled()
if err != nil {
return fmt.Errorf("failed to check whether device is MIG device: %v", err)
}
if isMigEnabled {
return nil
}
device, err := NewFullGPUDiscoverer(m.logger, root, nvidiaCTKPath, d)
if err != nil {
return fmt.Errorf("failed to create device: %v", err)
}
deviceEdits, err := edits.FromDiscoverer(device)
if err != nil {
return fmt.Errorf("failed to create container edits for device: %v", err)
}
deviceName, err := namer.GetDeviceName(i, d)
if err != nil {
return fmt.Errorf("failed to get device name: %v", err)
}
deviceSpec := specs.Device{
Name: deviceName,
ContainerEdits: *deviceEdits.ContainerEdits,
}
deviceSpecs = append(deviceSpecs, deviceSpec)
return nil
})
if err != nil {
return nil, fmt.Errorf("failed to generate CDI spec for GPU devices: %v", err)
}
err = devicelib.VisitMigDevices(func(i int, d device.Device, j int, mig device.MigDevice) error {
device, err := NewMigDeviceDiscoverer(m.logger, "", d, mig)
if err != nil {
return fmt.Errorf("failed to create MIG device: %v", err)
}
deviceEdits, err := edits.FromDiscoverer(device)
if err != nil {
return fmt.Errorf("failed to create container edits for MIG device: %v", err)
}
deviceName, err := namer.GetMigDeviceName(i, j, mig)
if err != nil {
return fmt.Errorf("failed to get device name: %v", err)
}
deviceSpec := specs.Device{
Name: deviceName,
ContainerEdits: *deviceEdits.ContainerEdits,
}
deviceSpecs = append(deviceSpecs, deviceSpec)
return nil
})
if err != nil {
return nil, fmt.Errorf("falied to generate CDI spec for MIG devices: %v", err)
}
return deviceSpecs, nil
}
// createAllDevice creates an 'all' device which combines the edits from the previous devices
func createAllDevice(deviceSpecs []specs.Device) specs.Device {
edits := edits.NewContainerEdits()
for _, d := range deviceSpecs {
edit := cdi.ContainerEdits{
ContainerEdits: &d.ContainerEdits,
}
edits.Append(&edit)
}
all := specs.Device{
Name: "all",
ContainerEdits: *edits.ContainerEdits,
}
return all
}
// createParentDirsIfRequired creates the parent folders of the specified path if requried.
// Note that MkdirAll does not specifically check whether the specified path is non-empty and raises an error if it is.
// The path will be empty if filename in the current folder is specified, for example
func createParentDirsIfRequired(filename string) error {
dir := filepath.Dir(filename)
if dir == "" {
return nil
}
return os.MkdirAll(dir, 0755)
}

View File

@@ -0,0 +1,42 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package generate
import (
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/sirupsen/logrus"
)
// NewIPCDiscoverer creats a discoverer for NVIDIA IPC sockets.
func NewIPCDiscoverer(logger *logrus.Logger, root string) (discover.Discover, error) {
d := discover.NewMounts(
logger,
lookup.NewFileLocator(
lookup.WithLogger(logger),
lookup.WithRoot(root),
),
root,
[]string{
"/var/run/nvidia-persistenced/socket",
"/var/run/nvidia-fabricmanager/socket",
"/tmp/nvidia-mps",
},
)
return d, nil
}

View File

@@ -0,0 +1,75 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package generate
import (
"fmt"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/nvcaps"
"github.com/sirupsen/logrus"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvlib/device"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvml"
)
// NewMigDeviceDiscoverer creates a discoverer for the specified mig device and its parent.
func NewMigDeviceDiscoverer(logger *logrus.Logger, root string, parent device.Device, d device.MigDevice) (discover.Discover, error) {
minor, ret := parent.GetMinorNumber()
if ret != nvml.SUCCESS {
return nil, fmt.Errorf("error getting GPU device minor number: %v", ret)
}
parentPath := fmt.Sprintf("/dev/nvidia%d", minor)
migCaps, err := nvcaps.NewMigCaps()
if err != nil {
return nil, fmt.Errorf("error getting MIG capability device paths: %v", err)
}
gi, ret := d.GetGpuInstanceId()
if ret != nvml.SUCCESS {
return nil, fmt.Errorf("error getting GPU Instance ID: %v", ret)
}
ci, ret := d.GetComputeInstanceId()
if ret != nvml.SUCCESS {
return nil, fmt.Errorf("error getting Compute Instance ID: %v", ret)
}
giCap := nvcaps.NewGPUInstanceCap(minor, gi)
giCapDevicePath, err := migCaps.GetCapDevicePath(giCap)
if err != nil {
return nil, fmt.Errorf("failed to get GI cap device path: %v", err)
}
ciCap := nvcaps.NewComputeInstanceCap(minor, gi, ci)
ciCapDevicePath, err := migCaps.GetCapDevicePath(ciCap)
if err != nil {
return nil, fmt.Errorf("failed to get CI cap device path: %v", err)
}
deviceNodes := discover.NewCharDeviceDiscoverer(
logger,
[]string{
parentPath,
giCapDevicePath,
ciCapDevicePath,
},
root,
)
return deviceNodes, nil
}

View File

@@ -0,0 +1,84 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package generate
import (
"fmt"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvlib/device"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvml"
)
type deviceNamer interface {
GetDeviceName(int, device.Device) (string, error)
GetMigDeviceName(int, int, device.MigDevice) (string, error)
}
const (
deviceNameStrategyIndex = "index"
deviceNameStrategyTypeIndex = "type-index"
deviceNameStrategyUUID = "uuid"
)
type deviceNameIndex struct {
gpuPrefix string
migPrefix string
}
type deviceNameUUID struct{}
// newDeviceNamer creates a Device Namer based on the supplied strategy.
// This namer can be used to construct the names for MIG and GPU devices when generating the CDI spec.
func newDeviceNamer(strategy string) (deviceNamer, error) {
switch strategy {
case deviceNameStrategyIndex:
return deviceNameIndex{}, nil
case deviceNameStrategyTypeIndex:
return deviceNameIndex{gpuPrefix: "gpu", migPrefix: "mig"}, nil
case deviceNameStrategyUUID:
return deviceNameUUID{}, nil
}
return nil, fmt.Errorf("invalid device name strategy: %v", strategy)
}
// GetDeviceName returns the name for the specified device based on the naming strategy
func (s deviceNameIndex) GetDeviceName(i int, d device.Device) (string, error) {
return fmt.Sprintf("%s%d", s.gpuPrefix, i), nil
}
// GetMigDeviceName returns the name for the specified device based on the naming strategy
func (s deviceNameIndex) GetMigDeviceName(i int, j int, d device.MigDevice) (string, error) {
return fmt.Sprintf("%s%d:%d", s.migPrefix, i, j), nil
}
// GetDeviceName returns the name for the specified device based on the naming strategy
func (s deviceNameUUID) GetDeviceName(i int, d device.Device) (string, error) {
uuid, ret := d.GetUUID()
if ret != nvml.SUCCESS {
return "", fmt.Errorf("failed to get device UUID: %v", ret)
}
return uuid, nil
}
// GetMigDeviceName returns the name for the specified device based on the naming strategy
func (s deviceNameUUID) GetMigDeviceName(i int, j int, d device.MigDevice) (string, error) {
uuid, ret := d.GetUUID()
if ret != nvml.SUCCESS {
return "", fmt.Errorf("failed to get device UUID: %v", ret)
}
return uuid, nil
}

View File

@@ -0,0 +1,140 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package chmod
import (
"fmt"
"path/filepath"
"strings"
"syscall"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
type command struct {
logger *logrus.Logger
}
type config struct {
paths cli.StringSlice
mode string
containerSpec string
}
// NewCommand constructs a chmod command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build the chmod command
func (m command) build() *cli.Command {
cfg := config{}
// Create the 'chmod' command
c := cli.Command{
Name: "chmod",
Usage: "Set the permissions of folders in the container by running chmod. The container root is prefixed to the specified paths.",
Before: func(c *cli.Context) error {
return validateFlags(c, &cfg)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringSliceFlag{
Name: "path",
Usage: "Specifiy a path to apply the specified mode to",
Destination: &cfg.paths,
},
&cli.StringFlag{
Name: "mode",
Usage: "Specify the file mode",
Destination: &cfg.mode,
},
&cli.StringFlag{
Name: "container-spec",
Usage: "Specify the path to the OCI container spec. If empty or '-' the spec will be read from STDIN",
Destination: &cfg.containerSpec,
},
}
return &c
}
func validateFlags(c *cli.Context, cfg *config) error {
if strings.TrimSpace(cfg.mode) == "" {
return fmt.Errorf("a non-empty mode must be specified")
}
for _, p := range cfg.paths.Value() {
if strings.TrimSpace(p) == "" {
return fmt.Errorf("paths must not be empty")
}
}
return nil
}
func (m command) run(c *cli.Context, cfg *config) error {
s, err := oci.LoadContainerState(cfg.containerSpec)
if err != nil {
return fmt.Errorf("failed to load container state: %v", err)
}
containerRoot, err := s.GetContainerRoot()
if err != nil {
return fmt.Errorf("failed to determined container root: %v", err)
}
if containerRoot == "" {
return fmt.Errorf("empty container root detected")
}
paths := m.getPaths(containerRoot, cfg.paths.Value())
if len(paths) == 0 {
m.logger.Debugf("No paths specified; exiting")
return nil
}
locator := lookup.NewExecutableLocator(m.logger, "")
targets, err := locator.Locate("chmod")
if err != nil {
return fmt.Errorf("failed to locate chmod: %v", err)
}
chmodPath := targets[0]
args := append([]string{filepath.Base(chmodPath), cfg.mode}, paths...)
return syscall.Exec(chmodPath, args, nil)
}
// getPaths updates the specified paths relative to the root.
func (m command) getPaths(root string, paths []string) []string {
var pathsInRoot []string
for _, f := range paths {
pathsInRoot = append(pathsInRoot, filepath.Join(root, f))
}
return pathsInRoot
}

View File

@@ -74,7 +74,7 @@ func (m command) build() *cli.Command {
},
&cli.StringSliceFlag{
Name: "link",
Usage: "Specify a specific link to create. The link is specified as source:target",
Usage: "Specify a specific link to create. The link is specified as target::link",
Destination: &cfg.links,
},
&cli.StringFlag{
@@ -145,7 +145,7 @@ func (m command) run(c *cli.Context, cfg *config) error {
links := cfg.links.Value()
for _, l := range links {
parts := strings.Split(l, ":")
parts := strings.Split(l, "::")
if len(parts) != 2 {
m.logger.Warnf("Invalid link specification %v", l)
continue

View File

@@ -17,6 +17,8 @@
package hook
import (
chmod "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook/chmod"
symlinks "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook/create-symlinks"
ldcache "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook/update-ldcache"
"github.com/sirupsen/logrus"
@@ -46,6 +48,7 @@ func (m hookCommand) build() *cli.Command {
hook.Subcommands = []*cli.Command{
ldcache.NewCommand(m.logger),
symlinks.NewCommand(m.logger),
chmod.NewCommand(m.logger),
}
return &hook

View File

@@ -0,0 +1,47 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package info
import (
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
type command struct {
logger *logrus.Logger
}
// NewCommand constructs an info command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build
func (m command) build() *cli.Command {
// Create the 'hook' command
hook := cli.Command{
Name: "info",
Usage: "Provide information about the system",
}
hook.Subcommands = []*cli.Command{}
return &hook
}

View File

@@ -19,8 +19,13 @@ package main
import (
"os"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/cdi"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook"
infoCLI "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/info"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/system"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
log "github.com/sirupsen/logrus"
cli "github.com/urfave/cli/v2"
)
@@ -70,6 +75,10 @@ func main() {
// Define the subcommands
c.Commands = []*cli.Command{
hook.NewCommand(logger),
runtime.NewCommand(logger),
infoCLI.NewCommand(logger),
cdi.NewCommand(logger),
system.NewCommand(logger),
}
// Run the CLI

View File

@@ -0,0 +1,203 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package configure
import (
"encoding/json"
"fmt"
"os"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime/nvidia"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/crio"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/docker"
"github.com/pelletier/go-toml"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
const (
defaultRuntime = "docker"
defaultDockerConfigFilePath = "/etc/docker/daemon.json"
defaultCrioConfigFilePath = "/etc/crio/crio.conf"
)
type command struct {
logger *logrus.Logger
}
// NewCommand constructs an configure command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// config defines the options that can be set for the CLI through config files,
// environment variables, or command line config
type config struct {
dryRun bool
runtime string
configFilePath string
nvidiaOptions nvidia.Options
}
func (m command) build() *cli.Command {
// Create a config struct to hold the parsed environment variables or command line flags
config := config{}
// Create the 'configure' command
configure := cli.Command{
Name: "configure",
Usage: "Add a runtime to the specified container engine",
Action: func(c *cli.Context) error {
return m.configureWrapper(c, &config)
},
}
configure.Flags = []cli.Flag{
&cli.BoolFlag{
Name: "dry-run",
Usage: "update the runtime configuration as required but don't write changes to disk",
Destination: &config.dryRun,
},
&cli.StringFlag{
Name: "runtime",
Usage: "the target runtime engine. One of [crio, docker]",
Value: defaultRuntime,
Destination: &config.runtime,
},
&cli.StringFlag{
Name: "config",
Usage: "path to the config file for the target runtime",
Destination: &config.configFilePath,
},
&cli.StringFlag{
Name: "nvidia-runtime-name",
Usage: "specify the name of the NVIDIA runtime that will be added",
Value: nvidia.RuntimeName,
Destination: &config.nvidiaOptions.RuntimeName,
},
&cli.StringFlag{
Name: "runtime-path",
Usage: "specify the path to the NVIDIA runtime executable",
Value: nvidia.RuntimeExecutable,
Destination: &config.nvidiaOptions.RuntimePath,
},
&cli.BoolFlag{
Name: "set-as-default",
Usage: "set the specified runtime as the default runtime",
Destination: &config.nvidiaOptions.SetAsDefault,
},
}
return &configure
}
func (m command) configureWrapper(c *cli.Context, config *config) error {
switch config.runtime {
case "crio":
return m.configureCrio(c, config)
case "docker":
return m.configureDocker(c, config)
}
return fmt.Errorf("unrecognized runtime '%v'", config.runtime)
}
// configureDocker updates the docker config to enable the NVIDIA Container Runtime
func (m command) configureDocker(c *cli.Context, config *config) error {
configFilePath := config.configFilePath
if configFilePath == "" {
configFilePath = defaultDockerConfigFilePath
}
cfg, err := docker.LoadConfig(configFilePath)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = docker.UpdateConfig(
cfg,
config.nvidiaOptions.RuntimeName,
config.nvidiaOptions.RuntimePath,
config.nvidiaOptions.SetAsDefault,
)
if err != nil {
return fmt.Errorf("unable to update config: %v", err)
}
if config.dryRun {
output, err := json.MarshalIndent(cfg, "", " ")
if err != nil {
return fmt.Errorf("unable to convert to JSON: %v", err)
}
os.Stdout.WriteString(fmt.Sprintf("%s\n", output))
return nil
}
err = docker.FlushConfig(cfg, configFilePath)
if err != nil {
return fmt.Errorf("unable to flush config: %v", err)
}
m.logger.Infof("Wrote updated config to %v", configFilePath)
m.logger.Infof("It is recommended that the docker daemon be restarted.")
return nil
}
// configureCrio updates the crio config to enable the NVIDIA Container Runtime
func (m command) configureCrio(c *cli.Context, config *config) error {
configFilePath := config.configFilePath
if configFilePath == "" {
configFilePath = defaultCrioConfigFilePath
}
cfg, err := crio.LoadConfig(configFilePath)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
err = crio.UpdateConfig(
cfg,
config.nvidiaOptions.RuntimeName,
config.nvidiaOptions.RuntimePath,
config.nvidiaOptions.SetAsDefault,
)
if err != nil {
return fmt.Errorf("unable to update config: %v", err)
}
if config.dryRun {
output, err := toml.Marshal(cfg)
if err != nil {
return fmt.Errorf("unable to convert to TOML: %v", err)
}
os.Stdout.WriteString(fmt.Sprintf("%s\n", output))
return nil
}
err = crio.FlushConfig(configFilePath, cfg)
if err != nil {
return fmt.Errorf("unable to flush config: %v", err)
}
m.logger.Infof("Wrote updated config to %v", configFilePath)
m.logger.Infof("It is recommended that the cri-o daemon be restarted.")
return nil
}

View File

@@ -0,0 +1,75 @@
/*
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package nvidia
const (
// RuntimeName is the default name to use in configs for the NVIDIA Container Runtime
RuntimeName = "nvidia"
// RuntimeExecutable is the default NVIDIA Container Runtime executable file name
RuntimeExecutable = "nvidia-container-runtime"
)
// Options specifies the options for the NVIDIA Container Runtime w.r.t a container engine such as docker.
type Options struct {
SetAsDefault bool
RuntimeName string
RuntimePath string
}
// Runtime defines an NVIDIA runtime with a name and a executable
type Runtime struct {
Name string
Path string
}
// DefaultRuntime returns the default runtime for the configured options.
// If the configuration is invalid or the default runtimes should not be set
// the empty string is returned.
func (o Options) DefaultRuntime() string {
if !o.SetAsDefault {
return ""
}
return o.RuntimeName
}
// Runtime creates a runtime struct based on the options.
func (o Options) Runtime() Runtime {
path := o.RuntimePath
if o.RuntimePath == "" {
path = RuntimeExecutable
}
r := Runtime{
Name: o.RuntimeName,
Path: path,
}
return r
}
// DockerRuntimesConfig generatest the expected docker config for the specified runtime
func (r Runtime) DockerRuntimesConfig() map[string]interface{} {
runtimes := make(map[string]interface{})
runtimes[r.Name] = map[string]interface{}{
"path": r.Path,
"args": []string{},
}
return runtimes
}

View File

@@ -0,0 +1,49 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package runtime
import (
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime/configure"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
type runtimeCommand struct {
logger *logrus.Logger
}
// NewCommand constructs a runtime command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := runtimeCommand{
logger: logger,
}
return c.build()
}
func (m runtimeCommand) build() *cli.Command {
// Create the 'runtime' command
runtime := cli.Command{
Name: "runtime",
Usage: "A collection of runtime-related utilities for the NVIDIA Container Toolkit",
}
runtime.Subcommands = []*cli.Command{
configure.NewCommand(m.logger),
}
return &runtime
}

View File

@@ -0,0 +1,175 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package devchar
import (
"fmt"
"path/filepath"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info/proc/devices"
"github.com/NVIDIA/nvidia-container-toolkit/internal/nvcaps"
"github.com/sirupsen/logrus"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvpci"
)
type allPossible struct {
logger *logrus.Logger
driverRoot string
deviceMajors devices.Devices
migCaps nvcaps.MigCaps
}
// newAllPossible returns a new allPossible device node lister.
// This lister lists all possible device nodes for NVIDIA GPUs, control devices, and capability devices.
func newAllPossible(logger *logrus.Logger, driverRoot string) (nodeLister, error) {
deviceMajors, err := devices.GetNVIDIADevices()
if err != nil {
return nil, fmt.Errorf("failed reading device majors: %v", err)
}
migCaps, err := nvcaps.NewMigCaps()
if err != nil {
return nil, fmt.Errorf("failed to read MIG caps: %v", err)
}
if migCaps == nil {
migCaps = make(nvcaps.MigCaps)
}
l := allPossible{
logger: logger,
driverRoot: driverRoot,
deviceMajors: deviceMajors,
migCaps: migCaps,
}
return l, nil
}
// DeviceNodes returns a list of all possible device nodes for NVIDIA GPUs, control devices, and capability devices.
func (m allPossible) DeviceNodes() ([]deviceNode, error) {
gpus, err := nvpci.NewFrom(
filepath.Join(m.driverRoot, nvpci.PCIDevicesRoot),
).GetGPUs()
if err != nil {
return nil, fmt.Errorf("failed to get GPU information: %v", err)
}
count := len(gpus)
if count == 0 {
m.logger.Infof("No NVIDIA devices found in %s", m.driverRoot)
return nil, nil
}
deviceNodes, err := m.getControlDeviceNodes()
if err != nil {
return nil, fmt.Errorf("failed to get control device nodes: %v", err)
}
for gpu := 0; gpu < count; gpu++ {
deviceNodes = append(deviceNodes, m.getGPUDeviceNodes(gpu)...)
deviceNodes = append(deviceNodes, m.getNVCapDeviceNodes(gpu)...)
}
return deviceNodes, nil
}
// getControlDeviceNodes generates a list of control devices
func (m allPossible) getControlDeviceNodes() ([]deviceNode, error) {
var deviceNodes []deviceNode
// Define the control devices for standard GPUs.
controlDevices := []deviceNode{
m.newDeviceNode(devices.NVIDIAGPU, "/dev/nvidia-modeset", devices.NVIDIAModesetMinor),
m.newDeviceNode(devices.NVIDIAGPU, "/dev/nvidiactl", devices.NVIDIACTLMinor),
m.newDeviceNode(devices.NVIDIAUVM, "/dev/nvidia-uvm", devices.NVIDIAUVMMinor),
m.newDeviceNode(devices.NVIDIAUVM, "/dev/nvidia-uvm-tools", devices.NVIDIAUVMToolsMinor),
}
deviceNodes = append(deviceNodes, controlDevices...)
for _, migControlDevice := range []nvcaps.MigCap{"config", "monitor"} {
migControlMinor, exist := m.migCaps[migControlDevice]
if !exist {
continue
}
d := m.newDeviceNode(
devices.NVIDIACaps,
migControlMinor.DevicePath(),
int(migControlMinor),
)
deviceNodes = append(deviceNodes, d)
}
return deviceNodes, nil
}
// getGPUDeviceNodes generates a list of device nodes for a given GPU.
func (m allPossible) getGPUDeviceNodes(gpu int) []deviceNode {
d := m.newDeviceNode(
devices.NVIDIAGPU,
fmt.Sprintf("/dev/nvidia%d", gpu),
gpu,
)
return []deviceNode{d}
}
// getNVCapDeviceNodes generates a list of cap device nodes for a given GPU.
func (m allPossible) getNVCapDeviceNodes(gpu int) []deviceNode {
var selectedCapMinors []nvcaps.MigMinor
for gi := 0; ; gi++ {
giCap := nvcaps.NewGPUInstanceCap(gpu, gi)
giMinor, exist := m.migCaps[giCap]
if !exist {
break
}
selectedCapMinors = append(selectedCapMinors, giMinor)
for ci := 0; ; ci++ {
ciCap := nvcaps.NewComputeInstanceCap(gpu, gi, ci)
ciMinor, exist := m.migCaps[ciCap]
if !exist {
break
}
selectedCapMinors = append(selectedCapMinors, ciMinor)
}
}
var deviceNodes []deviceNode
for _, capMinor := range selectedCapMinors {
d := m.newDeviceNode(
devices.NVIDIACaps,
capMinor.DevicePath(),
int(capMinor),
)
deviceNodes = append(deviceNodes, d)
}
return deviceNodes
}
// newDeviceNode creates a new device node with the specified path and major/minor numbers.
// The path is adjusted for the specified driver root.
func (m allPossible) newDeviceNode(deviceName devices.Name, path string, minor int) deviceNode {
major, _ := m.deviceMajors.Get(deviceName)
return deviceNode{
path: filepath.Join(m.driverRoot, path),
major: uint32(major),
minor: uint32(minor),
}
}

View File

@@ -0,0 +1,332 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package devchar
import (
"fmt"
"os"
"os/signal"
"path/filepath"
"strings"
"syscall"
"github.com/fsnotify/fsnotify"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
const (
defaultDevCharPath = "/dev/char"
)
type command struct {
logger *logrus.Logger
}
type config struct {
devCharPath string
driverRoot string
dryRun bool
watch bool
createAll bool
}
// NewCommand constructs a command sub-command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// build
func (m command) build() *cli.Command {
cfg := config{}
// Create the 'create-dev-char-symlinks' command
c := cli.Command{
Name: "create-dev-char-symlinks",
Usage: "A utility to create symlinks to possible /dev/nv* devices in /dev/char",
Before: func(c *cli.Context) error {
return m.validateFlags(c, &cfg)
},
Action: func(c *cli.Context) error {
return m.run(c, &cfg)
},
}
c.Flags = []cli.Flag{
&cli.StringFlag{
Name: "dev-char-path",
Usage: "The path at which the symlinks will be created. Symlinks will be created as `DEV_CHAR`/MAJOR:MINOR where MAJOR and MINOR are the major and minor numbers of a corresponding device node.",
Value: defaultDevCharPath,
Destination: &cfg.devCharPath,
EnvVars: []string{"DEV_CHAR_PATH"},
},
&cli.StringFlag{
Name: "driver-root",
Usage: "The path to the driver root. `DRIVER_ROOT`/dev is searched for NVIDIA device nodes.",
Value: "/",
Destination: &cfg.driverRoot,
EnvVars: []string{"DRIVER_ROOT"},
},
&cli.BoolFlag{
Name: "watch",
Usage: "If set, the command will watch for changes to the driver root and recreate the symlinks when changes are detected.",
Value: false,
Destination: &cfg.watch,
EnvVars: []string{"WATCH"},
},
&cli.BoolFlag{
Name: "create-all",
Usage: "Create all possible /dev/char symlinks instead of limiting these to existing device nodes.",
Destination: &cfg.createAll,
EnvVars: []string{"CREATE_ALL"},
},
&cli.BoolFlag{
Name: "dry-run",
Usage: "If set, the command will not create any symlinks.",
Value: false,
Destination: &cfg.dryRun,
EnvVars: []string{"DRY_RUN"},
},
}
return &c
}
func (m command) validateFlags(r *cli.Context, cfg *config) error {
if cfg.createAll && cfg.watch {
return fmt.Errorf("create-all and watch are mutually exclusive")
}
return nil
}
func (m command) run(c *cli.Context, cfg *config) error {
var watcher *fsnotify.Watcher
var sigs chan os.Signal
if cfg.watch {
watcher, err := newFSWatcher(filepath.Join(cfg.driverRoot, "dev"))
if err != nil {
return fmt.Errorf("failed to create FS watcher: %v", err)
}
defer watcher.Close()
sigs = newOSWatcher(syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM, syscall.SIGQUIT)
}
l, err := NewSymlinkCreator(
WithLogger(m.logger),
WithDevCharPath(cfg.devCharPath),
WithDriverRoot(cfg.driverRoot),
WithDryRun(cfg.dryRun),
WithCreateAll(cfg.createAll),
)
if err != nil {
return fmt.Errorf("failed to create symlink creator: %v", err)
}
create:
err = l.CreateLinks()
if err != nil {
return fmt.Errorf("failed to create links: %v", err)
}
if !cfg.watch {
return nil
}
for {
select {
case event := <-watcher.Events:
deviceNode := filepath.Base(event.Name)
if !strings.HasPrefix(deviceNode, "nvidia") {
continue
}
if event.Op&fsnotify.Create == fsnotify.Create {
m.logger.Infof("%s created, restarting.", event.Name)
goto create
}
if event.Op&fsnotify.Create == fsnotify.Remove {
m.logger.Infof("%s removed. Ignoring", event.Name)
}
// Watch for any other fs errors and log them.
case err := <-watcher.Errors:
m.logger.Errorf("inotify: %s", err)
// React to signals
case s := <-sigs:
switch s {
case syscall.SIGHUP:
m.logger.Infof("Received SIGHUP, recreating symlinks.")
goto create
default:
m.logger.Infof("Received signal %q, shutting down.", s)
return nil
}
}
}
}
type linkCreator struct {
logger *logrus.Logger
lister nodeLister
driverRoot string
devCharPath string
dryRun bool
createAll bool
}
// Creator is an interface for creating symlinks to /dev/nv* devices in /dev/char.
type Creator interface {
CreateLinks() error
}
// Option is a functional option for configuring the linkCreator.
type Option func(*linkCreator)
// NewSymlinkCreator creates a new linkCreator.
func NewSymlinkCreator(opts ...Option) (Creator, error) {
c := linkCreator{}
for _, opt := range opts {
opt(&c)
}
if c.logger == nil {
c.logger = logrus.StandardLogger()
}
if c.driverRoot == "" {
c.driverRoot = "/"
}
if c.devCharPath == "" {
c.devCharPath = defaultDevCharPath
}
if c.createAll {
lister, err := newAllPossible(c.logger, c.driverRoot)
if err != nil {
return nil, fmt.Errorf("failed to create all possible device lister: %v", err)
}
c.lister = lister
} else {
c.lister = existing{c.logger, c.driverRoot}
}
return c, nil
}
// WithDriverRoot sets the driver root path.
func WithDriverRoot(root string) Option {
return func(c *linkCreator) {
c.driverRoot = root
}
}
// WithDevCharPath sets the path at which the symlinks will be created.
func WithDevCharPath(path string) Option {
return func(c *linkCreator) {
c.devCharPath = path
}
}
// WithDryRun sets the dry run flag.
func WithDryRun(dryRun bool) Option {
return func(c *linkCreator) {
c.dryRun = dryRun
}
}
// WithLogger sets the logger.
func WithLogger(logger *logrus.Logger) Option {
return func(c *linkCreator) {
c.logger = logger
}
}
// WithCreateAll sets the createAll flag for the linkCreator.
func WithCreateAll(createAll bool) Option {
return func(lc *linkCreator) {
lc.createAll = createAll
}
}
// CreateLinks creates symlinks for all NVIDIA device nodes found in the driver root.
func (m linkCreator) CreateLinks() error {
deviceNodes, err := m.lister.DeviceNodes()
if err != nil {
return fmt.Errorf("failed to get device nodes: %v", err)
}
if len(deviceNodes) != 0 && !m.dryRun {
err := os.MkdirAll(m.devCharPath, 0755)
if err != nil {
return fmt.Errorf("failed to create directory %s: %v", m.devCharPath, err)
}
}
for _, deviceNode := range deviceNodes {
target := deviceNode.path
linkPath := filepath.Join(m.devCharPath, deviceNode.devCharName())
m.logger.Infof("Creating link %s => %s", linkPath, target)
if m.dryRun {
continue
}
err = os.Symlink(target, linkPath)
if err != nil {
m.logger.Warnf("Could not create symlink: %v", err)
}
}
return nil
}
type deviceNode struct {
path string
major uint32
minor uint32
}
func (d deviceNode) devCharName() string {
return fmt.Sprintf("%d:%d", d.major, d.minor)
}
func newFSWatcher(files ...string) (*fsnotify.Watcher, error) {
watcher, err := fsnotify.NewWatcher()
if err != nil {
return nil, err
}
for _, f := range files {
err = watcher.Add(f)
if err != nil {
watcher.Close()
return nil, err
}
}
return watcher, nil
}
func newOSWatcher(sigs ...os.Signal) chan os.Signal {
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, sigs...)
return sigChan
}

View File

@@ -0,0 +1,95 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package devchar
import (
"path/filepath"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/sirupsen/logrus"
"golang.org/x/sys/unix"
)
type nodeLister interface {
DeviceNodes() ([]deviceNode, error)
}
type existing struct {
logger *logrus.Logger
driverRoot string
}
// DeviceNodes returns a list of NVIDIA device nodes in the specified root.
// The nvidia-nvswitch* and nvidia-nvlink devices are excluded.
func (m existing) DeviceNodes() ([]deviceNode, error) {
locator := lookup.NewCharDeviceLocator(
lookup.WithLogger(m.logger),
lookup.WithRoot(m.driverRoot),
lookup.WithOptional(true),
)
devices, err := locator.Locate("/dev/nvidia*")
if err != nil {
m.logger.Warnf("Error while locating device: %v", err)
}
capDevices, err := locator.Locate("/dev/nvidia-caps/nvidia-*")
if err != nil {
m.logger.Warnf("Error while locating caps device: %v", err)
}
if len(devices) == 0 && len(capDevices) == 0 {
m.logger.Infof("No NVIDIA devices found in %s", m.driverRoot)
return nil, nil
}
var deviceNodes []deviceNode
for _, d := range append(devices, capDevices...) {
if m.nodeIsBlocked(d) {
continue
}
var stat unix.Stat_t
err := unix.Stat(d, &stat)
if err != nil {
m.logger.Warnf("Could not stat device: %v", err)
continue
}
deviceNode := deviceNode{
path: d,
major: unix.Major(uint64(stat.Rdev)),
minor: unix.Minor(uint64(stat.Rdev)),
}
deviceNodes = append(deviceNodes, deviceNode)
}
return deviceNodes, nil
}
// nodeIsBlocked returns true if the specified device node should be ignored.
func (m existing) nodeIsBlocked(path string) bool {
blockedPrefixes := []string{"nvidia-fs", "nvidia-nvswitch", "nvidia-nvlink"}
nodeName := filepath.Base(path)
for _, prefix := range blockedPrefixes {
if strings.HasPrefix(nodeName, prefix) {
return true
}
}
return false
}

View File

@@ -0,0 +1,49 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package system
import (
devchar "github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/system/create-dev-char-symlinks"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
type command struct {
logger *logrus.Logger
}
// NewCommand constructs a runtime command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
func (m command) build() *cli.Command {
// Create the 'system' command
system := cli.Command{
Name: "system",
Usage: "A collection of system-related utilities for the NVIDIA Container Toolkit",
}
system.Subcommands = []*cli.Command{
devchar.NewCommand(m.logger),
}
return &system
}

View File

@@ -1,19 +0,0 @@
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
#experimental = false

View File

@@ -16,4 +16,17 @@ ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
#experimental = false
log-level = "info"
# Specify the runtimes to consider. This list is processed in order and the PATH
# searched for matching executables unless the entry is an absolute path.
runtimes = [
"docker-runc",
"runc",
]
mode = "auto"
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

View File

@@ -16,4 +16,17 @@ ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
#experimental = false
log-level = "info"
# Specify the runtimes to consider. This list is processed in order and the PATH
# searched for matching executables unless the entry is an absolute path.
runtimes = [
"docker-runc",
"runc",
]
mode = "auto"
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

View File

@@ -16,4 +16,17 @@ ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
#experimental = false
log-level = "info"
# Specify the runtimes to consider. This list is processed in order and the PATH
# searched for matching executables unless the entry is an absolute path.
runtimes = [
"docker-runc",
"runc",
]
mode = "auto"
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

View File

@@ -16,7 +16,7 @@ ldconfig = "@/sbin/ldconfig.real"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
#experimental = false
log-level = "info"
# Specify the runtimes to consider. This list is processed in order and the PATH
# searched for matching executables unless the entry is an absolute path.
@@ -24,3 +24,9 @@ runtimes = [
"docker-runc",
"runc",
]
mode = "auto"
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

View File

@@ -1,70 +0,0 @@
ARG BASEIMAGE
FROM ${BASEIMAGE}
RUN yum install -y \
ca-certificates \
gcc \
wget \
git \
rpm-build \
make && \
rm -rf /var/cache/yum/*
ARG GOLANG_VERSION=0.0.0
RUN set -eux; \
\
arch="$(uname -m)"; \
case "${arch##*-}" in \
x86_64 | amd64) ARCH='amd64' ;; \
ppc64el | ppc64le) ARCH='ppc64le' ;; \
aarch64) ARCH='arm64' ;; \
*) echo "unsupported architecture"; exit 1 ;; \
esac; \
wget -nv -O - https://storage.googleapis.com/golang/go${GOLANG_VERSION}.linux-${ARCH}.tar.gz \
| tar -C /usr/local -xz
ENV GOPATH /go
ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH
# packaging
ARG PKG_VERS
ARG PKG_REV
ENV VERSION $PKG_VERS
ENV RELEASE $PKG_REV
# output directory
ENV DIST_DIR=/tmp/nvidia-container-toolkit-$PKG_VERS/SOURCES
RUN mkdir -p $DIST_DIR /dist
# nvidia-container-toolkit
WORKDIR $GOPATH/src/nvidia-container-toolkit
COPY . .
ARG GIT_COMMIT
ENV GIT_COMMIT ${GIT_COMMIT}
RUN make PREFIX=${DIST_DIR} cmds
ARG CONFIG_TOML_SUFFIX
ENV CONFIG_TOML_SUFFIX ${CONFIG_TOML_SUFFIX}
COPY config/config.toml.${CONFIG_TOML_SUFFIX} $DIST_DIR/config.toml
# Hook for Project Atomic's fork of Docker: https://github.com/projectatomic/docker/tree/docker-1.13.1-rhel#add-dockerhooks-exec-custom-hooks-for-prestartpoststop-containerspatch
# This might not be useful on Amazon Linux, but it's simpler to keep the RHEL
# and Amazon Linux packages identical.
COPY oci-nvidia-hook $DIST_DIR/oci-nvidia-hook
# Hook for libpod/CRI-O: https://github.com/containers/libpod/blob/v0.8.5/pkg/hooks/docs/oci-hooks.5.md
COPY oci-nvidia-hook.json $DIST_DIR/oci-nvidia-hook.json
WORKDIR $DIST_DIR/..
COPY packaging/rpm .
CMD arch=$(uname -m) && \
rpmbuild --clean --target=$arch -bb \
-D "_topdir $PWD" \
-D "version $VERSION" \
-D "libnvidia_container_version ${VERSION}-${RELEASE}" \
-D "release $RELEASE" \
SPECS/nvidia-container-toolkit.spec && \
mv RPMS/$arch/*.rpm /dist

View File

@@ -32,6 +32,7 @@ ENV GOPATH /go
ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH
# packaging
ARG PKG_NAME
ARG PKG_VERS
ARG PKG_REV
@@ -64,12 +65,17 @@ RUN if [ "$(lsb_release -cs)" = "jessie" ]; then \
WORKDIR $DIST_DIR
COPY packaging/debian ./debian
RUN sed -i "s;@VERSION@;${REVISION};" debian/changelog && \
dch --changelog debian/changelog --append "Bump libnvidia-container dependency to ${REVISION}}" && \
dch --changelog debian/changelog -r "" && \
ARG LIBNVIDIA_CONTAINER_TOOLS_VERSION
ENV LIBNVIDIA_CONTAINER_TOOLS_VERSION ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}
RUN dch --create --package="${PKG_NAME}" \
--newversion "${REVISION}" \
"See https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/blob/${GIT_COMMIT}/CHANGELOG.md for the changelog" && \
dch --append "Bump libnvidia-container dependency to ${LIBNVIDIA_CONTAINER1_VERSION}" && \
dch -r "" && \
if [ "$REVISION" != "$(dpkg-parsechangelog --show-field=Version)" ]; then exit 1; fi
CMD export DISTRIB="$(lsb_release -cs)" && \
debuild -eDISTRIB -eSECTION -eLIBNVIDIA_CONTAINER_VERSION="${REVISION}" \
debuild -eDISTRIB -eSECTION -eLIBNVIDIA_CONTAINER_TOOLS_VERSION -eVERSION="${REVISION}" \
--dpkg-buildpackage-hook='sh debian/prepare' -i -us -uc -b && \
mv /tmp/nvidia-container-toolkit_*.deb /dist
mv /tmp/*.deb /dist

View File

@@ -14,7 +14,8 @@
ARG GOLANG_VERSION=x.x.x
FROM golang:${GOLANG_VERSION}
RUN go install golang.org/x/lint/golint@latest
RUN go install golang.org/x/lint/golint@6edffad5e6160f5949cdefc81710b2706fbcd4f6
RUN go install github.com/matryer/moq@latest
RUN go install github.com/gordonklaus/ineffassign@latest
RUN go install github.com/gordonklaus/ineffassign@d2c82e48359b033cde9cf1307f6d5550b8d61321
RUN go install github.com/client9/misspell/cmd/misspell@latest
RUN go install github.com/google/go-licenses@latest

View File

@@ -25,11 +25,12 @@ ENV GOPATH /go
ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH
# packaging
ARG PKG_NAME
ARG PKG_VERS
ARG PKG_REV
ENV VERSION $PKG_VERS
ENV RELEASE $PKG_REV
ENV PKG_NAME ${PKG_NAME}
ENV PKG_VERS ${PKG_VERS}
ENV PKG_REV ${PKG_REV}
# output directory
ENV DIST_DIR=/tmp/nvidia-container-toolkit-$PKG_VERS/SOURCES
@@ -56,11 +57,16 @@ COPY config/config.toml.${CONFIG_TOML_SUFFIX} $DIST_DIR/config.toml
WORKDIR $DIST_DIR/..
COPY packaging/rpm .
ARG LIBNVIDIA_CONTAINER_TOOLS_VERSION
ENV LIBNVIDIA_CONTAINER_TOOLS_VERSION ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}
CMD arch=$(uname -m) && \
rpmbuild --clean --target=$arch -bb \
-D "_topdir $PWD" \
-D "version $VERSION" \
-D "libnvidia_container_version ${VERSION}-${RELEASE}" \
-D "release $RELEASE" \
-D "release_date $(date +'%a %b %d %Y')" \
-D "git_commit ${GIT_COMMIT}" \
-D "version ${PKG_VERS}" \
-D "libnvidia_container_tools_version ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}" \
-D "release ${PKG_REV}" \
SPECS/nvidia-container-toolkit.spec && \
mv RPMS/$arch/*.rpm /dist

View File

@@ -1,3 +1,19 @@
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This is the dockerfile for building packages on yum-based RPM systems.
ARG BASEIMAGE
FROM ${BASEIMAGE}
@@ -27,11 +43,12 @@ ENV GOPATH /go
ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH
# packaging
ARG PKG_NAME
ARG PKG_VERS
ARG PKG_REV
ENV VERSION $PKG_VERS
ENV RELEASE $PKG_REV
ENV PKG_NAME ${PKG_NAME}
ENV PKG_VERS ${PKG_VERS}
ENV PKG_REV ${PKG_REV}
# output directory
ENV DIST_DIR=/tmp/nvidia-container-toolkit-$PKG_VERS/SOURCES
@@ -58,11 +75,16 @@ COPY oci-nvidia-hook.json $DIST_DIR/oci-nvidia-hook.json
WORKDIR $DIST_DIR/..
COPY packaging/rpm .
ARG LIBNVIDIA_CONTAINER_TOOLS_VERSION
ENV LIBNVIDIA_CONTAINER_TOOLS_VERSION ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}
CMD arch=$(uname -m) && \
rpmbuild --clean --target=$arch -bb \
-D "_topdir $PWD" \
-D "version $VERSION" \
-D "libnvidia_container_version ${VERSION}-${RELEASE}" \
-D "release $RELEASE" \
-D "release_date $(date +'%a %b %d %Y')" \
-D "git_commit ${GIT_COMMIT}" \
-D "version ${PKG_VERS}" \
-D "libnvidia_container_tools_version ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}" \
-D "release ${PKG_REV}" \
SPECS/nvidia-container-toolkit.spec && \
mv RPMS/$arch/*.rpm /dist

View File

@@ -30,6 +30,7 @@ ENV GOPATH /go
ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH
# packaging
ARG PKG_NAME
ARG PKG_VERS
ARG PKG_REV
@@ -57,12 +58,17 @@ COPY config/config.toml.${CONFIG_TOML_SUFFIX} $DIST_DIR/config.toml
WORKDIR $DIST_DIR
COPY packaging/debian ./debian
RUN sed -i "s;@VERSION@;${REVISION};" debian/changelog && \
dch --changelog debian/changelog --append "Bump libnvidia-container dependency to ${REVISION}}" && \
dch --changelog debian/changelog -r "" && \
ARG LIBNVIDIA_CONTAINER_TOOLS_VERSION
ENV LIBNVIDIA_CONTAINER_TOOLS_VERSION ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}
RUN dch --create --package="${PKG_NAME}" \
--newversion "${REVISION}" \
"See https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/blob/${GIT_COMMIT}/CHANGELOG.md for the changelog" && \
dch --append "Bump libnvidia-container dependency to ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}" && \
dch -r "" && \
if [ "$REVISION" != "$(dpkg-parsechangelog --show-field=Version)" ]; then exit 1; fi
CMD export DISTRIB="$(lsb_release -cs)" && \
debuild -eDISTRIB -eSECTION -eLIBNVIDIA_CONTAINER_VERSION="${REVISION}" \
debuild -eDISTRIB -eSECTION -eLIBNVIDIA_CONTAINER_TOOLS_VERSION -eVERSION="${REVISION}" \
--dpkg-buildpackage-hook='sh debian/prepare' -i -us -uc -b && \
mv /tmp/*.deb /dist

View File

@@ -14,10 +14,10 @@
# Supported OSs by architecture
AMD64_TARGETS := ubuntu20.04 ubuntu18.04 ubuntu16.04 debian10 debian9
X86_64_TARGETS := centos7 centos8 rhel7 rhel8 amazonlinux2 opensuse-leap15.1
X86_64_TARGETS := fedora35 centos7 centos8 rhel7 rhel8 amazonlinux2 opensuse-leap15.1
PPC64LE_TARGETS := ubuntu18.04 ubuntu16.04 centos7 centos8 rhel7 rhel8
ARM64_TARGETS := ubuntu20.04 ubuntu18.04
AARCH64_TARGETS := centos8 rhel8 amazonlinux2
AARCH64_TARGETS := fedora35 centos8 rhel8 amazonlinux2
# Define top-level build targets
docker%: SHELL:=/bin/bash
@@ -85,37 +85,63 @@ docker-all: $(AMD64_TARGETS) $(X86_64_TARGETS) \
--%: docker-build-%
@
LIBNVIDIA_CONTAINER_VERSION ?= $(LIB_VERSION)
LIBNVIDIA_CONTAINER_TAG ?= $(LIB_TAG)
# private ubuntu target
--ubuntu%: OS := ubuntu
--ubuntu%: LIB_VERSION := $(LIB_VERSION)$(if $(LIB_TAG),~$(LIB_TAG))
--ubuntu%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)$(if $(LIBNVIDIA_CONTAINER_TAG),~$(LIBNVIDIA_CONTAINER_TAG))-1
--ubuntu%: PKG_REV := 1
# private debian target
--debian%: OS := debian
--debian%: LIB_VERSION := $(LIB_VERSION)$(if $(LIB_TAG),~$(LIB_TAG))
--debian%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)$(if $(LIBNVIDIA_CONTAINER_TAG),~$(LIBNVIDIA_CONTAINER_TAG))-1
--debian%: PKG_REV := 1
# private centos target
--centos%: OS := centos
--centos%: PKG_REV := $(if $(LIB_TAG),0.1.$(LIB_TAG),1)
--centos%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)-$(if $(LIBNVIDIA_CONTAINER_TAG),0.1.$(LIBNVIDIA_CONTAINER_TAG),1)
--centos%: DOCKERFILE = $(CURDIR)/docker/Dockerfile.rpm-yum
--centos%: CONFIG_TOML_SUFFIX := rpm-yum
--centos8%: BASEIMAGE = quay.io/centos/centos:stream8
# private fedora target
--fedora%: OS := fedora
--fedora%: PKG_REV := $(if $(LIB_TAG),0.1.$(LIB_TAG),1)
--fedora%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)-$(if $(LIBNVIDIA_CONTAINER_TAG),0.1.$(LIBNVIDIA_CONTAINER_TAG),1)
--fedora%: DOCKERFILE = $(CURDIR)/docker/Dockerfile.rpm-yum
--fedora%: CONFIG_TOML_SUFFIX := rpm-yum
# The fedora(35) base image has very slow performance when building aarch64 packages.
# Since our primary concern here is glibc versions, we use the older glibc version available in centos8.
--fedora35%: BASEIMAGE = quay.io/centos/centos:stream8
# private amazonlinux target
--amazonlinux%: OS := amazonlinux
--amazonlinux%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)-$(if $(LIBNVIDIA_CONTAINER_TAG),0.1.$(LIBNVIDIA_CONTAINER_TAG),1)
--amazonlinux%: PKG_REV := $(if $(LIB_TAG),0.1.$(LIB_TAG),1)
--amazonlinux%: DOCKERFILE = $(CURDIR)/docker/Dockerfile.rpm-yum
--amazonlinux%: CONFIG_TOML_SUFFIX := rpm-yum
# private opensuse-leap target
--opensuse-leap%: OS = opensuse-leap
--opensuse-leap%: BASEIMAGE = opensuse/leap:$(VERSION)
--opensuse-leap%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)-$(if $(LIBNVIDIA_CONTAINER_TAG),0.1.$(LIBNVIDIA_CONTAINER_TAG),1)
--opensuse-leap%: PKG_REV := $(if $(LIB_TAG),0.1.$(LIB_TAG),1)
# private rhel target (actually built on centos)
--rhel%: OS := centos
--rhel%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)-$(if $(LIBNVIDIA_CONTAINER_TAG),0.1.$(LIBNVIDIA_CONTAINER_TAG),1)
--rhel%: PKG_REV := $(if $(LIB_TAG),0.1.$(LIB_TAG),1)
--rhel%: VERSION = $(patsubst rhel%-$(ARCH),%,$(TARGET_PLATFORM))
--rhel%: ARTIFACTS_DIR = $(DIST_DIR)/rhel$(VERSION)/$(ARCH)
--rhel%: DOCKERFILE = $(CURDIR)/docker/Dockerfile.rpm-yum
--rhel%: CONFIG_TOML_SUFFIX := rpm-yum
--rhel8%: BASEIMAGE = quay.io/centos/centos:stream8
# We allow the CONFIG_TOML_SUFFIX to be overridden.
CONFIG_TOML_SUFFIX ?= $(OS)
@@ -128,10 +154,12 @@ docker-build-%:
--progress=plain \
--build-arg BASEIMAGE="$(BASEIMAGE)" \
--build-arg GOLANG_VERSION="$(GOLANG_VERSION)" \
--build-arg PKG_NAME="$(LIB_NAME)" \
--build-arg PKG_VERS="$(LIB_VERSION)" \
--build-arg PKG_REV="$(PKG_REV)" \
--build-arg LIBNVIDIA_CONTAINER_TOOLS_VERSION="$(LIBNVIDIA_CONTAINER_TOOLS_VERSION)" \
--build-arg CONFIG_TOML_SUFFIX="$(CONFIG_TOML_SUFFIX)" \
--build-arg GIT_COMMIT="$(GIT_COMMIT)" \
--build-arg GIT_COMMIT="$(GIT_COMMIT)" \
--tag $(BUILDIMAGE) \
--file $(DOCKERFILE) .
$(DOCKER) run \

34
go.mod
View File

@@ -1,18 +1,36 @@
module github.com/NVIDIA/nvidia-container-toolkit
go 1.14
go 1.18
require (
github.com/BurntSushi/toml v1.0.0
github.com/NVIDIA/go-nvml v0.11.6-0
github.com/container-orchestrated-devices/container-device-interface v0.3.1-0.20220224133719-e5457123010b
github.com/containers/podman/v4 v4.0.3
github.com/opencontainers/runtime-spec v1.0.3-0.20211214071223-8958f93039ab
github.com/NVIDIA/go-nvml v0.11.6-0.0.20220823120812-7e2082095e82
github.com/container-orchestrated-devices/container-device-interface v0.5.4-0.20230111111500-5b3b5d81179a
github.com/fsnotify/fsnotify v1.5.4
github.com/opencontainers/runtime-spec v1.0.3-0.20220825212826-86290f6a00fb
github.com/pelletier/go-toml v1.9.4
github.com/sirupsen/logrus v1.8.1
github.com/sirupsen/logrus v1.9.0
github.com/stretchr/testify v1.7.0
github.com/tsaikd/KDGoLib v0.0.0-20191001134900-7f3cf518e07d
github.com/urfave/cli/v2 v2.3.0
gitlab.com/nvidia/cloud-native/go-nvlib v0.0.0-20230119114711-6fe07bb33342
golang.org/x/mod v0.5.0
golang.org/x/sys v0.0.0-20220114195835-da31bd327af9
golang.org/x/sys v0.0.0-20220927170352-d9d178bc13c6
sigs.k8s.io/yaml v1.3.0
)
require (
github.com/cpuguy83/go-md2man/v2 v2.0.1 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/hashicorp/errwrap v1.1.0 // indirect
github.com/kr/text v0.2.0 // indirect
github.com/opencontainers/runc v1.1.4 // indirect
github.com/opencontainers/runtime-tools v0.9.1-0.20221107090550-2e043c6bd626 // indirect
github.com/opencontainers/selinux v1.10.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/russross/blackfriday/v2 v2.1.0 // indirect
github.com/syndtr/gocapability v0.0.0-20200815063812-42c35b437635 // indirect
github.com/xeipuuv/gojsonpointer v0.0.0-20190905194746-02993c407bfb // indirect
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)

1912
go.sum

File diff suppressed because it is too large Load Diff

View File

@@ -61,7 +61,7 @@ func GetConfig() (*Config, error) {
tomlFile, err := os.Open(configFilePath)
if err != nil {
return nil, fmt.Errorf("failed to open config file %v: %v", configFilePath, err)
return getDefaultConfig(), nil
}
defer tomlFile.Close()

View File

@@ -0,0 +1,125 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package crio
import (
"fmt"
"os"
"github.com/pelletier/go-toml"
log "github.com/sirupsen/logrus"
)
// LoadConfig loads the cri-o config from disk
func LoadConfig(config string) (*toml.Tree, error) {
log.Infof("Loading config: %v", config)
info, err := os.Stat(config)
if os.IsExist(err) && info.IsDir() {
return nil, fmt.Errorf("config file is a directory")
}
configFile := config
if os.IsNotExist(err) {
configFile = "/dev/null"
log.Infof("Config file does not exist, creating new one")
}
cfg, err := toml.LoadFile(configFile)
if err != nil {
return nil, err
}
log.Infof("Successfully loaded config")
return cfg, nil
}
// UpdateConfig updates the cri-o config to include the NVIDIA Container Runtime
func UpdateConfig(config *toml.Tree, runtimeClass string, runtimePath string, setAsDefault bool) error {
switch runc := config.Get("crio.runtime.runtimes.runc").(type) {
case *toml.Tree:
runc, _ = toml.Load(runc.String())
config.SetPath([]string{"crio", "runtime", "runtimes", runtimeClass}, runc)
}
config.SetPath([]string{"crio", "runtime", "runtimes", runtimeClass, "runtime_path"}, runtimePath)
config.SetPath([]string{"crio", "runtime", "runtimes", runtimeClass, "runtime_type"}, "oci")
if setAsDefault {
config.SetPath([]string{"crio", "runtime", "default_runtime"}, runtimeClass)
}
return nil
}
// RevertConfig reverts the cri-o config to remove the NVIDIA Container Runtime
func RevertConfig(config *toml.Tree, runtimeClass string) error {
if runtime, ok := config.GetPath([]string{"crio", "runtime", "default_runtime"}).(string); ok {
if runtimeClass == runtime {
config.DeletePath([]string{"crio", "runtime", "default_runtime"})
}
}
runtimeClassPath := []string{"crio", "runtime", "runtimes", runtimeClass}
config.DeletePath(runtimeClassPath)
for i := 0; i < len(runtimeClassPath); i++ {
remainingPath := runtimeClassPath[:len(runtimeClassPath)-i]
if entry, ok := config.GetPath(remainingPath).(*toml.Tree); ok {
if len(entry.Keys()) != 0 {
break
}
config.DeletePath(remainingPath)
}
}
return nil
}
// FlushConfig flushes the updated/reverted config out to disk
func FlushConfig(config string, cfg *toml.Tree) error {
log.Infof("Flushing config")
output, err := cfg.ToTomlString()
if err != nil {
return fmt.Errorf("unable to convert to TOML: %v", err)
}
switch len(output) {
case 0:
err := os.Remove(config)
if err != nil {
return fmt.Errorf("unable to remove empty file: %v", err)
}
log.Infof("Config empty, removing file")
default:
f, err := os.Create(config)
if err != nil {
return fmt.Errorf("unable to open '%v' for writing: %v", config, err)
}
defer f.Close()
_, err = f.WriteString(output)
if err != nil {
return fmt.Errorf("unable to write output: %v", err)
}
}
log.Infof("Successfully flushed config")
return nil
}

View File

@@ -0,0 +1,117 @@
/**
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package docker
import (
"bytes"
"encoding/json"
"fmt"
"io/ioutil"
"os"
log "github.com/sirupsen/logrus"
)
// LoadConfig loads the docker config from disk
func LoadConfig(configFilePath string) (map[string]interface{}, error) {
log.Infof("Loading docker config from %v", configFilePath)
info, err := os.Stat(configFilePath)
if os.IsExist(err) && info.IsDir() {
return nil, fmt.Errorf("config file is a directory")
}
cfg := make(map[string]interface{})
if os.IsNotExist(err) {
log.Infof("Config file does not exist, creating new one")
return cfg, nil
}
readBytes, err := ioutil.ReadFile(configFilePath)
if err != nil {
return nil, fmt.Errorf("unable to read config: %v", err)
}
reader := bytes.NewReader(readBytes)
if err := json.NewDecoder(reader).Decode(&cfg); err != nil {
return nil, err
}
log.Infof("Successfully loaded config")
return cfg, nil
}
// UpdateConfig updates the docker config to include the nvidia runtimes
func UpdateConfig(config map[string]interface{}, runtimeName string, runtimePath string, setAsDefault bool) error {
// Read the existing runtimes
runtimes := make(map[string]interface{})
if _, exists := config["runtimes"]; exists {
runtimes = config["runtimes"].(map[string]interface{})
}
// Add / update the runtime definitions
runtimes[runtimeName] = map[string]interface{}{
"path": runtimePath,
"args": []string{},
}
// Update the runtimes definition
if len(runtimes) > 0 {
config["runtimes"] = runtimes
}
if setAsDefault {
config["default-runtime"] = runtimeName
}
return nil
}
// FlushConfig flushes the updated/reverted config out to disk
func FlushConfig(cfg map[string]interface{}, configFilePath string) error {
log.Infof("Flushing docker config to %v", configFilePath)
output, err := json.MarshalIndent(cfg, "", " ")
if err != nil {
return fmt.Errorf("unable to convert to JSON: %v", err)
}
switch len(output) {
case 0:
err := os.Remove(configFilePath)
if err != nil {
return fmt.Errorf("unable to remove empty file: %v", err)
}
log.Infof("Config empty, removing file")
default:
f, err := os.Create(configFilePath)
if err != nil {
return fmt.Errorf("unable to open %v for writing: %v", configFilePath, err)
}
defer f.Close()
_, err = f.WriteString(string(output))
if err != nil {
return fmt.Errorf("unable to write output: %v", err)
}
}
log.Infof("Successfully flushed config")
return nil
}

View File

@@ -0,0 +1,215 @@
/**
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package docker
import (
"encoding/json"
"fmt"
"testing"
"github.com/stretchr/testify/require"
)
func TestUpdateConfigDefaultRuntime(t *testing.T) {
testCases := []struct {
config map[string]interface{}
runtimeName string
setAsDefault bool
expectedDefaultRuntimeName interface{}
}{
{
setAsDefault: false,
expectedDefaultRuntimeName: nil,
},
{
runtimeName: "NAME",
setAsDefault: true,
expectedDefaultRuntimeName: "NAME",
},
{
config: map[string]interface{}{
"default-runtime": "ALREADY_SET",
},
runtimeName: "NAME",
setAsDefault: false,
expectedDefaultRuntimeName: "ALREADY_SET",
},
{
config: map[string]interface{}{
"default-runtime": "ALREADY_SET",
},
runtimeName: "NAME",
setAsDefault: true,
expectedDefaultRuntimeName: "NAME",
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("test case %d", i), func(t *testing.T) {
if tc.config == nil {
tc.config = make(map[string]interface{})
}
err := UpdateConfig(tc.config, tc.runtimeName, "", tc.setAsDefault)
require.NoError(t, err)
defaultRuntimeName := tc.config["default-runtime"]
require.EqualValues(t, tc.expectedDefaultRuntimeName, defaultRuntimeName)
})
}
}
func TestUpdateConfigRuntimes(t *testing.T) {
testCases := []struct {
config map[string]interface{}
runtimes map[string]string
expectedConfig map[string]interface{}
}{
{
config: map[string]interface{}{},
runtimes: map[string]string{
"runtime1": "/test/runtime/dir/runtime1",
"runtime2": "/test/runtime/dir/runtime2",
},
expectedConfig: map[string]interface{}{
"runtimes": map[string]interface{}{
"runtime1": map[string]interface{}{
"path": "/test/runtime/dir/runtime1",
"args": []string{},
},
"runtime2": map[string]interface{}{
"path": "/test/runtime/dir/runtime2",
"args": []string{},
},
},
},
},
{
config: map[string]interface{}{
"runtimes": map[string]interface{}{
"runtime1": map[string]interface{}{
"path": "runtime1",
"args": []string{},
},
},
},
runtimes: map[string]string{
"runtime1": "/test/runtime/dir/runtime1",
"runtime2": "/test/runtime/dir/runtime2",
},
expectedConfig: map[string]interface{}{
"runtimes": map[string]interface{}{
"runtime1": map[string]interface{}{
"path": "/test/runtime/dir/runtime1",
"args": []string{},
},
"runtime2": map[string]interface{}{
"path": "/test/runtime/dir/runtime2",
"args": []string{},
},
},
},
},
{
config: map[string]interface{}{
"runtimes": map[string]interface{}{
"not-nvidia": map[string]interface{}{
"path": "some-other-path",
"args": []string{},
},
},
},
runtimes: map[string]string{
"runtime1": "/test/runtime/dir/runtime1",
},
expectedConfig: map[string]interface{}{
"runtimes": map[string]interface{}{
"not-nvidia": map[string]interface{}{
"path": "some-other-path",
"args": []string{},
},
"runtime1": map[string]interface{}{
"path": "/test/runtime/dir/runtime1",
"args": []string{},
},
},
},
},
{
config: map[string]interface{}{
"exec-opts": []string{"native.cgroupdriver=systemd"},
"log-driver": "json-file",
"log-opts": map[string]string{
"max-size": "100m",
},
"storage-driver": "overlay2",
},
runtimes: map[string]string{
"runtime1": "/test/runtime/dir/runtime1",
},
expectedConfig: map[string]interface{}{
"exec-opts": []string{"native.cgroupdriver=systemd"},
"log-driver": "json-file",
"log-opts": map[string]string{
"max-size": "100m",
},
"storage-driver": "overlay2",
"runtimes": map[string]interface{}{
"runtime1": map[string]interface{}{
"path": "/test/runtime/dir/runtime1",
"args": []string{},
},
},
},
},
{
config: map[string]interface{}{
"exec-opts": []string{"native.cgroupdriver=systemd"},
"log-driver": "json-file",
"log-opts": map[string]string{
"max-size": "100m",
},
"storage-driver": "overlay2",
},
expectedConfig: map[string]interface{}{
"exec-opts": []string{"native.cgroupdriver=systemd"},
"log-driver": "json-file",
"log-opts": map[string]string{
"max-size": "100m",
},
"storage-driver": "overlay2",
},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("test case %d", i), func(t *testing.T) {
for runtimeName, runtimePath := range tc.runtimes {
err := UpdateConfig(tc.config, runtimeName, runtimePath, false)
require.NoError(t, err)
}
configContent, err := json.MarshalIndent(tc.config, "", " ")
require.NoError(t, err)
expectedContent, err := json.MarshalIndent(tc.expectedConfig, "", " ")
require.NoError(t, err)
require.EqualValues(t, string(expectedContent), string(configContent))
})
}
}

View File

@@ -0,0 +1,54 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package image
// DriverCapability represents the possible values of NVIDIA_DRIVER_CAPABILITIES
type DriverCapability string
// Constants for the supported driver capabilities
const (
DriverCapabilityAll DriverCapability = "all"
DriverCapabilityCompat32 DriverCapability = "compat32"
DriverCapabilityCompute DriverCapability = "compute"
DriverCapabilityDisplay DriverCapability = "display"
DriverCapabilityGraphics DriverCapability = "graphics"
DriverCapabilityNgx DriverCapability = "ngx"
DriverCapabilityUtility DriverCapability = "utility"
DriverCapabilityVideo DriverCapability = "video"
)
// DriverCapabilities represents the NVIDIA_DRIVER_CAPABILITIES set for the specified image.
type DriverCapabilities map[DriverCapability]bool
// Has check whether the specified capability is selected.
func (c DriverCapabilities) Has(capability DriverCapability) bool {
if c[DriverCapabilityAll] {
return true
}
return c[capability]
}
// Any checks whether any of the specified capabilites are set
func (c DriverCapabilities) Any(capabilities ...DriverCapability) bool {
for _, cap := range capabilities {
if c.Has(cap) {
return true
}
}
return false
}

View File

@@ -26,10 +26,12 @@ import (
)
const (
envCUDAVersion = "CUDA_VERSION"
envNVRequirePrefix = "NVIDIA_REQUIRE_"
envNVRequireCUDA = envNVRequirePrefix + "CUDA"
envNVDisableRequire = "NVIDIA_DISABLE_REQUIRE"
envCUDAVersion = "CUDA_VERSION"
envNVRequirePrefix = "NVIDIA_REQUIRE_"
envNVRequireCUDA = envNVRequirePrefix + "CUDA"
envNVRequireJetpack = envNVRequirePrefix + "JETPACK"
envNVDisableRequire = "NVIDIA_DISABLE_REQUIRE"
envNVDriverCapabilities = "NVIDIA_DRIVER_CAPABILITIES"
)
// CUDA represents a CUDA image that can be used for GPU computing. This wraps
@@ -84,7 +86,7 @@ func (i CUDA) GetRequirements() ([]string, error) {
// All variables with the "NVIDIA_REQUIRE_" prefix are passed to nvidia-container-cli
var requirements []string
for name, value := range i {
if strings.HasPrefix(name, envNVRequirePrefix) {
if strings.HasPrefix(name, envNVRequirePrefix) && !strings.HasPrefix(name, envNVRequireJetpack) {
requirements = append(requirements, value)
}
}
@@ -111,6 +113,51 @@ func (i CUDA) HasDisableRequire() bool {
return false
}
// DevicesFromEnvvars returns the devices requested by the image through environment variables
func (i CUDA) DevicesFromEnvvars(envVars ...string) VisibleDevices {
// We concantenate all the devices from the specified envvars.
var isSet bool
var devices []string
requested := make(map[string]bool)
for _, envVar := range envVars {
if devs, ok := i[envVar]; ok {
isSet = true
for _, d := range strings.Split(devs, ",") {
trimmed := strings.TrimSpace(d)
if len(trimmed) == 0 {
continue
}
devices = append(devices, trimmed)
requested[trimmed] = true
}
}
}
// Environment variable unset with legacy image: default to "all".
if !isSet && len(devices) == 0 && i.IsLegacy() {
return NewVisibleDevices("all")
}
// Environment variable unset or empty or "void": return nil
if len(devices) == 0 || requested["void"] {
return NewVisibleDevices("void")
}
return NewVisibleDevices(devices...)
}
// GetDriverCapabilities returns the requested driver capabilities.
func (i CUDA) GetDriverCapabilities() DriverCapabilities {
env := i[envNVDriverCapabilities]
capabilites := make(DriverCapabilities)
for _, c := range strings.Split(env, ",") {
capabilites[DriverCapability(c)] = true
}
return capabilites
}
func (i CUDA) legacyVersion() (string, error) {
majorMinor, err := parseMajorMinorVersion(i[envCUDAVersion])
if err != nil {

View File

@@ -69,3 +69,55 @@ func TestParseMajorMinorVersionInvalid(t *testing.T) {
})
}
}
func TestGetRequirements(t *testing.T) {
testCases := []struct {
description string
env []string
requirements []string
}{
{
description: "NVIDIA_REQUIRE_JETPACK is ignored",
env: []string{"NVIDIA_REQUIRE_JETPACK=csv-mounts=all"},
requirements: nil,
},
{
description: "NVIDIA_REQUIRE_JETPACK_HOST_MOUNTS is ignored",
env: []string{"NVIDIA_REQUIRE_JETPACK_HOST_MOUNTS=base-only"},
requirements: nil,
},
{
description: "single requirement set",
env: []string{"NVIDIA_REQUIRE_CUDA=cuda>=11.6"},
requirements: []string{"cuda>=11.6"},
},
{
description: "requirements are concatenated requirement set",
env: []string{"NVIDIA_REQUIRE_CUDA=cuda>=11.6", "NVIDIA_REQUIRE_BRAND=brand=tesla"},
requirements: []string{"cuda>=11.6", "brand=tesla"},
},
{
description: "legacy image",
env: []string{"CUDA_VERSION=11.6"},
requirements: []string{"cuda>=11.6"},
},
{
description: "legacy image with additional requirement",
env: []string{"CUDA_VERSION=11.6", "NVIDIA_REQUIRE_BRAND=brand=tesla"},
requirements: []string{"cuda>=11.6", "brand=tesla"},
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
image, err := NewCUDAImageFromEnv(tc.env)
require.NoError(t, err)
requirements, err := image.GetRequirements()
require.NoError(t, err)
require.ElementsMatch(t, tc.requirements, requirements)
})
}
}

View File

@@ -0,0 +1,127 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package image
import (
"strings"
)
// VisibleDevices represents the devices selected in a container image
// through the NVIDIA_VISIBLE_DEVICES or other environment variables
type VisibleDevices interface {
List() []string
Has(string) bool
}
var _ VisibleDevices = (*all)(nil)
var _ VisibleDevices = (*none)(nil)
var _ VisibleDevices = (*void)(nil)
var _ VisibleDevices = (*devices)(nil)
// NewVisibleDevices creates a VisibleDevices based on the value of the specified envvar.
func NewVisibleDevices(envvars ...string) VisibleDevices {
for _, envvar := range envvars {
if envvar == "all" {
return all{}
}
if envvar == "none" {
return none{}
}
if envvar == "" || envvar == "void" {
return void{}
}
}
return newDevices(envvars...)
}
type all struct{}
// List returns ["all"] for all devices
func (a all) List() []string {
return []string{"all"}
}
// Has for all devices is true for any id except the empty ID
func (a all) Has(id string) bool {
return id != ""
}
type none struct{}
// List returns [""] for the none devices
func (n none) List() []string {
return []string{""}
}
// Has for none devices is false for any id
func (n none) Has(id string) bool {
return false
}
type void struct {
none
}
// List returns nil for the void devices
func (v void) List() []string {
return nil
}
type devices struct {
len int
lookup map[string]int
}
func newDevices(idOrCommaSeparated ...string) devices {
lookup := make(map[string]int)
i := 0
for _, commaSeparated := range idOrCommaSeparated {
for _, id := range strings.Split(commaSeparated, ",") {
lookup[id] = i
i++
}
}
d := devices{
len: i,
lookup: lookup,
}
return d
}
// List returns the list of requested devices
func (d devices) List() []string {
list := make([]string, d.len)
for id, i := range d.lookup {
list[i] = id
}
return list
}
// Has checks whether the specified ID is in the set of requested devices
func (d devices) Has(id string) bool {
if id == "" {
return false
}
_, exist := d.lookup[id]
return exist
}

View File

@@ -44,6 +44,12 @@ type RuntimeConfig struct {
// modesConfig defines (optional) per-mode configs
type modesConfig struct {
CSV csvModeConfig `toml:"csv"`
CDI cdiModeConfig `toml:"cdi"`
}
type cdiModeConfig struct {
// SpecDirs allows for the default spec dirs for CDI to be overridden
SpecDirs []string `toml:"spec-dirs"`
}
type csvModeConfig struct {

View File

@@ -0,0 +1,65 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import (
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/sirupsen/logrus"
)
// charDevices is a discover for a list of character devices
type charDevices mounts
var _ Discover = (*charDevices)(nil)
// NewCharDeviceDiscoverer creates a discoverer which locates the specified set of device nodes.
func NewCharDeviceDiscoverer(logger *logrus.Logger, devices []string, root string) Discover {
locator := lookup.NewCharDeviceLocator(
lookup.WithLogger(logger),
lookup.WithRoot(root),
)
return NewDeviceDiscoverer(logger, locator, root, devices)
}
// NewDeviceDiscoverer creates a discoverer which locates the specified set of device nodes using the specified locator.
func NewDeviceDiscoverer(logger *logrus.Logger, locator lookup.Locator, root string, devices []string) Discover {
m := NewMounts(logger, locator, root, devices).(*mounts)
return (*charDevices)(m)
}
// Mounts returns the discovered mounts for the charDevices.
// Since this explicitly specifies a device list, the mounts are nil.
func (d *charDevices) Mounts() ([]Mount, error) {
return nil, nil
}
// Devices returns the discovered devices for the charDevices.
// Here the device nodes are first discovered as mounts and these are converted to devices.
func (d *charDevices) Devices() ([]Device, error) {
devicesAsMounts, err := (*mounts)(d).Mounts()
if err != nil {
return nil, err
}
var devices []Device
for _, mount := range devicesAsMounts {
devices = append(devices, Device(mount))
}
return devices, nil
}

View File

@@ -0,0 +1,83 @@
/**
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import (
"fmt"
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
)
func TestCharDevices(t *testing.T) {
logger, logHook := testlog.NewNullLogger()
testCases := []struct {
description string
input *charDevices
expectedMounts []Mount
expectedMountsError error
expectedDevicesError error
expectedDevices []Device
}{
{
description: "dev mounts are empty",
input: (*charDevices)(
&mounts{
lookup: &lookup.LocatorMock{
LocateFunc: func(string) ([]string, error) {
return []string{"located"}, nil
},
},
required: []string{"required"},
},
),
expectedDevices: []Device{{Path: "located", HostPath: "located"}},
},
{
description: "dev devices returns error for nil lookup",
input: &charDevices{},
expectedDevicesError: fmt.Errorf("no lookup defined"),
},
}
for _, tc := range testCases {
logHook.Reset()
t.Run(tc.description, func(t *testing.T) {
tc.input.logger = logger
mounts, err := tc.input.Mounts()
if tc.expectedMountsError != nil {
require.Error(t, err)
} else {
require.NoError(t, err)
}
require.ElementsMatch(t, tc.expectedMounts, mounts)
devices, err := tc.input.Devices()
if tc.expectedDevicesError != nil {
require.Error(t, err)
} else {
require.NoError(t, err)
}
require.ElementsMatch(t, tc.expectedDevices, devices)
})
}
}

View File

@@ -24,11 +24,6 @@ import (
"github.com/sirupsen/logrus"
)
// charDevices is a discover for a list of character devices
type charDevices mounts
var _ Discover = (*charDevices)(nil)
// NewFromCSVFiles creates a discoverer for the specified CSV files. A logger is also supplied.
// The constructed discoverer is comprised of a list, with each element in the list being associated with a
// single CSV files.
@@ -40,30 +35,28 @@ func NewFromCSVFiles(logger *logrus.Logger, files []string, root string) (Discov
symlinkLocator := lookup.NewSymlinkLocator(logger, root)
locators := map[csv.MountSpecType]lookup.Locator{
csv.MountSpecDev: lookup.NewCharDeviceLocator(logger, root),
csv.MountSpecDev: lookup.NewCharDeviceLocator(lookup.WithLogger(logger), lookup.WithRoot(root)),
csv.MountSpecDir: lookup.NewDirectoryLocator(logger, root),
// Libraries and symlinks are handled in the same way
csv.MountSpecLib: symlinkLocator,
csv.MountSpecSym: symlinkLocator,
}
var discoverers []Discover
var mountSpecs []*csv.MountSpec
for _, filename := range files {
d, err := NewFromCSVFile(logger, locators, filename)
targets, err := loadCSVFile(logger, filename)
if err != nil {
logger.Warnf("Skipping CSV file %v: %v", filename, err)
continue
}
discoverers = append(discoverers, d)
mountSpecs = append(mountSpecs, targets...)
}
return &list{discoverers: discoverers}, nil
return newFromMountSpecs(logger, locators, root, mountSpecs)
}
// NewFromCSVFile creates a discoverer for the specified CSV file. A logger is also supplied.
// The constructed discoverer is comprised of a list, with each element in the list being associated with a particular
// MountSpecType.
func NewFromCSVFile(logger *logrus.Logger, locators map[csv.MountSpecType]lookup.Locator, filename string) (Discover, error) {
// loadCSVFile loads the specified CSV file and returns the list of mount specs
func loadCSVFile(logger *logrus.Logger, filename string) ([]*csv.MountSpec, error) {
// Create a discoverer for each file-kind combination
targets, err := csv.NewCSVFileParser(logger, filename).Parse()
if err != nil {
@@ -73,12 +66,12 @@ func NewFromCSVFile(logger *logrus.Logger, locators map[csv.MountSpecType]lookup
return nil, fmt.Errorf("CSV file is empty")
}
return newFromMountSpecs(logger, locators, targets)
return targets, nil
}
// newFromMountSpecs creates a discoverer for the CSV file. A logger is also supplied.
// A list of csvDiscoverers is returned, with each being associated with a single MountSpecType.
func newFromMountSpecs(logger *logrus.Logger, locators map[csv.MountSpecType]lookup.Locator, targets []*csv.MountSpec) (Discover, error) {
func newFromMountSpecs(logger *logrus.Logger, locators map[csv.MountSpecType]lookup.Locator, root string, targets []*csv.MountSpec) (Discover, error) {
if len(targets) == 0 {
return &None{}, nil
}
@@ -99,41 +92,16 @@ func newFromMountSpecs(logger *logrus.Logger, locators map[csv.MountSpecType]loo
return nil, fmt.Errorf("no locator defined for '%v'", t)
}
m := &mounts{
logger: logger,
lookup: locator,
required: candidatesByType[t],
}
var m Discover
switch t {
case csv.MountSpecDev:
// For device mount specs, we insert a charDevices into the list of discoverers.
discoverers = append(discoverers, (*charDevices)(m))
m = NewDeviceDiscoverer(logger, locator, root, candidatesByType[t])
default:
discoverers = append(discoverers, m)
m = NewMounts(logger, locator, root, candidatesByType[t])
}
discoverers = append(discoverers, m)
}
return &list{discoverers: discoverers}, nil
}
// Mounts returns the discovered mounts for the charDevices. Since this explicitly specifies a
// device list, the mounts are nil.
func (d *charDevices) Mounts() ([]Mount, error) {
return nil, nil
}
// Devices returns the discovered devices for the charDevices. Here the device nodes are first
// discovered as mounts and these are converted to devices.
func (d *charDevices) Devices() ([]Device, error) {
devicesAsMounts, err := (*mounts)(d).Mounts()
if err != nil {
return nil, err
}
var devices []Device
for _, mount := range devicesAsMounts {
devices = append(devices, Device(mount))
}
return devices, nil
}

View File

@@ -26,63 +26,6 @@ import (
"github.com/stretchr/testify/require"
)
func TestCharDevices(t *testing.T) {
logger, logHook := testlog.NewNullLogger()
testCases := []struct {
description string
input *charDevices
expectedMounts []Mount
expectedMountsError error
expectedDevicesError error
expectedDevices []Device
}{
{
description: "dev mounts are empty",
input: (*charDevices)(
&mounts{
lookup: &lookup.LocatorMock{
LocateFunc: func(string) ([]string, error) {
return []string{"located"}, nil
},
},
required: []string{"required"},
},
),
expectedDevices: []Device{{Path: "located"}},
},
{
description: "dev devices returns error for nil lookup",
input: &charDevices{},
expectedDevicesError: fmt.Errorf("no lookup defined"),
},
}
for _, tc := range testCases {
logHook.Reset()
t.Run(tc.description, func(t *testing.T) {
tc.input.logger = logger
mounts, err := tc.input.Mounts()
if tc.expectedMountsError != nil {
require.Error(t, err)
} else {
require.NoError(t, err)
}
require.ElementsMatch(t, tc.expectedMounts, mounts)
devices, err := tc.input.Devices()
if tc.expectedDevicesError != nil {
require.Error(t, err)
} else {
require.NoError(t, err)
}
require.ElementsMatch(t, tc.expectedDevices, devices)
})
}
}
func TestNewFromMountSpec(t *testing.T) {
logger, _ := testlog.NewNullLogger()
@@ -93,6 +36,7 @@ func TestNewFromMountSpec(t *testing.T) {
testCases := []struct {
description string
root string
targets []*csv.MountSpec
expectedError error
expectedDiscoverer Discover
@@ -133,12 +77,50 @@ func TestNewFromMountSpec(t *testing.T) {
&mounts{
logger: logger,
lookup: locators["dev"],
root: "/",
required: []string{"dev0", "dev1"},
},
),
&mounts{
logger: logger,
lookup: locators["lib"],
root: "/",
required: []string{"lib0"},
},
},
},
},
{
description: "sets root",
targets: []*csv.MountSpec{
{
Type: "dev",
Path: "dev0",
},
{
Type: "lib",
Path: "lib0",
},
{
Type: "dev",
Path: "dev1",
},
},
root: "/some/root",
expectedDiscoverer: &list{
discoverers: []Discover{
(*charDevices)(
&mounts{
logger: logger,
lookup: locators["dev"],
root: "/some/root",
required: []string{"dev0", "dev1"},
},
),
&mounts{
logger: logger,
lookup: locators["lib"],
root: "/some/root",
required: []string{"lib0"},
},
},
@@ -148,7 +130,7 @@ func TestNewFromMountSpec(t *testing.T) {
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
discoverer, err := newFromMountSpecs(logger, locators, tc.targets)
discoverer, err := newFromMountSpecs(logger, locators, tc.root, tc.targets)
if tc.expectedError != nil {
require.Error(t, err)
return

View File

@@ -18,18 +18,20 @@ package discover
// Config represents the configuration options for discovery
type Config struct {
Root string
NVIDIAContainerToolkitCLIExecutablePath string
Root string
NvidiaCTKPath string
}
// Device represents a discovered character device.
type Device struct {
Path string
HostPath string
Path string
}
// Mount represents a discovered mount.
type Mount struct {
Path string
HostPath string
Path string
}
// Hook represents a discovered hook.
@@ -39,8 +41,9 @@ type Hook struct {
Args []string
}
//go:generate moq -stub -out discover_mock.go . Discover
// Discover defines an interface for discovering the devices, mounts, and hooks available on a system
//
//go:generate moq -stub -out discover_mock.go . Discover
type Discover interface {
Devices() ([]Device, error)
Mounts() ([]Mount, error)

View File

@@ -0,0 +1,62 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import "github.com/sirupsen/logrus"
// Filter defines an interface for filtering discovered entities
type Filter interface {
DeviceIsSelected(device Device) bool
}
// filtered represents a filtered discoverer
type filtered struct {
Discover
logger *logrus.Logger
filter Filter
}
// newFilteredDisoverer creates a discoverer that applies the specified filter to the returned entities of the discoverer
func newFilteredDisoverer(logger *logrus.Logger, applyTo Discover, filter Filter) Discover {
return filtered{
Discover: applyTo,
logger: logger,
filter: filter,
}
}
// Devices returns a filtered list of devices based on the specified filter.
func (d filtered) Devices() ([]Device, error) {
devices, err := d.Discover.Devices()
if err != nil {
return nil, err
}
if d.filter == nil {
return devices, nil
}
var selected []Device
for _, device := range devices {
if d.filter.DeviceIsSelected(device) {
selected = append(selected, device)
}
d.logger.Debugf("skipping device %v", device)
}
return selected, nil
}

80
internal/discover/gds.go Normal file
View File

@@ -0,0 +1,80 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import (
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/sirupsen/logrus"
)
type gdsDeviceDiscoverer struct {
None
logger *logrus.Logger
devices Discover
mounts Discover
}
// NewGDSDiscoverer creates a discoverer for GPUDirect Storage devices and mounts.
func NewGDSDiscoverer(logger *logrus.Logger, root string) (Discover, error) {
devices := NewCharDeviceDiscoverer(
logger,
[]string{"/dev/nvidia-fs*"},
root,
)
udev := NewMounts(
logger,
lookup.NewDirectoryLocator(logger, root),
root,
[]string{"/run/udev"},
)
cufile := NewMounts(
logger,
lookup.NewFileLocator(
lookup.WithLogger(logger),
lookup.WithRoot(root),
),
root,
[]string{"/etc/cufile.json"},
)
d := gdsDeviceDiscoverer{
logger: logger,
devices: devices,
mounts: Merge(udev, cufile),
}
return &d, nil
}
// Devices discovers the nvidia-fs device nodes for use with GPUDirect Storage
func (d *gdsDeviceDiscoverer) Devices() ([]Device, error) {
return d.devices.Devices()
}
// Mounts discovers the required mounts for GPUDirect Storage.
// If no devices are discovered the discovered mounts are empty
func (d *gdsDeviceDiscoverer) Mounts() ([]Mount, error) {
devices, err := d.Devices()
if err != nil || len(devices) == 0 {
d.logger.Debugf("No nvidia-fs devices detected; skipping detection of mounts")
return nil, nil
}
return d.mounts.Mounts()
}

View File

@@ -0,0 +1,264 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import (
"fmt"
"os"
"path/filepath"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info/drm"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info/proc"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/sirupsen/logrus"
)
// NewGraphicsDiscoverer returns the discoverer for graphics tools such as Vulkan.
func NewGraphicsDiscoverer(logger *logrus.Logger, devices image.VisibleDevices, cfg *Config) (Discover, error) {
root := cfg.Root
mounts, err := NewGraphicsMountsDiscoverer(logger, root)
if err != nil {
return nil, fmt.Errorf("failed to create mounts discoverer: %v", err)
}
drmDeviceNodes, err := newDRMDeviceDiscoverer(logger, devices, root)
if err != nil {
return nil, fmt.Errorf("failed to create DRM device discoverer: %v", err)
}
drmByPathSymlinks := newCreateDRMByPathSymlinks(logger, drmDeviceNodes, cfg)
discover := Merge(
Merge(drmDeviceNodes, drmByPathSymlinks),
mounts,
)
return discover, nil
}
// NewGraphicsMountsDiscoverer creates a discoverer for the mounts required by graphics tools such as vulkan.
func NewGraphicsMountsDiscoverer(logger *logrus.Logger, root string) (Discover, error) {
locator, err := lookup.NewLibraryLocator(logger, root)
if err != nil {
return nil, fmt.Errorf("failed to construct library locator: %v", err)
}
libraries := NewMounts(
logger,
locator,
root,
[]string{
"libnvidia-egl-gbm.so",
},
)
jsonMounts := NewMounts(
logger,
lookup.NewFileLocator(
lookup.WithLogger(logger),
lookup.WithRoot(root),
lookup.WithSearchPaths("/etc", "/usr/share"),
),
root,
[]string{
"glvnd/egl_vendor.d/10_nvidia.json",
"vulkan/icd.d/nvidia_icd.json",
"vulkan/implicit_layer.d/nvidia_layers.json",
"egl/egl_external_platform.d/15_nvidia_gbm.json",
"egl/egl_external_platform.d/10_nvidia_wayland.json",
},
)
discover := Merge(
libraries,
jsonMounts,
)
return discover, nil
}
type drmDevicesByPath struct {
None
logger *logrus.Logger
nvidiaCTKPath string
root string
devicesFrom Discover
}
// newCreateDRMByPathSymlinks creates a discoverer for a hook to create the by-path symlinks for DRM devices discovered by the specified devices discoverer
func newCreateDRMByPathSymlinks(logger *logrus.Logger, devices Discover, cfg *Config) Discover {
d := drmDevicesByPath{
logger: logger,
nvidiaCTKPath: FindNvidiaCTK(logger, cfg.NvidiaCTKPath),
root: cfg.Root,
devicesFrom: devices,
}
return &d
}
// Hooks returns a hook to create the symlinks from the required CSV files
func (d drmDevicesByPath) Hooks() ([]Hook, error) {
devices, err := d.devicesFrom.Devices()
if err != nil {
return nil, fmt.Errorf("failed to discover devices for by-path symlinks: %v", err)
}
if len(devices) == 0 {
return nil, nil
}
links, err := d.getSpecificLinkArgs(devices)
if err != nil {
return nil, fmt.Errorf("failed to determine specific links: %v", err)
}
if len(links) == 0 {
return nil, nil
}
var args []string
for _, l := range links {
args = append(args, "--link", l)
}
hook := CreateNvidiaCTKHook(
d.nvidiaCTKPath,
"create-symlinks",
args...,
)
return []Hook{hook}, nil
}
// getSpecificLinkArgs returns the required specic links that need to be created
func (d drmDevicesByPath) getSpecificLinkArgs(devices []Device) ([]string, error) {
selectedDevices := make(map[string]bool)
for _, d := range devices {
selectedDevices[filepath.Base(d.HostPath)] = true
}
linkLocator := lookup.NewFileLocator(
lookup.WithLogger(d.logger),
lookup.WithRoot(d.root),
)
candidates, err := linkLocator.Locate("/dev/dri/by-path/pci-*-*")
if err != nil {
d.logger.Warningf("Failed to locate by-path links: %v; ignoring", err)
return nil, nil
}
var links []string
for _, c := range candidates {
device, err := os.Readlink(c)
if err != nil {
d.logger.Warningf("Failed to evaluate symlink %v; ignoring", c)
continue
}
if selectedDevices[filepath.Base(device)] {
d.logger.Debugf("adding device symlink %v -> %v", c, device)
links = append(links, fmt.Sprintf("%v::%v", device, c))
}
}
return links, nil
}
// newDRMDeviceDiscoverer creates a discoverer for the DRM devices associated with the requested devices.
func newDRMDeviceDiscoverer(logger *logrus.Logger, devices image.VisibleDevices, root string) (Discover, error) {
allDevices := NewDeviceDiscoverer(
logger,
lookup.NewCharDeviceLocator(
lookup.WithLogger(logger),
lookup.WithRoot(root),
),
root,
[]string{
"/dev/dri/card*",
"/dev/dri/renderD*",
},
)
filter, err := newDRMDeviceFilter(logger, devices, root)
if err != nil {
return nil, fmt.Errorf("failed to construct DRM device filter: %v", err)
}
// We return a discoverer that applies the DRM device filter created above to all discovered DRM device nodes.
d := newFilteredDisoverer(
logger,
allDevices,
filter,
)
return d, err
}
// newDRMDeviceFilter creates a filter that matches DRM devices nodes for the visible devices.
func newDRMDeviceFilter(logger *logrus.Logger, devices image.VisibleDevices, root string) (Filter, error) {
gpuInformationPaths, err := proc.GetInformationFilePaths(root)
if err != nil {
return nil, fmt.Errorf("failed to read GPU information: %v", err)
}
var selectedBusIds []string
for _, f := range gpuInformationPaths {
info, err := proc.ParseGPUInformationFile(f)
if err != nil {
return nil, fmt.Errorf("failed to parse %v: %v", f, err)
}
uuid := info[proc.GPUInfoGPUUUID]
busID := info[proc.GPUInfoBusLocation]
minor := info[proc.GPUInfoDeviceMinor]
if devices.Has(minor) || devices.Has(uuid) || devices.Has(busID) {
selectedBusIds = append(selectedBusIds, busID)
}
}
filter := make(selectDeviceByPath)
for _, busID := range selectedBusIds {
drmDeviceNodes, err := drm.GetDeviceNodesByBusID(busID)
if err != nil {
return nil, fmt.Errorf("failed to determine DRM devices for %v: %v", busID, err)
}
for _, drmDeviceNode := range drmDeviceNodes {
filter[filepath.Join(drmDeviceNode)] = true
}
}
return filter, nil
}
// selectDeviceByPath is a filter that allows devices to be selected by the path
type selectDeviceByPath map[string]bool
var _ Filter = (*selectDeviceByPath)(nil)
// DeviceIsSelected determines whether the device's path has been selected
func (s selectDeviceByPath) DeviceIsSelected(device Device) bool {
return s[device.Path]
}
// MountIsSelected is always true
func (s selectDeviceByPath) MountIsSelected(Mount) bool {
return true
}
// HookIsSelected is always true
func (s selectDeviceByPath) HookIsSelected(Hook) bool {
return true
}

View File

@@ -0,0 +1,63 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import (
"path/filepath"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/container-orchestrated-devices/container-device-interface/pkg/cdi"
"github.com/sirupsen/logrus"
)
const (
nvidiaCTKExecutable = "nvidia-ctk"
nvidiaCTKDefaultFilePath = "/usr/bin/nvidia-ctk"
)
// CreateNvidiaCTKHook creates a hook which invokes the NVIDIA Container CLI hook subcommand.
func CreateNvidiaCTKHook(executable string, hookName string, additionalArgs ...string) Hook {
return Hook{
Lifecycle: cdi.CreateContainerHook,
Path: executable,
Args: append([]string{filepath.Base(executable), "hook", hookName}, additionalArgs...),
}
}
// FindNvidiaCTK locates the nvidia-ctk executable to be used in hooks.
// If an override is specified, this is used instead.
func FindNvidiaCTK(logger *logrus.Logger, override string) string {
if override != "" {
logger.Debugf("Using specified NVIDIA Container Toolkit CLI path %v", override)
return override
}
lookup := lookup.NewExecutableLocator(logger, "")
hookPath := nvidiaCTKDefaultFilePath
targets, err := lookup.Locate(nvidiaCTKExecutable)
if err != nil {
logger.Warnf("Failed to locate %v: %v", nvidiaCTKExecutable, err)
} else if len(targets) == 0 {
logger.Warnf("%v not found", nvidiaCTKExecutable)
} else {
logger.Debugf("Found %v candidates: %v", nvidiaCTKExecutable, targets)
hookPath = targets[0]
}
logger.Debugf("Using NVIDIA Container Toolkit CLI path %v", hookPath)
return hookPath
}

View File

@@ -19,36 +19,30 @@ package discover
import (
"fmt"
"path/filepath"
"sort"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/container-orchestrated-devices/container-device-interface/pkg/cdi"
"github.com/sirupsen/logrus"
)
// NewLDCacheUpdateHook creates a discoverer that updates the ldcache for the specified mounts. A logger can also be specified
func NewLDCacheUpdateHook(logger *logrus.Logger, mounts Discover, cfg *Config) (Discover, error) {
d := ldconfig{
logger: logger,
mountsFrom: mounts,
lookup: lookup.NewExecutableLocator(logger, cfg.Root),
nvidiaCTKExecutablePath: cfg.NVIDIAContainerToolkitCLIExecutablePath,
logger: logger,
mountsFrom: mounts,
lookup: lookup.NewExecutableLocator(logger, cfg.Root),
nvidiaCTKPath: cfg.NvidiaCTKPath,
}
return &d, nil
}
const (
nvidiaCTKDefaultFilePath = "/usr/bin/nvidia-ctk"
)
type ldconfig struct {
None
logger *logrus.Logger
mountsFrom Discover
lookup lookup.Locator
nvidiaCTKExecutablePath string
logger *logrus.Logger
mountsFrom Discover
lookup lookup.Locator
nvidiaCTKPath string
}
// Hooks checks the required mounts for libraries and returns a hook to update the LDcache for the discovered paths.
@@ -57,70 +51,75 @@ func (d ldconfig) Hooks() ([]Hook, error) {
if err != nil {
return nil, fmt.Errorf("failed to discover mounts for ldcache update: %v", err)
}
libDirs := getLibDirs(mounts)
hookPath := nvidiaCTKDefaultFilePath
targets, err := d.lookup.Locate(d.nvidiaCTKExecutablePath)
if err != nil {
d.logger.Warnf("Failed to locate %v: %v", d.nvidiaCTKExecutablePath, err)
} else if len(targets) == 0 {
d.logger.Warnf("%v not found", d.nvidiaCTKExecutablePath)
} else {
d.logger.Debugf("Found %v candidates: %v", d.nvidiaCTKExecutablePath, targets)
hookPath = targets[0]
}
d.logger.Debugf("Using NVIDIA Container Toolkit CLI path %v", hookPath)
args := []string{hookPath, "hook", "update-ldcache"}
for _, f := range libDirs {
args = append(args, "--folder", f)
}
h := Hook{
Lifecycle: cdi.CreateContainerHook,
Path: hookPath,
Args: args,
}
h := CreateLDCacheUpdateHook(
d.nvidiaCTKPath,
getLibraryPaths(mounts),
)
return []Hook{h}, nil
}
// getLibDirs extracts the library dirs from the specified mounts
func getLibDirs(mounts []Mount) []string {
var paths []string
checked := make(map[string]bool)
for _, m := range mounts {
dir := filepath.Dir(m.Path)
if dir == "" {
continue
}
_, exists := checked[dir]
if exists {
continue
}
checked[dir] = isLibName(filepath.Base(m.Path))
if checked[dir] {
paths = append(paths, dir)
}
// CreateLDCacheUpdateHook locates the NVIDIA Container Toolkit CLI and creates a hook for updating the LD Cache
func CreateLDCacheUpdateHook(executable string, libraries []string) Hook {
var args []string
for _, f := range uniqueFolders(libraries) {
args = append(args, "--folder", f)
}
sort.Strings(paths)
hook := CreateNvidiaCTKHook(
executable,
"update-ldcache",
args...,
)
return hook
}
// getLibraryPaths extracts the library dirs from the specified mounts
func getLibraryPaths(mounts []Mount) []string {
var paths []string
for _, m := range mounts {
if !isLibName(m.Path) {
continue
}
paths = append(paths, m.Path)
}
return paths
}
// isLibName checks if the specified filename is a library (i.e. ends in `.so*`)
func isLibName(filename string) bool {
parts := strings.Split(filename, ".")
for _, p := range parts {
if p == "so" {
return true
}
base := filepath.Base(filename)
isLib, err := filepath.Match("lib?*.so*", base)
if !isLib || err != nil {
return false
}
return false
parts := strings.Split(base, ".so")
if len(parts) == 1 {
return true
}
return parts[len(parts)-1] == "" || strings.HasPrefix(parts[len(parts)-1], ".")
}
// uniqueFolders returns the unique set of folders for the specified files
func uniqueFolders(libraries []string) []string {
var paths []string
checked := make(map[string]bool)
for _, l := range libraries {
dir := filepath.Dir(l)
if dir == "" {
continue
}
if checked[dir] {
continue
}
checked[dir] = true
paths = append(paths, dir)
}
return paths
}

View File

@@ -0,0 +1,168 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import (
"fmt"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
)
const (
testNvidiaCTKPath = "/foo/bar/nvidia-ctk"
)
func TestLDCacheUpdateHook(t *testing.T) {
logger, _ := testlog.NewNullLogger()
cfg := Config{
Root: "/",
NvidiaCTKPath: testNvidiaCTKPath,
}
testCases := []struct {
description string
mounts []Mount
mountError error
expectedError error
expectedArgs []string
}{
{
description: "empty mounts",
expectedArgs: []string{"nvidia-ctk", "hook", "update-ldcache"},
},
{
description: "mount error",
mountError: fmt.Errorf("mountError"),
expectedError: fmt.Errorf("mountError"),
},
{
description: "library folders are added to args",
mounts: []Mount{
{
Path: "/usr/local/lib/libfoo.so",
},
{
Path: "/usr/bin/notlib",
},
{
Path: "/usr/local/libother/libfoo.so",
},
{
Path: "/usr/local/lib/libbar.so",
},
},
expectedArgs: []string{"nvidia-ctk", "hook", "update-ldcache", "--folder", "/usr/local/lib", "--folder", "/usr/local/libother"},
},
{
description: "host paths are ignored",
mounts: []Mount{
{
HostPath: "/usr/local/other/libfoo.so",
Path: "/usr/local/lib/libfoo.so",
},
},
expectedArgs: []string{"nvidia-ctk", "hook", "update-ldcache", "--folder", "/usr/local/lib"},
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
mountMock := &DiscoverMock{
MountsFunc: func() ([]Mount, error) {
return tc.mounts, tc.mountError
},
}
expectedHook := Hook{
Path: testNvidiaCTKPath,
Args: tc.expectedArgs,
Lifecycle: "createContainer",
}
d, err := NewLDCacheUpdateHook(logger, mountMock, &cfg)
require.NoError(t, err)
hooks, err := d.Hooks()
require.Len(t, mountMock.MountsCalls(), 1)
require.Len(t, mountMock.DevicesCalls(), 0)
require.Len(t, mountMock.HooksCalls(), 0)
if tc.expectedError != nil {
require.Error(t, err)
return
}
require.NoError(t, err)
require.Len(t, hooks, 1)
require.EqualValues(t, hooks[0], expectedHook)
devices, err := d.Devices()
require.NoError(t, err)
require.Empty(t, devices)
mounts, err := d.Mounts()
require.NoError(t, err)
require.Empty(t, mounts)
})
}
}
func TestIsLibName(t *testing.T) {
testCases := []struct {
name string
isLib bool
}{
{
name: "",
isLib: false,
},
{
name: "lib/not/.so",
isLib: false,
},
{
name: "lib.so",
isLib: false,
},
{
name: "notlibcuda.so",
isLib: false,
},
{
name: "libcuda.so",
isLib: true,
},
{
name: "libcuda.so.1",
isLib: true,
},
{
name: "libcuda.soNOT",
isLib: false,
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
require.Equal(t, tc.isLib, isLibName(tc.name))
})
}
}

View File

@@ -27,8 +27,8 @@ type list struct {
var _ Discover = (*list)(nil)
// NewList creates a discoverer that is the composite of a list of discoveres.
func NewList(d ...Discover) Discover {
// Merge creates a discoverer that is the composite of a list of discoveres.
func Merge(d ...Discover) Discover {
l := list{
discoverers: d,
}

View File

@@ -0,0 +1,35 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import (
"github.com/sirupsen/logrus"
)
// NewMOFEDDiscoverer creates a discoverer for MOFED devices.
func NewMOFEDDiscoverer(logger *logrus.Logger, root string) (Discover, error) {
devices := NewCharDeviceDiscoverer(
logger,
[]string{
"/dev/infiniband/uverbs*",
"/dev/infiniband/rdma_cm",
},
root,
)
return devices, nil
}

View File

@@ -18,6 +18,8 @@ package discover
import (
"fmt"
"path/filepath"
"strings"
"sync"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
@@ -31,6 +33,7 @@ type mounts struct {
None
logger *logrus.Logger
lookup lookup.Locator
root string
required []string
sync.Mutex
cache []Mount
@@ -38,6 +41,16 @@ type mounts struct {
var _ Discover = (*mounts)(nil)
// NewMounts creates a discoverer for the required mounts using the specified locator.
func NewMounts(logger *logrus.Logger, lookup lookup.Locator, root string, required []string) Discover {
return &mounts{
logger: logger,
lookup: lookup,
root: filepath.Join("/", root),
required: required,
}
}
func (d *mounts) Mounts() ([]Mount, error) {
if d.lookup == nil {
return nil, fmt.Errorf("no lookup defined")
@@ -51,7 +64,7 @@ func (d *mounts) Mounts() ([]Mount, error) {
d.Lock()
defer d.Unlock()
paths := make(map[string]bool)
uniqueMounts := make(map[string]Mount)
for _, candidate := range d.required {
d.logger.Debugf("Locating %v", candidate)
@@ -66,20 +79,39 @@ func (d *mounts) Mounts() ([]Mount, error) {
}
d.logger.Debugf("Located %v as %v", candidate, located)
for _, p := range located {
paths[p] = true
if _, ok := uniqueMounts[p]; ok {
d.logger.Debugf("Skipping duplicate mount %v", p)
continue
}
r := d.relativeTo(p)
if r == "" {
r = p
}
d.logger.Infof("Selecting %v as %v", p, r)
uniqueMounts[p] = Mount{
HostPath: p,
Path: r,
}
}
}
var mounts []Mount
for path := range paths {
d.logger.Infof("Selecting %v", path)
mount := Mount{
Path: path,
}
mounts = append(mounts, mount)
for _, m := range uniqueMounts {
mounts = append(mounts, m)
}
d.cache = mounts
return mounts, nil
return d.cache, nil
}
// relativeTo returns the path relative to the root for the file locator
func (d *mounts) relativeTo(path string) string {
if d.root == "/" {
return path
}
return strings.TrimPrefix(path, d.root)
}

View File

@@ -70,7 +70,7 @@ func TestMounts(t *testing.T) {
},
required: []string{"required"},
},
expectedMounts: []Mount{{Path: "located"}},
expectedMounts: []Mount{{Path: "located", HostPath: "located"}},
},
{
description: "mounts removes located duplicates",
@@ -83,7 +83,7 @@ func TestMounts(t *testing.T) {
},
required: []string{"required0", "required1"},
},
expectedMounts: []Mount{{Path: "located"}},
expectedMounts: []Mount{{Path: "located", HostPath: "located"}},
},
{
description: "mounts skips located errors",
@@ -98,7 +98,7 @@ func TestMounts(t *testing.T) {
},
required: []string{"required0", "error", "required1"},
},
expectedMounts: []Mount{{Path: "required0"}, {Path: "required1"}},
expectedMounts: []Mount{{Path: "required0", HostPath: "required0"}, {Path: "required1", HostPath: "required1"}},
},
{
description: "mounts skips unlocated",
@@ -113,10 +113,10 @@ func TestMounts(t *testing.T) {
},
required: []string{"required0", "empty", "required1"},
},
expectedMounts: []Mount{{Path: "required0"}, {Path: "required1"}},
expectedMounts: []Mount{{Path: "required0", HostPath: "required0"}, {Path: "required1", HostPath: "required1"}},
},
{
description: "mounts skips unlocated",
description: "mounts adds multiple",
input: &mounts{
lookup: &lookup.LocatorMock{
LocateFunc: func(s string) ([]string, error) {
@@ -129,10 +129,25 @@ func TestMounts(t *testing.T) {
required: []string{"required0", "multiple", "required1"},
},
expectedMounts: []Mount{
{Path: "required0"},
{Path: "multiple0"},
{Path: "multiple1"},
{Path: "required1"},
{Path: "required0", HostPath: "required0"},
{Path: "multiple0", HostPath: "multiple0"},
{Path: "multiple1", HostPath: "multiple1"},
{Path: "required1", HostPath: "required1"},
},
},
{
description: "mounts uses relative path",
input: &mounts{
lookup: &lookup.LocatorMock{
LocateFunc: func(s string) ([]string, error) {
return []string{"/some/root/located"}, nil
},
},
root: "/some/root",
required: []string{"required0", "multiple", "required1"},
},
expectedMounts: []Mount{
{Path: "/located", HostPath: "/some/root/located"},
},
},
}

View File

@@ -21,28 +21,24 @@ import (
"path/filepath"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/container-orchestrated-devices/container-device-interface/pkg/cdi"
"github.com/sirupsen/logrus"
)
type symlinks struct {
None
logger *logrus.Logger
lookup lookup.Locator
nvidiaCTKExecutablePath string
csvFiles []string
mountsFrom Discover
logger *logrus.Logger
nvidiaCTKPath string
csvFiles []string
mountsFrom Discover
}
// NewCreateSymlinksHook creates a discoverer for a hook that creates required symlinks in the container
func NewCreateSymlinksHook(logger *logrus.Logger, csvFiles []string, mounts Discover, cfg *Config) (Discover, error) {
d := symlinks{
logger: logger,
lookup: lookup.NewExecutableLocator(logger, cfg.Root),
nvidiaCTKExecutablePath: cfg.NVIDIAContainerToolkitCLIExecutablePath,
csvFiles: csvFiles,
mountsFrom: mounts,
logger: logger,
nvidiaCTKPath: FindNvidiaCTK(logger, cfg.NvidiaCTKPath),
csvFiles: csvFiles,
mountsFrom: mounts,
}
return &d, nil
@@ -50,19 +46,7 @@ func NewCreateSymlinksHook(logger *logrus.Logger, csvFiles []string, mounts Disc
// Hooks returns a hook to create the symlinks from the required CSV files
func (d symlinks) Hooks() ([]Hook, error) {
hookPath := nvidiaCTKDefaultFilePath
targets, err := d.lookup.Locate(d.nvidiaCTKExecutablePath)
if err != nil {
d.logger.Warnf("Failed to locate %v: %v", d.nvidiaCTKExecutablePath, err)
} else if len(targets) == 0 {
d.logger.Warnf("%v not found", d.nvidiaCTKExecutablePath)
} else {
d.logger.Debugf("Found %v candidates: %v", d.nvidiaCTKExecutablePath, targets)
hookPath = targets[0]
}
d.logger.Debugf("Using NVIDIA Container Toolkit CLI path %v", hookPath)
args := []string{hookPath, "hook", "create-symlinks"}
var args []string
for _, f := range d.csvFiles {
args = append(args, "--csv-filename", f)
}
@@ -73,13 +57,13 @@ func (d symlinks) Hooks() ([]Hook, error) {
}
args = append(args, links...)
h := Hook{
Lifecycle: cdi.CreateContainerHook,
Path: hookPath,
Args: args,
}
hook := CreateNvidiaCTKHook(
d.nvidiaCTKPath,
"create-symlinks",
args...,
)
return []Hook{h}, nil
return []Hook{hook}, nil
}
// getSpecificLinkArgs returns the required specic links that need to be created
@@ -117,7 +101,7 @@ func (d symlinks) getSpecificLinkArgs() ([]string, error) {
}
linkPath := filepath.Join(filepath.Dir(m.Path), link)
links = append(links, "--link", fmt.Sprintf("%v:%v", target, linkPath))
links = append(links, "--link", fmt.Sprintf("%v::%v", target, linkPath))
linkProcessed[link] = true
}

View File

@@ -25,21 +25,36 @@ import (
type device discover.Device
// toEdits converts a discovered device to CDI Container Edits.
func (d device) toEdits() *cdi.ContainerEdits {
func (d device) toEdits() (*cdi.ContainerEdits, error) {
deviceNode, err := d.toSpec()
if err != nil {
return nil, err
}
e := cdi.ContainerEdits{
ContainerEdits: &specs.ContainerEdits{
DeviceNodes: []*specs.DeviceNode{d.toSpec()},
DeviceNodes: []*specs.DeviceNode{deviceNode},
},
}
return &e
return &e, nil
}
// toSpec converts a discovered Device to a CDI Spec Device. Note
// that missing info is filled in when edits are applied by querying the Device node.
func (d device) toSpec() *specs.DeviceNode {
func (d device) toSpec() (*specs.DeviceNode, error) {
// The HostPath field was added in the v0.5.0 CDI specification.
// The cdi package uses strict unmarshalling when loading specs from file causing failures for
// unexpected fields.
// Since the behaviour for HostPath == "" and HostPath == Path are equivalent, we clear HostPath
// if it is equal to Path to ensure compatibility with the widest range of specs.
hostPath := d.HostPath
if hostPath == d.Path {
hostPath = ""
}
s := specs.DeviceNode{
Path: d.Path,
HostPath: hostPath,
Path: d.Path,
}
return &s
return &s, nil
}

View File

@@ -0,0 +1,69 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package edits
import (
"fmt"
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/container-orchestrated-devices/container-device-interface/specs-go"
"github.com/stretchr/testify/require"
)
func TestDeviceToSpec(t *testing.T) {
testCases := []struct {
device discover.Device
expected *specs.DeviceNode
}{
{
device: discover.Device{
Path: "/foo",
},
expected: &specs.DeviceNode{
Path: "/foo",
},
},
{
device: discover.Device{
Path: "/foo",
HostPath: "/foo",
},
expected: &specs.DeviceNode{
Path: "/foo",
},
},
{
device: discover.Device{
Path: "/foo",
HostPath: "/not/foo",
},
expected: &specs.DeviceNode{
Path: "/foo",
HostPath: "/not/foo",
},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
spec, err := device(tc.device).toSpec()
require.NoError(t, err)
require.EqualValues(t, tc.expected, spec)
})
}
}

View File

@@ -22,6 +22,7 @@ import (
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/container-orchestrated-devices/container-device-interface/pkg/cdi"
"github.com/container-orchestrated-devices/container-device-interface/specs-go"
ociSpecs "github.com/opencontainers/runtime-spec/specs-go"
"github.com/sirupsen/logrus"
)
@@ -34,6 +35,20 @@ type edits struct {
// NewSpecEdits creates a SpecModifier that defines the required OCI spec edits (as CDI ContainerEdits) from the specified
// discoverer.
func NewSpecEdits(logger *logrus.Logger, d discover.Discover) (oci.SpecModifier, error) {
c, err := FromDiscoverer(d)
if err != nil {
return nil, fmt.Errorf("error constructing container edits: %v", err)
}
e := edits{
ContainerEdits: *c,
logger: logger,
}
return &e, nil
}
// FromDiscoverer creates CDI container edits for the specified discoverer.
func FromDiscoverer(d discover.Discover) (*cdi.ContainerEdits, error) {
devices, err := d.Devices()
if err != nil {
return nil, fmt.Errorf("failed to discover devices: %v", err)
@@ -49,9 +64,13 @@ func NewSpecEdits(logger *logrus.Logger, d discover.Discover) (oci.SpecModifier,
return nil, fmt.Errorf("failed to discover hooks: %v", err)
}
c := cdi.ContainerEdits{}
c := NewContainerEdits()
for _, d := range devices {
c.Append(device(d).toEdits())
edits, err := device(d).toEdits()
if err != nil {
return nil, fmt.Errorf("failed to created container edits for device: %v", err)
}
c.Append(edits)
}
for _, m := range mounts {
@@ -62,12 +81,15 @@ func NewSpecEdits(logger *logrus.Logger, d discover.Discover) (oci.SpecModifier,
c.Append(hook(h).toEdits())
}
e := edits{
ContainerEdits: c,
logger: logger,
}
return c, nil
}
return &e, nil
// NewContainerEdits is a utility function to create a CDI ContainerEdits struct.
func NewContainerEdits() *cdi.ContainerEdits {
c := cdi.ContainerEdits{
ContainerEdits: &specs.ContainerEdits{},
}
return &c
}
// Modify applies the defined edits to the incoming OCI spec

View File

@@ -0,0 +1,31 @@
/**
# Copyright (c) NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package edits
import (
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/stretchr/testify/require"
)
func TestFromDiscovererAllowsMountsToIterate(t *testing.T) {
edits, err := FromDiscoverer(discover.None{})
require.NoError(t, err)
require.Empty(t, edits.Mounts)
}

View File

@@ -38,8 +38,7 @@ func (d mount) toEdits() *cdi.ContainerEdits {
// that missing info is filled in when edits are applied by querying the Mount node.
func (d mount) toSpec() *specs.Mount {
s := specs.Mount{
HostPath: d.Path,
// TODO: We need to allow the container path to be customised
HostPath: d.HostPath,
ContainerPath: d.Path,
Options: []string{
"ro",

View File

@@ -16,6 +16,8 @@
package info
import "gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvlib/info"
// Logger is a basic interface for logging to allow these functions to be called
// from code where logrus is not used.
type Logger interface {
@@ -32,10 +34,12 @@ func ResolveAutoMode(logger Logger, mode string) (rmode string) {
logger.Infof("Auto-detected mode as '%v'", rmode)
}()
isTegra, reason := IsTegraSystem()
nvinfo := info.New()
isTegra, reason := nvinfo.IsTegraSystem()
logger.Debugf("Is Tegra-based system? %v: %v", isTegra, reason)
hasNVML, reason := HasNVML()
hasNVML, reason := nvinfo.HasNvml()
logger.Debugf("Has NVML? %v: %v", hasNVML, reason)
if isTegra && !hasNVML {

View File

@@ -0,0 +1,39 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package drm
import (
"fmt"
"path/filepath"
)
// GetDeviceNodesByBusID returns the DRM devices associated with the specified PCI bus ID
func GetDeviceNodesByBusID(busID string) ([]string, error) {
drmRoot := filepath.Join("/sys/bus/pci/devices", busID, "drm")
matches, err := filepath.Glob(fmt.Sprintf("%s/*", drmRoot))
if err != nil {
return nil, err
}
var drmDeviceNodes []string
for _, m := range matches {
drmDeviceNode := filepath.Join("/dev/dri", filepath.Base(m))
drmDeviceNodes = append(drmDeviceNodes, drmDeviceNode)
}
return drmDeviceNodes, nil
}

View File

@@ -0,0 +1,125 @@
/*
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package devices
import (
"bufio"
"fmt"
"io"
"os"
"strings"
)
// Device major numbers and device names for NVIDIA devices
const (
NVIDIAUVMMinor = 0
NVIDIAUVMToolsMinor = 1
NVIDIACTLMinor = 255
NVIDIAModesetMinor = 254
NVIDIAFrontend = Name("nvidia-frontend")
NVIDIAGPU = NVIDIAFrontend
NVIDIACaps = Name("nvidia-caps")
NVIDIAUVM = Name("nvidia-uvm")
procDevicesPath = "/proc/devices"
nvidiaDevicePrefix = "nvidia"
)
// Name represents the name of a device as specified under /proc/devices
type Name string
// Major represents a device major as specified under /proc/devices
type Major int
// Devices represents the set of devices under /proc/devices
//
//go:generate moq -stub -out devices_mock.go . Devices
type Devices interface {
Exists(Name) bool
Get(Name) (Major, bool)
}
type devices map[Name]Major
var _ Devices = devices(nil)
// Exists checks if a Device with a given name exists or not
func (d devices) Exists(name Name) bool {
_, exists := d[name]
return exists
}
// Get a Device from Devices
func (d devices) Get(name Name) (Major, bool) {
device, exists := d[name]
return device, exists
}
// GetNVIDIADevices returns the set of NVIDIA Devices on the machine
func GetNVIDIADevices() (Devices, error) {
devicesFile, err := os.Open(procDevicesPath)
if os.IsNotExist(err) {
return nil, nil
}
if err != nil {
return nil, fmt.Errorf("error opening devices file: %v", err)
}
defer devicesFile.Close()
return nvidiaDeviceFrom(devicesFile), nil
}
func nvidiaDeviceFrom(reader io.Reader) devices {
allDevices := devicesFrom(reader)
nvidiaDevices := make(devices)
for n, d := range allDevices {
if !strings.HasPrefix(string(n), nvidiaDevicePrefix) {
continue
}
nvidiaDevices[n] = d
}
return nvidiaDevices
}
func devicesFrom(reader io.Reader) devices {
allDevices := make(devices)
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
device, major, err := processProcDeviceLine(scanner.Text())
if err != nil {
continue
}
allDevices[device] = major
}
return allDevices
}
func processProcDeviceLine(line string) (Name, Major, error) {
trimmed := strings.TrimSpace(line)
var name Name
var major Major
n, _ := fmt.Sscanf(trimmed, "%d %s", &major, &name)
if n == 2 {
return name, major, nil
}
return "", 0, fmt.Errorf("unparsable line: %v", line)
}

View File

@@ -0,0 +1,125 @@
// Code generated by moq; DO NOT EDIT.
// github.com/matryer/moq
package devices
import (
"sync"
)
// Ensure, that DevicesMock does implement Devices.
// If this is not the case, regenerate this file with moq.
var _ Devices = &DevicesMock{}
// DevicesMock is a mock implementation of Devices.
//
// func TestSomethingThatUsesDevices(t *testing.T) {
//
// // make and configure a mocked Devices
// mockedDevices := &DevicesMock{
// ExistsFunc: func(name Name) bool {
// panic("mock out the Exists method")
// },
// GetFunc: func(name Name) (Major, bool) {
// panic("mock out the Get method")
// },
// }
//
// // use mockedDevices in code that requires Devices
// // and then make assertions.
//
// }
type DevicesMock struct {
// ExistsFunc mocks the Exists method.
ExistsFunc func(name Name) bool
// GetFunc mocks the Get method.
GetFunc func(name Name) (Major, bool)
// calls tracks calls to the methods.
calls struct {
// Exists holds details about calls to the Exists method.
Exists []struct {
// Name is the name argument value.
Name Name
}
// Get holds details about calls to the Get method.
Get []struct {
// Name is the name argument value.
Name Name
}
}
lockExists sync.RWMutex
lockGet sync.RWMutex
}
// Exists calls ExistsFunc.
func (mock *DevicesMock) Exists(name Name) bool {
callInfo := struct {
Name Name
}{
Name: name,
}
mock.lockExists.Lock()
mock.calls.Exists = append(mock.calls.Exists, callInfo)
mock.lockExists.Unlock()
if mock.ExistsFunc == nil {
var (
bOut bool
)
return bOut
}
return mock.ExistsFunc(name)
}
// ExistsCalls gets all the calls that were made to Exists.
// Check the length with:
//
// len(mockedDevices.ExistsCalls())
func (mock *DevicesMock) ExistsCalls() []struct {
Name Name
} {
var calls []struct {
Name Name
}
mock.lockExists.RLock()
calls = mock.calls.Exists
mock.lockExists.RUnlock()
return calls
}
// Get calls GetFunc.
func (mock *DevicesMock) Get(name Name) (Major, bool) {
callInfo := struct {
Name Name
}{
Name: name,
}
mock.lockGet.Lock()
mock.calls.Get = append(mock.calls.Get, callInfo)
mock.lockGet.Unlock()
if mock.GetFunc == nil {
var (
majorOut Major
bOut bool
)
return majorOut, bOut
}
return mock.GetFunc(name)
}
// GetCalls gets all the calls that were made to Get.
// Check the length with:
//
// len(mockedDevices.GetCalls())
func (mock *DevicesMock) GetCalls() []struct {
Name Name
} {
var calls []struct {
Name Name
}
mock.lockGet.RLock()
calls = mock.calls.Get
mock.lockGet.RUnlock()
return calls
}

View File

@@ -0,0 +1,102 @@
/*
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package devices
import (
"fmt"
"strings"
"testing"
"github.com/stretchr/testify/require"
)
func TestNvidiaDevices(t *testing.T) {
devices := map[Name]Major{
"nvidia-frontend": 195,
"nvidia-nvlink": 234,
"nvidia-caps": 235,
"nvidia-uvm": 510,
"nvidia-nvswitch": 511,
}
nvidiaDevices := testDevices(devices)
for name, major := range devices {
device, exists := nvidiaDevices.Get(name)
require.True(t, exists, "Unexpected missing device")
require.Equal(t, device, major, "Unexpected device major")
}
_, exists := nvidiaDevices.Get("bogus")
require.False(t, exists, "Unexpected 'bogus' device found")
}
func TestProcessDeviceFile(t *testing.T) {
testCases := []struct {
lines []string
expected devices
}{
{[]string{}, make(devices)},
{[]string{"Not a valid line:"}, make(devices)},
{[]string{"195 nvidia-frontend"}, devices{"nvidia-frontend": 195}},
{[]string{"195 nvidia-frontend", "235 nvidia-caps"}, devices{"nvidia-frontend": 195, "nvidia-caps": 235}},
{[]string{" 195 nvidia-frontend"}, devices{"nvidia-frontend": 195}},
{[]string{"Not a valid line:", "", "195 nvidia-frontend"}, devices{"nvidia-frontend": 195}},
{[]string{"195 not-nvidia-frontend"}, make(devices)},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("testcase %d", i), func(t *testing.T) {
contents := strings.NewReader(strings.Join(tc.lines, "\n"))
d := nvidiaDeviceFrom(contents)
require.EqualValues(t, tc.expected, d)
})
}
}
func TestProcessDeviceFileLine(t *testing.T) {
testCases := []struct {
line string
name Name
major Major
err bool
}{
{"", "", 0, true},
{"0", "", 0, true},
{"notint nvidia-frontend", "", 0, true},
{"195 nvidia-frontend", "nvidia-frontend", 195, false},
{" 195 nvidia-frontend", "nvidia-frontend", 195, false},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("testcase %d", i), func(t *testing.T) {
name, major, err := processProcDeviceLine(tc.line)
require.Equal(t, tc.name, name)
require.Equal(t, tc.major, major)
if tc.err {
require.Error(t, err)
} else {
require.NoError(t, err)
}
})
}
}
// testDevices creates a set of test NVIDIA devices
func testDevices(d map[Name]Major) Devices {
return devices(d)
}

View File

@@ -0,0 +1,89 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package proc
import (
"bufio"
"fmt"
"io"
"os"
"path/filepath"
"strings"
)
// GPUInfoField represents the field name for information specified in a GPU's information file
type GPUInfoField string
// The following constants define the fields of interest from the GPU information file
const (
GPUInfoModel = GPUInfoField("Model")
GPUInfoGPUUUID = GPUInfoField("GPU UUID")
GPUInfoBusLocation = GPUInfoField("Bus Location")
GPUInfoDeviceMinor = GPUInfoField("Device Minor")
)
// GPUInfo stores the information for a GPU as determined from its associated information file
type GPUInfo map[GPUInfoField]string
// GetInformationFilePaths returns the list of information files associated with NVIDIA GPUs.
func GetInformationFilePaths(root string) ([]string, error) {
return filepath.Glob(filepath.Join(root, "/proc/driver/nvidia/gpus/*/information"))
}
// ParseGPUInformationFile parses the specified GPU information file and constructs a GPUInfo structure
func ParseGPUInformationFile(path string) (GPUInfo, error) {
infoFile, err := os.Open(path)
if err != nil {
return nil, fmt.Errorf("failed to open %v: %v", path, err)
}
defer infoFile.Close()
return gpuInfoFrom(infoFile), nil
}
// gpuInfoFrom parses a GPUInfo struct from the specified reader
// An information file has the following strucutre:
// $ cat /proc/driver/nvidia/gpus/0000\:06\:00.0/information
// Model: Tesla V100-SXM2-16GB
// IRQ: 408
// GPU UUID: GPU-edfee158-11c1-52b8-0517-92f30e7fac88
// Video BIOS: 88.00.41.00.01
// Bus Type: PCIe
// DMA Size: 47 bits
// DMA Mask: 0x7fffffffffff
// Bus Location: 0000:06:00.0
// Device Minor: 0
// GPU Excluded: No
func gpuInfoFrom(reader io.Reader) GPUInfo {
info := make(GPUInfo)
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
line := scanner.Text()
parts := strings.SplitN(line, ":", 2)
if len(parts) != 2 {
continue
}
field := GPUInfoField(parts[0])
value := strings.TrimSpace(parts[1])
info[field] = value
}
return info
}

326
internal/ldcache/ldcache.go Normal file
View File

@@ -0,0 +1,326 @@
/*
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
// Adapted from https://github.com/rai-project/ldcache
package ldcache
import (
"bytes"
"encoding/binary"
"errors"
"fmt"
"os"
"path/filepath"
"strings"
"syscall"
"unsafe"
log "github.com/sirupsen/logrus"
)
const ldcachePath = "/etc/ld.so.cache"
const (
magicString1 = "ld.so-1.7.0"
magicString2 = "glibc-ld.so.cache"
magicVersion = "1.1"
)
const (
flagTypeMask = 0x00ff
flagTypeELF = 0x0001
flagArchMask = 0xff00
flagArchI386 = 0x0000
flagArchX8664 = 0x0300
flagArchX32 = 0x0800
flagArchPpc64le = 0x0500
)
var errInvalidCache = errors.New("invalid ld.so.cache file")
type header1 struct {
Magic [len(magicString1) + 1]byte // include null delimiter
NLibs uint32
}
type entry1 struct {
Flags int32
Key, Value uint32
}
type header2 struct {
Magic [len(magicString2)]byte
Version [len(magicVersion)]byte
NLibs uint32
TableSize uint32
_ [3]uint32 // unused
_ uint64 // force 8 byte alignment
}
type entry2 struct {
Flags int32
Key, Value uint32
OSVersion uint32
HWCap uint64
}
// LDCache represents the interface for performing lookups into the LDCache
type LDCache interface {
List() ([]string, []string)
Lookup(...string) ([]string, []string)
}
type ldcache struct {
*bytes.Reader
data, libs []byte
header header2
entries []entry2
root string
logger *log.Logger
}
// New creates a new LDCache with the specified logger and root.
func New(logger *log.Logger, root string) (LDCache, error) {
path := filepath.Join(root, ldcachePath)
logger.Debugf("Opening ld.conf at %v", path)
f, err := os.Open(path)
if err != nil {
return nil, err
}
defer f.Close()
fi, err := f.Stat()
if err != nil {
return nil, err
}
d, err := syscall.Mmap(int(f.Fd()), 0, int(fi.Size()),
syscall.PROT_READ, syscall.MAP_PRIVATE)
if err != nil {
return nil, err
}
cache := &ldcache{
data: d,
Reader: bytes.NewReader(d),
root: root,
logger: logger,
}
return cache, cache.parse()
}
func (c *ldcache) Close() error {
return syscall.Munmap(c.data)
}
func (c *ldcache) Magic() string {
return string(c.header.Magic[:])
}
func (c *ldcache) Version() string {
return string(c.header.Version[:])
}
func strn(b []byte, n int) string {
return string(b[:n])
}
func (c *ldcache) parse() error {
var header header1
// Check for the old format (< glibc-2.2)
if c.Len() <= int(unsafe.Sizeof(header)) {
return errInvalidCache
}
if strn(c.data, len(magicString1)) == magicString1 {
if err := binary.Read(c, binary.LittleEndian, &header); err != nil {
return err
}
n := int64(header.NLibs) * int64(unsafe.Sizeof(entry1{}))
offset, err := c.Seek(n, 1) // skip old entries
if err != nil {
return err
}
n = (-offset) & int64(unsafe.Alignof(c.header)-1)
_, err = c.Seek(n, 1) // skip padding
if err != nil {
return err
}
}
c.libs = c.data[c.Size()-int64(c.Len()):] // kv offsets start here
if err := binary.Read(c, binary.LittleEndian, &c.header); err != nil {
return err
}
if c.Magic() != magicString2 || c.Version() != magicVersion {
return errInvalidCache
}
c.entries = make([]entry2, c.header.NLibs)
if err := binary.Read(c, binary.LittleEndian, &c.entries); err != nil {
return err
}
return nil
}
type entry struct {
libname string
bits int
value string
}
// getEntries returns the entires of the ldcache in a go-friendly struct.
func (c *ldcache) getEntries(selected func(string) bool) []entry {
var entries []entry
for _, e := range c.entries {
bits := 0
if ((e.Flags & flagTypeMask) & flagTypeELF) == 0 {
continue
}
switch e.Flags & flagArchMask {
case flagArchX8664:
fallthrough
case flagArchPpc64le:
bits = 64
case flagArchX32:
fallthrough
case flagArchI386:
bits = 32
default:
continue
}
if e.Key > uint32(len(c.libs)) || e.Value > uint32(len(c.libs)) {
continue
}
lib := bytesToString(c.libs[e.Key:])
if lib == "" {
c.logger.Debugf("Skipping invalid lib")
continue
}
if !selected(lib) {
continue
}
value := bytesToString(c.libs[e.Value:])
if value == "" {
c.logger.Debugf("Skipping invalid value for lib %v", lib)
continue
}
e := entry{
libname: lib,
bits: bits,
value: value,
}
entries = append(entries, e)
}
return entries
}
// List creates a list of libraires in the ldcache.
// The 32-bit and 64-bit libraries are returned separately.
func (c *ldcache) List() ([]string, []string) {
all := func(s string) bool { return true }
return c.resolveSelected(all)
}
// Lookup searches the ldcache for the specified prefixes.
// The 32-bit and 64-bit libraries matching the prefixes are returned.
func (c *ldcache) Lookup(libPrefixes ...string) ([]string, []string) {
c.logger.Debugf("Looking up %v in cache", libPrefixes)
// We define a functor to check whether a given library name matches any of the prefixes
matchesAnyPrefix := func(s string) bool {
for _, p := range libPrefixes {
if strings.HasPrefix(s, p) {
return true
}
}
return false
}
return c.resolveSelected(matchesAnyPrefix)
}
// resolveSelected process the entries in the LDCach based on the supplied filter and returns the resolved paths.
// The paths are separated by bittage.
func (c *ldcache) resolveSelected(selected func(string) bool) ([]string, []string) {
paths := make(map[int][]string)
processed := make(map[string]bool)
for _, e := range c.getEntries(selected) {
path, err := c.resolve(e.value)
if err != nil {
c.logger.Debugf("Could not resolve entry: %v", err)
continue
}
if processed[path] {
continue
}
paths[e.bits] = append(paths[e.bits], path)
processed[path] = true
}
return paths[32], paths[64]
}
// resolve resolves the specified ldcache entry based on the value being processed.
// The input is the name of the entry in the cache.
func (c *ldcache) resolve(target string) (string, error) {
name := filepath.Join(c.root, target)
c.logger.Debugf("checking %v", string(name))
info, err := os.Lstat(name)
if err != nil {
return "", fmt.Errorf("failed to get file info: %v", info)
}
if info.Mode()&os.ModeSymlink == 0 {
c.logger.Debugf("Resolved regular file: %v", name)
return name, nil
}
link, err := os.Readlink(name)
if err != nil {
return "", fmt.Errorf("failed to resolve symlink: %v", err)
}
// We return absolute paths for all targets
if !filepath.IsAbs(link) || strings.HasPrefix(link, ".") {
link = filepath.Join(filepath.Dir(target), link)
}
// Ensure that the returned path is relative to the root.
link = filepath.Join(c.root, link)
c.logger.Debugf("Resolved link: '%v' => '%v'", name, link)
return link, nil
}
// bytesToString converts a byte slice to a string.
// This assumes that the byte slice is null-terminated
func bytesToString(value []byte) string {
n := bytes.IndexByte(value, 0)
if n < 0 {
return ""
}
return strn(value, n)
}

View File

@@ -19,9 +19,6 @@ package lookup
import (
"fmt"
"os"
"path/filepath"
"github.com/sirupsen/logrus"
)
const (
@@ -30,23 +27,23 @@ const (
// NewCharDeviceLocator creates a Locator that can be used to find char devices at the specified root. A logger is
// also specified.
func NewCharDeviceLocator(logger *logrus.Logger, root string) Locator {
l := file{
logger: logger,
prefixes: []string{root, filepath.Join(root, devRoot)},
filter: assertCharDevice,
}
return &l
func NewCharDeviceLocator(opts ...Option) Locator {
opts = append(opts,
WithSearchPaths("", devRoot),
WithFilter(assertCharDevice),
)
return NewFileLocator(
opts...,
)
}
// assertCharDevice checks whether the specified path is a char device and returns an error if this is not the case.
func assertCharDevice(filename string) error {
info, err := os.Stat(filename)
info, err := os.Lstat(filename)
if err != nil {
return fmt.Errorf("error getting info: %v", err)
}
if info.Mode()|os.ModeCharDevice == 0 {
if info.Mode()&os.ModeCharDevice == 0 {
return fmt.Errorf("%v is not a char device", filename)
}
return nil

View File

@@ -0,0 +1,58 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package lookup
import (
"fmt"
"testing"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
)
func TestCharDeviceLocator(t *testing.T) {
logger, _ := testlog.NewNullLogger()
testCases := []struct {
root string
expectedPrefixes []string
}{
{
root: "",
expectedPrefixes: []string{"", "/dev"},
},
{
root: "/",
expectedPrefixes: []string{"/", "/dev"},
},
{
root: "/some/root",
expectedPrefixes: []string{"/some/root", "/some/root/dev"},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("%d", i), func(t *testing.T) {
f := NewCharDeviceLocator(
WithLogger(logger),
WithRoot(tc.root),
).(*file)
require.EqualValues(t, tc.expectedPrefixes, f.prefixes)
})
}
}

View File

@@ -26,13 +26,11 @@ import (
// NewDirectoryLocator creates a Locator that can be used to find directories at the specified root. A logger
// is also specified.
func NewDirectoryLocator(logger *log.Logger, root string) Locator {
l := file{
logger: logger,
prefixes: []string{root},
filter: assertDirectory,
}
return &l
return NewFileLocator(
WithLogger(logger),
WithRoot(root),
WithFilter(assertDirectory),
)
}
// assertDirectory checks wither the specified path is a directory.

Some files were not shown because too many files have changed in this diff Show More