Compare commits

...

143 Commits

Author SHA1 Message Date
Evan Lezar
503ed96275 Merge branch 'fix-release-tooling' into 'main'
Ensure CLI versions are set correctly for RPM packages

See merge request nvidia/container-toolkit/container-toolkit!211
2022-08-24 10:45:38 +00:00
Evan Lezar
d8ba84d427 Add release tests for fedora35
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-24 11:57:20 +02:00
Evan Lezar
8e8c41a3bc Clean up repo test scripts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-24 11:57:20 +02:00
Evan Lezar
e34fe17b45 Add fedora35 to release and signing scripts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-24 11:57:20 +02:00
Evan Lezar
c5b0278c58 Ensure CLI versions are set correctly for RPM packages
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-24 11:57:20 +02:00
Evan Lezar
8daa257b35 Merge branch 'update-changelog' into 'main'
Add changelog for 1.11.0-rc.3

See merge request nvidia/container-toolkit/container-toolkit!210
2022-08-24 09:01:39 +00:00
Evan Lezar
6329174cfc Add changelog for 1.11.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-24 10:08:23 +02:00
Evan Lezar
1ec41c1bf1 Merge branch 'update-libnvidia-container' into 'main'
Update libnvidia-container

See merge request nvidia/container-toolkit/container-toolkit!209
2022-08-23 16:52:09 +00:00
Evan Lezar
581a76de38 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 17:29:01 +02:00
Evan Lezar
5d52ca8909 Merge branch 'add-fedora35' into 'main'
Add fedora35 package targets

See merge request nvidia/container-toolkit/container-toolkit!205
2022-08-23 13:04:45 +00:00
Evan Lezar
ad7151d394 Update CUDA base image to 11.7.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
3269a7b0e7 Update libnvidia-container submodule
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
6a155cc606 Increase package build timeout to 3 hours for slow aarch64 builds
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
a5bbf613e8 Use single config file for centos, al2, and fedora
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
22427c1359 Add fedora35 CI targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
f17121fd6c Add fedora targets to release scripts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
256e37eb3f Add fedora35 package targets
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Evan Lezar
bdfd123b9d Switch to single docker file yum-based rpm builds
This reuses the docker file for yum-based rpm distros (centos, amazonlinux)
instead of maintaining two files with the same contents.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-23 14:18:49 +02:00
Jon Mayo
3f7dce202a Merge branch 'remove-podman' into 'main'
Specify hook structure instead of importing Podman

See merge request nvidia/container-toolkit/container-toolkit!208
2022-08-22 15:25:40 +00:00
Evan Lezar
a6d21abe14 Merge branch 'add-package-with-no-libnvidia-container' into 'main'
Split nvidia-container-toolkit package

See merge request nvidia/container-toolkit/container-toolkit!195
2022-08-22 09:08:33 +00:00
Evan Lezar
d0f1fe2273 Use new packages in toolkit image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-19 12:38:17 +02:00
Evan Lezar
8de9593209 Split nvidia-container-toolkit package
This change splits the nvidia-container-toolkit package into the top-level package and
an nvidia-container-toolkit-base package.
The nvidia-container-toolkit-base package allows the NVIDIA Container Runtime and
NVIDIA Container Toolkit CLI to be installed on systems without requiring that the
NVIDIA Container Runtine Hook and the transitive dependencies included in the NVIDIA
Container Library and NVIDIA Container CLI also be installed.

This allows the runtime to be used on systems where the CSV or CDI mode of the runtime
is used exclusively.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-19 12:38:17 +02:00
Evan Lezar
64b2b50470 Fix centos8 test image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-19 12:36:52 +02:00
Evan Lezar
4dc1451c49 Fix indentation in makefile
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-19 12:36:52 +02:00
Evan Lezar
211081ff25 Update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-19 10:28:00 +02:00
Evan Lezar
c1c1d5cf8e Specify hook structure instead of importing Podman
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-19 10:26:34 +02:00
Evan Lezar
e91ffef258 Merge branch 'fix-runtime-hook-rename' into 'main'
Fix cleanup of nvidia-container-toolkit link

See merge request nvidia/container-toolkit/container-toolkit!207
2022-08-18 12:51:51 +00:00
Evan Lezar
47c8aa3790 Fix cleanup of nvidia-container-toolkit link
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-18 14:06:08 +02:00
Evan Lezar
33b4e7fb0a Merge branch 'fix-containerd-tests' into 'main'
Fix image in containerd tests

See merge request nvidia/container-toolkit/container-toolkit!206
2022-08-12 13:46:24 +00:00
Evan Lezar
936da0295b Use proper cuda image for containerd tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-12 14:23:24 +02:00
Evan Lezar
c2205c14fb Update subcomponents
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-12 14:22:40 +02:00
Evan Lezar
56935f5743 Merge branch 'fix-mounts' into 'main'
Fix setting of toolkit config option in toolkit container

See merge request nvidia/container-toolkit/container-toolkit!204
2022-08-09 15:46:15 +00:00
Evan Lezar
1b3bae790c Update image used for containerd tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-09 16:55:51 +02:00
Evan Lezar
47559a8c87 Output applied config to toolkit container stdout
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-09 15:18:59 +02:00
Evan Lezar
86412ea821 Ensure that toolkit-container sets correct default value
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-09 15:18:52 +02:00
Evan Lezar
b8aa844171 Fix setting of toolkit config option in toolkit container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-09 15:18:52 +02:00
Evan Lezar
f9464c5cf9 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-09 15:18:52 +02:00
Evan Lezar
9df75e1fa3 Merge branch 'add-tegra-files-as-mounts' into 'main'
Add modifier to inject Tegra platform files

See merge request nvidia/container-toolkit/container-toolkit!203
2022-08-09 11:43:04 +00:00
Evan Lezar
0218e2ebf7 Update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-08 17:12:47 +02:00
Evan Lezar
a9dc6550d5 Use nvinfo package from go-nvlib
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-08 17:11:42 +02:00
Evan Lezar
ffd6ec3c54 Add modifier to inject Tegra platform files
This change adds a modifier to that injects the tegra platform files
* /etc/nv_tegra_release
* /sys/devices/soc0/family

allowing these files to be used for platform detection in a containerized
context such as the GPU device plugin.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-08 16:04:20 +02:00
Evan Lezar
de3e0df96c Merge branch 'bump-version-1.11.0-rc.3' into 'main'
Bump version to 1.11.0-rc.3

See merge request nvidia/container-toolkit/container-toolkit!202
2022-08-08 13:45:59 +00:00
Evan Lezar
e5dadf34d9 Bump version to 1.11.0-rc.3
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-08-08 14:56:01 +02:00
Evan Lezar
52145f2d73 Merge branch 'fix-libnvidia-container-tag' into 'main'
Fix setting of LIBNVIDIA_CONTAINER_TAG

See merge request nvidia/container-toolkit/container-toolkit!201
2022-07-27 11:31:06 +00:00
Evan Lezar
90df3caf62 Fix setting of LIBNVIDIA_CONTAINER_TAG
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 13:30:31 +02:00
Evan Lezar
50db66a925 Merge branch 'release-1.11.0-rc.2' into 'main'
Add CHANGELOG entry for 1.11.0-rc.2

See merge request nvidia/container-toolkit/container-toolkit!200
2022-07-27 10:53:26 +00:00
Evan Lezar
8587fa05bd Add CHANGELOG entry for 1.11.0-rc.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 12:06:09 +02:00
Evan Lezar
8129dade3c Merge branch 'set-mount-devices' into 'main'
Allow accept-nvidia-visible-devices-* to be set by toolkit contianer

See merge request nvidia/container-toolkit/container-toolkit!198
2022-07-27 09:58:25 +00:00
Evan Lezar
3610fe7c33 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 11:12:57 +02:00
Evan Lezar
90518e0ce5 Allow accept-visible-devices config options to be set
This change allows the
* accept-nvidia-visible-devices-envvar-when-unprivileged
* accept-nvidia-visible-devices-as-volume-mounts

options to be set in the toolkit-container. These are controlled
by command line flags or the following environment variables:

* ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED
* ACCEPT_NVIDIA_VISIBLE_DEVICES_AS_VOLUME_MOUNTS

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 10:57:43 +02:00
Evan Lezar
9c060f06ba Remove unused TOOLKIT_ARGS / --toolkit-args
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 10:50:18 +02:00
Evan Lezar
e848aa7813 Set toolkit root as flag
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 10:50:06 +02:00
Evan Lezar
feedc912e4 Rename toolkitDir toolkitRoot
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 10:50:05 +02:00
Evan Lezar
ab3f05cf62 Move global toolkitDir to options struct
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 10:41:46 +02:00
Evan Lezar
35982e51bf Move toolkit options to struct
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-27 10:40:19 +02:00
Evan Lezar
94e650c518 Merge branch 'bump-version' into 'main'
bump version to 1.11.0-rc.2

See merge request nvidia/container-toolkit/container-toolkit!197
2022-07-26 17:57:23 +00:00
Evan Lezar
d9edc18bf8 Bump version to 1.11.0-rc.2
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-25 09:51:20 +02:00
Evan Lezar
f4d01e0a05 Add changelog entries for 1.11.0-rc.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-25 09:51:01 +02:00
Evan Lezar
648cfaba51 Merge branch 'update-error-message' into 'main'
Make error message clearer

See merge request nvidia/container-toolkit/container-toolkit!194
2022-07-21 08:49:56 +00:00
Christopher Desiniotis
3a9de13f4e Apply 1 suggestion(s) to 1 file(s) 2022-07-21 08:03:39 +00:00
Evan Lezar
629a68937e Merge branch 'fix-relative-files' into 'main'
Fix adjusting relative paths for containerised devices and mounts.

See merge request nvidia/container-toolkit/container-toolkit!193
2022-07-20 11:40:28 +00:00
Evan Lezar
34e80abdea Add root to mounts type
This change adds a root member to the mounts type that is used to
perform most of the lookups for files and devices. This allows
for consistent handling of relative paths.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-18 14:37:02 +02:00
Evan Lezar
1161b21166 Make error message clearer
This change improves the error message when invoking the NVIDIA
Runtime Hook in non-legacy mode. This should guide users to specifying
the --runtime=nvidia flag when using docker.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-18 13:09:59 +02:00
Evan Lezar
bcdef81e30 Merge branch 'fix-ordering-of-csv-hooks' into 'main'
Fix ordering of create-symlink and update-ldcache hooks

See merge request nvidia/container-toolkit/container-toolkit!192
2022-07-18 10:59:41 +00:00
Evan Lezar
acc0afbb7a Remove Relative method from Locator
The Relative method added to the Locator interface was
not correctly implemented in the file type. The root was
never set when instantiating the object.

This change removes this method from the interface and the file
type, switching to a local implementation in the mounts type
instead.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-15 16:40:27 +02:00
Evan Lezar
7584044b3c Fix bug where ldcache may not contain symlinks
Since the creation of symlinks may include other libraries / folders
the ldcache should be updated AFTER the symlinks are created.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-15 12:18:40 +02:00
Evan Lezar
02c14e981c Add tests for identifying libraries
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-15 12:17:15 +02:00
Evan Lezar
37ee972f74 Merge branch 'CNT-2349/configure-docker' into 'main'
Add nvidia-ctk runtime configure command to update docker config

See merge request nvidia/container-toolkit/container-toolkit!166
2022-07-14 08:06:27 +00:00
Evan Lezar
3809407b6a Merge branch 'rename-to-nvidia-container-hook' into 'main'
Rename -toolkit executable to -runtime-hook

See merge request nvidia/container-toolkit/container-toolkit!189
2022-07-13 11:08:53 +00:00
Evan Lezar
f9547c447a Merge branch 'fix-cdi-refresh' into 'main'
Ensure that CDI registry is refreshed

See merge request nvidia/container-toolkit/container-toolkit!191
2022-07-13 09:38:45 +00:00
Evan Lezar
eb85d45137 Merge branch 'CNT-3297/cdi-config' into 'main'
Add runtime config option for CDI spec dirs

See merge request nvidia/container-toolkit/container-toolkit!190
2022-07-13 09:36:33 +00:00
Evan Lezar
9f0060f651 Add nvidia-ctk runtime configure command
This change adds a `runtime configure` command to the nvidia-ctk CLI. This
command is currently limited to configuring the docker config on the
system by modifying the daemon.json config file associated with docker.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-13 10:33:46 +02:00
Evan Lezar
0e6dc3f7ea Move docker config handling to internal package
In preparation for adding a command to the nvidia-ctk CLI to modify
the docker config, this change refactors load, update, and flush logic
from the toolkit container docker CLI to an internal package.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-13 10:30:01 +02:00
Evan Lezar
1b4944e1de Ensure that CDI registry is refreshed
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-12 14:07:21 +02:00
Evan Lezar
83743e3613 Add runtime config option for CDI spec dirs
This change adds an nvidia-container-runtime.modes.cdi.spec-dirs
config option that allows the default spec dirs to be overridden.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-11 15:39:48 +02:00
Evan Lezar
87afcc3ef4 Reuse check for existing hook
This change reuse the code that checks for the existing NVIDIA
Container Runtime hook to ensure that both nvidia-container-toolkit
and nvidia-container-runtime-hook are detected.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-08 12:20:19 +02:00
Evan Lezar
6ed3a4e1a6 Update package descriptions and URLs
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-08 12:16:03 +02:00
Evan Lezar
8a56671d18 Update package definitions
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-08 12:16:03 +02:00
Evan Lezar
1d81db76a6 Update references to nvidia-container-runtime-hook
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-08 12:15:56 +02:00
Evan Lezar
f50aecb84e Rename -toolkit executable to -runtime-hook
This change renames the nvidia-container-toolkit executable
to nvidia-container-runtime-hook. Here nvidia-container-toolkit
is created as a symlink to nvidia-container-runtime-hook.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-08 12:09:11 +02:00
Evan Lezar
a4258277e1 Merge branch 'update-release-script' into 'main'
Update release tooling to allow for rc release that don't update all packages.

See merge request nvidia/container-toolkit/container-toolkit!188
2022-07-07 14:33:27 +00:00
Evan Lezar
18eb3c7c38 Skip packages that already exist
For rc releases we allow nvidia-container-toolkit versions
to not match libnvidia-container versions. This change ensures
that only changed packages are released.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-07 15:41:20 +02:00
Evan Lezar
a0e728b5c8 Use centos:stream8 image for signing
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-07 15:40:53 +02:00
Evan Lezar
df0176cca4 Merge branch 'support-host-device-paths' into 'main'
Support device nodes with a different root

See merge request nvidia/container-toolkit/container-toolkit!187
2022-07-07 11:35:10 +00:00
Evan Lezar
b68b3c543b Use device host path to determine properties
This mirrors what is done in cri-o and allows for devices nodes
from, for example, the driver container to be injected into a
container at /dev instead of <ROOT>/dev

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-07 12:03:23 +02:00
Evan Lezar
aea1a85bb4 Update vendored runc version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-07 11:30:01 +02:00
Evan Lezar
98e874e750 Merge branch 'add-cdi-mode' into 'main'
Add CDI mode to NVIDIA Container Runtime

See merge request nvidia/container-toolkit/container-toolkit!172
2022-07-07 08:09:38 +00:00
Evan Lezar
eef016c27d Merge branch 'refactor-csv-discovery' into 'main'
Refactor device discovery

See merge request nvidia/container-toolkit/container-toolkit!185
2022-07-07 08:07:43 +00:00
Evan Lezar
19f89ecafd Update cdi package and run go mod vendor
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-06 16:53:38 +02:00
Evan Lezar
8817dee66c Add support for specifying devices in annotations
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-06 16:53:36 +02:00
Evan Lezar
404e266222 Add cdi mode to NVIDIA Container Runtime
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-06 16:53:05 +02:00
Evan Lezar
9b898c65fa Merge branch 'move-license-make-target' into 'main'
The licenses make target should not be a check target

See merge request nvidia/container-toolkit/container-toolkit!186
2022-07-06 13:14:43 +00:00
Evan Lezar
5c39cf4deb The licenses make target should not be a check target
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-06 14:24:11 +02:00
Evan Lezar
beff276a52 Add charDevices discoverer for devices
This change adds a charDevices discoverer and using this
for CSV, GDS, and MOFED discovery. Internally the discoverer
is a "mounts" discoverer with a charDevice locator.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-06 13:43:23 +02:00
Evan Lezar
55cb82c6c8 Create single discoverer per mount type for CSV
Instead of creating a set of discoverers per file, this change creates
a discoverer per type by first concatenating the mount specifications
from all files. This will allow all device nodes, for example, to
be treated as a single device.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-06 10:57:35 +02:00
Evan Lezar
88d1143827 Merge branch 'add-go-license' into 'main'
Add tooling to check go licenses

See merge request nvidia/container-toolkit/container-toolkit!183
2022-07-06 05:08:59 +00:00
Evan Lezar
d5162b1917 Add tooling to check go licenses
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-05 20:19:23 +02:00
Evan Lezar
ec078543a1 Merge branch 'rename-discover-merge' into 'main'
Rename discover.NewList to discover.Merge

See merge request nvidia/container-toolkit/container-toolkit!182
2022-07-05 09:37:03 +00:00
Evan Lezar
9191074666 Rename discover.NewList to discover.Merge
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-05 10:28:40 +02:00
Evan Lezar
89824849d3 Merge branch 'refactor-envvar-devices' into 'main'
Add DevicesFromEnvvars function to CUDA image abstraction

See merge request nvidia/container-toolkit/container-toolkit!178
2022-07-04 08:47:28 +00:00
Evan Lezar
877083f091 Merge branch 'CNT-3242/strip-root-from-container-mount' into 'main'
Strip root (e.g. driver root) from located mount paths in the container

See merge request nvidia/container-toolkit/container-toolkit!177
2022-07-04 08:45:38 +00:00
Evan Lezar
6467fcd0f5 Merge branch 'ensure-test-output-path-exists' into 'main'
Ensure test/output path exists

See merge request nvidia/container-toolkit/container-toolkit!180
2022-07-04 08:44:11 +00:00
Evan Lezar
fd135f1a8b Add Relative function to Locator interface
This adds a Relative function to the Locator interface and uses
this to determine the host and container paths for located files
(and devices). This ensures that the root (e.g. the nvidia driver
root) is stripped from the container path.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 16:23:50 +02:00
Evan Lezar
4e08ec2405 Use CUDA.DevicesFromEnvvar to check if modifications are required
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 16:14:36 +02:00
Evan Lezar
925c348565 Add DevicesFromEnvvars function to CUDA image
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 16:12:13 +02:00
Kevin Klues
25fd1aaf7e Merge branch 'CNT-3084/include-cufile.json' into 'main'
Include cufile.json in GDS discovery

See merge request nvidia/container-toolkit/container-toolkit!175
2022-07-01 13:49:02 +00:00
Kevin Klues
91e645b91b Merge branch 'gds-poc' into 'main'
Add initial GDS and MOFED discovery

See merge request nvidia/container-toolkit/container-toolkit!163
2022-07-01 13:43:20 +00:00
Evan Lezar
a1c2f07b6e Add /etc/cufile.json to list of required mounts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:54:58 +02:00
Evan Lezar
7f7bec0668 Create GDS and MOFED modifiers
This change creates GDS and MOFED modifiers and adds them to the
modifer created for the selected runtime mode if the NVIDIA_GDS
and NVIDIA_MOFED envvars are set to "enabled", respectively.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:54:05 +02:00
Evan Lezar
cb34f7c6d1 Add discovery of GDS and MOFED devices
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:40:55 +02:00
Evan Lezar
7f47a61986 Allow globs in filenames for locators
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:30:33 +02:00
Evan Lezar
e8843c38f2 Move cmd/nvidia-container-runtime/modifier package to internal/modifier
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:28:40 +02:00
Evan Lezar
d66c00dd1d Use modifier list and discoverModifer
This change uses modifier compositioning and the discoverModifier to
refactor the existing CSV modifier.

This change adds a discoverModifier to the internal/modifier package and
refactors the CSV modifier to use this abstraction.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:25:19 +02:00
Evan Lezar
55ac8628c8 Add lists of modifiers to allow for modifier compositioning
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 14:25:18 +02:00
Evan Lezar
175f75b43f Ensure test/output path exists
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-07-01 10:07:37 +02:00
Evan Lezar
da3226745c Update vendoring
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-22 10:58:09 +02:00
Evan Lezar
b23e3ea13a Merge branch 'bump-1.11.0-rc.1' into 'main'
Bump version to 1.11.0-rc.1

See merge request nvidia/container-toolkit/container-toolkit!170
2022-06-22 07:52:19 +00:00
Evan Lezar
02f0ee08fc Update nvidia-docker and nvidia-container-runtime
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-17 11:58:06 +02:00
Evan Lezar
4b0e79be50 Update nvidia-docker and nvidia-container-runtime branches to main
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-17 11:37:53 +02:00
Evan Lezar
8b729475e2 Allow any 1.* version of libnvidia-container package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-16 14:57:30 +02:00
Evan Lezar
a1319b1786 Switch to latest docker and docker dind in CI
This change prevents errors when downloading ubuntu repos on
amd64 architectures. The `stable` images were last pushed
2 years ago.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-16 13:44:14 +02:00
Evan Lezar
278fa43303 Allow libnvidia-container1 version to be specified directly
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-15 13:37:42 +02:00
Evan Lezar
d75f364b27 Update build scripts to set libnvidia-container version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-15 13:37:42 +02:00
Evan Lezar
52d5021b76 Bump version to 1.11.0-rc.1
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-15 13:37:40 +02:00
Kevin Klues
7cfd3bd510 Merge branch 'bump-v1.10.0' into 'main'
Bump version to v1.10.0

See merge request nvidia/container-toolkit/container-toolkit!169
2022-06-13 10:32:37 +00:00
Evan Lezar
05ca131858 Update libnvidia-container submodule to v1.10.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-13 11:40:18 +02:00
Evan Lezar
181ce8571d Bump version to v1.10.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-13 11:40:18 +02:00
Shiva Krishna Merla
2ab0c6abce Merge branch 'update_container_licenses' into 'main'
Update toolkit images to use NGC DL license

See merge request nvidia/container-toolkit/container-toolkit!164
2022-06-08 19:04:22 +00:00
Shiva Krishna Merla
50caf29b4e Update toolkit images to use NGC DL license 2022-06-08 19:04:21 +00:00
Evan Lezar
067f7af142 Merge branch 'update-nvidia-docker' into 'main'
Bump nvidia-docker version to 2.11.0

See merge request nvidia/container-toolkit/container-toolkit!167
2022-06-08 12:15:17 +00:00
Evan Lezar
d1449951bc Bump nvidia-docker version
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-08 13:25:27 +02:00
Evan Lezar
a05af50b0f Merge branch 'bump-cuda-version' into 'main'
Bump CUDA base image version to 11.7.0

See merge request nvidia/container-toolkit/container-toolkit!162
2022-06-07 15:22:05 +00:00
Evan Lezar
950aff269b Merge branch 'bump-version-1.10.0-rc.4' into 'main'
Update NVIDIA Container Runtime readme and installed configs

See merge request nvidia/container-toolkit/container-toolkit!160
2022-06-07 15:05:48 +00:00
Evan Lezar
e033db559f Switch default container-toolkit image target to ubuntu20.04
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-07 11:32:20 +02:00
Evan Lezar
9a24a40fd2 Merge branch 'only-bump-version' into 'main'
Bump version to 1.10.0-rc.4

See merge request nvidia/container-toolkit/container-toolkit!165
2022-06-07 09:00:38 +00:00
Evan Lezar
df391e2144 Only generate amd64 images for ubuntu18.04
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-07 10:58:15 +02:00
Evan Lezar
9146b4d4b6 Remove build and release of centos8 container-toolkit images
Note that the centos8 packages are still built.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-07 10:58:15 +02:00
Evan Lezar
068d7e085b Use ubi8 base image for centos8
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-07 10:58:15 +02:00
Evan Lezar
79510a8290 Bump CUDA base image version to 11.7.0
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-07 10:58:15 +02:00
Evan Lezar
50240c93bd Update config files with options and defaults
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-03 13:10:24 +02:00
Evan Lezar
7ca0e5db60 Update NVIDIA Container Runtime readme
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-03 13:10:21 +02:00
Evan Lezar
c0e6765d46 Update libnvidia-container
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-01 15:29:25 +02:00
Evan Lezar
7739b0e8ea Bump version to 1.10.0-rc.4
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-01 14:46:12 +02:00
166 changed files with 5630 additions and 3781 deletions

View File

@@ -12,9 +12,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.
default:
image: docker:stable
image: docker
services:
- name: docker:stable-dind
- name: docker:dind
command: ["--experimental"]
variables:
@@ -57,6 +57,10 @@ stages:
variables:
DIST: debian9
.dist-fedora35:
variables:
DIST: fedora35
.dist-opensuse-leap15.1:
variables:
DIST: opensuse-leap15.1
@@ -210,13 +214,6 @@ release:staging-centos7:
needs:
- image-centos7
release:staging-centos8:
extends:
- .release:staging
- .dist-centos8
needs:
- image-centos8
release:staging-ubi8:
extends:
- .release:staging

View File

@@ -94,7 +94,7 @@ unit-tests:
- .multi-arch-build
- .package-artifacts
stage: package-build
timeout: 2h 30m
timeout: 3h
script:
- ./scripts/build-packages.sh ${DIST}-${ARCH}
@@ -158,6 +158,18 @@ package-debian9-amd64:
- .dist-debian9
- .arch-amd64
package-fedora35-aarch64:
extends:
- .package-build
- .dist-fedora35
- .arch-aarch64
package-fedora35-x86_64:
extends:
- .package-build
- .dist-fedora35
- .arch-x86_64
package-opensuse-leap15.1-x86_64:
extends:
- .package-build
@@ -231,16 +243,6 @@ image-centos7:
- package-centos7-ppc64le
- package-centos7-x86_64
image-centos8:
extends:
- .image-build
- .package-artifacts
- .dist-centos8
needs:
- package-centos8-aarch64
- package-centos8-x86_64
- package-centos8-ppc64le
image-ubi8:
extends:
- .image-build
@@ -288,6 +290,8 @@ image-packaging:
- package-centos8-x86_64
- package-debian10-amd64
- package-debian9-amd64
- package-fedora35-aarch64
- package-fedora35-x86_64
- package-opensuse-leap15.1-x86_64
- package-ubuntu16.04-amd64
- package-ubuntu16.04-ppc64le

2
.gitmodules vendored
View File

@@ -5,6 +5,8 @@
[submodule "third_party/nvidia-container-runtime"]
path = third_party/nvidia-container-runtime
url = https://gitlab.com/nvidia/container-toolkit/container-runtime.git
branch = main
[submodule "third_party/nvidia-docker"]
path = third_party/nvidia-docker
url = https://gitlab.com/nvidia/container-toolkit/nvidia-docker.git
branch = main

View File

@@ -70,11 +70,6 @@ image-centos7:
- .image-pull
- .dist-centos7
image-centos8:
extends:
- .image-pull
- .dist-centos8
image-ubi8:
extends:
- .image-pull
@@ -154,23 +149,6 @@ scan-centos7-arm64:
- image-centos7
- scan-centos7-amd64
scan-centos8-amd64:
extends:
- .scan
- .dist-centos8
- .platform-amd64
needs:
- image-centos8
scan-centos8-arm64:
extends:
- .scan
- .dist-centos8
- .platform-arm64
needs:
- image-centos8
- scan-centos8-amd64
scan-ubuntu18.04-amd64:
extends:
- .scan
@@ -179,15 +157,6 @@ scan-ubuntu18.04-amd64:
needs:
- image-ubuntu18.04
scan-ubuntu18.04-arm64:
extends:
- .scan
- .dist-ubuntu18.04
- .platform-arm64
needs:
- image-ubuntu18.04
- scan-ubuntu18.04-amd64
scan-ubuntu20.04-amd64:
extends:
- .scan
@@ -253,11 +222,6 @@ release:ngc-centos7:
- .release:ngc
- .dist-centos7
release:ngc-centos8:
extends:
- .release:ngc
- .dist-centos8
release:ngc-ubuntu18.04:
extends:
- .release:ngc

View File

@@ -1,5 +1,31 @@
# NVIDIA Container Toolkit Changelog
## v1.11.0-rc.3
* Build fedora35 packages
* Introduce an `nvidia-container-toolkit-base` package for better dependency management
* Fix removal of `nvidia-container-runtime-hook` on RPM-based systems
* Inject platform files into container on Tegra-based systems
* [toolkit container] Update CUDA base images to 11.7.1
* [libnvidia-container] Preload libgcc_s.so.1 on arm64 systems
## v1.11.0-rc.2
* Allow `accept-nvidia-visible-devices-*` config options to be set by toolkit container
* [libnvidia-container] Fix bug where LDCache was not updated when the `--no-pivot-root` option was specified
## v1.11.0-rc.1
* Add discovery of GPUDirect Storage (`nvidia-fs*`) devices if the `NVIDIA_GDS` environment variable of the container is set to `enabled`
* Add discovery of MOFED Infiniband devices if the `NVIDIA_MOFED` environment variable of the container is set to `enabled`
* Fix bug in CSV mode where libraries listed as `sym` entries in mount specification are not added to the LDCache.
* Rename `nvidia-container-toolkit` executable to `nvidia-container-runtime-hook` and create `nvidia-container-toolkit` as a symlink to `nvidia-container-runtime-hook` instead.
* Add `nvidia-ctk runtime configure` command to configure the Docker config file (e.g. `/etc/docker/daemon.json`) for use with the NVIDIA Container Runtime.
## v1.10.0
* Promote v1.10.0-rc.3 to v1.10.0
## v1.10.0-rc.3
* Use default config instead of raising an error if config file cannot be found

View File

@@ -39,7 +39,7 @@ CMDS := $(patsubst ./cmd/%/,%,$(sort $(dir $(wildcard ./cmd/*/))))
CMD_TARGETS := $(patsubst %,cmd-%, $(CMDS))
CHECK_TARGETS := assert-fmt vet lint ineffassign misspell
MAKE_TARGETS := binaries build check fmt lint-internal test examples cmds coverage generate $(CHECK_TARGETS)
MAKE_TARGETS := binaries build check fmt lint-internal test examples cmds coverage generate licenses $(CHECK_TARGETS)
TARGETS := $(MAKE_TARGETS) $(EXAMPLE_TARGETS) $(CMD_TARGETS)
@@ -102,6 +102,9 @@ misspell:
vet:
go vet $(MODULE)/...
licenses:
go-licenses csv $(MODULE)/...
COVERAGE_FILE := coverage.out
test: build cmds
go test -v -coverprofile=$(COVERAGE_FILE) $(MODULE)/...

View File

@@ -67,9 +67,9 @@ ARG TARGETARCH
ENV PACKAGE_ARCH ${TARGETARCH}
RUN PACKAGE_ARCH=${PACKAGE_ARCH/amd64/x86_64} && PACKAGE_ARCH=${PACKAGE_ARCH/arm64/aarch64} && \
yum localinstall -y \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container1-${PACKAGE_VERSION}*.rpm \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container-tools-${PACKAGE_VERSION}*.rpm \
${PACKAGE_DIST}/${PACKAGE_ARCH}/nvidia-container-toolkit-${PACKAGE_VERSION}*.rpm
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container1-1.*.rpm \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container-tools-1.*.rpm \
${PACKAGE_DIST}/${PACKAGE_ARCH}/nvidia-container-toolkit*-${PACKAGE_VERSION}*.rpm
WORKDIR /work
@@ -85,7 +85,7 @@ LABEL release="N/A"
LABEL summary="Automatically Configure your Container Runtime for GPU support."
LABEL description="See summary"
COPY ./LICENSE /licenses/LICENSE
RUN mkdir /licenses && mv /NGC-DL-CONTAINER-LICENSE /licenses/NGC-DL-CONTAINER-LICENSE
# Install / upgrade packages here that are required to resolve CVEs
ARG CVE_UPDATES

View File

@@ -26,4 +26,4 @@ COPY ${ARTIFACTS_ROOT} /artifacts/packages/
WORKDIR /artifacts/packages
COPY ./LICENSE /licenses/LICENSE
RUN mkdir /licenses && mv /NGC-DL-CONTAINER-LICENSE /licenses/NGC-DL-CONTAINER-LICENSE

View File

@@ -75,9 +75,9 @@ RUN if [ "${PACKAGE_ARCH}" = "arm64" ]; then \
fi
RUN dpkg -i \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container1_${PACKAGE_VERSION}*.deb \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container-tools_${PACKAGE_VERSION}*.deb \
${PACKAGE_DIST}/${PACKAGE_ARCH}/nvidia-container-toolkit_${PACKAGE_VERSION}*.deb
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container1_1.*.deb \
${PACKAGE_DIST}/${PACKAGE_ARCH}/libnvidia-container-tools_1.*.deb \
${PACKAGE_DIST}/${PACKAGE_ARCH}/nvidia-container-toolkit*_${PACKAGE_VERSION}*.deb
WORKDIR /work
@@ -93,7 +93,7 @@ LABEL release="N/A"
LABEL summary="Automatically Configure your Container Runtime for GPU support."
LABEL description="See summary"
COPY ./LICENSE /licenses/LICENSE
RUN mkdir /licenses && mv /NGC-DL-CONTAINER-LICENSE /licenses/NGC-DL-CONTAINER-LICENSE
# Install / upgrade packages here that are required to resolve CVEs
ARG CVE_UPDATES

View File

@@ -43,8 +43,8 @@ OUT_IMAGE_TAG = $(OUT_IMAGE_VERSION)-$(DIST)
OUT_IMAGE = $(OUT_IMAGE_NAME):$(OUT_IMAGE_TAG)
##### Public rules #####
DEFAULT_PUSH_TARGET := ubuntu18.04
DISTRIBUTIONS := ubuntu20.04 ubuntu18.04 ubi8 centos7 centos8
DEFAULT_PUSH_TARGET := ubuntu20.04
DISTRIBUTIONS := ubuntu20.04 ubuntu18.04 ubi8 centos7
META_TARGETS := packaging
@@ -110,10 +110,10 @@ build-ubi8: DOCKERFILE_SUFFIX := centos
build-ubi8: PACKAGE_DIST = centos8
build-ubi8: PACKAGE_VERSION := $(LIB_VERSION)-$(if $(LIB_TAG),0.1.$(LIB_TAG),1)
build-centos%: BASE_DIST = $(*)
build-centos%: DOCKERFILE_SUFFIX := centos
build-centos%: PACKAGE_DIST = $(BASE_DIST)
build-centos%: PACKAGE_VERSION := $(LIB_VERSION)-$(if $(LIB_TAG),0.1.$(LIB_TAG),1)
build-centos7: BASE_DIST = $(*)
build-centos7: DOCKERFILE_SUFFIX := centos
build-centos7: PACKAGE_DIST = $(BASE_DIST)
build-centos7: PACKAGE_VERSION := $(LIB_VERSION)-$(if $(LIB_TAG),0.1.$(LIB_TAG),1)
build-packaging: BASE_DIST := ubuntu20.04
build-packaging: DOCKERFILE_SUFFIX := packaging

View File

@@ -30,5 +30,8 @@ push-short:
# We only have x86_64 packages for centos7
build-centos7: DOCKER_BUILD_PLATFORM_OPTIONS = --platform=linux/amd64
# We only generate amd64 image for ubuntu18.04
build-ubuntu18.04: DOCKER_BUILD_PLATFORM_OPTIONS = --platform=linux/amd64
# We only generate a single image for packaging targets
build-packaging: DOCKER_BUILD_PLATFORM_OPTIONS = --platform=linux/amd64

View File

@@ -165,7 +165,7 @@ func isPrivileged(s *Spec) bool {
return false
}
func getDevicesFromEnvvar(env map[string]string, legacyImage bool) *string {
func getDevicesFromEnvvar(image image.CUDA) *string {
// Build a list of envvars to consider.
envVars := []string{envNVVisibleDevices}
if envSwarmGPU != nil {
@@ -173,35 +173,14 @@ func getDevicesFromEnvvar(env map[string]string, legacyImage bool) *string {
envVars = append([]string{*envSwarmGPU}, envVars...)
}
// Grab a reference to devices from the first envvar
// in the list that actually exists in the environment.
var devices *string
for _, envVar := range envVars {
if devs, ok := env[envVar]; ok {
devices = &devs
break
}
}
// Environment variable unset with legacy image: default to "all".
if devices == nil && legacyImage {
all := "all"
return &all
}
// Environment variable unset or empty or "void": return nil
if devices == nil || len(*devices) == 0 || *devices == "void" {
devices := image.DevicesFromEnvvars(envVars...)
if len(devices) == 0 {
return nil
}
// Environment variable set to "none": reset to "".
if *devices == "none" {
empty := ""
return &empty
}
devicesString := strings.Join(devices, ",")
// Any other value.
return devices
return &devicesString
}
func getDevicesFromMounts(mounts []Mount) *string {
@@ -241,7 +220,7 @@ func getDevicesFromMounts(mounts []Mount) *string {
return &ret
}
func getDevices(hookConfig *HookConfig, env map[string]string, mounts []Mount, privileged bool, legacyImage bool) *string {
func getDevices(hookConfig *HookConfig, image image.CUDA, mounts []Mount, privileged bool) *string {
// If enabled, try and get the device list from volume mounts first
if hookConfig.AcceptDeviceListAsVolumeMounts {
devices := getDevicesFromMounts(mounts)
@@ -251,7 +230,7 @@ func getDevices(hookConfig *HookConfig, env map[string]string, mounts []Mount, p
}
// Fallback to reading from the environment variable if privileges are correct
devices := getDevicesFromEnvvar(env, legacyImage)
devices := getDevicesFromEnvvar(image)
if devices == nil {
return nil
}
@@ -307,7 +286,7 @@ func getNvidiaConfig(hookConfig *HookConfig, image image.CUDA, mounts []Mount, p
legacyImage := image.IsLegacy()
var devices string
if d := getDevices(hookConfig, image, mounts, privileged, legacyImage); d != nil {
if d := getDevices(hookConfig, image, mounts, privileged); d != nil {
devices = *d
} else {
// 'nil' devices means this is not a GPU container.

View File

@@ -4,6 +4,7 @@ import (
"path/filepath"
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
"github.com/stretchr/testify/require"
)
@@ -671,7 +672,7 @@ func TestDeviceListSourcePriority(t *testing.T) {
hookConfig := getDefaultHookConfig()
hookConfig.AcceptEnvvarUnprivileged = tc.acceptUnprivileged
hookConfig.AcceptDeviceListAsVolumeMounts = tc.acceptMounts
devices = getDevices(&hookConfig, env, tc.mountDevices, tc.privileged, false)
devices = getDevices(&hookConfig, env, tc.mountDevices, tc.privileged)
}
// For all other tests, just grab the devices and check the results
@@ -693,7 +694,6 @@ func TestGetDevicesFromEnvvar(t *testing.T) {
description string
envSwarmGPU *string
env map[string]string
legacyImage bool
expectedDevices *string
}{
{
@@ -729,13 +729,15 @@ func TestGetDevicesFromEnvvar(t *testing.T) {
description: "NVIDIA_VISIBLE_DEVICES set returns value for legacy image",
env: map[string]string{
envNVVisibleDevices: gpuID,
envCUDAVersion: "legacy",
},
legacyImage: true,
expectedDevices: &gpuID,
},
{
description: "empty env returns all for legacy image",
legacyImage: true,
description: "empty env returns all for legacy image",
env: map[string]string{
envCUDAVersion: "legacy",
},
expectedDevices: &all,
},
// Add the `DOCKER_RESOURCE_GPUS` envvar and ensure that this is ignored when
@@ -781,16 +783,16 @@ func TestGetDevicesFromEnvvar(t *testing.T) {
env: map[string]string{
envNVVisibleDevices: gpuID,
envDockerResourceGPUs: anotherGPUID,
envCUDAVersion: "legacy",
},
legacyImage: true,
expectedDevices: &gpuID,
},
{
description: "empty env returns all for legacy image",
env: map[string]string{
envDockerResourceGPUs: anotherGPUID,
envCUDAVersion: "legacy",
},
legacyImage: true,
expectedDevices: &all,
},
// Add the `DOCKER_RESOURCE_GPUS` envvar and ensure that this is selected when
@@ -834,8 +836,8 @@ func TestGetDevicesFromEnvvar(t *testing.T) {
envSwarmGPU: &envDockerResourceGPUs,
env: map[string]string{
envDockerResourceGPUs: gpuID,
envCUDAVersion: "legacy",
},
legacyImage: true,
expectedDevices: &gpuID,
},
{
@@ -860,7 +862,7 @@ func TestGetDevicesFromEnvvar(t *testing.T) {
for i, tc := range tests {
t.Run(tc.description, func(t *testing.T) {
envSwarmGPU = tc.envSwarmGPU
devices := getDevicesFromEnvvar(tc.env, tc.legacyImage)
devices := getDevicesFromEnvvar(image.CUDA(tc.env))
if tc.expectedDevices == nil {
require.Nil(t, devices, "%d: %v", i, tc)
return

View File

@@ -34,7 +34,7 @@ type CLIConfig struct {
Ldconfig *string `toml:"ldconfig"`
}
// HookConfig : options for the nvidia-container-toolkit.
// HookConfig : options for the nvidia-container-runtime-hook.
type HookConfig struct {
DisableRequire bool `toml:"disable-require"`
SwarmResource *string `toml:"swarm-resource"`

View File

@@ -75,7 +75,7 @@ func doPrestart() {
cli := hook.NvidiaContainerCLI
if info.ResolveAutoMode(&logInterceptor{}, hook.NVIDIAContainerRuntime.Mode) != "legacy" {
log.Panicln("invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime instead.")
log.Panicln("invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime (e.g. specify the --runtime=nvidia flag) instead.")
}
container := getContainerConfig(hook)

View File

@@ -2,24 +2,86 @@
The NVIDIA Container Runtime is a shim for OCI-compliant low-level runtimes such as [runc](https://github.com/opencontainers/runc). When a `create` command is detected, the incoming [OCI runtime specification](https://github.com/opencontainers/runtime-spec) is modified in place and the command is forwarded to the low-level runtime.
## Standard Mode
## Configuration
In the standard mode configuration, the NVIDIA Container Runtime adds a [`prestart` hook](https://github.com/opencontainers/runtime-spec/blob/master/config.md#prestart) to the incomming OCI specification that invokes the NVIDIA Container Runtime Hook for all containers created. This hook checks whether NVIDIA devices are requested and ensures GPU access is configured using the `nvidia-container-cli` from project [libnvidia-container](https://github.com/NVIDIA/libnvidia-container).
The NVIDIA Container Runtime uses file-based configuration, with the config stored in `/etc/nvidia-container-runtime/config.toml`. The `/etc` path can be overridden using the `XDG_CONFIG_HOME` environment variable with the `${XDG_CONFIG_HOME}/nvidia-container-runtime/config.toml` file used instead if this environment variable is set.
## Experimental Mode
This config file may contain options for other components of the NVIDIA container stack and for the NVIDIA Container Runtime, the relevant config section is `nvidia-container-runtime`
The NVIDIA Container Runtime can be configured in an experimental mode by setting the following options in the runtime's `config.toml` file:
### Logging
The `log-level` config option (default: `"info"`) specifies the log level to use and the `debug` option, if set, specifies a log file to which logs for the NVIDIA Container Runtime must be written.
In addition to this, the NVIDIA Container Runtime considers the value of `--log` and `--log-format` flags that may be passed to it by a container runtime such as docker or containerd. If the `--debug` flag is present the log-level specified in the config file is overridden as `"debug"`.
### Low-level Runtime Path
The `runtimes` config option allows for the low-level runtime to be specified. The first entry in this list that is an existing executable file is used as the low-level runtime. If the entry is not a path, the `PATH` is searched for a matching executable. If the entry is a path this is checked instead.
The default value for this setting is:
```toml
runtimes = [
"docker-runc",
"runc",
]
```
and if, for example, `crun` is to be used instead this can be changed to:
```toml
runtimes = [
"crun",
]
```
### Runtime Mode
The `mode` config option (default `"auto"`) controls the high-level behaviour of the runtime.
#### Auto Mode
When `mode` is set to `"auto"`, the runtime employs heuristics to determine which mode to use based on, for example, the platform where the runtime is being run.
#### Legacy Mode
When `mode` is set to `"legacy"`, the NVIDIA Container Runtime adds a [`prestart` hook](https://github.com/opencontainers/runtime-spec/blob/master/config.md#prestart) to the incomming OCI specification that invokes the NVIDIA Container Runtime Hook for all containers created. This hook checks whether NVIDIA devices are requested and ensures GPU access is configured using the `nvidia-container-cli` from the [libnvidia-container](https://github.com/NVIDIA/libnvidia-container) project.
#### CSV Mode
When `mode` is set to `"csv"`, CSV files at `/etc/nvidia-container-runtime/host-files-for-container.d` define the devices and mounts that are to be injected into a container when it is created. The search path for the files can be overridden by modifying the `nvidia-container-runtime.modes.csv.mount-spec-path` in the config as below:
```toml
[nvidia-container-runtime]
experimental = true
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
```
When this setting is enabled, the modifications made to the OCI specification are controlled by the `nvidia-container-runtime.discover-mode` option, with the following mode supported:
* `"legacy"`: This mode mirrors the behaviour of the standard mode, inserting the NVIDIA Container Runtime Hook as a `prestart` hook into the container's OCI specification.
* `"csv"`: This mode uses CSV files at `/etc/nvidia-container-runtime/host-files-for-container.d` to define the devices and mounts that are to be injected into a container when it is created.
This mode is primarily targeted at Tegra-based systems without NVML available.
### Notes on using the docker CLI
The `docker` CLI supports the `--gpus` flag to select GPUs for inclusion in a container. Since specifying this flag inserts the same NVIDIA Container Runtime Hook into the OCI runtime specification. When experimental mode is activated, the NVIDIA Container Runtime detects the presence of the hook and raises an error. This requirement will be relaxed in the near future.
Note that only the `"legacy"` NVIDIA Container Runtime mode is directly compatible with the `--gpus` flag implemented by the `docker` CLI (assuming the NVIDIA Container Runtime is not used). The reason for this is that `docker` inserts the same NVIDIA Container Runtime Hook into the OCI runtime specification.
If a different mode is explicitly set or detected, the NVIDIA Container Runtime Hook will raise the following error when `--gpus` is set:
```
$ docker run --rm --gpus all ubuntu:18.04
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'csv'
invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime instead.: unknown.
```
Here NVIDIA Container Runtime must be used explicitly. The recommended way to do this is to specify the `--runtime=nvidia` command line argument as part of the `docker run` commmand as follows:
```
$ docker run --rm --gpus all --runtime=nvidia ubuntu:18.04
```
Alternatively the NVIDIA Container Runtime can be set as the default runtime for docker. This can be done by modifying the `/etc/docker/daemon.json` file as follows:
```json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
```

View File

@@ -10,7 +10,7 @@ import (
"strings"
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime/modifier"
"github.com/NVIDIA/nvidia-container-toolkit/internal/modifier"
"github.com/NVIDIA/nvidia-container-toolkit/internal/test"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/stretchr/testify/require"

View File

@@ -19,9 +19,9 @@ package main
import (
"fmt"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-container-runtime/modifier"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
"github.com/NVIDIA/nvidia-container-toolkit/internal/modifier"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/NVIDIA/nvidia-container-toolkit/internal/runtime"
"github.com/sirupsen/logrus"
@@ -62,11 +62,43 @@ func newNVIDIAContainerRuntime(logger *logrus.Logger, cfg *config.Config, argv [
// newSpecModifier is a factory method that creates constructs an OCI spec modifer based on the provided config.
func newSpecModifier(logger *logrus.Logger, cfg *config.Config, ociSpec oci.Spec, argv []string) (oci.SpecModifier, error) {
modeModifier, err := newModeModifier(logger, cfg, ociSpec, argv)
if err != nil {
return nil, err
}
gdsModifier, err := modifier.NewGDSModifier(logger, cfg, ociSpec)
if err != nil {
return nil, err
}
mofedModifier, err := modifier.NewMOFEDModifier(logger, cfg, ociSpec)
if err != nil {
return nil, err
}
tegraModifier, err := modifier.NewTegraPlatformFiles(logger)
if err != nil {
return nil, err
}
modifiers := modifier.Merge(
modeModifier,
gdsModifier,
mofedModifier,
tegraModifier,
)
return modifiers, nil
}
func newModeModifier(logger *logrus.Logger, cfg *config.Config, ociSpec oci.Spec, argv []string) (oci.SpecModifier, error) {
switch info.ResolveAutoMode(logger, cfg.NVIDIAContainerRuntimeConfig.Mode) {
case "legacy":
return modifier.NewStableRuntimeModifier(logger), nil
case "csv":
return modifier.NewCSVModifier(logger, cfg, ociSpec)
case "cdi":
return modifier.NewCDIModifier(logger, cfg, ociSpec)
}
return nil, fmt.Errorf("invalid runtime mode: %v", cfg.NVIDIAContainerRuntimeConfig.Mode)

View File

@@ -1,3 +1,17 @@
# NVIDIA Container Toolkit CLI
The NVIDIA Container Toolkit CLI `nvidia-ctk` provides a number of utilities that are useful for working with the NVIDIA Container Toolkit.
## Functionality
### Configure runtimes
The `runtime` command of the `nvidia-ctk` CLI provides a set of utilities to related to the configuration
and management of supported container engines.
For example, running the following command:
```bash
nvidia-ctk runtime configure --set-as-default
```
will ensure that the NVIDIA Container Runtime is added as the default runtime to the default container
engine.

View File

@@ -20,6 +20,7 @@ import (
"os"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime"
"github.com/NVIDIA/nvidia-container-toolkit/internal/info"
log "github.com/sirupsen/logrus"
cli "github.com/urfave/cli/v2"
@@ -70,6 +71,7 @@ func main() {
// Define the subcommands
c.Commands = []*cli.Command{
hook.NewCommand(logger),
runtime.NewCommand(logger),
}
// Run the CLI

View File

@@ -0,0 +1,154 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package configure
import (
"encoding/json"
"fmt"
"os"
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime/nvidia"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/docker"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
const (
defaultRuntime = "docker"
defaultDockerConfigFilePath = "/etc/docker/daemon.json"
)
type command struct {
logger *logrus.Logger
}
// NewCommand constructs an configure command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := command{
logger: logger,
}
return c.build()
}
// config defines the options that can be set for the CLI through config files,
// environment variables, or command line config
type config struct {
dryRun bool
runtime string
configFilePath string
nvidiaOptions nvidia.Options
}
func (m command) build() *cli.Command {
// Create a config struct to hold the parsed environment variables or command line flags
config := config{}
// Create the 'configure' command
configure := cli.Command{
Name: "configure",
Usage: "Add a runtime to the specified container engine",
Action: func(c *cli.Context) error {
return m.configureWrapper(c, &config)
},
}
configure.Flags = []cli.Flag{
&cli.BoolFlag{
Name: "dry-run",
Usage: "update the runtime configuration as required but don't write changes to disk",
Destination: &config.dryRun,
},
&cli.StringFlag{
Name: "runtime",
Usage: "the target runtime engine. One of [docker]",
Value: defaultRuntime,
Destination: &config.runtime,
},
&cli.StringFlag{
Name: "config",
Usage: "path to the config file for the target runtime",
Destination: &config.configFilePath,
},
&cli.StringFlag{
Name: "nvidia-runtime-name",
Usage: "specify the name of the NVIDIA runtime that will be added",
Value: nvidia.RuntimeName,
Destination: &config.nvidiaOptions.RuntimeName,
},
&cli.StringFlag{
Name: "runtime-path",
Usage: "specify the path to the NVIDIA runtime executable",
Value: nvidia.RuntimeExecutable,
Destination: &config.nvidiaOptions.RuntimePath,
},
&cli.BoolFlag{
Name: "set-as-default",
Usage: "set the specified runtime as the default runtime",
Destination: &config.nvidiaOptions.SetAsDefault,
},
}
return &configure
}
func (m command) configureWrapper(c *cli.Context, config *config) error {
switch config.runtime {
case "docker":
return m.configureDocker(c, config)
}
return fmt.Errorf("unrecognized runtime '%v'", config.runtime)
}
// configureDocker updates the docker config to enable the NVIDIA Container Runtime
func (m command) configureDocker(c *cli.Context, config *config) error {
configFilePath := config.configFilePath
if configFilePath == "" {
configFilePath = defaultDockerConfigFilePath
}
cfg, err := docker.LoadConfig(configFilePath)
if err != nil {
return fmt.Errorf("unable to load config: %v", err)
}
defaultRuntime := config.nvidiaOptions.DefaultRuntime()
runtimeConfig := config.nvidiaOptions.Runtime().DockerRuntimesConfig()
err = docker.UpdateConfig(cfg, defaultRuntime, runtimeConfig)
if err != nil {
return fmt.Errorf("unable to update config: %v", err)
}
if config.dryRun {
output, err := json.MarshalIndent(cfg, "", " ")
if err != nil {
return fmt.Errorf("unable to convert to JSON: %v", err)
}
os.Stdout.WriteString(fmt.Sprintf("%s\n", output))
return nil
}
err = docker.FlushConfig(cfg, configFilePath)
if err != nil {
return fmt.Errorf("unable to flush config: %v", err)
}
m.logger.Infof("Wrote updated config to %v", configFilePath)
m.logger.Infof("It is recommended that the docker daemon be restarted.")
return nil
}

View File

@@ -0,0 +1,75 @@
/*
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package nvidia
const (
// RuntimeName is the default name to use in configs for the NVIDIA Container Runtime
RuntimeName = "nvidia"
// RuntimeExecutable is the default NVIDIA Container Runtime executable file name
RuntimeExecutable = "nvidia-container-runtime"
)
// Options specifies the options for the NVIDIA Container Runtime w.r.t a container engine such as docker.
type Options struct {
SetAsDefault bool
RuntimeName string
RuntimePath string
}
// Runtime defines an NVIDIA runtime with a name and a executable
type Runtime struct {
Name string
Path string
}
// DefaultRuntime returns the default runtime for the configured options.
// If the configuration is invalid or the default runtimes should not be set
// the empty string is returned.
func (o Options) DefaultRuntime() string {
if !o.SetAsDefault {
return ""
}
return o.RuntimeName
}
// Runtime creates a runtime struct based on the options.
func (o Options) Runtime() Runtime {
path := o.RuntimePath
if o.RuntimePath == "" {
path = RuntimeExecutable
}
r := Runtime{
Name: o.RuntimeName,
Path: path,
}
return r
}
// DockerRuntimesConfig generatest the expected docker config for the specified runtime
func (r Runtime) DockerRuntimesConfig() map[string]interface{} {
runtimes := make(map[string]interface{})
runtimes[r.Name] = map[string]interface{}{
"path": r.Path,
"args": []string{},
}
return runtimes
}

View File

@@ -0,0 +1,49 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package runtime
import (
"github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/runtime/configure"
"github.com/sirupsen/logrus"
"github.com/urfave/cli/v2"
)
type runtimeCommand struct {
logger *logrus.Logger
}
// NewCommand constructs a runtime command with the specified logger
func NewCommand(logger *logrus.Logger) *cli.Command {
c := runtimeCommand{
logger: logger,
}
return c.build()
}
func (m runtimeCommand) build() *cli.Command {
// Create the 'runtime' command
runtime := cli.Command{
Name: "runtime",
Usage: "A collection of runtime-related utilities for the NVIDIA Container Toolkit",
}
runtime.Subcommands = []*cli.Command{
configure.NewCommand(m.logger),
}
return &runtime
}

View File

@@ -1,19 +0,0 @@
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
#experimental = false

View File

@@ -16,4 +16,17 @@ ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
#experimental = false
log-level = "info"
# Specify the runtimes to consider. This list is processed in order and the PATH
# searched for matching executables unless the entry is an absolute path.
runtimes = [
"docker-runc",
"runc",
]
mode = "auto"
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

View File

@@ -16,4 +16,17 @@ ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
#experimental = false
log-level = "info"
# Specify the runtimes to consider. This list is processed in order and the PATH
# searched for matching executables unless the entry is an absolute path.
runtimes = [
"docker-runc",
"runc",
]
mode = "auto"
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

View File

@@ -16,4 +16,17 @@ ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
#experimental = false
log-level = "info"
# Specify the runtimes to consider. This list is processed in order and the PATH
# searched for matching executables unless the entry is an absolute path.
runtimes = [
"docker-runc",
"runc",
]
mode = "auto"
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

View File

@@ -16,7 +16,7 @@ ldconfig = "@/sbin/ldconfig.real"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
#experimental = false
log-level = "info"
# Specify the runtimes to consider. This list is processed in order and the PATH
# searched for matching executables unless the entry is an absolute path.
@@ -24,3 +24,9 @@ runtimes = [
"docker-runc",
"runc",
]
mode = "auto"
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

View File

@@ -1,73 +0,0 @@
ARG BASEIMAGE
FROM ${BASEIMAGE}
RUN yum install -y \
ca-certificates \
gcc \
wget \
git \
rpm-build \
make && \
rm -rf /var/cache/yum/*
ARG GOLANG_VERSION=0.0.0
RUN set -eux; \
\
arch="$(uname -m)"; \
case "${arch##*-}" in \
x86_64 | amd64) ARCH='amd64' ;; \
ppc64el | ppc64le) ARCH='ppc64le' ;; \
aarch64) ARCH='arm64' ;; \
*) echo "unsupported architecture"; exit 1 ;; \
esac; \
wget -nv -O - https://storage.googleapis.com/golang/go${GOLANG_VERSION}.linux-${ARCH}.tar.gz \
| tar -C /usr/local -xz
ENV GOPATH /go
ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH
# packaging
ARG PKG_NAME
ARG PKG_VERS
ARG PKG_REV
ENV VERSION $PKG_VERS
ENV RELEASE $PKG_REV
# output directory
ENV DIST_DIR=/tmp/nvidia-container-toolkit-$PKG_VERS/SOURCES
RUN mkdir -p $DIST_DIR /dist
# nvidia-container-toolkit
WORKDIR $GOPATH/src/nvidia-container-toolkit
COPY . .
ARG GIT_COMMIT
ENV GIT_COMMIT ${GIT_COMMIT}
RUN make PREFIX=${DIST_DIR} cmds
ARG CONFIG_TOML_SUFFIX
ENV CONFIG_TOML_SUFFIX ${CONFIG_TOML_SUFFIX}
COPY config/config.toml.${CONFIG_TOML_SUFFIX} $DIST_DIR/config.toml
# Hook for Project Atomic's fork of Docker: https://github.com/projectatomic/docker/tree/docker-1.13.1-rhel#add-dockerhooks-exec-custom-hooks-for-prestartpoststop-containerspatch
# This might not be useful on Amazon Linux, but it's simpler to keep the RHEL
# and Amazon Linux packages identical.
COPY oci-nvidia-hook $DIST_DIR/oci-nvidia-hook
# Hook for libpod/CRI-O: https://github.com/containers/libpod/blob/v0.8.5/pkg/hooks/docs/oci-hooks.5.md
COPY oci-nvidia-hook.json $DIST_DIR/oci-nvidia-hook.json
WORKDIR $DIST_DIR/..
COPY packaging/rpm .
CMD arch=$(uname -m) && \
rpmbuild --clean --target=$arch -bb \
-D "_topdir $PWD" \
-D "release_date $(date +'%a %b %d %Y')" \
-D "git_commit ${GIT_COMMIT}" \
-D "version $VERSION" \
-D "libnvidia_container_version ${VERSION}-${RELEASE}" \
-D "release $RELEASE" \
SPECS/nvidia-container-toolkit.spec && \
mv RPMS/$arch/*.rpm /dist

View File

@@ -65,14 +65,17 @@ RUN if [ "$(lsb_release -cs)" = "jessie" ]; then \
WORKDIR $DIST_DIR
COPY packaging/debian ./debian
ARG LIBNVIDIA_CONTAINER_TOOLS_VERSION
ENV LIBNVIDIA_CONTAINER_TOOLS_VERSION ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}
RUN dch --create --package="${PKG_NAME}" \
--newversion "${REVISION}" \
"See https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/blob/${GIT_COMMIT}/CHANGELOG.md for the changelog" && \
dch --append "Bump libnvidia-container dependency to ${REVISION}" && \
dch --append "Bump libnvidia-container dependency to ${LIBNVIDIA_CONTAINER1_VERSION}" && \
dch -r "" && \
if [ "$REVISION" != "$(dpkg-parsechangelog --show-field=Version)" ]; then exit 1; fi
CMD export DISTRIB="$(lsb_release -cs)" && \
debuild -eDISTRIB -eSECTION -eLIBNVIDIA_CONTAINER_VERSION="${REVISION}" \
debuild -eDISTRIB -eSECTION -eLIBNVIDIA_CONTAINER_TOOLS_VERSION -eVERSION="${REVISION}" \
--dpkg-buildpackage-hook='sh debian/prepare' -i -us -uc -b && \
mv /tmp/nvidia-container-toolkit_*.deb /dist

View File

@@ -18,3 +18,4 @@ RUN go install golang.org/x/lint/golint@latest
RUN go install github.com/matryer/moq@latest
RUN go install github.com/gordonklaus/ineffassign@latest
RUN go install github.com/client9/misspell/cmd/misspell@latest
RUN go install github.com/google/go-licenses@latest

View File

@@ -28,9 +28,9 @@ ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH
ARG PKG_NAME
ARG PKG_VERS
ARG PKG_REV
ENV VERSION $PKG_VERS
ENV RELEASE $PKG_REV
ENV PKG_NAME ${PKG_NAME}
ENV PKG_VERS ${PKG_VERS}
ENV PKG_REV ${PKG_REV}
# output directory
ENV DIST_DIR=/tmp/nvidia-container-toolkit-$PKG_VERS/SOURCES
@@ -57,13 +57,16 @@ COPY config/config.toml.${CONFIG_TOML_SUFFIX} $DIST_DIR/config.toml
WORKDIR $DIST_DIR/..
COPY packaging/rpm .
ARG LIBNVIDIA_CONTAINER_TOOLS_VERSION
ENV LIBNVIDIA_CONTAINER_TOOLS_VERSION ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}
CMD arch=$(uname -m) && \
rpmbuild --clean --target=$arch -bb \
-D "_topdir $PWD" \
-D "release_date $(date +'%a %b %d %Y')" \
-D "git_commit ${GIT_COMMIT}" \
-D "version $VERSION" \
-D "libnvidia_container_version ${VERSION}-${RELEASE}" \
-D "release $RELEASE" \
-D "version ${PKG_VERS}" \
-D "libnvidia_container_tools_version ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}" \
-D "release ${PKG_REV}" \
SPECS/nvidia-container-toolkit.spec && \
mv RPMS/$arch/*.rpm /dist

View File

@@ -1,3 +1,19 @@
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This is the dockerfile for building packages on yum-based RPM systems.
ARG BASEIMAGE
FROM ${BASEIMAGE}
@@ -30,9 +46,9 @@ ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH
ARG PKG_NAME
ARG PKG_VERS
ARG PKG_REV
ENV VERSION $PKG_VERS
ENV RELEASE $PKG_REV
ENV PKG_NAME ${PKG_NAME}
ENV PKG_VERS ${PKG_VERS}
ENV PKG_REV ${PKG_REV}
# output directory
ENV DIST_DIR=/tmp/nvidia-container-toolkit-$PKG_VERS/SOURCES
@@ -59,13 +75,16 @@ COPY oci-nvidia-hook.json $DIST_DIR/oci-nvidia-hook.json
WORKDIR $DIST_DIR/..
COPY packaging/rpm .
ARG LIBNVIDIA_CONTAINER_TOOLS_VERSION
ENV LIBNVIDIA_CONTAINER_TOOLS_VERSION ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}
CMD arch=$(uname -m) && \
rpmbuild --clean --target=$arch -bb \
-D "_topdir $PWD" \
-D "release_date $(date +'%a %b %d %Y')" \
-D "git_commit ${GIT_COMMIT}" \
-D "version $VERSION" \
-D "libnvidia_container_version ${VERSION}-${RELEASE}" \
-D "release $RELEASE" \
-D "version ${PKG_VERS}" \
-D "libnvidia_container_tools_version ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}" \
-D "release ${PKG_REV}" \
SPECS/nvidia-container-toolkit.spec && \
mv RPMS/$arch/*.rpm /dist

View File

@@ -58,14 +58,17 @@ COPY config/config.toml.${CONFIG_TOML_SUFFIX} $DIST_DIR/config.toml
WORKDIR $DIST_DIR
COPY packaging/debian ./debian
ARG LIBNVIDIA_CONTAINER_TOOLS_VERSION
ENV LIBNVIDIA_CONTAINER_TOOLS_VERSION ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}
RUN dch --create --package="${PKG_NAME}" \
--newversion "${REVISION}" \
"See https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/blob/${GIT_COMMIT}/CHANGELOG.md for the changelog" && \
dch --append "Bump libnvidia-container dependency to ${REVISION}" && \
dch --append "Bump libnvidia-container dependency to ${LIBNVIDIA_CONTAINER_TOOLS_VERSION}" && \
dch -r "" && \
if [ "$REVISION" != "$(dpkg-parsechangelog --show-field=Version)" ]; then exit 1; fi
CMD export DISTRIB="$(lsb_release -cs)" && \
debuild -eDISTRIB -eSECTION -eLIBNVIDIA_CONTAINER_VERSION="${REVISION}" \
debuild -eDISTRIB -eSECTION -eLIBNVIDIA_CONTAINER_TOOLS_VERSION -eVERSION="${REVISION}" \
--dpkg-buildpackage-hook='sh debian/prepare' -i -us -uc -b && \
mv /tmp/*.deb /dist

View File

@@ -14,10 +14,10 @@
# Supported OSs by architecture
AMD64_TARGETS := ubuntu20.04 ubuntu18.04 ubuntu16.04 debian10 debian9
X86_64_TARGETS := centos7 centos8 rhel7 rhel8 amazonlinux2 opensuse-leap15.1
X86_64_TARGETS := fedora35 centos7 centos8 rhel7 rhel8 amazonlinux2 opensuse-leap15.1
PPC64LE_TARGETS := ubuntu18.04 ubuntu16.04 centos7 centos8 rhel7 rhel8
ARM64_TARGETS := ubuntu20.04 ubuntu18.04
AARCH64_TARGETS := centos8 rhel8 amazonlinux2
AARCH64_TARGETS := fedora35 centos8 rhel8 amazonlinux2
# Define top-level build targets
docker%: SHELL:=/bin/bash
@@ -85,37 +85,63 @@ docker-all: $(AMD64_TARGETS) $(X86_64_TARGETS) \
--%: docker-build-%
@
LIBNVIDIA_CONTAINER_VERSION ?= $(LIB_VERSION)
LIBNVIDIA_CONTAINER_TAG ?= $(LIB_TAG)
# private ubuntu target
--ubuntu%: OS := ubuntu
--ubuntu%: LIB_VERSION := $(LIB_VERSION)$(if $(LIB_TAG),~$(LIB_TAG))
--ubuntu%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)$(if $(LIBNVIDIA_CONTAINER_TAG),~$(LIBNVIDIA_CONTAINER_TAG))-1
--ubuntu%: PKG_REV := 1
# private debian target
--debian%: OS := debian
--debian%: LIB_VERSION := $(LIB_VERSION)$(if $(LIB_TAG),~$(LIB_TAG))
--debian%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)$(if $(LIBNVIDIA_CONTAINER_TAG),~$(LIBNVIDIA_CONTAINER_TAG))-1
--debian%: PKG_REV := 1
# private centos target
--centos%: OS := centos
--centos%: PKG_REV := $(if $(LIB_TAG),0.1.$(LIB_TAG),1)
--centos%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)-$(if $(LIBNVIDIA_CONTAINER_TAG),0.1.$(LIBNVIDIA_CONTAINER_TAG),1)
--centos%: DOCKERFILE = $(CURDIR)/docker/Dockerfile.rpm-yum
--centos%: CONFIG_TOML_SUFFIX := rpm-yum
--centos8%: BASEIMAGE = quay.io/centos/centos:stream8
# private fedora target
--fedora%: OS := fedora
--fedora%: PKG_REV := $(if $(LIB_TAG),0.1.$(LIB_TAG),1)
--fedora%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)-$(if $(LIBNVIDIA_CONTAINER_TAG),0.1.$(LIBNVIDIA_CONTAINER_TAG),1)
--fedora%: DOCKERFILE = $(CURDIR)/docker/Dockerfile.rpm-yum
--fedora%: CONFIG_TOML_SUFFIX := rpm-yum
# The fedora(35) base image has very slow performance when building aarch64 packages.
# Since our primary concern here is glibc versions, we use the older glibc version available in centos8.
--fedora35%: BASEIMAGE = quay.io/centos/centos:stream8
# private amazonlinux target
--amazonlinux%: OS := amazonlinux
--amazonlinux%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)-$(if $(LIBNVIDIA_CONTAINER_TAG),0.1.$(LIBNVIDIA_CONTAINER_TAG),1)
--amazonlinux%: PKG_REV := $(if $(LIB_TAG),0.1.$(LIB_TAG),1)
--amazonlinux%: DOCKERFILE = $(CURDIR)/docker/Dockerfile.rpm-yum
--amazonlinux%: CONFIG_TOML_SUFFIX := rpm-yum
# private opensuse-leap target
--opensuse-leap%: OS = opensuse-leap
--opensuse-leap%: BASEIMAGE = opensuse/leap:$(VERSION)
--opensuse-leap%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)-$(if $(LIBNVIDIA_CONTAINER_TAG),0.1.$(LIBNVIDIA_CONTAINER_TAG),1)
--opensuse-leap%: PKG_REV := $(if $(LIB_TAG),0.1.$(LIB_TAG),1)
# private rhel target (actually built on centos)
--rhel%: OS := centos
--rhel%: LIBNVIDIA_CONTAINER_TOOLS_VERSION := $(LIBNVIDIA_CONTAINER_VERSION)-$(if $(LIBNVIDIA_CONTAINER_TAG),0.1.$(LIBNVIDIA_CONTAINER_TAG),1)
--rhel%: PKG_REV := $(if $(LIB_TAG),0.1.$(LIB_TAG),1)
--rhel%: VERSION = $(patsubst rhel%-$(ARCH),%,$(TARGET_PLATFORM))
--rhel%: ARTIFACTS_DIR = $(DIST_DIR)/rhel$(VERSION)/$(ARCH)
--rhel%: DOCKERFILE = $(CURDIR)/docker/Dockerfile.rpm-yum
--rhel%: CONFIG_TOML_SUFFIX := rpm-yum
--rhel8%: BASEIMAGE = quay.io/centos/centos:stream8
# We allow the CONFIG_TOML_SUFFIX to be overridden.
CONFIG_TOML_SUFFIX ?= $(OS)
@@ -131,8 +157,9 @@ docker-build-%:
--build-arg PKG_NAME="$(LIB_NAME)" \
--build-arg PKG_VERS="$(LIB_VERSION)" \
--build-arg PKG_REV="$(PKG_REV)" \
--build-arg LIBNVIDIA_CONTAINER_TOOLS_VERSION="$(LIBNVIDIA_CONTAINER_TOOLS_VERSION)" \
--build-arg CONFIG_TOML_SUFFIX="$(CONFIG_TOML_SUFFIX)" \
--build-arg GIT_COMMIT="$(GIT_COMMIT)" \
--build-arg GIT_COMMIT="$(GIT_COMMIT)" \
--tag $(BUILDIMAGE) \
--file $(DOCKERFILE) .
$(DOCKER) run \

14
go.mod
View File

@@ -4,15 +4,21 @@ go 1.14
require (
github.com/BurntSushi/toml v1.0.0
github.com/NVIDIA/go-nvml v0.11.6-0
github.com/container-orchestrated-devices/container-device-interface v0.3.1-0.20220224133719-e5457123010b
github.com/containers/podman/v4 v4.0.3
github.com/NVIDIA/go-nvml v0.11.6-0.0.20220715143214-a79f46f2a6f7
github.com/container-orchestrated-devices/container-device-interface v0.4.1-0.20220614144320-dc973e22f674
github.com/cpuguy83/go-md2man/v2 v2.0.1 // indirect
github.com/kr/text v0.2.0 // indirect
github.com/opencontainers/runc v1.1.3
github.com/opencontainers/runtime-spec v1.0.3-0.20211214071223-8958f93039ab
github.com/opencontainers/runtime-tools v0.9.1-0.20220110225228-7e2d60f1e41f // indirect
github.com/pelletier/go-toml v1.9.4
github.com/sirupsen/logrus v1.8.1
github.com/stretchr/testify v1.7.0
github.com/tsaikd/KDGoLib v0.0.0-20191001134900-7f3cf518e07d
github.com/urfave/cli v1.22.4 // indirect
github.com/urfave/cli/v2 v2.3.0
github.com/xeipuuv/gojsonpointer v0.0.0-20190809123943-df4f5c81cb3b // indirect
gitlab.com/nvidia/cloud-native/go-nvlib v0.0.0-20220725232003-c7f47cb02a33
golang.org/x/mod v0.5.0
golang.org/x/sys v0.0.0-20220114195835-da31bd327af9
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect
)

1336
go.sum

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,115 @@
/**
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package docker
import (
"bytes"
"encoding/json"
"fmt"
"io/ioutil"
"os"
log "github.com/sirupsen/logrus"
)
// LoadConfig loads the docker config from disk
func LoadConfig(configFilePath string) (map[string]interface{}, error) {
log.Infof("Loading docker config from %v", configFilePath)
info, err := os.Stat(configFilePath)
if os.IsExist(err) && info.IsDir() {
return nil, fmt.Errorf("config file is a directory")
}
cfg := make(map[string]interface{})
if os.IsNotExist(err) {
log.Infof("Config file does not exist, creating new one")
return cfg, nil
}
readBytes, err := ioutil.ReadFile(configFilePath)
if err != nil {
return nil, fmt.Errorf("unable to read config: %v", err)
}
reader := bytes.NewReader(readBytes)
if err := json.NewDecoder(reader).Decode(&cfg); err != nil {
return nil, err
}
log.Infof("Successfully loaded config")
return cfg, nil
}
// UpdateConfig updates the docker config to include the nvidia runtimes
func UpdateConfig(config map[string]interface{}, defaultRuntime string, newRuntimes map[string]interface{}) error {
if defaultRuntime != "" {
config["default-runtime"] = defaultRuntime
}
// Read the existing runtimes
runtimes := make(map[string]interface{})
if _, exists := config["runtimes"]; exists {
runtimes = config["runtimes"].(map[string]interface{})
}
// Add / update the runtime definitions
for name, rt := range newRuntimes {
runtimes[name] = rt
}
// Update the runtimes definition
if len(runtimes) > 0 {
config["runtimes"] = runtimes
}
return nil
}
// FlushConfig flushes the updated/reverted config out to disk
func FlushConfig(cfg map[string]interface{}, configFilePath string) error {
log.Infof("Flushing docker config to %v", configFilePath)
output, err := json.MarshalIndent(cfg, "", " ")
if err != nil {
return fmt.Errorf("unable to convert to JSON: %v", err)
}
switch len(output) {
case 0:
err := os.Remove(configFilePath)
if err != nil {
return fmt.Errorf("unable to remove empty file: %v", err)
}
log.Infof("Config empty, removing file")
default:
f, err := os.Create(configFilePath)
if err != nil {
return fmt.Errorf("unable to open %v for writing: %v", configFilePath, err)
}
defer f.Close()
_, err = f.WriteString(string(output))
if err != nil {
return fmt.Errorf("unable to write output: %v", err)
}
}
log.Infof("Successfully flushed config")
return nil
}

View File

@@ -0,0 +1,228 @@
/**
# Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
*/
package docker
import (
"encoding/json"
"fmt"
"testing"
"github.com/stretchr/testify/require"
)
func TestUpdateConfigDefaultRuntime(t *testing.T) {
testCases := []struct {
config map[string]interface{}
defaultRuntime string
runtimeName string
expectedDefaultRuntimeName interface{}
}{
{
defaultRuntime: "",
expectedDefaultRuntimeName: nil,
},
{
defaultRuntime: "NAME",
expectedDefaultRuntimeName: "NAME",
},
{
config: map[string]interface{}{
"default-runtime": "ALREADY_SET",
},
defaultRuntime: "",
expectedDefaultRuntimeName: "ALREADY_SET",
},
{
config: map[string]interface{}{
"default-runtime": "ALREADY_SET",
},
defaultRuntime: "NAME",
expectedDefaultRuntimeName: "NAME",
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("test case %d", i), func(t *testing.T) {
if tc.config == nil {
tc.config = make(map[string]interface{})
}
err := UpdateConfig(tc.config, tc.defaultRuntime, nil)
require.NoError(t, err)
defaultRuntimeName := tc.config["default-runtime"]
require.EqualValues(t, tc.expectedDefaultRuntimeName, defaultRuntimeName)
})
}
}
func TestUpdateConfigRuntimes(t *testing.T) {
testCases := []struct {
config map[string]interface{}
runtimes map[string]interface{}
expectedConfig map[string]interface{}
}{
{
config: map[string]interface{}{},
runtimes: map[string]interface{}{
"runtime1": map[string]interface{}{
"path": "/test/runtime/dir/runtime1",
"args": []string{},
},
"runtime2": map[string]interface{}{
"path": "/test/runtime/dir/runtime2",
"args": []string{},
},
},
expectedConfig: map[string]interface{}{
"runtimes": map[string]interface{}{
"runtime1": map[string]interface{}{
"path": "/test/runtime/dir/runtime1",
"args": []string{},
},
"runtime2": map[string]interface{}{
"path": "/test/runtime/dir/runtime2",
"args": []string{},
},
},
},
},
{
config: map[string]interface{}{
"runtimes": map[string]interface{}{
"runtime1": map[string]interface{}{
"path": "runtime1",
"args": []string{},
},
},
},
runtimes: map[string]interface{}{
"runtime1": map[string]interface{}{
"path": "/test/runtime/dir/runtime1",
"args": []string{},
},
"runtime2": map[string]interface{}{
"path": "/test/runtime/dir/runtime2",
"args": []string{},
},
},
expectedConfig: map[string]interface{}{
"runtimes": map[string]interface{}{
"runtime1": map[string]interface{}{
"path": "/test/runtime/dir/runtime1",
"args": []string{},
},
"runtime2": map[string]interface{}{
"path": "/test/runtime/dir/runtime2",
"args": []string{},
},
},
},
},
{
config: map[string]interface{}{
"runtimes": map[string]interface{}{
"not-nvidia": map[string]interface{}{
"path": "some-other-path",
"args": []string{},
},
},
},
runtimes: map[string]interface{}{
"runtime1": map[string]interface{}{
"path": "/test/runtime/dir/runtime1",
"args": []string{},
},
},
expectedConfig: map[string]interface{}{
"runtimes": map[string]interface{}{
"not-nvidia": map[string]interface{}{
"path": "some-other-path",
"args": []string{},
},
"runtime1": map[string]interface{}{
"path": "/test/runtime/dir/runtime1",
"args": []string{},
},
},
},
},
{
config: map[string]interface{}{
"exec-opts": []string{"native.cgroupdriver=systemd"},
"log-driver": "json-file",
"log-opts": map[string]string{
"max-size": "100m",
},
"storage-driver": "overlay2",
},
runtimes: map[string]interface{}{
"runtime1": map[string]interface{}{
"path": "/test/runtime/dir/runtime1",
"args": []string{},
},
},
expectedConfig: map[string]interface{}{
"exec-opts": []string{"native.cgroupdriver=systemd"},
"log-driver": "json-file",
"log-opts": map[string]string{
"max-size": "100m",
},
"storage-driver": "overlay2",
"runtimes": map[string]interface{}{
"runtime1": map[string]interface{}{
"path": "/test/runtime/dir/runtime1",
"args": []string{},
},
},
},
},
{
config: map[string]interface{}{
"exec-opts": []string{"native.cgroupdriver=systemd"},
"log-driver": "json-file",
"log-opts": map[string]string{
"max-size": "100m",
},
"storage-driver": "overlay2",
},
expectedConfig: map[string]interface{}{
"exec-opts": []string{"native.cgroupdriver=systemd"},
"log-driver": "json-file",
"log-opts": map[string]string{
"max-size": "100m",
},
"storage-driver": "overlay2",
},
},
}
for i, tc := range testCases {
t.Run(fmt.Sprintf("test case %d", i), func(t *testing.T) {
err := UpdateConfig(tc.config, "", tc.runtimes)
require.NoError(t, err)
configContent, err := json.MarshalIndent(tc.config, "", " ")
require.NoError(t, err)
expectedContent, err := json.MarshalIndent(tc.expectedConfig, "", " ")
require.NoError(t, err)
require.EqualValues(t, string(expectedContent), string(configContent))
})
}
}

View File

@@ -112,6 +112,36 @@ func (i CUDA) HasDisableRequire() bool {
return false
}
// DevicesFromEnvvars returns the devices requested by the image through environment variables
func (i CUDA) DevicesFromEnvvars(envVars ...string) []string {
// Grab a reference to devices from the first envvar
// in the list that actually exists in the environment.
var devices *string
for _, envVar := range envVars {
if devs, ok := i[envVar]; ok {
devices = &devs
break
}
}
// Environment variable unset with legacy image: default to "all".
if devices == nil && i.IsLegacy() {
return []string{"all"}
}
// Environment variable unset or empty or "void": return nil
if devices == nil || len(*devices) == 0 || *devices == "void" {
return nil
}
// Environment variable set to "none": reset to "".
if *devices == "none" {
return []string{""}
}
return strings.Split(*devices, ",")
}
func (i CUDA) legacyVersion() (string, error) {
majorMinor, err := parseMajorMinorVersion(i[envCUDAVersion])
if err != nil {

View File

@@ -44,6 +44,12 @@ type RuntimeConfig struct {
// modesConfig defines (optional) per-mode configs
type modesConfig struct {
CSV csvModeConfig `toml:"csv"`
CDI cdiModeConfig `toml:"cdi"`
}
type cdiModeConfig struct {
// SpecDirs allows for the default spec dirs for CDI to be overridden
SpecDirs []string `toml:"spec-dirs"`
}
type csvModeConfig struct {

View File

@@ -0,0 +1,62 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import (
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/sirupsen/logrus"
)
// charDevices is a discover for a list of character devices
type charDevices mounts
var _ Discover = (*charDevices)(nil)
// NewCharDeviceDiscoverer creates a discoverer which locates the specified set of device nodes.
func NewCharDeviceDiscoverer(logger *logrus.Logger, devices []string, root string) Discover {
locator := lookup.NewCharDeviceLocator(logger, root)
return NewDeviceDiscoverer(logger, locator, root, devices)
}
// NewDeviceDiscoverer creates a discoverer which locates the specified set of device nodes using the specified locator.
func NewDeviceDiscoverer(logger *logrus.Logger, locator lookup.Locator, root string, devices []string) Discover {
m := NewMounts(logger, locator, root, devices).(*mounts)
return (*charDevices)(m)
}
// Mounts returns the discovered mounts for the charDevices.
// Since this explicitly specifies a device list, the mounts are nil.
func (d *charDevices) Mounts() ([]Mount, error) {
return nil, nil
}
// Devices returns the discovered devices for the charDevices.
// Here the device nodes are first discovered as mounts and these are converted to devices.
func (d *charDevices) Devices() ([]Device, error) {
devicesAsMounts, err := (*mounts)(d).Mounts()
if err != nil {
return nil, err
}
var devices []Device
for _, mount := range devicesAsMounts {
devices = append(devices, Device(mount))
}
return devices, nil
}

View File

@@ -0,0 +1,83 @@
/**
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import (
"fmt"
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
)
func TestCharDevices(t *testing.T) {
logger, logHook := testlog.NewNullLogger()
testCases := []struct {
description string
input *charDevices
expectedMounts []Mount
expectedMountsError error
expectedDevicesError error
expectedDevices []Device
}{
{
description: "dev mounts are empty",
input: (*charDevices)(
&mounts{
lookup: &lookup.LocatorMock{
LocateFunc: func(string) ([]string, error) {
return []string{"located"}, nil
},
},
required: []string{"required"},
},
),
expectedDevices: []Device{{Path: "located", HostPath: "located"}},
},
{
description: "dev devices returns error for nil lookup",
input: &charDevices{},
expectedDevicesError: fmt.Errorf("no lookup defined"),
},
}
for _, tc := range testCases {
logHook.Reset()
t.Run(tc.description, func(t *testing.T) {
tc.input.logger = logger
mounts, err := tc.input.Mounts()
if tc.expectedMountsError != nil {
require.Error(t, err)
} else {
require.NoError(t, err)
}
require.ElementsMatch(t, tc.expectedMounts, mounts)
devices, err := tc.input.Devices()
if tc.expectedDevicesError != nil {
require.Error(t, err)
} else {
require.NoError(t, err)
}
require.ElementsMatch(t, tc.expectedDevices, devices)
})
}
}

View File

@@ -24,11 +24,6 @@ import (
"github.com/sirupsen/logrus"
)
// charDevices is a discover for a list of character devices
type charDevices mounts
var _ Discover = (*charDevices)(nil)
// NewFromCSVFiles creates a discoverer for the specified CSV files. A logger is also supplied.
// The constructed discoverer is comprised of a list, with each element in the list being associated with a
// single CSV files.
@@ -47,23 +42,21 @@ func NewFromCSVFiles(logger *logrus.Logger, files []string, root string) (Discov
csv.MountSpecSym: symlinkLocator,
}
var discoverers []Discover
var mountSpecs []*csv.MountSpec
for _, filename := range files {
d, err := NewFromCSVFile(logger, locators, filename)
targets, err := loadCSVFile(logger, filename)
if err != nil {
logger.Warnf("Skipping CSV file %v: %v", filename, err)
continue
}
discoverers = append(discoverers, d)
mountSpecs = append(mountSpecs, targets...)
}
return &list{discoverers: discoverers}, nil
return newFromMountSpecs(logger, locators, root, mountSpecs)
}
// NewFromCSVFile creates a discoverer for the specified CSV file. A logger is also supplied.
// The constructed discoverer is comprised of a list, with each element in the list being associated with a particular
// MountSpecType.
func NewFromCSVFile(logger *logrus.Logger, locators map[csv.MountSpecType]lookup.Locator, filename string) (Discover, error) {
// loadCSVFile loads the specified CSV file and returns the list of mount specs
func loadCSVFile(logger *logrus.Logger, filename string) ([]*csv.MountSpec, error) {
// Create a discoverer for each file-kind combination
targets, err := csv.NewCSVFileParser(logger, filename).Parse()
if err != nil {
@@ -73,12 +66,12 @@ func NewFromCSVFile(logger *logrus.Logger, locators map[csv.MountSpecType]lookup
return nil, fmt.Errorf("CSV file is empty")
}
return newFromMountSpecs(logger, locators, targets)
return targets, nil
}
// newFromMountSpecs creates a discoverer for the CSV file. A logger is also supplied.
// A list of csvDiscoverers is returned, with each being associated with a single MountSpecType.
func newFromMountSpecs(logger *logrus.Logger, locators map[csv.MountSpecType]lookup.Locator, targets []*csv.MountSpec) (Discover, error) {
func newFromMountSpecs(logger *logrus.Logger, locators map[csv.MountSpecType]lookup.Locator, root string, targets []*csv.MountSpec) (Discover, error) {
if len(targets) == 0 {
return &None{}, nil
}
@@ -99,41 +92,16 @@ func newFromMountSpecs(logger *logrus.Logger, locators map[csv.MountSpecType]loo
return nil, fmt.Errorf("no locator defined for '%v'", t)
}
m := &mounts{
logger: logger,
lookup: locator,
required: candidatesByType[t],
}
var m Discover
switch t {
case csv.MountSpecDev:
// For device mount specs, we insert a charDevices into the list of discoverers.
discoverers = append(discoverers, (*charDevices)(m))
m = NewDeviceDiscoverer(logger, locator, root, candidatesByType[t])
default:
discoverers = append(discoverers, m)
m = NewMounts(logger, locator, root, candidatesByType[t])
}
discoverers = append(discoverers, m)
}
return &list{discoverers: discoverers}, nil
}
// Mounts returns the discovered mounts for the charDevices. Since this explicitly specifies a
// device list, the mounts are nil.
func (d *charDevices) Mounts() ([]Mount, error) {
return nil, nil
}
// Devices returns the discovered devices for the charDevices. Here the device nodes are first
// discovered as mounts and these are converted to devices.
func (d *charDevices) Devices() ([]Device, error) {
devicesAsMounts, err := (*mounts)(d).Mounts()
if err != nil {
return nil, err
}
var devices []Device
for _, mount := range devicesAsMounts {
devices = append(devices, Device(mount))
}
return devices, nil
}

View File

@@ -26,63 +26,6 @@ import (
"github.com/stretchr/testify/require"
)
func TestCharDevices(t *testing.T) {
logger, logHook := testlog.NewNullLogger()
testCases := []struct {
description string
input *charDevices
expectedMounts []Mount
expectedMountsError error
expectedDevicesError error
expectedDevices []Device
}{
{
description: "dev mounts are empty",
input: (*charDevices)(
&mounts{
lookup: &lookup.LocatorMock{
LocateFunc: func(string) ([]string, error) {
return []string{"located"}, nil
},
},
required: []string{"required"},
},
),
expectedDevices: []Device{{Path: "located"}},
},
{
description: "dev devices returns error for nil lookup",
input: &charDevices{},
expectedDevicesError: fmt.Errorf("no lookup defined"),
},
}
for _, tc := range testCases {
logHook.Reset()
t.Run(tc.description, func(t *testing.T) {
tc.input.logger = logger
mounts, err := tc.input.Mounts()
if tc.expectedMountsError != nil {
require.Error(t, err)
} else {
require.NoError(t, err)
}
require.ElementsMatch(t, tc.expectedMounts, mounts)
devices, err := tc.input.Devices()
if tc.expectedDevicesError != nil {
require.Error(t, err)
} else {
require.NoError(t, err)
}
require.ElementsMatch(t, tc.expectedDevices, devices)
})
}
}
func TestNewFromMountSpec(t *testing.T) {
logger, _ := testlog.NewNullLogger()
@@ -93,6 +36,7 @@ func TestNewFromMountSpec(t *testing.T) {
testCases := []struct {
description string
root string
targets []*csv.MountSpec
expectedError error
expectedDiscoverer Discover
@@ -133,12 +77,50 @@ func TestNewFromMountSpec(t *testing.T) {
&mounts{
logger: logger,
lookup: locators["dev"],
root: "/",
required: []string{"dev0", "dev1"},
},
),
&mounts{
logger: logger,
lookup: locators["lib"],
root: "/",
required: []string{"lib0"},
},
},
},
},
{
description: "sets root",
targets: []*csv.MountSpec{
{
Type: "dev",
Path: "dev0",
},
{
Type: "lib",
Path: "lib0",
},
{
Type: "dev",
Path: "dev1",
},
},
root: "/some/root",
expectedDiscoverer: &list{
discoverers: []Discover{
(*charDevices)(
&mounts{
logger: logger,
lookup: locators["dev"],
root: "/some/root",
required: []string{"dev0", "dev1"},
},
),
&mounts{
logger: logger,
lookup: locators["lib"],
root: "/some/root",
required: []string{"lib0"},
},
},
@@ -148,7 +130,7 @@ func TestNewFromMountSpec(t *testing.T) {
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
discoverer, err := newFromMountSpecs(logger, locators, tc.targets)
discoverer, err := newFromMountSpecs(logger, locators, tc.root, tc.targets)
if tc.expectedError != nil {
require.Error(t, err)
return

View File

@@ -24,12 +24,14 @@ type Config struct {
// Device represents a discovered character device.
type Device struct {
Path string
HostPath string
Path string
}
// Mount represents a discovered mount.
type Mount struct {
Path string
HostPath string
Path string
}
// Hook represents a discovered hook.

77
internal/discover/gds.go Normal file
View File

@@ -0,0 +1,77 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import (
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/sirupsen/logrus"
)
type gdsDeviceDiscoverer struct {
None
logger *logrus.Logger
devices Discover
mounts Discover
}
// NewGDSDiscoverer creates a discoverer for GPUDirect Storage devices and mounts.
func NewGDSDiscoverer(logger *logrus.Logger, root string) (Discover, error) {
devices := NewCharDeviceDiscoverer(
logger,
[]string{"/dev/nvidia-fs*"},
root,
)
udev := NewMounts(
logger,
lookup.NewDirectoryLocator(logger, root),
root,
[]string{"/run/udev"},
)
cufile := NewMounts(
logger,
lookup.NewFileLocator(logger, root),
root,
[]string{"/etc/cufile.json"},
)
d := gdsDeviceDiscoverer{
logger: logger,
devices: devices,
mounts: Merge(udev, cufile),
}
return &d, nil
}
// Devices discovers the nvidia-fs device nodes for use with GPUDirect Storage
func (d *gdsDeviceDiscoverer) Devices() ([]Device, error) {
return d.devices.Devices()
}
// Mounts discovers the required mounts for GPUDirect Storage.
// If no devices are discovered the discovered mounts are empty
func (d *gdsDeviceDiscoverer) Mounts() ([]Mount, error) {
devices, err := d.Devices()
if err != nil || len(devices) == 0 {
d.logger.Debugf("No nvidia-fs devices detected; skipping detection of mounts")
return nil, nil
}
return d.mounts.Mounts()
}

View File

@@ -100,7 +100,7 @@ func getLibDirs(mounts []Mount) []string {
if exists {
continue
}
checked[dir] = isLibName(filepath.Base(m.Path))
checked[dir] = isLibName(m.Path)
if checked[dir] {
paths = append(paths, dir)
@@ -114,13 +114,18 @@ func getLibDirs(mounts []Mount) []string {
// isLibName checks if the specified filename is a library (i.e. ends in `.so*`)
func isLibName(filename string) bool {
parts := strings.Split(filename, ".")
for _, p := range parts {
if p == "so" {
return true
}
base := filepath.Base(filename)
isLib, err := filepath.Match("lib?*.so*", base)
if !isLib || err != nil {
return false
}
return false
parts := strings.Split(base, ".so")
if len(parts) == 1 {
return true
}
return parts[len(parts)-1] == "" || strings.HasPrefix(parts[len(parts)-1], ".")
}

View File

@@ -0,0 +1,65 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import (
"testing"
"github.com/stretchr/testify/require"
)
func TestIsLibName(t *testing.T) {
testCases := []struct {
name string
isLib bool
}{
{
name: "",
isLib: false,
},
{
name: "lib/not/.so",
isLib: false,
},
{
name: "lib.so",
isLib: false,
},
{
name: "notlibcuda.so",
isLib: false,
},
{
name: "libcuda.so",
isLib: true,
},
{
name: "libcuda.so.1",
isLib: true,
},
{
name: "libcuda.soNOT",
isLib: false,
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
require.Equal(t, tc.isLib, isLibName(tc.name))
})
}
}

View File

@@ -27,8 +27,8 @@ type list struct {
var _ Discover = (*list)(nil)
// NewList creates a discoverer that is the composite of a list of discoveres.
func NewList(d ...Discover) Discover {
// Merge creates a discoverer that is the composite of a list of discoveres.
func Merge(d ...Discover) Discover {
l := list{
discoverers: d,
}

View File

@@ -0,0 +1,35 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package discover
import (
"github.com/sirupsen/logrus"
)
// NewMOFEDDiscoverer creates a discoverer for MOFED devices.
func NewMOFEDDiscoverer(logger *logrus.Logger, root string) (Discover, error) {
devices := NewCharDeviceDiscoverer(
logger,
[]string{
"/dev/infiniband/uverbs*",
"/dev/infiniband/rdma_cm",
},
root,
)
return devices, nil
}

View File

@@ -18,6 +18,8 @@ package discover
import (
"fmt"
"path/filepath"
"strings"
"sync"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
@@ -31,6 +33,7 @@ type mounts struct {
None
logger *logrus.Logger
lookup lookup.Locator
root string
required []string
sync.Mutex
cache []Mount
@@ -38,6 +41,16 @@ type mounts struct {
var _ Discover = (*mounts)(nil)
// NewMounts creates a discoverer for the required mounts using the specified locator.
func NewMounts(logger *logrus.Logger, lookup lookup.Locator, root string, required []string) Discover {
return &mounts{
logger: logger,
lookup: lookup,
root: filepath.Join("/", root),
required: required,
}
}
func (d *mounts) Mounts() ([]Mount, error) {
if d.lookup == nil {
return nil, fmt.Errorf("no lookup defined")
@@ -51,7 +64,7 @@ func (d *mounts) Mounts() ([]Mount, error) {
d.Lock()
defer d.Unlock()
paths := make(map[string]bool)
uniqueMounts := make(map[string]Mount)
for _, candidate := range d.required {
d.logger.Debugf("Locating %v", candidate)
@@ -66,20 +79,39 @@ func (d *mounts) Mounts() ([]Mount, error) {
}
d.logger.Debugf("Located %v as %v", candidate, located)
for _, p := range located {
paths[p] = true
if _, ok := uniqueMounts[p]; ok {
d.logger.Debugf("Skipping duplicate mount %v", p)
continue
}
r := d.relativeTo(p)
if r == "" {
r = p
}
d.logger.Infof("Selecting %v as %v", p, r)
uniqueMounts[p] = Mount{
HostPath: p,
Path: r,
}
}
}
var mounts []Mount
for path := range paths {
d.logger.Infof("Selecting %v", path)
mount := Mount{
Path: path,
}
mounts = append(mounts, mount)
for _, m := range uniqueMounts {
mounts = append(mounts, m)
}
d.cache = mounts
return mounts, nil
return d.cache, nil
}
// relativeTo returns the path relative to the root for the file locator
func (d *mounts) relativeTo(path string) string {
if d.root == "/" {
return path
}
return strings.TrimPrefix(path, d.root)
}

View File

@@ -70,7 +70,7 @@ func TestMounts(t *testing.T) {
},
required: []string{"required"},
},
expectedMounts: []Mount{{Path: "located"}},
expectedMounts: []Mount{{Path: "located", HostPath: "located"}},
},
{
description: "mounts removes located duplicates",
@@ -83,7 +83,7 @@ func TestMounts(t *testing.T) {
},
required: []string{"required0", "required1"},
},
expectedMounts: []Mount{{Path: "located"}},
expectedMounts: []Mount{{Path: "located", HostPath: "located"}},
},
{
description: "mounts skips located errors",
@@ -98,7 +98,7 @@ func TestMounts(t *testing.T) {
},
required: []string{"required0", "error", "required1"},
},
expectedMounts: []Mount{{Path: "required0"}, {Path: "required1"}},
expectedMounts: []Mount{{Path: "required0", HostPath: "required0"}, {Path: "required1", HostPath: "required1"}},
},
{
description: "mounts skips unlocated",
@@ -113,10 +113,10 @@ func TestMounts(t *testing.T) {
},
required: []string{"required0", "empty", "required1"},
},
expectedMounts: []Mount{{Path: "required0"}, {Path: "required1"}},
expectedMounts: []Mount{{Path: "required0", HostPath: "required0"}, {Path: "required1", HostPath: "required1"}},
},
{
description: "mounts skips unlocated",
description: "mounts adds multiple",
input: &mounts{
lookup: &lookup.LocatorMock{
LocateFunc: func(s string) ([]string, error) {
@@ -129,10 +129,25 @@ func TestMounts(t *testing.T) {
required: []string{"required0", "multiple", "required1"},
},
expectedMounts: []Mount{
{Path: "required0"},
{Path: "multiple0"},
{Path: "multiple1"},
{Path: "required1"},
{Path: "required0", HostPath: "required0"},
{Path: "multiple0", HostPath: "multiple0"},
{Path: "multiple1", HostPath: "multiple1"},
{Path: "required1", HostPath: "required1"},
},
},
{
description: "mounts uses relative path",
input: &mounts{
lookup: &lookup.LocatorMock{
LocateFunc: func(s string) ([]string, error) {
return []string{"/some/root/located"}, nil
},
},
root: "/some/root",
required: []string{"required0", "multiple", "required1"},
},
expectedMounts: []Mount{
{Path: "/located", HostPath: "/some/root/located"},
},
},
}

View File

@@ -17,29 +17,51 @@
package edits
import (
"fmt"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/container-orchestrated-devices/container-device-interface/pkg/cdi"
"github.com/container-orchestrated-devices/container-device-interface/specs-go"
"github.com/opencontainers/runc/libcontainer/devices"
)
type device discover.Device
// toEdits converts a discovered device to CDI Container Edits.
func (d device) toEdits() *cdi.ContainerEdits {
func (d device) toEdits() (*cdi.ContainerEdits, error) {
deviceNode, err := d.toSpec()
if err != nil {
return nil, err
}
e := cdi.ContainerEdits{
ContainerEdits: &specs.ContainerEdits{
DeviceNodes: []*specs.DeviceNode{d.toSpec()},
DeviceNodes: []*specs.DeviceNode{deviceNode},
},
}
return &e
return &e, nil
}
// toSpec converts a discovered Device to a CDI Spec Device. Note
// that missing info is filled in when edits are applied by querying the Device node.
func (d device) toSpec() *specs.DeviceNode {
func (d device) toSpec() (*specs.DeviceNode, error) {
// NOTE: This mirrors what cri-o does.
// https://github.com/cri-o/cri-o/blob/ca3bb80a3dda0440659fcf8da8ed6f23211de94e/internal/config/device/device.go#L93
// This can be removed once https://github.com/container-orchestrated-devices/container-device-interface/issues/72 is addressed
dev, err := devices.DeviceFromPath(d.HostPath, "rwm")
if err != nil {
return nil, fmt.Errorf("failed to query device node %v: %v", d.HostPath, err)
}
s := specs.DeviceNode{
Path: d.Path,
Path: d.Path,
Type: string(dev.Type),
Major: dev.Major,
Minor: dev.Minor,
FileMode: &dev.FileMode,
UID: &dev.Uid,
GID: &dev.Gid,
}
return &s
return &s, nil
}

View File

@@ -51,7 +51,11 @@ func NewSpecEdits(logger *logrus.Logger, d discover.Discover) (oci.SpecModifier,
c := cdi.ContainerEdits{}
for _, d := range devices {
c.Append(device(d).toEdits())
edits, err := device(d).toEdits()
if err != nil {
return nil, fmt.Errorf("failed to created container edits for device: %v", err)
}
c.Append(edits)
}
for _, m := range mounts {

View File

@@ -38,8 +38,7 @@ func (d mount) toEdits() *cdi.ContainerEdits {
// that missing info is filled in when edits are applied by querying the Mount node.
func (d mount) toSpec() *specs.Mount {
s := specs.Mount{
HostPath: d.Path,
// TODO: We need to allow the container path to be customised
HostPath: d.HostPath,
ContainerPath: d.Path,
Options: []string{
"ro",

View File

@@ -16,6 +16,8 @@
package info
import "gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvinfo"
// Logger is a basic interface for logging to allow these functions to be called
// from code where logrus is not used.
type Logger interface {
@@ -32,10 +34,10 @@ func ResolveAutoMode(logger Logger, mode string) (rmode string) {
logger.Infof("Auto-detected mode as '%v'", rmode)
}()
isTegra, reason := IsTegraSystem()
isTegra, reason := nvinfo.IsTegraSystem()
logger.Debugf("Is Tegra-based system? %v: %v", isTegra, reason)
hasNVML, reason := HasNVML()
hasNVML, reason := nvinfo.HasNVML()
logger.Debugf("Has NVML? %v: %v", hasNVML, reason)
if isTegra && !hasNVML {

View File

@@ -49,18 +49,19 @@ func NewExecutableLocator(logger *log.Logger, root string) Locator {
var _ Locator = (*executable)(nil)
// Locate finds executable files in the path. If a relative or absolute path is specified, the prefix paths are not considered.
func (p executable) Locate(filename string) ([]string, error) {
// Locate finds executable files with the specified pattern in the path.
// If a relative or absolute path is specified, the prefix paths are not considered.
func (p executable) Locate(pattern string) ([]string, error) {
// For absolute paths we ensure that it is executable
if strings.Contains(filename, "/") {
err := assertExecutable(filename)
if strings.Contains(pattern, "/") {
err := assertExecutable(pattern)
if err != nil {
return nil, fmt.Errorf("absolute path %v is not an executable file: %v", filename, err)
return nil, fmt.Errorf("absolute path %v is not an executable file: %v", pattern, err)
}
return []string{filename}, nil
return []string{pattern}, nil
}
return p.file.Locate(filename)
return p.file.Locate(pattern)
}
// assertExecutable checks whether the specified path is an execuable file.

View File

@@ -50,22 +50,29 @@ func newFileLocator(logger *log.Logger, root string) file {
var _ Locator = (*file)(nil)
// Locate attempts to find the specified file. All prefixes are searched and any matching
// candidates are returned. If no matches are found, an error is returned.
func (p file) Locate(filename string) ([]string, error) {
// Locate attempts to find files with names matching the specified pattern.
// All prefixes are searched and any matching candidates are returned. If no matches are found, an error is returned.
func (p file) Locate(pattern string) ([]string, error) {
var filenames []string
for _, prefix := range p.prefixes {
candidate := filepath.Join(prefix, filename)
p.logger.Debugf("Checking candidate '%v'", candidate)
err := p.filter(candidate)
pathPattern := filepath.Join(prefix, pattern)
candidates, err := filepath.Glob(pathPattern)
if err != nil {
p.logger.Debugf("Candidate '%v' does not meet requirements: %v", candidate, err)
continue
p.logger.Debugf("Checking pattern '%v' failed: %v", pathPattern, err)
}
for _, candidate := range candidates {
p.logger.Debugf("Checking candidate '%v'", candidate)
err := p.filter(candidate)
if err != nil {
p.logger.Debugf("Candidate '%v' does not meet requirements: %v", candidate, err)
continue
}
filenames = append(filenames, candidate)
}
filenames = append(filenames, candidate)
}
if len(filename) == 0 {
return nil, fmt.Errorf("file %v not found", filename)
if len(filenames) == 0 {
return nil, fmt.Errorf("pattern %v not found", pattern)
}
return filenames, nil
}

View File

@@ -20,6 +20,9 @@ var _ Locator = &LocatorMock{}
// LocateFunc: func(s string) ([]string, error) {
// panic("mock out the Locate method")
// },
// RelativeFunc: func(s string) (string, error) {
// panic("mock out the Relative method")
// },
// }
//
// // use mockedLocator in code that requires Locator
@@ -30,6 +33,9 @@ type LocatorMock struct {
// LocateFunc mocks the Locate method.
LocateFunc func(s string) ([]string, error)
// RelativeFunc mocks the Relative method.
RelativeFunc func(s string) (string, error)
// calls tracks calls to the methods.
calls struct {
// Locate holds details about calls to the Locate method.
@@ -37,8 +43,14 @@ type LocatorMock struct {
// S is the s argument value.
S string
}
// Relative holds details about calls to the Relative method.
Relative []struct {
// S is the s argument value.
S string
}
}
lockLocate sync.RWMutex
lockLocate sync.RWMutex
lockRelative sync.RWMutex
}
// Locate calls LocateFunc.
@@ -75,3 +87,38 @@ func (mock *LocatorMock) LocateCalls() []struct {
mock.lockLocate.RUnlock()
return calls
}
// Relative calls RelativeFunc.
func (mock *LocatorMock) Relative(s string) (string, error) {
callInfo := struct {
S string
}{
S: s,
}
mock.lockRelative.Lock()
mock.calls.Relative = append(mock.calls.Relative, callInfo)
mock.lockRelative.Unlock()
if mock.RelativeFunc == nil {
var (
sOut string
errOut error
)
return sOut, errOut
}
return mock.RelativeFunc(s)
}
// RelativeCalls gets all the calls that were made to Relative.
// Check the length with:
// len(mockedLocator.RelativeCalls())
func (mock *LocatorMock) RelativeCalls() []struct {
S string
} {
var calls []struct {
S string
}
mock.lockRelative.RLock()
calls = mock.calls.Relative
mock.lockRelative.RUnlock()
return calls
}

View File

@@ -52,10 +52,10 @@ func NewSymlinkLocator(logger *logrus.Logger, root string) Locator {
return &l
}
// Locate finds the specified file at the specified root. If the file is a symlink, the link is followed and all candidates
// to the final target are returned.
func (p symlinkChain) Locate(filename string) ([]string, error) {
candidates, err := p.file.Locate(filename)
// Locate finds the specified pattern at the specified root.
// If the file is a symlink, the link is followed and all candidates to the final target are returned.
func (p symlinkChain) Locate(pattern string) ([]string, error) {
candidates, err := p.file.Locate(pattern)
if err != nil {
return nil, err
}
@@ -104,14 +104,15 @@ func (p symlinkChain) Locate(filename string) ([]string, error) {
return filenames, nil
}
// Locate finds the specified file at the specified root. If the file is a symlink, the link is resolved and the target returned.
func (p symlink) Locate(filename string) ([]string, error) {
candidates, err := p.file.Locate(filename)
// Locate finds the specified pattern at the specified root.
// If the file is a symlink, the link is resolved and the target returned.
func (p symlink) Locate(pattern string) ([]string, error) {
candidates, err := p.file.Locate(pattern)
if err != nil {
return nil, err
}
if len(candidates) != 1 {
return nil, fmt.Errorf("failed to uniquely resolve symlink %v: %v", filename, candidates)
return nil, fmt.Errorf("failed to uniquely resolve symlink %v: %v", pattern, candidates)
}
target, err := filepath.EvalSymlinks(candidates[0])

128
internal/modifier/cdi.go Normal file
View File

@@ -0,0 +1,128 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package modifier
import (
"fmt"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
cdi "github.com/container-orchestrated-devices/container-device-interface/pkg/cdi"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/sirupsen/logrus"
)
type cdiModifier struct {
logger *logrus.Logger
specDirs []string
devices []string
}
// NewCDIModifier creates an OCI spec modifier that determines the modifications to make based on the
// CDI specifications available on the system. The NVIDIA_VISIBLE_DEVICES enviroment variable is
// used to select the devices to include.
func NewCDIModifier(logger *logrus.Logger, cfg *config.Config, ociSpec oci.Spec) (oci.SpecModifier, error) {
devices, err := getDevicesFromSpec(ociSpec)
if err != nil {
return nil, fmt.Errorf("failed to get required devices from OCI specification: %v", err)
}
if len(devices) == 0 {
logger.Debugf("No devices requested; no modification required.")
return nil, nil
}
specDirs := cdi.DefaultSpecDirs
if len(cfg.NVIDIAContainerRuntimeConfig.Modes.CDI.SpecDirs) > 0 {
specDirs = cfg.NVIDIAContainerRuntimeConfig.Modes.CDI.SpecDirs
}
m := cdiModifier{
logger: logger,
specDirs: specDirs,
devices: devices,
}
return m, nil
}
func getDevicesFromSpec(ociSpec oci.Spec) ([]string, error) {
rawSpec, err := ociSpec.Load()
if err != nil {
return nil, fmt.Errorf("failed to load OCI spec: %v", err)
}
image, err := image.NewCUDAImageFromSpec(rawSpec)
if err != nil {
return nil, err
}
envDevices := image.DevicesFromEnvvars(visibleDevicesEnvvar)
_, annotationDevices, err := cdi.ParseAnnotations(rawSpec.Annotations)
if err != nil {
return nil, fmt.Errorf("failed to parse container annotations: %v", err)
}
uniqueDevices := make(map[string]struct{})
for _, name := range append(envDevices, annotationDevices...) {
if !cdi.IsQualifiedName(name) {
name = cdi.QualifiedName("nvidia.com", "gpu", name)
}
uniqueDevices[name] = struct{}{}
}
var devices []string
for name := range uniqueDevices {
devices = append(devices, name)
}
return devices, nil
}
// Modify loads the CDI registry and injects the specified CDI devices into the OCI runtime specification.
func (m cdiModifier) Modify(spec *specs.Spec) error {
registry := cdi.GetRegistry(
cdi.WithSpecDirs(m.specDirs...),
cdi.WithAutoRefresh(false),
)
if err := registry.Refresh(); err != nil {
m.logger.Debugf("The following error was triggered when refreshing the CDI registry: %v", err)
}
devices := m.devices
for _, d := range devices {
if d == "nvidia.com/gpu=all" {
devices = []string{}
for _, candidate := range registry.DeviceDB().ListDevices() {
if strings.HasPrefix(candidate, "nvidia.com/gpu=") {
devices = append(devices, candidate)
}
}
break
}
}
m.logger.Debugf("Injecting devices using CDI: %v", devices)
_, err := registry.InjectDevices(spec, devices...)
if err != nil {
return fmt.Errorf("failed to inject CDI devices: %v", err)
}
return nil
}

View File

@@ -24,10 +24,8 @@ import (
"github.com/NVIDIA/nvidia-container-toolkit/internal/cuda"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover/csv"
"github.com/NVIDIA/nvidia-container-toolkit/internal/edits"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/NVIDIA/nvidia-container-toolkit/internal/requirements"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/sirupsen/logrus"
)
@@ -52,11 +50,13 @@ func NewCSVModifier(logger *logrus.Logger, cfg *config.Config, ociSpec oci.Spec)
return nil, fmt.Errorf("failed to load OCI spec: %v", err)
}
// In experimental mode, we check whether a modification is required at all and return the lowlevelRuntime directly
// if no modification is required.
visibleDevices, exists := ociSpec.LookupEnv(visibleDevicesEnvvar)
if !exists || visibleDevices == "" || visibleDevices == visibleDevicesVoid {
logger.Infof("No modification required: %v=%v (exists=%v)", visibleDevicesEnvvar, visibleDevices, exists)
image, err := image.NewCUDAImageFromSpec(rawSpec)
if err != nil {
return nil, err
}
if devices := image.DevicesFromEnvvars(visibleDevicesEnvvar); len(devices) == 0 {
logger.Infof("No modification required; no devices requested")
return nil, nil
}
logger.Infof("Constructing modifier from config: %+v", *cfg)
@@ -66,14 +66,7 @@ func NewCSVModifier(logger *logrus.Logger, cfg *config.Config, ociSpec oci.Spec)
NVIDIAContainerToolkitCLIExecutablePath: cfg.NVIDIACTKConfig.Path,
}
// TODO: Once the devices have been encapsulated in the CUDA image, this can be moved to before the
// visible devices are checked.
image, err := image.NewCUDAImageFromSpec(rawSpec)
if err != nil {
return nil, err
}
if err := checkRequirements(logger, &image); err != nil {
if err := checkRequirements(logger, image); err != nil {
return nil, fmt.Errorf("requirements not met: %v", err)
}
@@ -82,8 +75,7 @@ func NewCSVModifier(logger *logrus.Logger, cfg *config.Config, ociSpec oci.Spec)
return nil, fmt.Errorf("failed to get list of CSV files: %v", err)
}
nvidiaRequireJetpack, _ := ociSpec.LookupEnv(nvidiaRequireJetpackEnvvar)
if nvidiaRequireJetpack != "csv-mounts=all" {
if nvidiaRequireJetpack, _ := image[nvidiaRequireJetpackEnvvar]; nvidiaRequireJetpack != "csv-mounts=all" {
csvFiles = csv.BaseFilesOnly(csvFiles)
}
@@ -92,48 +84,37 @@ func NewCSVModifier(logger *logrus.Logger, cfg *config.Config, ociSpec oci.Spec)
return nil, fmt.Errorf("failed to create CSV discoverer: %v", err)
}
ldcacheUpdateHook, err := discover.NewLDCacheUpdateHook(logger, csvDiscoverer, config)
if err != nil {
return nil, fmt.Errorf("failed to create ldcach update hook discoverer: %v", err)
}
createSymlinksHook, err := discover.NewCreateSymlinksHook(logger, csvFiles, csvDiscoverer, config)
if err != nil {
return nil, fmt.Errorf("failed to create symlink hook discoverer: %v", err)
}
d := discover.NewList(csvDiscoverer, ldcacheUpdateHook, createSymlinksHook)
return newModifierFromDiscoverer(logger, d)
}
// newModifierFromDiscoverer created a modifier that aplies the discovered
// modifications to an OCI spec if require by the runtime wrapper.
func newModifierFromDiscoverer(logger *logrus.Logger, d discover.Discover) (oci.SpecModifier, error) {
m := csvMode{
logger: logger,
discoverer: d,
}
return &m, nil
}
// Modify applies the required modifications to the incomming OCI spec. These modifications
// are applied in-place.
func (m csvMode) Modify(spec *specs.Spec) error {
err := nvidiaContainerRuntimeHookRemover{m.logger}.Modify(spec)
ldcacheUpdateHook, err := discover.NewLDCacheUpdateHook(logger, csvDiscoverer, config)
if err != nil {
return fmt.Errorf("failed to remove existing hooks: %v", err)
return nil, fmt.Errorf("failed to create ldcach update hook discoverer: %v", err)
}
specEdits, err := edits.NewSpecEdits(m.logger, m.discoverer)
d := discover.Merge(
csvDiscoverer,
createSymlinksHook,
// The ldcacheUpdateHook is added last to ensure that the created symlinks are included
ldcacheUpdateHook,
)
discoverModifier, err := NewModifierFromDiscoverer(logger, d)
if err != nil {
return fmt.Errorf("failed to get required container edits: %v", err)
return nil, fmt.Errorf("failed to construct modifier: %v", err)
}
return specEdits.Modify(spec)
modifiers := Merge(
nvidiaContainerRuntimeHookRemover{logger},
discoverModifier,
)
return modifiers, nil
}
func checkRequirements(logger *logrus.Logger, image *image.CUDA) error {
func checkRequirements(logger *logrus.Logger, image image.CUDA) error {
if image.HasDisableRequire() {
// TODO: We could print the real value here instead
logger.Debugf("NVIDIA_DISABLE_REQUIRE=%v; skipping requirement checks", true)

View File

@@ -21,7 +21,6 @@ import (
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/opencontainers/runtime-spec/specs-go"
testlog "github.com/sirupsen/logrus/hooks/test"
@@ -95,105 +94,15 @@ func TestNewCSVModifier(t *testing.T) {
}
}
func TestExperimentalModifier(t *testing.T) {
func TestCSVModifierRemovesHook(t *testing.T) {
logger, _ := testlog.NewNullLogger()
testCases := []struct {
description string
discover *discover.DiscoverMock
spec *specs.Spec
expectedError error
expectedSpec *specs.Spec
}{
{
description: "empty discoverer does not modify spec",
discover: &discover.DiscoverMock{},
},
{
description: "failed hooks discoverer returns error",
discover: &discover.DiscoverMock{
HooksFunc: func() ([]discover.Hook, error) {
return nil, fmt.Errorf("discover.Hooks error")
},
},
expectedError: fmt.Errorf("discover.Hooks error"),
},
{
description: "discovered hooks are injected into spec",
spec: &specs.Spec{},
discover: &discover.DiscoverMock{
HooksFunc: func() ([]discover.Hook, error) {
hooks := []discover.Hook{
{
Lifecycle: "prestart",
Path: "/hook/a",
Args: []string{"/hook/a", "arga"},
},
{
Lifecycle: "createContainer",
Path: "/hook/b",
Args: []string{"/hook/b", "argb"},
},
}
return hooks, nil
},
},
expectedSpec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Path: "/hook/a",
Args: []string{"/hook/a", "arga"},
},
},
CreateContainer: []specs.Hook{
{
Path: "/hook/b",
Args: []string{"/hook/b", "argb"},
},
},
},
},
},
{
description: "existing hooks are maintained",
spec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Path: "/hook/a",
Args: []string{"/hook/a", "arga"},
},
},
},
},
discover: &discover.DiscoverMock{
HooksFunc: func() ([]discover.Hook, error) {
hooks := []discover.Hook{
{
Lifecycle: "prestart",
Path: "/hook/b",
Args: []string{"/hook/b", "argb"},
},
}
return hooks, nil
},
},
expectedSpec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Path: "/hook/a",
Args: []string{"/hook/a", "arga"},
},
{
Path: "/hook/b",
Args: []string{"/hook/b", "argb"},
},
},
},
},
},
{
description: "modification removes existing nvidia-container-runtime-hook",
spec: &specs.Spec{
@@ -206,26 +115,9 @@ func TestExperimentalModifier(t *testing.T) {
},
},
},
discover: &discover.DiscoverMock{
HooksFunc: func() ([]discover.Hook, error) {
hooks := []discover.Hook{
{
Lifecycle: "prestart",
Path: "/hook/b",
Args: []string{"/hook/b", "argb"},
},
}
return hooks, nil
},
},
expectedSpec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Path: "/hook/b",
Args: []string{"/hook/b", "argb"},
},
},
Prestart: []specs.Hook{},
},
},
},
@@ -241,26 +133,9 @@ func TestExperimentalModifier(t *testing.T) {
},
},
},
discover: &discover.DiscoverMock{
HooksFunc: func() ([]discover.Hook, error) {
hooks := []discover.Hook{
{
Lifecycle: "prestart",
Path: "/hook/b",
Args: []string{"/hook/b", "argb"},
},
}
return hooks, nil
},
},
expectedSpec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Path: "/hook/b",
Args: []string{"/hook/b", "argb"},
},
},
Prestart: []specs.Hook{},
},
},
},
@@ -268,17 +143,16 @@ func TestExperimentalModifier(t *testing.T) {
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
m, err := newModifierFromDiscoverer(logger, tc.discover)
require.NoError(t, err)
m := nvidiaContainerRuntimeHookRemover{logger: logger}
err = m.Modify(tc.spec)
err := m.Modify(tc.spec)
if tc.expectedError != nil {
require.Error(t, err)
} else {
require.NoError(t, err)
}
require.EqualValues(t, tc.expectedSpec, tc.spec)
require.Empty(t, tc.spec.Hooks.Prestart)
})
}
}

View File

@@ -0,0 +1,53 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package modifier
import (
"fmt"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/edits"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/opencontainers/runtime-spec/specs-go"
"github.com/sirupsen/logrus"
)
type discoverModifier struct {
logger *logrus.Logger
discoverer discover.Discover
}
// NewModifierFromDiscoverer creates a modifier that applies the discovered
// modifications to an OCI spec if required by the runtime wrapper.
func NewModifierFromDiscoverer(logger *logrus.Logger, d discover.Discover) (oci.SpecModifier, error) {
m := discoverModifier{
logger: logger,
discoverer: d,
}
return &m, nil
}
// Modify applies the modifications required by discoverer to the incomming OCI spec.
// These modifications are applied in-place.
func (m discoverModifier) Modify(spec *specs.Spec) error {
specEdits, err := edits.NewSpecEdits(m.logger, m.discoverer)
if err != nil {
return fmt.Errorf("failed to get required container edits: %v", err)
}
return specEdits.Modify(spec)
}

View File

@@ -0,0 +1,145 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package modifier
import (
"fmt"
"testing"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/opencontainers/runtime-spec/specs-go"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
)
func TestDiscoverModifier(t *testing.T) {
logger, _ := testlog.NewNullLogger()
testCases := []struct {
description string
discover *discover.DiscoverMock
spec *specs.Spec
expectedError error
expectedSpec *specs.Spec
}{
{
description: "empty discoverer does not modify spec",
discover: &discover.DiscoverMock{},
},
{
description: "failed hooks discoverer returns error",
discover: &discover.DiscoverMock{
HooksFunc: func() ([]discover.Hook, error) {
return nil, fmt.Errorf("discover.Hooks error")
},
},
expectedError: fmt.Errorf("discover.Hooks error"),
},
{
description: "discovered hooks are injected into spec",
spec: &specs.Spec{},
discover: &discover.DiscoverMock{
HooksFunc: func() ([]discover.Hook, error) {
hooks := []discover.Hook{
{
Lifecycle: "prestart",
Path: "/hook/a",
Args: []string{"/hook/a", "arga"},
},
{
Lifecycle: "createContainer",
Path: "/hook/b",
Args: []string{"/hook/b", "argb"},
},
}
return hooks, nil
},
},
expectedSpec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Path: "/hook/a",
Args: []string{"/hook/a", "arga"},
},
},
CreateContainer: []specs.Hook{
{
Path: "/hook/b",
Args: []string{"/hook/b", "argb"},
},
},
},
},
},
{
description: "existing hooks are maintained",
spec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Path: "/hook/a",
Args: []string{"/hook/a", "arga"},
},
},
},
},
discover: &discover.DiscoverMock{
HooksFunc: func() ([]discover.Hook, error) {
hooks := []discover.Hook{
{
Lifecycle: "prestart",
Path: "/hook/b",
Args: []string{"/hook/b", "argb"},
},
}
return hooks, nil
},
},
expectedSpec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Path: "/hook/a",
Args: []string{"/hook/a", "arga"},
},
{
Path: "/hook/b",
Args: []string{"/hook/b", "argb"},
},
},
},
},
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
m, err := NewModifierFromDiscoverer(logger, tc.discover)
require.NoError(t, err)
err = m.Modify(tc.spec)
if tc.expectedError != nil {
require.Error(t, err)
} else {
require.NoError(t, err)
}
require.EqualValues(t, tc.expectedSpec, tc.spec)
})
}
}

61
internal/modifier/gds.go Normal file
View File

@@ -0,0 +1,61 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package modifier
import (
"fmt"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/sirupsen/logrus"
)
const (
nvidiaGDSEnvvar = "NVIDIA_GDS"
)
// NewGDSModifier creates the modifiers for GDS devices.
// If the spec does not contain the NVIDIA_GDS=enabled environment variable no changes are made.
func NewGDSModifier(logger *logrus.Logger, cfg *config.Config, ociSpec oci.Spec) (oci.SpecModifier, error) {
rawSpec, err := ociSpec.Load()
if err != nil {
return nil, fmt.Errorf("failed to load OCI spec: %v", err)
}
image, err := image.NewCUDAImageFromSpec(rawSpec)
if err != nil {
return nil, err
}
if devices := image.DevicesFromEnvvars(visibleDevicesEnvvar); len(devices) == 0 {
logger.Infof("No modification required; no devices requested")
return nil, nil
}
if gds, _ := image[nvidiaGDSEnvvar]; gds != "enabled" {
return nil, nil
}
d, err := discover.NewGDSDiscoverer(logger, cfg.NVIDIAContainerCLIConfig.Root)
if err != nil {
return nil, fmt.Errorf("failed to construct discoverer for GDS devices: %v", err)
}
return NewModifierFromDiscoverer(logger, d)
}

View File

@@ -0,0 +1,111 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package modifier
import (
"testing"
"github.com/opencontainers/runtime-spec/specs-go"
testlog "github.com/sirupsen/logrus/hooks/test"
"github.com/stretchr/testify/require"
)
func TestHookRemover(t *testing.T) {
logger, _ := testlog.NewNullLogger()
testCases := []struct {
description string
spec *specs.Spec
expectedError error
expectedSpec *specs.Spec
}{
{
description: "existing hooks are maintained",
spec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Path: "/hook/a",
Args: []string{"/hook/a", "arga"},
},
},
},
},
expectedSpec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Path: "/hook/a",
Args: []string{"/hook/a", "arga"},
},
},
},
},
},
{
description: "modification removes existing nvidia-container-runtime-hook",
spec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Path: "/path/to/nvidia-container-runtime-hook",
Args: []string{"/path/to/nvidia-container-runtime-hook", "prestart"},
},
},
},
},
expectedSpec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: nil,
},
},
},
{
description: "modification removes existing nvidia-container-toolkit",
spec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Path: "/path/to/nvidia-container-toolkit",
Args: []string{"/path/to/nvidia-container-toolkit", "prestart"},
},
},
},
},
expectedSpec: &specs.Spec{
Hooks: &specs.Hooks{
Prestart: nil,
},
},
},
}
for _, tc := range testCases {
t.Run(tc.description, func(t *testing.T) {
m := nvidiaContainerRuntimeHookRemover{logger: logger}
err := m.Modify(tc.spec)
if tc.expectedError != nil {
require.Error(t, err)
} else {
require.NoError(t, err)
}
require.EqualValues(t, tc.expectedSpec, tc.spec)
})
}
}

54
internal/modifier/list.go Normal file
View File

@@ -0,0 +1,54 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package modifier
import (
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/opencontainers/runtime-spec/specs-go"
)
type list struct {
modifiers []oci.SpecModifier
}
// Merge merges a set of OCI specification modifiers as a list.
// This can be used to compose modifiers.
func Merge(modifiers ...oci.SpecModifier) oci.SpecModifier {
var filteredModifiers []oci.SpecModifier
for _, m := range modifiers {
if m == nil {
continue
}
filteredModifiers = append(filteredModifiers, m)
}
return list{
modifiers: filteredModifiers,
}
}
// Modify applies a list of modifiers in sequence and returns on any errors encountered.
func (m list) Modify(spec *specs.Spec) error {
for _, mm := range m.modifiers {
err := mm.Modify(spec)
if err != nil {
return err
}
}
return nil
}

View File

@@ -0,0 +1,61 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package modifier
import (
"fmt"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config/image"
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/sirupsen/logrus"
)
const (
nvidiaMOFEDEnvvar = "NVIDIA_MOFED"
)
// NewMOFEDModifier creates the modifiers for MOFED devices.
// If the spec does not contain the NVIDIA_MOFED=enabled environment variable no changes are made.
func NewMOFEDModifier(logger *logrus.Logger, cfg *config.Config, ociSpec oci.Spec) (oci.SpecModifier, error) {
rawSpec, err := ociSpec.Load()
if err != nil {
return nil, fmt.Errorf("failed to load OCI spec: %v", err)
}
image, err := image.NewCUDAImageFromSpec(rawSpec)
if err != nil {
return nil, err
}
if devices := image.DevicesFromEnvvars(visibleDevicesEnvvar); len(devices) == 0 {
logger.Infof("No modification required; no devices requested")
return nil, nil
}
if mofed, _ := image[nvidiaMOFEDEnvvar]; mofed != "enabled" {
return nil, nil
}
d, err := discover.NewMOFEDDiscoverer(logger, cfg.NVIDIAContainerCLIConfig.Root)
if err != nil {
return nil, fmt.Errorf("failed to construct discoverer for MOFED devices: %v", err)
}
return NewModifierFromDiscoverer(logger, d)
}

View File

@@ -20,7 +20,6 @@ import (
"os"
"os/exec"
"path/filepath"
"strings"
"github.com/NVIDIA/nvidia-container-toolkit/internal/config"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
@@ -61,8 +60,8 @@ func (m stableRuntimeModifier) Modify(spec *specs.Spec) error {
spec.Hooks = &specs.Hooks{}
} else if len(spec.Hooks.Prestart) != 0 {
for _, hook := range spec.Hooks.Prestart {
if strings.Contains(hook.Path, config.NVIDIAContainerRuntimeHookExecutable) {
m.logger.Infof("existing nvidia prestart hook found in OCI spec")
if isNVIDIAContainerRuntimeHook(&hook) {
m.logger.Infof("Existing nvidia prestart hook (%v) found in OCI spec", hook.Path)
return nil
}
}

View File

@@ -0,0 +1,45 @@
/**
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
**/
package modifier
import (
"github.com/NVIDIA/nvidia-container-toolkit/internal/discover"
"github.com/NVIDIA/nvidia-container-toolkit/internal/lookup"
"github.com/NVIDIA/nvidia-container-toolkit/internal/oci"
"github.com/sirupsen/logrus"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvinfo"
)
// NewTegraPlatformFiles creates a modifier to inject the Tegra platform files into a container.
func NewTegraPlatformFiles(logger *logrus.Logger) (oci.SpecModifier, error) {
isTegra, _ := nvinfo.IsTegraSystem()
if !isTegra {
return nil, nil
}
tegraSystemMounts := discover.NewMounts(
logger,
lookup.NewFileLocator(logger, ""),
"",
[]string{
"/etc/nv_tegra_release",
"/sys/devices/soc0/family",
},
)
return NewModifierFromDiscoverer(logger, tegraSystemMounts)
}

View File

@@ -1,2 +1,2 @@
#!/bin/sh
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" exec /usr/bin/nvidia-container-toolkit "$@"
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" exec /usr/bin/nvidia-container-runtime-hook "$@"

View File

@@ -1,8 +1,8 @@
{
"version": "1.0.0",
"hook": {
"path": "/usr/bin/nvidia-container-toolkit",
"args": ["nvidia-container-toolkit", "prestart"],
"path": "/usr/bin/nvidia-container-runtime-hook",
"args": ["nvidia-container-runtime-hook", "prestart"],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
]

View File

@@ -3,15 +3,23 @@ Section: @SECTION@utils
Priority: optional
Maintainer: NVIDIA CORPORATION <cudatools@nvidia.com>
Standards-Version: 3.9.8
Homepage: https://github.com/NVIDIA/nvidia-container-runtime/wiki
Vcs-Git: https://github.com/NVIDIA/nvidia-container-runtime
Vcs-Browser: https://github.com/NVIDIA/nvidia-container-runtime
Homepage: https://github.com/NVIDIA/nvidia-container-toolkit
Vcs-Git: https://github.com/NVIDIA/nvidia-container-toolkit
Vcs-Browser: https://github.com/NVIDIA/nvidia-container-toolkit
Build-Depends: debhelper (>= 9)
Package: nvidia-container-toolkit
Architecture: any
Depends: ${misc:Depends}, libnvidia-container-tools (>= @LIBNVIDIA_CONTAINER_VERSION@), libnvidia-container-tools (<< 2.0.0), libseccomp2
Depends: ${misc:Depends}, nvidia-container-toolkit-base (= @VERSION@), libnvidia-container-tools (>= @LIBNVIDIA_CONTAINER_TOOLS_VERSION@), libnvidia-container-tools (<< 2.0.0), libseccomp2
Breaks: nvidia-container-runtime (<= 3.5.0-1), nvidia-container-runtime-hook
Replaces: nvidia-container-runtime (<= 3.5.0-1), nvidia-container-runtime-hook
Description: NVIDIA container runtime hook
Provides a OCI hook to enable GPU support in containers.
Description: NVIDIA Container toolkit
Provides tools and utilities to enable GPU support in containers.
Package: nvidia-container-toolkit-base
Architecture: any
Depends: ${misc:Depends}
Breaks: nvidia-container-runtime (<= 3.5.0-1), nvidia-container-runtime-hook, nvidia-container-toolkit (<= 1.10.0-1)
Replaces: nvidia-container-runtime (<= 3.5.0-1), nvidia-container-runtime-hook
Description: NVIDIA Container Toolkit Base
Provides tools such as the NVIDIA Container Runtime and NVIDIA Container Toolkit CLI to enable GPU support in containers.

View File

@@ -0,0 +1,3 @@
config.toml /etc/nvidia-container-runtime
nvidia-container-runtime /usr/bin
nvidia-ctk /usr/bin

View File

@@ -1,4 +1 @@
config.toml /etc/nvidia-container-runtime
nvidia-container-toolkit /usr/bin
nvidia-container-runtime /usr/bin
nvidia-ctk /usr/bin
nvidia-container-runtime-hook /usr/bin

View File

@@ -7,9 +7,9 @@ NVIDIA_CONTAINER_TOOLKIT=/usr/bin/nvidia-container-toolkit
case "$1" in
configure)
if [ -f "${NVIDIA_CONTAINER_TOOLKIT}" ]; then
if [ ! -e "${NVIDIA_CONTAINER_RUNTIME_HOOK}" ]; then
ln -s ${NVIDIA_CONTAINER_TOOLKIT} ${NVIDIA_CONTAINER_RUNTIME_HOOK}
if [ -f "${NVIDIA_CONTAINER_RUNTIME_HOOK}" ]; then
if [ ! -e "${NVIDIA_CONTAINER_TOOLKIT}" ]; then
ln -s ${NVIDIA_CONTAINER_RUNTIME_HOOK} ${NVIDIA_CONTAINER_TOOLKIT}
fi
fi
;;

View File

@@ -7,7 +7,7 @@ NVIDIA_CONTAINER_TOOLKIT=/usr/bin/nvidia-container-toolkit
case "$1" in
purge)
[ -L "${NVIDIA_CONTAINER_RUNTIME_HOOK}" ] && rm ${NVIDIA_CONTAINER_RUNTIME_HOOK}
[ -L "${NVIDIA_CONTAINER_TOOLKIT}" ] && rm ${NVIDIA_CONTAINER_TOOLKIT}
;;
upgrade|failed-upgrade|remove|abort-install|abort-upgrade|disappear)

View File

@@ -3,7 +3,8 @@
set -e
sed -i "s;@SECTION@;${SECTION:+$SECTION/};g" debian/control
sed -i "s;@LIBNVIDIA_CONTAINER_VERSION@;${LIBNVIDIA_CONTAINER_VERSION:+$LIBNVIDIA_CONTAINER_VERSION};g" debian/control
sed -i "s;@LIBNVIDIA_CONTAINER_TOOLS_VERSION@;${LIBNVIDIA_CONTAINER_TOOLS_VERSION:+$LIBNVIDIA_CONTAINER_TOOLS_VERSION};g" debian/control
sed -i "s;@VERSION@;${VERSION:+$VERSION};g" debian/control
if [ -n "$DISTRIB" ]; then
sed -i "s;UNRELEASED;$DISTRIB;" debian/changelog

View File

@@ -6,11 +6,11 @@ Group: Development Tools
Vendor: NVIDIA CORPORATION
Packager: NVIDIA CORPORATION <cudatools@nvidia.com>
Summary: NVIDIA container runtime hook
URL: https://github.com/NVIDIA/nvidia-container-runtime
Summary: NVIDIA Container Toolkit
URL: https://github.com/NVIDIA/nvidia-container-toolkit
License: Apache-2.0
Source0: nvidia-container-toolkit
Source0: nvidia-container-runtime-hook
Source1: nvidia-container-runtime
Source2: nvidia-ctk
Source3: config.toml
@@ -18,10 +18,11 @@ Source4: oci-nvidia-hook
Source5: oci-nvidia-hook.json
Source6: LICENSE
Obsoletes: nvidia-container-runtime <= 3.5.0-1, nvidia-container-runtime-hook
Obsoletes: nvidia-container-runtime <= 3.5.0-1, nvidia-container-runtime-hook <= 1.4.0-2
Provides: nvidia-container-runtime
Provides: nvidia-container-runtime-hook
Requires: libnvidia-container-tools >= %{libnvidia_container_version}, libnvidia-container-tools < 2.0.0
Requires: libnvidia-container-tools >= %{libnvidia_container_tools_version}, libnvidia-container-tools < 2.0.0
Requires: nvidia-container-toolkit-base == %{version}-%{release}
%if 0%{?suse_version}
Requires: libseccomp2
@@ -31,14 +32,14 @@ Requires: libseccomp
%endif
%description
Provides a OCI hook to enable GPU support in containers.
Provides tools and utilities to enable GPU support in containers.
%prep
cp %{SOURCE0} %{SOURCE1} %{SOURCE2} %{SOURCE3} %{SOURCE4} %{SOURCE5} %{SOURCE6} .
%install
mkdir -p %{buildroot}%{_bindir}
install -m 755 -t %{buildroot}%{_bindir} nvidia-container-toolkit
install -m 755 -t %{buildroot}%{_bindir} nvidia-container-runtime-hook
install -m 755 -t %{buildroot}%{_bindir} nvidia-container-runtime
install -m 755 -t %{buildroot}%{_bindir} nvidia-ctk
@@ -52,17 +53,14 @@ mkdir -p %{buildroot}/usr/share/containers/oci/hooks.d
install -m 644 -t %{buildroot}/usr/share/containers/oci/hooks.d oci-nvidia-hook.json
%posttrans
ln -sf %{_bindir}/nvidia-container-toolkit %{_bindir}/nvidia-container-runtime-hook
ln -sf %{_bindir}/nvidia-container-runtime-hook %{_bindir}/nvidia-container-toolkit
%postun
rm -f %{_bindir}/nvidia-container-runtime-hook
rm -f %{_bindir}/nvidia-container-toolkit
%files
%license LICENSE
%{_bindir}/nvidia-container-toolkit
%{_bindir}/nvidia-container-runtime
%{_bindir}/nvidia-ctk
%config /etc/nvidia-container-runtime/config.toml
%{_bindir}/nvidia-container-runtime-hook
/usr/libexec/oci/hooks.d/oci-nvidia-hook
/usr/share/containers/oci/hooks.d/oci-nvidia-hook.json
@@ -70,4 +68,23 @@ rm -f %{_bindir}/nvidia-container-runtime-hook
# As of 1.10.0-1 we generate the release information automatically
* %{release_date} NVIDIA CORPORATION <cudatools@nvidia.com> %{version}-%{release}
- See https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/blob/%{git_commit}/CHANGELOG.md
- Bump libnvidia-container dependency to libnvidia-container-tools >= %{libnvidia_container_version}
- Bump libnvidia-container dependency to libnvidia-container-tools >= %{libnvidia_container_tools_version}
# The BASE package consists of the NVIDIA Container Runtime and the NVIDIA Container Toolkit CLI.
# This allows the package to be installed on systems where no NVIDIA Container CLI is available.
%package base
Summary: NVIDIA Container Toolkit Base
Obsoletes: nvidia-container-runtime <= 3.5.0-1, nvidia-container-runtime-hook <= 1.4.0-2
Provides: nvidia-container-runtime
# Since this package allows certain components of the NVIDIA Container Toolkit to be installed separately
# it conflicts with older versions of the nvidia-container-toolkit package that also provide these files.
Conflicts: nvidia-container-toolkit <= 1.10.0-1
%description base
Provides tools such as the NVIDIA Container Runtime and NVIDIA Container Toolkit CLI to enable GPU support in containers.
%files base
%license LICENSE
%config /etc/nvidia-container-runtime/config.toml
%{_bindir}/nvidia-container-runtime
%{_bindir}/nvidia-ctk

View File

@@ -1,3 +1,3 @@
FROM centos:8
FROM quay.io/centos/centos:stream8
RUN yum install -y createrepo rpm-sign pinentry

View File

@@ -25,7 +25,7 @@ function assert_usage() {
exit 1
}
set -e -x
set -e
SCRIPTS_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )"/../scripts && pwd )"
PROJECT_ROOT="$( cd ${SCRIPTS_DIR}/.. && pwd )"
@@ -52,13 +52,16 @@ ${SCRIPTS_DIR}/get-component-versions.sh
# Build libnvidia-container
make -C ${LIBNVIDIA_CONTAINER_ROOT} -f mk/docker.mk ${TARGET}
# Build nvidia-container-toolkit
make -C ${NVIDIA_CONTAINER_TOOLKIT_ROOT} ${TARGET}
if [[ -z ${NVIDIA_CONTAINER_TOOLKIT_VERSION} ]]; then
if [[ -z ${NVIDIA_CONTAINER_TOOLKIT_VERSION} || -z ${LIBNVIDIA_CONTAINER_VERSION} ]]; then
eval $(${SCRIPTS_DIR}/get-component-versions.sh)
fi
# Build nvidia-container-toolkit
make -C ${NVIDIA_CONTAINER_TOOLKIT_ROOT} \
LIBNVIDIA_CONTAINER_VERSION="${LIBNVIDIA_CONTAINER_VERSION}" \
LIBNVIDIA_CONTAINER_TAG="${LIBNVIDIA_CONTAINER_TAG}" \
${TARGET}
# We set the TOOLKIT_VERSION, TOOLKIT_TAG for the nvidia-container-runtime and nvidia-docker targets
# The LIB_TAG is also overridden to match the TOOLKIT_TAG.
# Build nvidia-container-runtime

View File

@@ -19,7 +19,7 @@
# as well as the components included in the third_party folder.
# All required packages are generated in the specified dist folder.
set -e -x
set -e
SCRIPTS_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )"/../scripts && pwd )"
PROJECT_ROOT="$( cd ${SCRIPTS_DIR}/.. && pwd )"
@@ -37,6 +37,8 @@ all=(
centos8-x86_64
debian10-amd64
debian9-amd64
fedora35-aarch64
fedora35-x86_64
opensuse-leap15.1-x86_64
ubuntu16.04-amd64
ubuntu16.04-ppc64le
@@ -61,17 +63,26 @@ fi
eval $(${SCRIPTS_DIR}/get-component-versions.sh)
if [[ "${NVIDIA_CONTAINER_TOOLKIT_VERSION}${NVIDIA_CONTAINER_TOOLKIT_TAG:+~${NVIDIA_CONTAINER_TOOLKIT_TAG}}" != "${LIBNVIDIA_CONTAINER_VERSION}" ]]; then
if [[ -n ${NVIDIA_CONTAINER_TOOLKIT_TAG} ]]; then
echo "Allowing mismatched versions for release candidate "
: ${ALLOW_VERSION_MISMATCH:=true}
fi
if [[ "${NVIDIA_CONTAINER_TOOLKIT_PACKAGE_VERSION}" != "${LIBNVIDIA_CONTAINER_PACKAGE_VERSION}" ]]; then
set +x
echo "The libnvidia-container and nvidia-container-toolkit versions do not match."
echo "lib: '${LIBNVIDIA_CONTAINER_VERSION}'"
echo "toolkit: '${NVIDIA_CONTAINER_TOOLKIT_VERSION}${NVIDIA_CONTAINER_TOOLKIT_TAG:+~${NVIDIA_CONTAINER_TOOLKIT_TAG}}'"
echo "lib: '${LIBNVIDIA_CONTAINER_PACKAGE_VERSION}'"
echo "toolkit: '${NVIDIA_CONTAINER_TOOLKIT_PACKAGE_VERSION}'"
set -x
[[ ${ALLOW_VERSION_MISMATCH} == "true" ]] || exit 1
echo "Continuing with mismatched version"
fi
export NVIDIA_CONTAINER_TOOLKIT_VERSION
export NVIDIA_CONTAINER_TOOLKIT_TAG
export LIBNVIDIA_CONTAINER_VERSION
export LIBNVIDIA_CONTAINER_TAG
export NVIDIA_CONTAINER_RUNTIME_VERSION
export NVIDIA_DOCKER_VERSION

View File

@@ -36,6 +36,9 @@ NVIDIA_DOCKER_ROOT=${PROJECT_ROOT}/third_party/nvidia-docker
# Get version for libnvidia-container
libnvidia_container_version_tag=$(grep "#define NVC_VERSION" ${LIBNVIDIA_CONTAINER_ROOT}/src/nvc.h \
| sed -e 's/#define NVC_VERSION[[:space:]]"\(.*\)"/\1/')
libnvidia_container_version=${libnvidia_container_version_tag%%~*}
libnvidia_container_tag=${libnvidia_container_version_tag##${libnvidia_container_version}}
libnvidia_container_tag=${libnvidia_container_tag##\~}
versions_makefile=${NVIDIA_CONTAINER_TOOLKIT_ROOT}/versions.mk
# Get version for nvidia-container-toolit
@@ -53,7 +56,8 @@ nvidia_docker_version=$(grep -m 1 "^NVIDIA_DOCKER_VERSION := " ${versions_makefi
nvidia_docker_tag=${nvidia_container_toolkit_tag}
nvidia_docker_version_tag="${nvidia_docker_version}${nvidia_docker_tag:+~${nvidia_docker_tag}}"
echo "LIBNVIDIA_CONTAINER_VERSION=${libnvidia_container_version_tag}"
echo "LIBNVIDIA_CONTAINER_VERSION=${libnvidia_container_version}"
echo "LIBNVIDIA_CONTAINER_TAG=${libnvidia_container_tag}"
echo "LIBNVIDIA_CONTAINER_PACKAGE_VERSION=${libnvidia_container_version_tag//\~/-}"
echo "NVIDIA_CONTAINER_TOOLKIT_VERSION=${nvidia_container_toolkit_version}"
echo "NVIDIA_CONTAINER_TOOLKIT_TAG=${nvidia_container_toolkit_tag}"

View File

@@ -61,6 +61,8 @@ function sign() {
;;
debian*) pkg_type=deb
;;
fedora*) pkg_type=rpm
;;
opensuse-leap*) pkg_type=rpm
;;
ubuntu*) pkg_type=deb

View File

@@ -94,6 +94,8 @@ function sync() {
;;
debian*) pkg_type=deb
;;
fedora*) pkg_type=rpm
;;
opensuse-leap*) pkg_type=rpm
;;
ubuntu*) pkg_type=deb
@@ -116,8 +118,19 @@ function sync() {
return
fi
mkdir -p ${dst}
cp ${src}/libnvidia-container*.${pkg_type} ${dst}
cp ${src}/nvidia-container-toolkit*.${pkg_type} ${dst}
for f in $(ls ${src}/libnvidia-container*.${pkg_type} ${src}/nvidia-container-toolkit*.${pkg_type}); do
df=${dst}/$(basename ${f})
df_stable=${df//"/experimental/"/"/stable/"}
if [[ -f "${df}" ]]; then
echo "${df} already exists; skipping"
elif [[ ${REPO} == "experimental" && -f ${df_stable} ]]; then
echo "${df_stable} already exists; skipping"
else
cp ${f} ${df}
fi
done
if [[ ${REPO} == "stable" ]]; then
cp ${src}/nvidia-container-runtime*.${pkg_type} ${dst}
cp ${src}/nvidia-docker*.${pkg_type} ${dst}
@@ -137,6 +150,8 @@ all=(
centos8-x86_64
debian10-amd64
debian9-amd64
fedora35-aarch64
fedora35-x86_64
opensuse-leap15.1-x86_64
ubuntu16.04-amd64
ubuntu16.04-ppc64le

View File

@@ -43,8 +43,8 @@ testing::containerd::toolkit::run() {
# Ensure that we can run some non GPU containers from within dind
with_retry 3 5s testing::containerd::dind::exec " \
ctr --address=${containerd_dind_containerd_dir}/containerd.sock image pull nvcr.io/nvidia/cuda:11.1-base; \
ctr --address=${containerd_dind_containerd_dir}/containerd.sock run --rm --runtime=io.containerd.runtime.v1.linux nvcr.io/nvidia/cuda:11.1-base cuda echo foo"
ctr --address=${containerd_dind_containerd_dir}/containerd.sock image pull nvcr.io/nvidia/cuda:11.1.1-base-ubuntu20.04; \
ctr --address=${containerd_dind_containerd_dir}/containerd.sock run --rm --runtime=io.containerd.runtime.v1.linux nvcr.io/nvidia/cuda:11.1.1-base-ubuntu20.04 cuda echo foo"
# Share the volumes so that we can edit the config file and point to the new runtime
# Share the pid so that we can ask docker to reload its config
@@ -63,8 +63,8 @@ testing::containerd::toolkit::run() {
# Ensure that we haven't broken non GPU containers
with_retry 3 5s testing::containerd::dind::exec " \
ctr --address=${containerd_dind_containerd_dir}/containerd.sock image pull nvcr.io/nvidia/cuda:11.1-base; \
ctr --address=${containerd_dind_containerd_dir}/containerd.sock run --rm --runtime=io.containerd.runtime.v1.linux nvcr.io/nvidia/cuda:11.1-base cuda echo foo"
ctr --address=${containerd_dind_containerd_dir}/containerd.sock image pull nvcr.io/nvidia/cuda:11.1.1-base-ubuntu20.04; \
ctr --address=${containerd_dind_containerd_dir}/containerd.sock run --rm --runtime=io.containerd.runtime.v1.linux nvcr.io/nvidia/cuda:11.1.1-base-ubuntu20.04 cuda echo foo"
}
# This test runs containerd setup and containerd cleanup in succession to ensure that the

View File

@@ -23,7 +23,7 @@ testing::toolkit::install() {
READLINK="greadlink"
fi
testing::docker_run::toolkit::shell 'toolkit install /usr/local/nvidia/toolkit'
testing::docker_run::toolkit::shell 'toolkit install --toolkit-root=/usr/local/nvidia/toolkit'
docker run --rm -v "${shared_dir}:/work" alpine sh -c "chown -R ${uid}:${gid} /work/"
# Ensure toolkit dir is correctly setup
@@ -35,14 +35,15 @@ testing::toolkit::install() {
test -e "$(${READLINK} -f "${shared_dir}/usr/local/nvidia/toolkit/libnvidia-container-go.so.1")"
test -e "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-cli"
test -e "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-toolkit"
test -e "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-runtime-hook"
test -L "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-toolkit"
test -e "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-runtime"
grep -q -E "nvidia driver modules are not yet loaded, invoking runc directly" "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-runtime"
grep -q -E "exec runc \".@\"" "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-runtime"
test -e "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-cli.real"
test -e "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-toolkit.real"
test -e "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-runtime-hook.real"
test -e "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-runtime.real"
test -e "${shared_dir}/usr/local/nvidia/toolkit/nvidia-container-runtime.experimental"
@@ -65,7 +66,7 @@ testing::toolkit::install() {
testing::toolkit::delete() {
testing::docker_run::toolkit::shell 'mkdir -p /usr/local/nvidia/delete-toolkit'
testing::docker_run::toolkit::shell 'touch /usr/local/nvidia/delete-toolkit/test.file'
testing::docker_run::toolkit::shell 'toolkit delete /usr/local/nvidia/delete-toolkit'
testing::docker_run::toolkit::shell 'toolkit delete --toolkit-root=/usr/local/nvidia/delete-toolkit'
test ! -z "$(ls -A "${shared_dir}/usr/local/nvidia")"
test ! -e "${shared_dir}/usr/local/nvidia/delete-toolkit"

0
test/output/.gitkeep Normal file
View File

View File

@@ -14,7 +14,7 @@
WORKFLOW ?= nvidia-docker
DISTRIBUTIONS := ubuntu18.04 centos8
DISTRIBUTIONS := ubuntu18.04 centos8 fedora35
IMAGE_TARGETS := $(patsubst %,image-%, $(DISTRIBUTIONS))
RUN_TARGETS := $(patsubst %,run-%, $(DISTRIBUTIONS))
@@ -28,7 +28,6 @@ image-%: DOCKERFILE = docker/$(*)/Dockerfile
images: $(IMAGE_TARGETS)
$(IMAGE_TARGETS): image-%: $(DOCKERFILE)
docker build ${PLATFORM_ARGS} \
--build-arg WORKFLOW="$(WORKFLOW)" \
-t nvidia-container-toolkit-repo-test:$(*) \
-f $(DOCKERFILE) \
$(shell dirname $(DOCKERFILE))
@@ -36,6 +35,7 @@ $(IMAGE_TARGETS): image-%: $(DOCKERFILE)
%-ubuntu18.04: ARCH ?= amd64
%-centos8: ARCH ?= x86_64
%-fedora35: ARCH ?= x86_64
PLATFORM_ARGS = --platform=linux/${ARCH}

View File

@@ -1,16 +1,6 @@
ARG BASEIMAGE=centos:8
ARG BASEIMAGE=quay.io/centos/centos:stream8
FROM ${BASEIMAGE}
ARG BASEIMAGE
# See https://www.centos.org/centos-linux-eol/
# and https://stackoverflow.com/a/70930049 for move to vault.centos.org
# and https://serverfault.com/questions/1093922/failing-to-run-yum-update-in-centos-8 for move to vault.epel.cloud
RUN [[ "${BASEIMAGE}" != "centos:8" ]] || \
( \
sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-Linux-* && \
sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.epel.cloud|g' /etc/yum.repos.d/CentOS-Linux-* \
)
RUN yum install -y \
yum-utils \
ruby-devel \
@@ -35,9 +25,8 @@ RUN fpm -s empty \
rm -f /tmp/docker.rpm
ARG WORKFLOW=nvidia-docker
RUN curl -s -L https://nvidia.github.io/${WORKFLOW}/centos8/nvidia-docker.repo \
| tee /etc/yum.repos.d/nvidia-docker.repo
RUN curl -s -L https://nvidia.github.io/libnvidia-container/centos8/libnvidia-container.repo \
| tee /etc/yum.repos.d/nvidia-container-toolkit.repo
COPY entrypoint.sh /
COPY install_repo.sh /

View File

@@ -21,5 +21,5 @@
test_repo=$1
echo "Setting up TEST repo: ${test_repo}"
sed -i -e "s#nvidia\.github\.io/libnvidia-container#${test_repo}/libnvidia-container#g" /etc/yum.repos.d/nvidia-docker.repo
sed -i -e "s#nvidia\.github\.io/libnvidia-container#${test_repo}/libnvidia-container#g" /etc/yum.repos.d/nvidia-container-toolkit.repo
yum-config-manager --enable libnvidia-container-experimental

View File

@@ -0,0 +1,34 @@
ARG BASEIMAGE=fedora:35
FROM ${BASEIMAGE}
RUN yum install -y \
yum-utils \
ruby-devel \
gcc \
make \
rpm-build \
rubygems \
createrepo
RUN gem install --no-document fpm
# We create and install a dummy docker package since these dependencies are out of
# scope for the tests performed here.
RUN fpm -s empty \
-t rpm \
--description "A dummy package for docker-ce_18.06.3.ce-3.el7" \
-n docker-ce --version 18.06.3.ce-3.el7 \
-p /tmp/docker.rpm \
&& \
yum localinstall -y /tmp/docker.rpm \
&& \
rm -f /tmp/docker.rpm
RUN curl -s -L https://nvidia.github.io/libnvidia-container/fedora35/nvidia-container-toolkit.repo \
| tee /etc/yum.repos.d/nvidia-container-toolkit.repo
COPY entrypoint.sh /
COPY install_repo.sh /
ENTRYPOINT [ "/entrypoint.sh" ]

Some files were not shown because too many files have changed in this diff Show More