Commit Graph

1888 Commits

Author SHA1 Message Date
Kevin Klues
20604621e4 Add 'compute' capability to list of defaults.
For most practical purposes, it should be fine to set
NVIDIA_DRIVER_CAPABILITIES=all nowadays.

Historically, these different capabilities exist because they were added
incrementally, with varying degrees of stability. It's fairly common to
run with GPUs in containers today, but a few years ago the driver didn't
support them very well, and it was important to make sure the libraries
being injected into the container actually worked in a containerized
environment. When they didn't, it was common to get information leaks,
crashes, or even silent failures.

In the past, whenever a new set of libraries was being vetted for
injected, a new capability was added to make sure that users had control
to explicitly include only those libraries they were comfortable having
injected into their containers.

The idea being that whoever puts together a container image for use with
GPUs should have the knowledge of what capabilities the software in that
container image requires, and can set the NVIDIA_DRIVER_CAPABILITIES
envvar in that image appropriately.

After some back and forth, we've decided it doesn't quite make sense to
set it to "all" just yet, but we should set it to "utility, compute"
instead of just "utility", so that at least the core CUDA libraries work
by default (once installed in the container).

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-12-07 12:10:23 +00:00
Kevin Klues
8cfb3c29f6 Merge branch 'upstream-bump-v1.3.0' into 'master'
Bump to version 1.3.0

See merge request nvidia/container-toolkit/container-toolkit!22
2020-09-16 13:34:37 +00:00
Kevin Klues
98e202d0d8 Bump to version 1.3.0
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-09-16 11:45:31 +00:00
Kevin Klues
26668097c4 Merge branch 'upstream-bump-1.3.0-rc.2' into 'master'
Bump to version 1.3.0 rc.2

See merge request nvidia/container-toolkit/container-toolkit!21
2020-08-10 15:33:25 +00:00
Kevin Klues
caf2792463 Update changelogs for 1.3.0-rc.2
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-08-10 13:08:17 +00:00
Kevin Klues
b2be0b08ac Bump version to 1.3.0-rc.2
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-08-10 13:03:00 +00:00
Kevin Klues
edc5041636 Merge branch 'upstream-update-devices-from-volume-mounts-semantics' into 'master'
Refactor accepting device lists from volume mounts as a boolean

See merge request nvidia/container-toolkit/container-toolkit!20
2020-08-07 18:40:56 +00:00
Kevin Klues
2c1809475c Add more tests for new semantics with device list from volume mounts
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-08-07 16:30:31 +00:00
Kevin Klues
7c00385797 Refactor accepting device lists from volume mounts as a boolean
Also hard code the "root" path where these volume mounts will be looked
for rather than making it configurable.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-08-07 16:30:19 +00:00
Kevin Klues
322006c361 Merge branch 'upstream-bump-1.3.0-rc.1' into 'master'
Bump version to 1.3.0-rc.1

See merge request nvidia/container-toolkit/container-toolkit!19
2020-07-24 20:36:38 +00:00
Kevin Klues
a25017fb8a Merge branch 'upstream-build-prerelease' into 'master'
Update build system to accept a TAG variable for things like rc.x

See merge request nvidia/container-toolkit/container-toolkit!18
2020-07-24 20:22:00 +00:00
Kevin Klues
928905ce94 Update changelogs for 1.3.0-rc.1
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 20:10:42 +00:00
Kevin Klues
7ed17bb9ca Bump version to 1.3.0-rc.1
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 20:03:48 +00:00
Kevin Klues
b50d86c174 Update build system to accept a TAG variable for things like rc.x
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 19:54:29 +00:00
Kevin Klues
bf342fb4c9 Merge branch 'upstream-fix-ci' into 'master'
Generalize CI variables

See merge request nvidia/container-toolkit/container-toolkit!17
2020-07-24 14:28:49 +00:00
Kevin Klues
1791372f22 Generalize CI variables
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 14:01:39 +00:00
Kevin Klues
4448319605 Merge branch 'upstream-add-alternate-device-list' into 'master'
Add the ability to pull the device list from mounted files instead of just Envvars

See merge request nvidia/container-toolkit/container-toolkit!15
2020-07-24 13:18:53 +00:00
Kevin Klues
2ea3150b60 Merge branch 'upstream-simplify-nvidia-config-generation' into 'master'
Simplify logic for `nvidiaConfig` generation

See merge request nvidia/container-toolkit/container-toolkit!14
2020-07-24 13:18:35 +00:00
Kevin Klues
32b4b09bc9 Add tests to verify priority of device list from mounts vs. envvar
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
cc0a22a6d9 Consolidate logic for building nvidiaConfig into a single function
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
e48d23d107 Add test for getDevicesFromMounts()
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
430dda41e9 Remove getNvidiaConfigLegacy() function
A subsequent commit will add equivalent functionality back in

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
8bcd02ee5d Add logic implementing getDevicesFromMounts()
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
4791fab747 Simplify getMigConfigDevices() and getMigMonitorDevices()
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
7313069d4c Update getDevices() to account for getting the devices list from mounts
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
a24b0c8b4e Split isLegacyCUDAImage() into its own helper function
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
f46d1861d3 Add stub implementation for getDevicesFromMounts()
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
0a9dc3c653 Add test to make sure that getNvidiaConfig() operates as expected
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
889ebae1fe Pull logic to get the device list from ENVVARs out to its own function
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
e4b9318de3 Only run gofmt over go files under pkg/ in CI
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
aec9a28bc3 Push HookConfig and privileged flags down to getDevices() call
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
2ae7cb07cf Add ability to consider container mounts to generate nvidiaConfig
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
da36874e91 Add new config options to pull device list from mounted files not ENVVAR
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
b9ef2db205 Remove unnecessary files from version control
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:50:05 +00:00
Kevin Klues
da6fbb343a Revert "Add ability to merge envars of the form NVIDIA_VISIBLE_DEVICES_*"
This reverts commit 01b4381282.
2020-07-24 12:50:05 +00:00
Kevin Klues
647a805341 Merge branch 'upstream-add-ci-tests' into 'master'
Add common CI tests for things like golint, gofmt, unit tests, etc.

See merge request nvidia/container-toolkit/container-toolkit!16
2020-07-24 12:39:45 +00:00
Kevin Klues
fe65573bdf Add common CI tests for things like golint, gofmt, unit tests, etc
This commit also fixes the minor issues uncovered while running these
tests locally.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:14:26 +00:00
Kevin Klues
a7fb33301c Flip build-all targets to run automatically on merge requests
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 12:14:26 +00:00
Kevin Klues
8b248b6631 Rename github.com/NVIDIA/container-toolkit to nvidia-container-toolkit
The repo name on github recently changed, so all references here should
as well.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-24 11:40:45 +00:00
Kevin Klues
d10144b3b1 Merge branch 'upstream-add-ngx-all-driver-caps' into 'master'
Add 'ngx' to list of *all* driver capabilities -- Prepare patch release for 1.2.1

See merge request nvidia/container-toolkit/container-toolkit!13
2020-07-22 15:21:11 +00:00
Kevin Klues
ba9758c7ff Update changelogs for 1.2.1
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-22 13:41:44 +00:00
Kevin Klues
d467b87ef9 Bump version to 1.2.1
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-22 13:39:31 +00:00
Kevin Klues
2f4af74320 List config.toml as a config file in the RPM SPEC
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-22 13:39:22 +00:00
Kevin Klues
4e6e0ed4f1 Add 'ngx' to list of *all* driver capabilities
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-22 13:29:39 +00:00
Kevin Klues
7ec9e84369 Merge branch 'upstream-bump-v1.2.0' into 'master'
Bump to version 1.2.0

See merge request nvidia/container-toolkit/container-toolkit!12
2020-07-08 20:29:41 +00:00
Kevin Klues
023af3729f Update changelogs for 1.2.0
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-08 18:11:44 +00:00
Kevin Klues
a63bef2281 Bump version to 1.2.0
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-08 16:22:01 +00:00
Kevin Klues
320bb6e4dc Update dependence on libnvidia-container to 1.2.0
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-08 16:22:01 +00:00
Kevin Klues
8e0aab4607 Fix repo listed in changelog for debian distributions
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2020-07-08 16:22:01 +00:00
Kevin Klues
ad7d3dda83 Merge branch 'upstream-add-ngx' into 'master'
Add the 'ngx' driver capability

See merge request nvidia/container-toolkit/container-toolkit!11
2020-06-24 18:35:52 +00:00