nvidia-container-toolkit

mirror of https://github.com/NVIDIA/nvidia-container-toolkit synced 2025-06-26 18:18:24 +00:00

Author	SHA1	Message	Date
Evan Lezar	f677245d60	Merge branch 'fix-multiple-driver-roots-wsl' into 'main' Fix bug with multiple driver store paths See merge request nvidia/container-toolkit/container-toolkit!425	2023-06-27 16:59:33 +02:00
Evan Lezar	9d31bd4cc3	Merge branch 'fix-cdi-permissions' into 'main' Properly set spec permissions See merge request nvidia/container-toolkit/container-toolkit!383	2023-06-27 16:27:28 +02:00
Carlos Eduardo Arango Gutierrez	b063fa40b1	Merge branch 'fix-cdi-spec-permissions' into 'main' Generate CDI specifications with 644 permissions to allow non-root clients to consume them See merge request nvidia/container-toolkit/container-toolkit!381	2023-06-27 16:27:27 +02:00
Evan Lezar	6a83e2ebe5	Add nvidia-ctk cdi transform root command Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-28 11:45:58 -07:00
Evan Lezar	2abe679dd1	Move libcuda locator to internal/lookup package Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-26 17:04:06 +02:00
Evan Lezar	226c54613e	Also return an error from nvcdi.New This change allows nvcdi.New to return an error in addition to the constructed library instead of panicing. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-26 16:13:12 +02:00
Evan Lezar	33f6fe0217	Generate a simplified CDI spec by default As simplified CDI spec has no duplicate entities in any single set of container edits. Furthermore, contianer edits defined at a spec-level are not included in the container edits for a device. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-24 11:01:46 +02:00
Evan Lezar	5ff206e1a9	Add transform to deduplicate entities in CDI spec Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-24 11:01:23 +02:00
Evan Lezar	9506bd9da0	Fix generation of management CDI spec in containers Since we relied on finding libcuda.so in the LDCache to determine both the CUDA version and the expected directory for the driver libraries, the generation of the management CDI specifications fails in containers where the LDCache has not been updated. This change falls back to searching a set of predefined paths instead when the lookup of libcuda.so in the cache fails. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-23 15:59:01 +02:00
Evan Lezar	685802b1ce	Only init nvml as required when generating CDI specs CDI generation modes such as management and wsl don't require NVML. This change removes the top-level instantiation of nvmllib and replaces it with an instanitation in the nvml CDI spec generation code. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-20 14:24:08 +02:00
Evan Lezar	868393b7ed	Add mofed mode to nvcdi API Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-07 18:47:52 +02:00
Evan Lezar	ebe18fbb7f	Add gds mode to nvcdi API Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-07 18:47:52 +02:00
Evan Lezar	29cbbe83f9	Add management mode to CDI spec generation API These changes add support for generating a management spec to the nvcdi API. A management spec consists of a single CDI device (`all`) which includes all expected NVIDIA device nodes, driver libraries, binaries, and IPC sockets. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-06 10:53:43 +02:00
Evan Lezar	314059fcf0	Move path manipulation to spec.Save Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-01 13:49:04 +02:00
Evan Lezar	9f5e141437	Expose vendor and class as options Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-01 13:48:28 +02:00
Evan Lezar	8be6de177f	Move formatJSON and formatYAML to nvcdi/spec package Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-01 13:48:28 +02:00
Evan Lezar	89321edae6	Add top-level GetSpec function to nvcdi API Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-01 13:48:28 +02:00
Evan Lezar	6d6cd56196	Return nvcdi.spec.Interface from GetSpec Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-01 12:45:30 +02:00
Evan Lezar	2e95e04359	Add nvcdi.spec package Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-03-01 12:45:30 +02:00
Evan Lezar	accba4ead5	Merge branch 'CNT-3965/clean-up-by-path-symlinks' into 'main' Improve handling of /dev/dri devices and nested device paths See merge request nvidia/container-toolkit/container-toolkit!307	2023-03-01 10:25:48 +00:00
Christopher Desiniotis	87e406eee6	Update root transformer tests to ensure container path is not modified Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>	2023-02-28 09:00:05 -08:00
Christopher Desiniotis	45ed3b0412	Handle hook arguments for creation of symlinks Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>	2023-02-28 09:00:02 -08:00
Christopher Desiniotis	0516fc96ca	Add Transform interface and initial implemention for a root transform Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>	2023-02-28 08:56:13 -08:00
Evan Lezar	b4dc1f338d	Generate nested device folder permission hooks per device This change generates device folder permission hooks per device instead of at a spec level. This ensures that the hook is not injected for a device that does not have any nested device nodes. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-02-22 17:16:23 +02:00
Evan Lezar	181128fe73	Only include by-path-symlinks for injected device nodes Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-02-22 16:53:04 +02:00
Evan Lezar	2680c45811	Add mode constants to nvcdi Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-02-20 16:33:51 +02:00
Evan Lezar	b76808dbd5	Add tests for CDI mode resolution Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-02-20 16:33:33 +02:00
Evan Lezar	4ccb0b9a53	Add and resolve auto discovery mode for cdi generation Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-02-20 14:49:58 +02:00
Evan Lezar	b21dc929ef	Add WSL2 discovery and spec generation These changes add a wsl discovery mode to the nvidia-ctk cdi generate command. If wsl mode is enabled, the driver store for the available devices is used as the source for discovered entities. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-02-20 10:30:13 +02:00
Evan Lezar	d226925fe7	Construct nvml-based CDI lib based on mode Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-02-20 10:30:13 +02:00
Evan Lezar	5103adab89	Add mode option to nvcdi API Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-02-20 10:30:13 +02:00
Kevin Klues	5710b9e7e8	Add globbing for mounting multiple GSP firmware files Newer drivers have split the GSP firmware into multiple files so a simple match against gsp.bin in the firmware directory is no longer possible. This patch adds globbing capabilitis to match any GSP firmware files of the form gsp*.bin and mount them all into the container. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2023-02-16 11:53:36 +00:00
Christopher Desiniotis	a52c9f0ac6	fix: apply options when constructing an instance of the nvcdi library Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>	2023-02-14 16:32:40 -08:00
Evan Lezar	5b110fba2d	Add nvcdi package with basic CDI generation API This change adds an nvcdi package that exposes a basic API for CDI spec generation. This is used from the nvidia-ctk cdi generate command and can be consumed by DRA implementations and the device plugin. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-02-14 19:52:31 +01:00
Evan Lezar	f72b79cc2a	Move pkg to cmd/nvidia-container-toolkit This change moves the pkg folder to `cmd/nvidia-container-toolkit` to better match go best practices. This allows, for example, for the `cmd/nvidia-container-toolkit` to be go installed. The only package included in `pkg` was `main`. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2021-06-08 15:20:59 +02:00
Evan Lezar	2a92d6acb7	Fix bug where docker swarm device selection is overriden by NVIDIA_VISIBLE_DEVICES This change fixes a bug where the value of NVIDIA_VISIBLE_DEVICES would be used to select devices even if the `swarm-resource` config option is specified. Note that this does not change the value of NVIDIA_VISIBLE_DEVICES in the container. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2021-06-07 14:10:08 +02:00
Evan Lezar	602eaf0e60	Use require package for tests Signed-off-by: Evan Lezar <elezar@nvidia.com>	2021-06-07 13:31:41 +02:00
Evan Lezar	fc408a32c7	Add utility function to get config name from struct Signed-off-by: Evan Lezar <elezar@nvidia.com>	2021-01-22 16:08:45 +01:00
Evan Lezar	f6b1b1afad	Ignore NVIDIA_VISIBLE_DEVICES for containers with insufficent privileges This change ignores the value of NVIDIA_VISIBLE_DEVICES instead of raising an error when launching a container with insufficient permissions. This changes the behaviour under the following conditions: NVIDIA_VISIBLE_DEVICES is set and accept-nvidia-visible-devices-envvar-when-unprivileged = false (default: true) or privileged = false (default: false) This means that a user need not explicitly clear the NVIDIA_VISIBLE_DEVICES environment variable if no GPUs are to be used in unprivileged containers. Note that this envvar is set to 'all' by default in many CUDA images that are used as base images. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2021-01-22 15:34:52 +01:00
Kevin Klues	20604621e4	Add 'compute' capability to list of defaults. For most practical purposes, it should be fine to set NVIDIA_DRIVER_CAPABILITIES=all nowadays. Historically, these different capabilities exist because they were added incrementally, with varying degrees of stability. It's fairly common to run with GPUs in containers today, but a few years ago the driver didn't support them very well, and it was important to make sure the libraries being injected into the container actually worked in a containerized environment. When they didn't, it was common to get information leaks, crashes, or even silent failures. In the past, whenever a new set of libraries was being vetted for injected, a new capability was added to make sure that users had control to explicitly include only those libraries they were comfortable having injected into their containers. The idea being that whoever puts together a container image for use with GPUs should have the knowledge of what capabilities the software in that container image requires, and can set the NVIDIA_DRIVER_CAPABILITIES envvar in that image appropriately. After some back and forth, we've decided it doesn't quite make sense to set it to "all" just yet, but we should set it to "utility, compute" instead of just "utility", so that at least the core CUDA libraries work by default (once installed in the container). Signed-off-by: Kevin Klues <kklues@nvidia.com>	2020-12-07 12:10:23 +00:00
Kevin Klues	2c1809475c	Add more tests for new semantics with device list from volume mounts Signed-off-by: Kevin Klues <kklues@nvidia.com>	2020-08-07 16:30:31 +00:00
Kevin Klues	7c00385797	Refactor accepting device lists from volume mounts as a boolean Also hard code the "root" path where these volume mounts will be looked for rather than making it configurable. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2020-08-07 16:30:19 +00:00
Kevin Klues	32b4b09bc9	Add tests to verify priority of device list from mounts vs. envvar Signed-off-by: Kevin Klues <kklues@nvidia.com>	2020-07-24 12:50:05 +00:00
Kevin Klues	e48d23d107	Add test for getDevicesFromMounts() Signed-off-by: Kevin Klues <kklues@nvidia.com>	2020-07-24 12:50:05 +00:00
Kevin Klues	8bcd02ee5d	Add logic implementing getDevicesFromMounts() Signed-off-by: Kevin Klues <kklues@nvidia.com>	2020-07-24 12:50:05 +00:00
Kevin Klues	7313069d4c	Update getDevices() to account for getting the devices list from mounts Signed-off-by: Kevin Klues <kklues@nvidia.com>	2020-07-24 12:50:05 +00:00
Kevin Klues	f46d1861d3	Add stub implementation for getDevicesFromMounts() Signed-off-by: Kevin Klues <kklues@nvidia.com>	2020-07-24 12:50:05 +00:00
Kevin Klues	889ebae1fe	Pull logic to get the device list from ENVVARs out to its own function Signed-off-by: Kevin Klues <kklues@nvidia.com>	2020-07-24 12:50:05 +00:00
Kevin Klues	aec9a28bc3	Push HookConfig and privileged flags down to getDevices() call Signed-off-by: Kevin Klues <kklues@nvidia.com>	2020-07-24 12:50:05 +00:00
Kevin Klues	2ae7cb07cf	Add ability to consider container mounts to generate nvidiaConfig Signed-off-by: Kevin Klues <kklues@nvidia.com>	2020-07-24 12:50:05 +00:00

1 2

67 Commits