Commit Graph

79 Commits

Author SHA1 Message Date
Evan Lezar
e7d2a9c212 Merge branch 'CNT-1876/cdi-specs-from-csv' into 'main'
Add csv mode to CDI spec generation

See merge request nvidia/container-toolkit/container-toolkit!393
2023-05-23 14:47:19 +00:00
Evan Lezar
fcb4e379e3 Fix mode resolution tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-23 16:02:07 +02:00
Evan Lezar
3ea02d13fc Merge branch 'use-major-minor-for-cuda-version' into 'main'
Use *.* pattern when locating libcuda.so

See merge request nvidia/container-toolkit/container-toolkit!397
2023-05-22 13:02:33 +00:00
Evan Lezar
e30fd0f4ad Add csv mode to nvidia-ctk cdi generate command
This chagne allows the csv mode option to specified in the
nvidia-ctk cdi generate command and adds a --csv.file option
that can be repeated to specify the CSV files to be processed.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:56:45 +02:00
Evan Lezar
418e4014e6 Resolve to csv for CDI generation on tegra systems
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:56:00 +02:00
Evan Lezar
e78a4f5eac Add csv mode to nvcdi api
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:55:58 +02:00
Evan Lezar
c120c511d5 Merge branch 'CNT-3939/generate-all-device' into 'main'
Add options to generate all device to nvcdi API

See merge request nvidia/container-toolkit/container-toolkit!348
2023-05-22 11:53:39 +00:00
Evan Lezar
424b8c9d46 Use *.* pattern when locating libcuda.so
This change ensures that libcuda.so can be located on systems
where no patch version is specified in the driver version.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-22 13:53:19 +02:00
Evan Lezar
fe37196788 Generate all device using merged transform
The nvcid api is extended to allow for merged device options to
be specified. If any options are specified, then a merged device
is generated.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-12 13:52:58 +02:00
Evan Lezar
ba44c50f4e Add MergedDevice transform to generate all device
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-12 13:52:58 +02:00
Evan Lezar
b7e5cef934 Include xorg discoverer with graphics mounts
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 17:07:55 +02:00
Evan Lezar
8bb0235c92 Remove discover.Config
These changes remove the use of discover.Config which was used
to pass the driver root and the nvidiaCTK path in some cases.

Instead, the nvidiaCTKPath is resolved at the begining of runtime
invocation to ensure that this is valid at all points where it is
used.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-10 15:03:37 +02:00
Evan Lezar
470b4eebd8 Properly set spec permissions
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-03 10:45:11 +02:00
Evan Lezar
367a30827f Allow spec file permisions to be specified
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-05-02 16:27:50 +02:00
Carlos Eduardo Arango Gutierrez
81d8b94cdc
Export pkg config/engine
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2023-04-25 07:16:59 +02:00
Evan Lezar
6a83e2ebe5 Add nvidia-ctk cdi transform root command
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-28 11:45:58 -07:00
Evan Lezar
2abe679dd1 Move libcuda locator to internal/lookup package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-26 17:04:06 +02:00
Evan Lezar
226c54613e Also return an error from nvcdi.New
This change allows nvcdi.New to return an error in addition to the
constructed library instead of panicing.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-26 16:13:12 +02:00
Evan Lezar
33f6fe0217 Generate a simplified CDI spec by default
As simplified CDI spec has no duplicate entities in any single set of container edits.
Furthermore, contianer edits defined at a spec-level are not included in the container
edits for a device.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-24 11:01:46 +02:00
Evan Lezar
5ff206e1a9 Add transform to deduplicate entities in CDI spec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-24 11:01:23 +02:00
Evan Lezar
9506bd9da0 Fix generation of management CDI spec in containers
Since we relied on finding libcuda.so in the LDCache to determine both the CUDA
version and the expected directory for the driver libraries, the generation of the
management CDI specifications fails in containers where the LDCache has not been updated.

This change falls back to searching a set of predefined paths instead when the lookup of
libcuda.so in the cache fails.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-23 15:59:01 +02:00
Evan Lezar
685802b1ce Only init nvml as required when generating CDI specs
CDI generation modes such as management and wsl don't require
NVML. This change removes the top-level instantiation of nvmllib
and replaces it with an instanitation in the nvml CDI spec generation
code.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-20 14:24:08 +02:00
Evan Lezar
868393b7ed Add mofed mode to nvcdi API
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 18:47:52 +02:00
Evan Lezar
ebe18fbb7f Add gds mode to nvcdi API
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-07 18:47:52 +02:00
Evan Lezar
29cbbe83f9 Add management mode to CDI spec generation API
These changes add support for generating a management spec to the nvcdi API.
A management spec consists of a single CDI device (`all`) which includes all expected
NVIDIA device nodes, driver libraries, binaries, and IPC sockets.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-06 10:53:43 +02:00
Evan Lezar
314059fcf0 Move path manipulation to spec.Save
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 13:49:04 +02:00
Evan Lezar
9f5e141437 Expose vendor and class as options
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 13:48:28 +02:00
Evan Lezar
8be6de177f Move formatJSON and formatYAML to nvcdi/spec package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 13:48:28 +02:00
Evan Lezar
89321edae6 Add top-level GetSpec function to nvcdi API
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 13:48:28 +02:00
Evan Lezar
6d6cd56196 Return nvcdi.spec.Interface from GetSpec
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 12:45:30 +02:00
Evan Lezar
2e95e04359 Add nvcdi.spec package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-03-01 12:45:30 +02:00
Evan Lezar
accba4ead5 Merge branch 'CNT-3965/clean-up-by-path-symlinks' into 'main'
Improve handling of /dev/dri devices and nested device paths

See merge request nvidia/container-toolkit/container-toolkit!307
2023-03-01 10:25:48 +00:00
Christopher Desiniotis
87e406eee6 Update root transformer tests to ensure container path is not modified
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-02-28 09:00:05 -08:00
Christopher Desiniotis
45ed3b0412 Handle hook arguments for creation of symlinks
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-02-28 09:00:02 -08:00
Christopher Desiniotis
0516fc96ca Add Transform interface and initial implemention for a root transform
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-02-28 08:56:13 -08:00
Evan Lezar
b4dc1f338d Generate nested device folder permission hooks per device
This change generates device folder permission hooks per device instead of
at a spec level. This ensures that the hook is not injected for a device that
does not have any nested device nodes.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-22 17:16:23 +02:00
Evan Lezar
181128fe73 Only include by-path-symlinks for injected device nodes
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-22 16:53:04 +02:00
Evan Lezar
2680c45811 Add mode constants to nvcdi
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 16:33:51 +02:00
Evan Lezar
b76808dbd5 Add tests for CDI mode resolution
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 16:33:33 +02:00
Evan Lezar
4ccb0b9a53 Add and resolve auto discovery mode for cdi generation
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 14:49:58 +02:00
Evan Lezar
b21dc929ef Add WSL2 discovery and spec generation
These changes add a wsl discovery mode to the nvidia-ctk cdi generate command.

If wsl mode is enabled, the driver store for the available devices is used as
the source for discovered entities.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 10:30:13 +02:00
Evan Lezar
d226925fe7 Construct nvml-based CDI lib based on mode
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 10:30:13 +02:00
Evan Lezar
5103adab89 Add mode option to nvcdi API
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-20 10:30:13 +02:00
Kevin Klues
5710b9e7e8 Add globbing for mounting multiple GSP firmware files
Newer drivers have split the GSP firmware into multiple files so a simple match
against gsp.bin in the firmware directory is no longer possible. This patch
adds globbing capabilitis to match any GSP firmware files of the form gsp*.bin
and mount them all into the container.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-02-16 11:53:36 +00:00
Christopher Desiniotis
a52c9f0ac6 fix: apply options when constructing an instance of the nvcdi library
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2023-02-14 16:32:40 -08:00
Evan Lezar
5b110fba2d Add nvcdi package with basic CDI generation API
This change adds an nvcdi package that exposes a basic API for
CDI spec generation. This is used from the nvidia-ctk cdi generate
command and can be consumed by DRA implementations and the device plugin.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-14 19:52:31 +01:00
Evan Lezar
f72b79cc2a Move pkg to cmd/nvidia-container-toolkit
This change moves the pkg folder to `cmd/nvidia-container-toolkit` to
better match go best practices. This allows, for example, for the
`cmd/nvidia-container-toolkit` to be go installed.

The only package included in `pkg` was `main`.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2021-06-08 15:20:59 +02:00
Evan Lezar
2a92d6acb7 Fix bug where docker swarm device selection is overriden by NVIDIA_VISIBLE_DEVICES
This change fixes a bug where the value of NVIDIA_VISIBLE_DEVICES would be used to
select devices even if the `swarm-resource` config option is specified.

Note that this does not change the value of NVIDIA_VISIBLE_DEVICES in the container.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2021-06-07 14:10:08 +02:00
Evan Lezar
602eaf0e60 Use require package for tests
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2021-06-07 13:31:41 +02:00
Evan Lezar
fc408a32c7 Add utility function to get config name from struct
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2021-01-22 16:08:45 +01:00