This change ensures that libcuda.so can be located on systems
where no patch version is specified in the driver version.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
These changes remove the use of discover.Config which was used
to pass the driver root and the nvidiaCTK path in some cases.
Instead, the nvidiaCTKPath is resolved at the begining of runtime
invocation to ensure that this is valid at all points where it is
used.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a GetDefaultConfigToml function to the config package.
This function returns the default config in the form of raw TOML
including comments. This is useful for generating a default config at
installation time, with platform-specific differences codified.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-ctk system create-device-nodes command for
creating NVIDIA device nodes. Currently this is limited to control devices
(nvidia-uvm, nvidia-uvm-tools, nvidia-modeset, nvidiactl).
A --dry-run mode is included for outputing commands that would be executed and
the driver root can be specified.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-container-runtime.modes.cdi.annotation-prefixes config
option that defaults to cdi.k8s.io/. This allows the annotation prefixes parsed
for CDI devices to be overridden in cases where CDI support in container engines such
as containerd or crio need to be overridden.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This chagne prefers (non-symlink) sockets at /run over /var/run for
nvidia-persistenced and nvidia-fabricmanager sockets.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The following changes are made:
* The default-cdi-kind config option is used to convert an envvar entry to a fully-qualified device name
* If annotation devices exist, these are used instead of the envvar devices.
* The `all` device is no longer treated as a special case and MUST exist in the CDI spec.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change copies dxcore.h and dxcore.c from libnvidia-container to
allow for the driver store path to be queried. Modifications are made
to dxcore to remove the code associated with checking the components
in the driver store path.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the update-ldcache hook is created in a manner
consistent with other nvidia-ctk hooks ensuring that a full path is
used.
Without this change the update-ldcache hook on Tegra-based sytems had an
invalid path.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
If this is not done, the default config which sets the nvidia-ctk.path
option as "nvidia-ctk" will result in an invalid OCI spec if a hook is
injected. This change ensures that the path used is always an absolute
path as required by the hook spec.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the first match of an executable in the path
is retured instead of a list of candidates. This prevents a CDI spec,
for example, from containing multiple entries for a single executable
(e.g. nvidia-smi).
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an --nvidia-ctk-path to the nvidia-ctk cdi generate
command. This ensures that the executable path for the generated
hooks can be specified consistently.
Since the NVIDIA Container Runtime already allows for the executable
path to be specified in the config the utility code to update the
LDCache and create other nvidia-ctk hooks are also updated.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The HostPath field was added in the v0.5.0 CDI specification.
The cdi package uses strict unmarshalling when loading specs
from file causing failures for unexpected fields.
Since the behaviour for HostPath == "" and HostPath == Path are
equivalent, we clear HostPath if it is equal to Path to ensure
compatibility with the widest range of specs.
This allows, for example, a v0.4.0 spec to be generated as required.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change extends the support for multiple envvars when
specifying swarm resources to consider ALL of the specified
environment variables instead of the first match.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds the discovery of DRM devices associated with requested
devices. This means that the /dev/dri/card* and /dev/dri/renderD*
devices associated with each requested NVIDIA GPU are injected into
the container and that the /dev/dri/by-path symlinks associated with
these devices are created in the container.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds support for filtering entities by specifying a filter.
This can be used, for example, to check whether a mount or device
has a particular property and removing it from the set of discovered
entities if it does not.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a Devices abstraction to the CUDA image utilities. This
allows for checking whether a devices is selected, for example.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates the docker config update for simplicitly.
This also allows for the API to match the crio update code.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This adds support for updating crio configs (instead of installing hooks)
and adds crio support to the nvidia-ctk runtime configure command.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change allows the NVIDIA Container Runtime to inject vulkan
loaders and libraries by modifying the OCI runtime specification.
This allows vulkan applications to run in containers without
additional modifications.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a Locator that can be used to locate libraries.
If library names are specified, the ldcache is searched otherwise
symlinks are resolved.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change ensures that a more concrete error is provided by the NVIDIA
Container Runtime if the NVIDIA Container Runtime hook cannot be
located.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a modifier to that injects the tegra platform files
* /etc/nv_tegra_release
* /sys/devices/soc0/family
allowing these files to be used for platform detection in a containerized
context such as the GPU device plugin.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a root member to the mounts type that is used to
perform most of the lookups for files and devices. This allows
for consistent handling of relative paths.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
The Relative method added to the Locator interface was
not correctly implemented in the file type. The root was
never set when instantiating the object.
This change removes this method from the interface and the file
type, switching to a local implementation in the mounts type
instead.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Since the creation of symlinks may include other libraries / folders
the ldcache should be updated AFTER the symlinks are created.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
In preparation for adding a command to the nvidia-ctk CLI to modify
the docker config, this change refactors load, update, and flush logic
from the toolkit container docker CLI to an internal package.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an nvidia-container-runtime.modes.cdi.spec-dirs
config option that allows the default spec dirs to be overridden.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change reuse the code that checks for the existing NVIDIA
Container Runtime hook to ensure that both nvidia-container-toolkit
and nvidia-container-runtime-hook are detected.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This mirrors what is done in cri-o and allows for devices nodes
from, for example, the driver container to be injected into a
container at /dev instead of <ROOT>/dev
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a charDevices discoverer and using this
for CSV, GDS, and MOFED discovery. Internally the discoverer
is a "mounts" discoverer with a charDevice locator.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Instead of creating a set of discoverers per file, this change creates
a discoverer per type by first concatenating the mount specifications
from all files. This will allow all device nodes, for example, to
be treated as a single device.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This adds a Relative function to the Locator interface and uses
this to determine the host and container paths for located files
(and devices). This ensures that the root (e.g. the nvidia driver
root) is stripped from the container path.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change creates GDS and MOFED modifiers and adds them to the
modifer created for the selected runtime mode if the NVIDIA_GDS
and NVIDIA_MOFED envvars are set to "enabled", respectively.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds version output to the nvidia-continer-runtime,
nvidia-container-toolkit, and nvidia-ctk CLIs. The same version
is used in all cases and includes a version string and a git
revision if set.
The construction of the version string mirrors what is done in runc.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
These changes replace the nvidia-container-runtime config options
experimental and discover-mode with a single mode config option.
Note that mode is now a string with a default value of "auto"
and a mode value of "legacy" is equivalent to experimental == false.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change updates the create-symlinks hook to also create symlinks for
libcuda.so, libGLX_indirect.so.0, and libnvidia-opticalflow.so
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a GetContainerRoot to the oci.State type to
encapsulate the logic around determining the container root.
This Fixes a bug where relative roots (e.g. as generated by contianerd)
are not supported.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a Requirements abstraction that can be used to check
an images' NVIDIA_REQUIRE_* envvars against the host properties such
as CUDA version or architecture.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a CUDA image abstraction that encapsulates
the queries performed on a container image (e.g. envvars) to
check certain CUDA properties.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a DefaultExecutableDir = /usr/bin constant that is used
to construct default paths for executables instead of specifying these
explicitly.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a cache to the mounts type. This means that if called to get
a list of folders, for example, the result is reused instead of recalculated.
This also avoids duplicate logging.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a discovered hook for updating the ldcache as a container-create
hook. The mounts from a discoverer are inspected to determine the folders that must
be added to the cache using the nvidia-ctk hook update-ldcache command.
This is added to the "csv" discovery mode for the experimental runtime.
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds an 'auto' discover mode that attempts to select the correct mode
for a given platform. This currently attempts to detect whether the platform is a
Tegra-based system in which case the 'csv' discover mode is used. The 'legacy'
discover mode is used as the fallback.
Signed-off-by: Evan Lezar <elezar@nvidia.com>