Commit Graph

55 Commits

Author SHA1 Message Date
Kevin Klues
82adde1bf4 Remove redundant tests and fix misleading tests
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-27 10:08:21 +00:00
Kevin Klues
18957773f2 Add function for AssertValidMigProfileFormat
This does not verify that the profile is a valid profile for the current
platform, but rather that it simply adheres to the proper formatting of a MIG
profile string.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-27 10:04:32 +00:00
Kevin Klues
8c50f9f18f Fix bug in heuristic for which MIG profiles to skip
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-25 22:01:20 +00:00
Kevin Klues
500a464b22 Cache mig profiles in devicelib, not just each device
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-25 18:48:18 +00:00
Kevin Klues
631bde023f Add ability to query device architeture and cuda compute capability
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-24 14:24:19 +00:00
Kevin Klues
642041d1e0 Update mig-profile parsing / name generation after go-nvml v12.0 bump
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-23 19:29:57 +00:00
Evan Lezar
bcbaf5a0de Add HasDXCore to info package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-02-08 16:04:35 +01:00
Kevin Klues
264c5dab79 Add NewDeviceByUUID() and NewMigDeviceByUUID() calls to nvlib.device
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-12-08 14:53:50 +00:00
Kevin Klues
5d4be6ac55 Regenerate mocks for NVML
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-12-08 14:53:45 +00:00
Kevin Klues
6a4886e49e Add Placement related calls for GPUInstances in nvml wrapper
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-12-08 14:53:39 +00:00
Evan Lezar
7e5501f6a3 Skip DGX Display devices in addition to NVIDIA DGX Display devices
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-12-07 11:40:09 +01:00
Evan Lezar
1fc1eee392 Remove WithSelecteDeviceClasses option
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-21 15:47:44 +01:00
Evan Lezar
655eb9795c Skip display devices based on device names
This allows devices to be skipped based on device names and
skips "NVIDIA DGX Display" devices by default.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-21 15:46:15 +01:00
Evan Lezar
fa5d0408ce Ensure pci bus ID is lower case
The PCI Bus ID returned by NVML is upper case and results in the following error:

error getting PCI device class for device:
failed to construct PCI device:
unable to read PCI device vendor id for 0000:0A:00.0:
open /sys/bus/pci/devices/0000:0A:00.0/vendor:
no such file or directory

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-16 12:12:07 +01:00
Evan Lezar
e37e145458 Add filtering of devices based on PCI device class
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-16 10:30:49 +01:00
Evan Lezar
f156c34310 Add private constructor for creating a device
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-15 17:42:22 +01:00
Evan Lezar
e96d9c58f1 Add GetGPUByPciBusID to nvpci.Interface
This change adds a GetGPUByPciBusID method to the nvpci Interface.
The exising NewDevice function is moved to nvmdev where it is used.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-11-15 17:42:22 +01:00
Zvonko Kaiser
f3102f8dcb Added PCI IDS support and DPU detection 2022-11-02 03:58:13 -07:00
Evan Lezar
8b5e3d224d Ensure that invalid MIG profiles are skipped
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-10-14 10:31:50 +02:00
Evan Lezar
1cb5426db8 Add functions for interacting with Events
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-21 15:10:06 +02:00
Kevin Klues
f933892965 Add extended APIs for top-level devices to the device package
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-09-16 13:34:17 +00:00
Kevin Klues
1d680a93b6 Move MIG apis to device package
We decided it makes sense to have top level device and MIG device abstractions
all under one package rather than trying to separate them. It will make it
easier to hav them clal between each other without package dependency loops.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-09-16 13:09:09 +00:00
Kevin Klues
8e749776c5 Add nvml wrappers for getting GIs and CIs by ID
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-09-15 17:08:00 +00:00
Kevin Klues
e95e3a5e8b Add a MIG package as a subpackage to nvlib
For now this package only has functions to work with MIG profiles. More
functionality will be added here in the future.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-09-15 17:08:00 +00:00
Evan Lezar
16ab19d8ae Merge branch 'add-nvlib-base' into 'main'
Add a new nvlib package and move the nvinfo package into it

See merge request nvidia/cloud-native/go-nvlib!16
2022-09-15 11:36:25 +00:00
Kevin Klues
d23f460ad3 Move the nvinfo package into pkg/nvlib/info
Also build an interface around the API so that it can more easily be mocked.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-09-15 11:30:34 +00:00
Evan Lezar
211a8eb973 Address minor lint error
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-02 15:12:23 +02:00
Evan Lezar
bf9a4d3476 Sort functions in interface alphabetically
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-02 15:07:39 +02:00
Evan Lezar
a404873b12 Add additional functions to Device interface
Add the following functions to the Device interface:
* GetCudaComputeCapability
* GetAttributes
* GetName

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-02 15:05:12 +02:00
Evan Lezar
9175bde20b Add SystemGetCudaDriverVersion to NVML interface
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-09-01 14:05:16 +02:00
Christopher Desiniotis
bccac280ca nvmdev: Add GetPhysicalFunction() for both Device and ParentDevice
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2022-08-25 09:35:11 -07:00
Christopher Desiniotis
6ff7845b92 nvpci: Add GetGPUByIndex()
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2022-08-25 09:34:49 -07:00
Christopher Desiniotis
09ae86c8e0 Merge branch 'pf-filtering' into 'main'
Detect if NvidiaPCIDevice is a PF or VF

See merge request nvidia/cloud-native/go-nvlib!13
2022-08-15 17:09:02 +00:00
Kevin Klues
2e1e2e784a Add String() and Error() functions to Return type in nvml package
There is a default implementation for these that is overwritten if the
underlying NVML library ends up being used.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-08-11 12:13:41 +00:00
Kevin Klues
008aa70bc6 Add an interface based wrapper around go-nvml for better mocking
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-08-10 15:52:16 +00:00
Christopher Desiniotis
afdf3edd99 Detect if NvidiaPCIDevice is a PF or VF
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2022-07-28 19:16:39 -07:00
Christopher Desiniotis
f52cd402a1 Detect iommu_group for PCI and mdev devices 2022-07-25 23:20:03 +00:00
Christopher Desiniotis
f281b5e581 Merge branch 'driver-detection' into 'main'
Detect driver bound to an NvidiaPCIDevice and mdev device

See merge request nvidia/cloud-native/go-nvlib!11
2022-07-14 20:39:17 +00:00
Christopher Desiniotis
8209652159 Detect driver bound to mdev devices
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2022-07-08 13:11:01 -07:00
Christopher Desiniotis
805db5afa8 Refactor how mdev's are represented internally in nvmdev.
The 'mdev' string now represents the absolute path to an
mdev device (/sys/bus/pci/devices/<addr>/<uuid>) instead
of the  mdev_type directory for the mdev device
(/sys/bus/pci/devices/<addr>/mdev_supported_types/<mdev-type>).
This is more intuitive and will make it easier to get
more information about a particular mdev device -
like the driver or iommu_group it belongs to - which can
be found at /sys/bus/pci/devices/<addr>/<uuid>.

Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2022-07-08 13:00:06 -07:00
Christopher Desiniotis
09edde0a0b Detect driver bound to an NvidiaPCIDevice
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2022-07-08 08:46:02 -07:00
Christopher Desiniotis
e2d858daed Use 'os' instead of 'ioutil' which is recommended starting with Go 1.16
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
2022-07-07 13:43:11 -07:00
Evan Lezar
f13f6f0ac9 Add nvinfo package to query system state
This change adds an nvinfo package with HasNVML and IsTegraSystem
functions. These functions can be used to control behaviour on
different platforms.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2022-06-28 15:56:34 +02:00
Zvonko Kaiser
c5ed06b032 Added function to retrieve the BAR sizes 2022-06-01 03:20:10 -07:00
Christopher Desiniotis
505f83b943 Add nvmdev package for mdev (vGPU) devices 2022-05-25 16:34:29 +00:00
zvonkok
9196546dcc Add the status byte check 2022-02-16 10:45:15 +01:00
zvonkok
1f718a1568 Update the Open API to OpenRO and OpenRW 2022-02-16 10:44:32 +01:00
Zvonko Kaiser
2c175dcdbf Fix the linting errors 2022-02-16 10:43:27 +01:00
Kevin Klues
4066c09810 Add test for NUMA node addition to nvpci
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-03-22 17:19:50 +00:00
Kevin Klues
5adc7bd87c Add numa node as a standard field in the nvpci struct
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-03-22 17:18:38 +00:00