How do I discover the PCIe bus topology and slot numbers on the board?

Alex picture Alex · Aug 18, 2015 · Viewed 11.7k times · Source

For example, when I use multi-GPU system with CUDA C/C++ and GPUDirect 2.0 P2P, and I use nested PCI-Express Switches, as shown on picture, then I must know how many switches between any two GPUs by their PCI Bus ID, to optimize data transfer and distribution of calculation.

Or if I already know hardware PCIe topology with PCIe-switches, then I must know, to which hardware PCIe slot on board is connected any GPU card. enter image description here

As I know, even if I already know hardware PCIe topology with PCIe-switches, then these identifiers is not hard-bound to PCIe slots on the board, and these IDs may change and be different from run to run of system:

  • CUDA device_id
  • nvidia-smi/nvml GPU id
  • PCI Bus ID

What is the best way to discover the topology of the PCIe bus with detailed device tree and the number of PCIe slot on the board on Windows and Linux?

Answer

Paebbels picture Paebbels · Aug 19, 2015

PCI devices (endpoints) have a unique address. This address has 3 parts:

  • BusID
  • DeviceID
  • FunctionID

For example function 3 of device 12 on bus 3 is written in BDF notion: 03:0C.3. An extended BDF notation adds a domain (mostly 0000) as a prefix: 0000:03:0c.3.

Linux lists these devices in /sys/bus/pci/devices

paebbels@debian8:~$ ll /sys/bus/pci/devices/
drwxr-xr-x 2 root root 0 Aug 19 11:44 .
drwxr-xr-x 5 root root 0 Aug  5 15:14 ..
lrwxrwxrwx 1 root root 0 Aug 19 11:44 0000:00:00.0 -> ../../../devices/pci0000:00/0000:00:00.0
lrwxrwxrwx 1 root root 0 Aug 19 11:44 0000:00:01.0 -> ../../../devices/pci0000:00/0000:00:01.0
lrwxrwxrwx 1 root root 0 Aug 19 11:44 0000:00:07.0 -> ../../../devices/pci0000:00/0000:00:07.0
lrwxrwxrwx 1 root root 0 Aug 19 11:44 0000:00:07.1 -> ../../../devices/pci0000:00/0000:00:07.1
...
lrwxrwxrwx 1 root root 0 Aug 19 11:44 0000:00:18.6 -> ../../../devices/pci0000:00/0000:00:18.6
lrwxrwxrwx 1 root root 0 Aug 19 11:44 0000:00:18.7 -> ../../../devices/pci0000:00/0000:00:18.7
lrwxrwxrwx 1 root root 0 Aug 19 11:44 0000:02:00.0 -> ../../../devices/pci0000:00/0000:00:11.0/0000:02:00.0
lrwxrwxrwx 1 root root 0 Aug 19 11:44 0000:02:01.0 -> ../../../devices/pci0000:00/0000:00:11.0/0000:02:01.0
lrwxrwxrwx 1 root root 0 Aug 19 11:44 0000:02:02.0 -> ../../../devices/pci0000:00/0000:00:11.0/0000:02:02.0
lrwxrwxrwx 1 root root 0 Aug 19 11:44 0000:02:03.0 -> ../../../devices/pci0000:00/0000:00:11.0/0000:02:03.0
lrwxrwxrwx 1 root root 0 Aug 19 11:44 0000:03:00.0 -> ../../../devices/pci0000:00/0000:00:15.0/0000:03:00.0

Here you can see that sys-fs lists devices 00 to 03 of bus 02 as connected to bus 00, device 11, function 0

From these information, you can rebuilt the complete PCI bus-tree. The tree is always the same after a boot up, unless you add or remove devices.

The windows device manager offers the same information. The property dialog shows you the device type, vendor and location: e.g. PCI bus 0, device 2, function 0 for an integrated Intel HD 4600 graphics.

Currently, I don't know how you can get these information by scripting or programming language in a Windows environment, but there are commercial and free tools in the internet, that provide these information. Maybe there is an API.