https://blog.csdn.net/m0_37201243/article/details/108655015
ibv_devices 显示系统中目前所有设备
ibv_devices
device node GUID ------ ---------------- mlx5_0 98039b0300bec200
ibv_devinfo -d mlx5_0查看设备具体信息
ibv_devinfo -d mlx5_0
hca_id: mlx5_0 transport: InfiniBand (0) fw_ver: 12.17.2052 node_guid: 9803:9b03:00be:c200 sys_image_guid: 9803:9b03:00be:c200 vendor_id: 0x02c9 vendor_part_id: 4115 hw_ver: 0x0 board_id: DEL2180110032 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 7 port_lid: 7 port_lmc: 0x00 link_layer: InfiniBand
ibstat 输出ib卡的信息
ibstat
CA 'mlx5_0' CA type: MT4115 Number of ports: 1 Firmware version: 12.17.2052 Hardware version: 0 Node GUID: 0x98039b0300bec200 System image GUID: 0x98039b0300bec200 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 7 LMC: 0 SM lid: 3 Capability mask: 0x2651e848 Port GUID: 0x98039b0300bec200 Link layer: InfiniBand
iftop -i eth0 -n
sudo /etc/init.d/openibd restart
sudo /etc/init.d/opensmd start
使用简单的 ping 程序,比如 infiniband-diags 软件包中的 ibping 测试 RDMA 连接性。ibping(需要root权限) 程序采用客户端/服务器模式。必须首先在一台机器中启动 ibping 服务器,然后再另一台机器中将 ibping 作为客户端运行,并让它与 ibping 服务器相连。
sudo ibping -S -C mlx5_0 -P 1 #无任何输出-S:以服务器端运行-C:是CA,来自ibstat的输出-P:端口号,来自ibstat的输出
sudo ibping -S -C mlx5_0 -P 1
Client端
sudo ibping -c 10000 -f -C mlx5_0 -P 1 -L 7-c:发送10000个packet之后停止.-f:flood destination-C:是CA,来自ibstat的输出-P:端口号,来自服务器端运行ibping命令时指定的-P 参数值.-L:Base lid,来自服务器端运行ibping命令时指定的端口(-P 参数值)的base lid(参考ibstat),具体要查看服务端的Base lid,我这里是7
sudo ibping -c 10000 -f -C mlx5_0 -P 1 -L 7
--- master.master (Lid 7) ibping statistics --- 10000 packets transmitted, 10000 received, 0% packet loss, time 1014 ms rtt min/avg/max = 0.004/0.101/900.009 ms
master: ib_write_bw
ib_write_bw
************************************ * Waiting for client to connect... * ************************************ --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON ibv_wr* API : ON CQ Moderation : 1 Mtu : 4096[B] Link type : IB Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID 0x07 QPN 0x1d5e4 PSN 0x217da0 RKey 0x014638 VAddr 0x007fb994704000 remote address: LID 0x08 QPN 0x174d PSN 0x98825a RKey 0x1403f5 VAddr 0x007f5b83c62000 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] 65536 5000 3775.40 3775.34 0.060406 ---------------------------------------------------------------------------------------
node1: ib_write_bw 192.168.3.60
ib_write_bw 192.168.3.60
--------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON ibv_wr* API : ON TX depth : 128 CQ Moderation : 1 Mtu : 4096[B] Link type : IB Max inline data : 0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID 0x08 QPN 0x174d PSN 0x98825a RKey 0x1403f5 VAddr 0x007f5b83c62000 remote address: LID 0x07 QPN 0x1d5e4 PSN 0x217da0 RKey 0x014638 VAddr 0x007fb994704000 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] Conflicting CPU frequency values detected: 1200.836000 != 1976.390000. CPU Frequency is not max. 65536 5000 3775.40 3775.34 0.060406 ---------------------------------------------------------------------------------------
同理ib_read_bw
ib_read_bw
Linux的Kernel版本必须为5.4.0-26-generic
5.4.0-26-generic
如果内核版本不对,需要参考https://linux.how2shout.com/how-to-install-linux-kernal-5-19-on-ubuntu-22-04-or-20-04/将内核版本修改到5.4.0-26-generic