InfiniBand和RDMA网络
词汇
- InfiniBand 网络的物理链路协议
- InfiniBand Verbs API,这是 RDMA(remote direct memory access)技术的一个实现
基本命令
ibv_devices
显示系统中目前所有设备
device node GUID
------ ----------------
mlx5_0 98039b0300bec200
ibv_devinfo -d mlx5_0
查看设备具体信息
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 12.17.2052
node_guid: 9803:9b03:00be:c200
sys_image_guid: 9803:9b03:00be:c200
vendor_id: 0x02c9
vendor_part_id: 4115
hw_ver: 0x0
board_id: DEL2180110032
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 7
port_lid: 7
port_lmc: 0x00
link_layer: InfiniBand
ibstat
输出ib卡的信息
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2052
Hardware version: 0
Node GUID: 0x98039b0300bec200
System image GUID: 0x98039b0300bec200
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 7
LMC: 0
SM lid: 3
Capability mask: 0x2651e848
Port GUID: 0x98039b0300bec200
Link layer: InfiniBand
iftop -i eth0 -n
按L 实时流量查询sudo /etc/init.d/openibd restart
重启IB服务sudo /etc/init.d/opensmd start
#开启子网管理器
测试连接性
使用简单的 ping 程序,比如 infiniband-diags 软件包中的 ibping 测试 RDMA 连接性。ibping(需要root权限) 程序采用客户端/服务器模式。必须首先在一台机器中启动 ibping 服务器,然后再另一台机器中将 ibping 作为客户端运行,并让它与 ibping 服务器相连。
sudo ibping -S -C mlx5_0 -P 1
#无任何输出
-S:以服务器端运行
-C:是CA,来自ibstat的输出
-P:端口号,来自ibstat的输出
Client端
sudo ibping -c 10000 -f -C mlx5_0 -P 1 -L 7
-c:发送10000个packet之后停止.
-f:flood destination
-C:是CA,来自ibstat的输出
-P:端口号,来自服务器端运行ibping命令时指定的-P 参数值.
-L:Base lid,来自服务器端运行ibping命令时指定的端口(-P 参数值)的base lid(参考ibstat),具体要查看服务端的Base lid,我这里是7
--- master.master (Lid 7) ibping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 1014 ms
rtt min/avg/max = 0.004/0.101/900.009 ms
测试带宽
master: ib_write_bw
************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
CQ Moderation : 1
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x07 QPN 0x1d5e4 PSN 0x217da0 RKey 0x014638 VAddr 0x007fb994704000
remote address: LID 0x08 QPN 0x174d PSN 0x98825a RKey 0x1403f5 VAddr 0x007f5b83c62000
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
65536 5000 3775.40 3775.34 0.060406
---------------------------------------------------------------------------------------
node1: ib_write_bw 192.168.3.60
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth : 128
CQ Moderation : 1
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x08 QPN 0x174d PSN 0x98825a RKey 0x1403f5 VAddr 0x007f5b83c62000
remote address: LID 0x07 QPN 0x1d5e4 PSN 0x217da0 RKey 0x014638 VAddr 0x007fb994704000
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
Conflicting CPU frequency values detected: 1200.836000 != 1976.390000. CPU Frequency is not max.
65536 5000 3775.40 3775.34 0.060406
---------------------------------------------------------------------------------------
同理ib_read_bw
注意
Linux的Kernel版本必须为5.4.0-26-generic
5.4.0-26-generic
如果内核版本不对,需要参考https://linux.how2shout.com/how-to-install-linux-kernal-5-19-on-ubuntu-22-04-or-20-04/将内核版本修改到5.4.0-26-generic
nfs服务管理
sudo systemctl restart nfs-server
exportfs -av
systemctl restart nfs-server