展开

InfiniBand和RDMA网络

最后发布时间 : 2023-11-19 21:20:18 浏览量 :

https://blog.csdn.net/m0_37201243/article/details/108655015

生信小木屋

词汇

  • InfiniBand 网络的物理链路协议
  • InfiniBand Verbs API,这是 RDMA(remote direct memory access)技术的一个实现

基本命令

ibv_devices 显示系统中目前所有设备

   device                 node GUID
    ------              ----------------
    mlx5_0              98039b0300bec200

ibv_devinfo -d mlx5_0查看设备具体信息

hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         12.17.2052
        node_guid:                      9803:9b03:00be:c200
        sys_image_guid:                 9803:9b03:00be:c200
        vendor_id:                      0x02c9
        vendor_part_id:                 4115
        hw_ver:                         0x0
        board_id:                       DEL2180110032
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 7
                        port_lid:               7
                        port_lmc:               0x00
                        link_layer:             InfiniBand

ibstat 输出ib卡的信息

CA 'mlx5_0'
        CA type: MT4115
        Number of ports: 1
        Firmware version: 12.17.2052
        Hardware version: 0
        Node GUID: 0x98039b0300bec200
        System image GUID: 0x98039b0300bec200
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 56
                Base lid: 7
                LMC: 0
                SM lid: 3
                Capability mask: 0x2651e848
                Port GUID: 0x98039b0300bec200
                Link layer: InfiniBand
  • iftop -i eth0 -n按L 实时流量查询
  • sudo /etc/init.d/openibd restart重启IB服务
  • sudo /etc/init.d/opensmd start #开启子网管理器

测试连接性

使用简单的 ping 程序,比如 infiniband-diags 软件包中的 ibping 测试 RDMA 连接性。ibping(需要root权限) 程序采用客户端/服务器模式。必须首先在一台机器中启动 ibping 服务器,然后再另一台机器中将 ibping 作为客户端运行,并让它与 ibping 服务器相连。

sudo ibping -S -C mlx5_0 -P 1 #无任何输出
-S:以服务器端运行
-C:是CA,来自ibstat的输出
-P:端口号,来自ibstat的输出

Client端

sudo ibping -c 10000 -f -C mlx5_0 -P 1 -L 7
-c:发送10000个packet之后停止.
-f:flood destination
-C:是CA,来自ibstat的输出
-P:端口号,来自服务器端运行ibping命令时指定的-P 参数值.
-L:Base lid,来自服务器端运行ibping命令时指定的端口(-P 参数值)的base lid(参考ibstat),具体要查看服务端的Base lid,我这里是7

--- master.master (Lid 7) ibping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 1014 ms
rtt min/avg/max = 0.004/0.101/900.009 ms

测试带宽

master: ib_write_bw


************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 CQ Moderation   : 1
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x07 QPN 0x1d5e4 PSN 0x217da0 RKey 0x014638 VAddr 0x007fb994704000
 remote address: LID 0x08 QPN 0x174d PSN 0x98825a RKey 0x1403f5 VAddr 0x007f5b83c62000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 65536      5000             3775.40            3775.34            0.060406
---------------------------------------------------------------------------------------

node1: ib_write_bw 192.168.3.60

---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 128
 CQ Moderation   : 1
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x08 QPN 0x174d PSN 0x98825a RKey 0x1403f5 VAddr 0x007f5b83c62000
 remote address: LID 0x07 QPN 0x1d5e4 PSN 0x217da0 RKey 0x014638 VAddr 0x007fb994704000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 1200.836000 != 1976.390000. CPU Frequency is not max.
 65536      5000             3775.40            3775.34            0.060406
---------------------------------------------------------------------------------------

同理ib_read_bw

注意

Linux的Kernel版本必须为5.4.0-26-generic

5.4.0-26-generic

如果内核版本不对,需要参考https://linux.how2shout.com/how-to-install-linux-kernal-5-19-on-ubuntu-22-04-or-20-04/将内核版本修改到5.4.0-26-generic