...
Description of problem: When tested on BCM57414 device, all mvapich2 benchmarks fail with the following error message. [rdma-qe-24.rdma.lab.eng.rdu2.redhat.com:mpi_rank_0][post_srq_send] ./src/mpid/ch3/channels/mrail/src/gen2/ibv_send_inline.h:677: ibv_post_sr (post_send_desc): Invalid argument (22) [rdma-qe-25.rdma.lab.eng.rdu2.redhat.com:mpi_rank_1][handle_cqe] Send desc error in msg to 0, wc_opcode=0 [rdma-qe-25.rdma.lab.eng.rdu2.redhat.com:mpi_rank_1][handle_cqe] Msg from 0: wc.status=12 (transport retry counter exceeded), wc.wr_id=0x55555daf4040, wc.opcode=0, vbuf->phead->type=0 = MPIDI_CH3_PKT_EAGER_SEND [rdma-qe-25.rdma.lab.eng.rdu2.redhat.com:mpi_rank_1][mv2_print_wc_status_error] IBV_WC_RETRY_EXC_ERR: This event is generated when a sender is unable to receive feedback from the receiver. This means that either the receiver just never ACKs sender messages in a specified time period, or it has been disconnected or it is in a bad state which prevents it from responding. If this happens when sending the first message, usually it means that the QP connection attributes are wrong or the remote side is not in a state that it can respond to messages. If this happens after sending the first message, usually it means that the remote QP is not available anymore or that there is congestion in the network preventing the packets from reaching on time. Relevant to: RC or DC QPs. [rdma-qe-25.rdma.lab.eng.rdu2.redhat.com:mpi_rank_1][handle_cqe] src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:499: [] Got completion with error 12, vendor code=0x0, dest rank=0 : Invalid argument (22) Version-Release number of selected component (if applicable): DISTRO=RHEL-8.7.0-20220424.1 + [22-05-05 13:31:32] cat /etc/redhat-release Red Hat Enterprise Linux release 8.7 Beta (Ootpa) + [22-05-05 13:31:32] uname -a Linux rdma-qe-25.rdma.lab.eng.rdu2.redhat.com 4.18.0-384.el8.x86_64 #1 SMP Wed Apr 20 16:08:47 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux + [22-05-05 13:31:32] cat /proc/cmdline BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-384.el8.x86_64 root=/dev/mapper/rhel_rdma-qe25-root ro crashkernel=auto resume=/dev/mapper/rhel_rdmaqe-25-swap rd.lvm.lv=rhel_rdma-qe-25/root rd.lvm.lv=rhel_rdma-qe-25/swap console=ttyS0,115200n81 + [22-05-05 13:31:32] rpm -q rdma-core linux-firmware rdma-core-37.2-1.el8.x86_64 linux-firmware-20220210-107.git6342082c.el8.noarch Installed: mpitests-mvapich2-5.8-1.el8.x86_64 mvapich2-2.3.6-1.el8.x86_64 + [22-05-05 13:31:32] tail /sys/class/infiniband/bnxt_re0/fw_ver /sys/class/infiniband/bnxt_re1/fw_ver /sys/class/infiniband/bnxt_re2/fw_ver /sys/class/infiniband/bnxt_re3/fw_ver ==> /sys/class/infiniband/bnxt_re0/fw_ver <== 20.8.30.0 ==> /sys/class/infiniband/bnxt_re1/fw_ver <== 20.8.30.0 ==> /sys/class/infiniband/bnxt_re2/fw_ver <== 216.0.51.0 ==> /sys/class/infiniband/bnxt_re3/fw_ver <== 216.0.51.0 + [22-05-05 13:31:32] lspci + [22-05-05 13:31:32] grep -i -e ethernet -e infiniband -e omni -e ConnectX 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe 01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe 1a:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01) 1a:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01) 5e:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01) 5e:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01) How reproducible: 100% Steps to Reproduce: 1. bring up the RDMA hosts mentioned above with RHEL8.7 build 2. set up RDMA hosts for mvapich2 benchamrk tests 3. run one of the mvapich2 benchmark with "mpirun" command, as the following: timeout --preserve-status --kill-after=5m 3m mpirun -hostfile /root/hfile_one_core -np 2 mpitests-IMB-MPI1 PingPong -time 1.5 Actual results: [rdma-qe-24.rdma.lab.eng.rdu2.redhat.com:mpi_rank_0][rdma_param_handle_heterogeneity] All nodes involved in the job were detected to be homogeneous in terms of processors and interconnects. Setting MV2_HOMOGENEOUS_CLUSTER=1 can improve job startup performance on such systems. The following link has more details on enhancing job startup performance. http://mvapich.cse.ohio-state.edu/performance/job-startup/. [rdma-qe-24.rdma.lab.eng.rdu2.redhat.com:mpi_rank_0][rdma_param_handle_heterogeneity] To suppress this warning, please set MV2_SUPPRESS_JOB_STARTUP_PERFORMANCE_WARNING to 1 [src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_priv.c:2118] Could not modify qpto RTR #---------------------------------------------------------------- Intel(R) MPI Benchmarks 2021.3, MPI-1 part #---------------------------------------------------------------- Date : Thu May 5 13:31:41 2022 Machine : x86_64 System : Linux Release : 4.18.0-384.el8.x86_64 Version : #1 SMP Wed Apr 20 16:08:47 EDT 2022 MPI Version : 3.1 MPI Thread Environment: Calling sequence was: mpitests-IMB-MPI1 PingPong -time 1.5 Minimum message length in bytes: 0 Maximum message length in bytes: 4194304 # MPI_Datatype : MPI_BYTE MPI_Datatype for reductions : MPI_FLOAT MPI_Op : MPI_SUM List of Benchmarks to run: PingPong [0 => 1]: post_srq_send(ibv_post_sr (post_send_desc)): ret=22, errno=22: failed while avail wqe is 63, rail 0 IBV_POST_SR err:: : Invalid argument [rdma-qe-24.rdma.lab.eng.rdu2.redhat.com:mpi_rank_0][post_srq_send] ./src/mpid/ch3/channels/mrail/src/gen2/ibv_send_inline.h:677: ibv_post_sr (post_send_desc): Invalid argument (22) [rdma-qe-25.rdma.lab.eng.rdu2.redhat.com:mpi_rank_1][handle_cqe] Send desc error in msg to 0, wc_opcode=0 [rdma-qe-25.rdma.lab.eng.rdu2.redhat.com:mpi_rank_1][handle_cqe] Msg from 0: wc.status=12 (transport retry counter exceeded), wc.wr_id=0x55d2742dc040, wc.opcode=0, vbuf->phead->type=0 = MPIDI_CH3_PKT_EAGER_SEND [rdma-qe-25.rdma.lab.eng.rdu2.redhat.com:mpi_rank_1][mv2_print_wc_status_error] IBV_WC_RETRY_EXC_ERR: This event is generated when a sender is unable to receive feedback from the receiver. This means that either the receiver just never ACKs sender messages in a specified time period, or it has been disconnected or it is in a bad state which prevents it from responding. If this happens when sending the first message, usually it means that the QP connection attributes are wrong or the remote side is not in a state that it can respond to messages. If this happens after sending the first message, usually it means that the remote QP is not available anymore or that there is congestion in the network preventing the packets from reaching on time. Relevant to: RC or DC QPs. [rdma-qe-25.rdma.lab.eng.rdu2.redhat.com:mpi_rank_1][handle_cqe] src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:499: [] Got completion with error 12, vendor code=0x0, dest rank=0 : Invalid argument (22) Expected results: Normal execution with benchmark test stats in output Additional info: Same result exists on RHEL8.6 build, as well
Won't Do