...
Description of problem: During "qperf" tests, a lot of tracebacks were observed on the RDMA client host console. Also, there was no indication of any "qperf" test cases ran with data outputs. This is a regression from RHEL-8.8.0-20230228.22 build. Version-Release number of selected component (if applicable): Clients: rdma-dev-03 Servers: rdma-dev-02 DISTRO=RHEL-8.9.0-20230531.26 + [23-05-31 18:23:09] cat /etc/redhat-release Red Hat Enterprise Linux release 8.9 Beta (Ootpa) + [23-05-31 18:23:09] uname -a Linux rdma-dev-03.rdma.lab.eng.rdu2.redhat.com 4.18.0-494.el8.x86_64 #1 SMP Mon May 22 11:16:32 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux + [23-05-31 18:23:09] cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-494.el8.x86_64 root=UUID=f2550ab5-318d-4fc6-9720-3d753ef5b845 ro console=tty0 rd_NO_PLYMOUTH intel_iommu=on iommu=on crashkernel=auto resume=UUID=b36d9352-dc3a-4059-b5aa-6cb606ad4bce console=ttyS1,115200 + [23-05-31 18:23:09] rpm -q rdma-core linux-firmware rdma-core-44.0-2.el8.1.x86_64 linux-firmware-20230515-115.gitd1962891.el8.noarch + [23-05-31 18:23:09] tail /sys/class/infiniband/qedr0/fw_ver /sys/class/infiniband/qedr1/fw_ver ==> /sys/class/infiniband/qedr0/fw_ver <== 8. 59. 1. 0 ==> /sys/class/infiniband/qedr1/fw_ver <== 8. 59. 1. 0 + [23-05-31 18:23:09] lspci + [23-05-31 18:23:09] grep -i -e ethernet -e infiniband -e omni -e ConnectX 02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe 02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe 08:00.0 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10) 08:00.1 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10) How reproducible: 100% Steps to Reproduce: 1. On the RDMA server host, issue qperf 2. On the RDMA client host, issue qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_bi_bw qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_bw qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_lat qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_rdma_read_bw qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_compare_swap_mr qperf -v -i qedr1:1 -cm 1 172.31.50.102 rc_fetch_add_mr qperf -v -i qedr1:1 -cm 1 172.31.50.102 ver_rc_compare_swap qperf -v -i qedr1:1 -cm 1 172.31.50.102 ver_rc_fetch_add Actual results: The actual tracebacks are at with "======= Start of the tracebacks ==============" below. [ 3398.097186] restraintd[1961]: ** Completed Task : 161077355 [ 3398.520118] restraintd[1961]: ** Fetching task: 161077356 [/mnt/tests/gerrit-git.engineering.redhat.com/kernel-tests/infiniband/qperf] [ 3398.893610] restraintd[1961]: ** Preparing metadata [ 3399.143978] restraintd[1961]: ** Refreshing peer role hostnames: Retries 0 [ 3399.374951] restraintd[1961]: ** Updating env vars [ 3399.375063] restraintd[1961]: ** Updating external watchdog: 5400 seconds [ 3399.721661] restraintd[1961]: ** Installing dependencies [ 3401.751084] restraintd[1961]: ** Running task: 161077356 [/kernel/infiniband/qperf] [ 3413.601870] [qed_pglueb_rbc_attn_handler:319(qede_iw-0)]ZLR error - 031ff806 [Address 00000002:7d037000] [ 3413.601873] [qed_pglueb_rbc_attn_handler:319(qede_roce-0)]ZLR error - 031ff806 [Address 00000002:7d037000] [ 3413.612490] [qed_pglueb_rbc_attn_handler:335(qede_iw-0)]ILT error - Details ff001026 Details2 00e00020 [Address 00000000:00010000] [ 3413.623290] [qed_pglueb_rbc_attn_handler:335(qede_roce-0)]ILT error - Details ff001026 Details2 00e00020 [Address 00000000:00010000] [ 3413.636424] [qed_dbg_parse_attn:7812(qede_iw-0)]pglue_b (Interrupt) : master_zlr_attn [address 0x00aa0630, bit 10] [ 3413.649746] [qed_dbg_parse_attn:7812(qede_roce-0)]pglue_b (Interrupt) : pgl_pcie_attn [address 0x00aa0630, bit 19] [masked] [ 3413.661331] [qed_dbg_parse_attn:7812(qede_iw-0)]pglue_b (Interrupt) : pgl_pcie_attn [address 0x00aa0630, bit 19] [masked] [ 3413.686063] [qed_dbg_parse_attn:7812(qede_iw-0)]pglue_b (Interrupt) : vf_ilt_err [address 0x00aa0630, bit 22] [masked] [ 3430.649853] [qed_hw_err_notify:830(qede_roce-0)]Ramrod is stuck [CID ff000007 ROCE_RAMROD_MODIFY_QP:0e PROTOCOLID_ROCE:02 echo 04fa] [ 3430.663190] [qed_hw_error_occurred:2775(qede_roce-0)]HW error occurred [Ramrod Failure] [ 3430.672148] [qede_atomic_hw_err_handler:2691(qede_roce)]Generic non-sleepable HW error handling started - err_flags 0x80000007 [ 3430.684895] [qede_atomic_hw_err_handler:2700(qede_roce)]Generic non-sleepable HW error handling is done [ 3430.698248] [qede_generic_hw_err_handler:2707(qede_roce)]Generic sleepable HW error handling started - err_flags 0x80000007 [ 3430.710701] [qede_generic_hw_err_handler:2710(qede_roce)]Reporting fatal error to devlink [ 3430.720730] [qed_dbg_dump:8021(qede_roce)]Collecting a debug feature ["idle_chk"] [ 3430.736248] [ 3430.736248] fw-version: 8_59_1_1 [ 3430.736248] fw-image: main [ 3430.736248] mfw-version: 8_34_8_0 [ 3430.736248] chip-revision: b0 [ 3430.736248] tools-version: 11 [ 3430.736248] chip: bb [ 3430.736248] platform: asic [ 3430.736248] pci-func: 0 [ 3430.736248] epoch: 1685571838 [ 3430.736248] path: 0 [ 3430.736248] dump-type: idle-chk [ 3430.736248] [ 3430.736248] FW_IDLE_CHECK: [ 3430.736248] Error if no traffic: PSWRQ2: sr_cnt is not equal to sr_num_cfg. Registers: PSWRQ2_REG_SR_CNT=0x7e PSWRQ2_REG_SR_NUM_CFG=0x7f [ 3430.736248] Error if no traffic: PSWRQ2: SR_CNT_per_vq6 is not equal to MAX_SRS_vq. Registers: PSWRQ2_REG_SR_CNT_PER_VQ_6=0x7e PSWRQ2_REG_MAX_SRS_VQ6=0x7f [ 3430.736248] Error if no traffic: PSWRQ2: blk_cnt is not equal to blk_num_cfg. Registers: PSWRQ2_REG_BLK_CNT=0x120 PSWRQ2_REG_BLK_NUM_CFG=0x140 [ 3430.736248] Error if no traffic: PSWRQ2: There are more than eight used blocks. Registers: PSWRQ2_REG_BLK_CNT=0x120 PSWRQ2_REG_BLK_NUM_CFG=0x140 [ 3430.736257] ing: PBF: Packet/s were sent to BTB with 'drop' indication. Registers: PBF_REG_NUM_PKTS_SENT_WITH_DROP_TO_BTB=0xc2 [ 3430.736257] Error if no traffic: XSDM: There are outstanding PXP requests. Registers: XSDM_REG_DST_PXP_SRC_PEND_EMPTY=0x0 [ 3430.736257] Error if no traffic: XSDM: The PXP requests linked-list is not empty. Registers: XSDM_REG_DST_PXP_LINK_EMPTY=0x0 [ 3430.736257] Error if no traffic: XCM: There are messages in the XX RAM. Registers: XCM_REG_XX_FREE_CNT=0x3f [ 3430.736257] Error if no traffic: XCM: XX CAM has active lcids. Registers: XCM_REG_XX_LCID_CAM_FILL_LVL=0x1 [ 3430.736257] Error if no traffic: XCM: has locked lcids in xxLock. Registers: XCM_REG_XX_LOCK_CNT=0x1 [ 3430.736257] Error if no traffic: XSEM: Thread not freed. Registers: XSEM_REG_NUM_OF_THREADS=0x17 [ 3430.736257] Error if no traffic: QM: VOQ LINE credit on LB queues is not equal to init credit. Registers: QM_REG_VOQCRDLINE[16]=0x80d0 QM_REG_VOQINITCRDLINE[16]=0x8124 MISC_REG_PORT_MODE=0x0 [ 3430.835691] dit. Registers: QM_REG_VOQCRDLINE[0]=0x83aa QM_REG_VOQINITCRDLINE[0]=0x841a MISC_REG_PORT_MODE=0x0 [ 3430.835691] Error if no traffic: QM: VOQ BYTE credit on LB queues is not equal to init credit. Registers: QM_REG_VOQCRDBYTE[16]=0x815a1c QM_REG_VOQINITCRDBYTE[16]=0x818000 MISC_REG_PORT_MODE=0x0 [ 3430.835691] Error if no traffic: QM: VOQ BYTE credit on NW queues is not equal to init credit. Registers: QM_REG_VOQCRDBYTE[0]=0x815a1c QM_REG_VOQINITCRDBYTE[0]=0x818000 MISC_REG_PORT_MODE=0x0 [ 3430.835691] Error if no traffic: XCM: QM Active State Counter is not 0. Registers: XCM_REG_QM_ACT_ST_CNT[4]=0x2 [ 3430.835691] Error if no traffic: XCM: QM Active State Counter is not 0. Registers: XCM_REG_QM_ACT_ST_CNT[9]=0x1 [ 3430.835691] [ 3430.835691] LSI_IDLE_CHECK: [ 3430.835691] Error if no traffic: PSWRQ2: sr_cnt is not equal to sr_num_cfg. Registers: PSWRQ2_REG_SR_CNT=0x7e PSWRQ2_REG_SR_NUM_CFG=0x7f [ 3430.936354] traffic: PSWRQ2: blk_cnt is not equal to blk_num_cfg. Registers: PSWRQ2_REG_BLK_CNT=0x120 PSWRQ2_REG_BLK_NUM_CFG=0x140 [ 3430.936354] Error if no traffic: PSWRQ2: There are more than eight used blocks. Registers: PSWRQ2_REG_BLK_CNT=0x120 PSWRQ2_REG_BLK_NUM_CFG=0x140 [ 3430.936354] Error if no traffic: PSWRQ2: blk_cnt_per_vq6 is not equal to max_blks_vq. Registers: PSWRQ2_REG_BLK_CNT_PER_VQ_6=0x120 PSWRQ2_REG_MAX_BLKS_VQ6=0x140 [ 3430.936354] Error: PGLUE: PCIE error status is set. Registers: PGLUE_B_REG_PCIE_ERR_STATUS=0x4 [ 3430.936354] Warning: PBF: Packet/s were sent to BTB with 'drop' indication. Registers: PBF_REG_NUM_PKTS_SENT_WITH_DROP_TO_BTB=0xc2 [ 3430.936354] Error if no traffic: XSDM: XSDM_REG_DST_PXP_SRC_PEND_EMPTY = 0. Registers: XSDM_REG_DST_PXP_SRC_PEND_EMPTY=0x0 [ 3430.936354] Error if no traffic: XSDM: XSDM_REG_DST_PXP_LINK_EMPTY = 0. Registers: XSDM_REG_DST_PXP_LINK_EMPTY=0x0 [ 3431.029799] n CAM is not empty. Registers: XCM_REG_XX_LCID_CAM_FILL_LVL=0x1 [ 3431.029799] Error if no traffic: XCM: there is locked LCID in XX. Registers: XCM_REG_XX_LOCK_CNT=0x1 [ 3431.029799] Error if no traffic: XSEM: Thread not freed. Registers: XSEM_REG_NUM_OF_THREADS=0x17 [ 3431.029799] Error if no traffic: QM: VOQ LINE credit is not equal to init credit. Registers: QM_REG_VOQCRDLINE[16]=0x80d0 QM_REG_VOQINITCRDLINE[16]=0x8124 MISC_REG_PORT_MODE=0x0 [ 3431.029799] Error if no traffic: QM: VOQ LINE credit is not equal to init credit. Registers: QM_REG_VOQCRDLINE[0]=0x83aa QM_REG_VOQINITCRDLINE[0]=0x841a MISC_REG_PORT_MODE=0x0 [ 3431.029799] Error if no traffic: QM: VOQ BYTE credit is not equal to init credit. Registers: QM_REG_VOQCRDBYTE[16]=0x815a1c QM_REG_VOQINITCRDBYTE[16]=0x818000 MISC_REG_PORT_MODE=0x0 [ 3431.029799] Error if no traffic: QM: VOQ BYTE credit is not equal to init credit. Registers: QM_REG_VOQCRDBYTE[0]=0x815a1c QM_REG_VOQINITCRDBYTE[0]=0x818000 MISC_REG_PORT_MODE=0x0 [ 3431.123025] ot 0. Registers: XCM_REG_QM_ACT_ST_CNT[4]=0x2 [ 3431.123025] Error if no traffic: XCM: QM Active State Counter is not 0. Registers: XCM_REG_QM_ACT_ST_CNT[9]=0x1 [ 3431.123025] [ 3431.123025] Idle Check failed!!! (with 18 errors and 1 warnings) [ 3431.123025] [ 3431.123025] [ 3431.123025] [ 3431.123025] [ 3431.260135] [ 3431.262119] [qed_dbg_dump:8021(qede_roce)]Collecting a debug feature ["idle_chk"] [ 3431.277930] [ 3431.277930] fw-version: 8_59_1_1 [ 3431.277930] fw-image: main [ 3431.277930] mfw-version: 8_34_8_0 [ 3431.277930] chip-revision: b0 [ 3431.277930] tools-version: 11 [ 3431.277930] chip: bb [ 3431.277930] platform: asic [ 3431.277930] pci-func: 0 [ 3431.277930] epoch: 1685571839 [ 3431.277930] path: 0 [ 3431.277930] dump-type: idle-chk [ 3431.277930] [ 3431.277930] FW_IDLE_CHECK: [ 3431.277930] Error if no traffic: PSWRQ2: sr_cnt is not equal to sr_num_cfg. Registers: PSWRQ2_REG_SR_CNT=0x7e PSWRQ2_REG_SR_NUM_CFG=0x7f [ 3431.277930] Error if no traffic: PSWRQ2: SR_CNT_per_vq6 is not equal to MAX_SRS_vq. Registers: PSWRQ2_REG_SR_CNT_PER_VQ_6=0x7e PSWRQ2_REG_MAX_SRS_VQ6=0x7f [ 3431.277930] Error if no traffic: PSWRQ2: blk_cnt is not equal to blk_num_cfg. Registers: PSWRQ2_REG_BLK_CNT=0x120 PSWRQ2_REG_BLK_NUM_CFG=0x140 [ 3431.277930] Error if no traffic: PSWRQ2: There are more than eight used blocks. Registers: PSWRQ2_REG_BLK_CNT=0x120 PSWRQ2_REG_BLK_NUM_CFG=0x140 [ 3431.277938] r if no traffic: XSDM: There are outstanding PXP requests. Registers: XSDM_REG_DST_PXP_SRC_PEND_EMPTY=0x0 [ 3431.277938] Error if no traffic: XSDM: The PXP requests linked-list is not empty. Registers: XSDM_REG_DST_PXP_LINK_EMPTY=0x0 [ 3431.277938] Error if no traffic: XCM: There are messages in the XX RAM. Registers: XCM_REG_XX_FREE_CNT=0x3f [ 3431.277938] Error if no traffic: XCM: XX CAM has active lcids. Registers: XCM_REG_XX_LCID_CAM_FILL_LVL=0x1 [ 3431.277938] Error if no traffic: XCM: has locked lcids in xxLock. Registers: XCM_REG_XX_LOCK_CNT=0x1 [ 3431.277938] Error if no traffic: XSEM: Thread not freed. Registers: XSEM_REG_NUM_OF_THREADS=0x17 [ 3431.277938] Error if no traffic: QM: VOQ LINE credit on LB queues is not equal to init credit. Registers: QM_REG_VOQCRDLINE[16]=0x80d0 QM_REG_VOQINITCRDLINE[16]=0x8124 MISC_REG_PORT_MODE=0x0 [ 3431.381855] QM: VOQ BYTE credit on LB queues is not equal to init credit. Registers: QM_REG_VOQCRDBYTE[16]=0x815a1c QM_REG_VOQINITCRDBYTE[16]=0x818000 MISC_REG_PORT_MODE=0x0 [ 3431.381855] Error if no traffic: QM: VOQ BYTE credit on NW queues is not equal to init credit. Registers: QM_REG_VOQCRDBYTE[0]=0x815a1c QM_REG_VOQINITCRDBYTE[0]=0x818000 MISC_REG_PORT_MODE=0x0 [ 3431.381855] Error if no traffic: XCM: QM Active State Counter is not 0. Registers: XCM_REG_QM_ACT_ST_CNT[4]=0x2 [ 3431.381855] Error if no traffic: XCM: QM Active State Counter is not 0. Registers: XCM_REG_QM_ACT_ST_CNT[9]=0x1 [ 3431.381855] [ 3431.381855] LSI_IDLE_CHECK: [ 3431.381855] Error if no traffic: PSWRQ2: sr_cnt is not equal to sr_num_cfg. Registers: PSWRQ2_REG_SR_CNT=0x7e PSWRQ2_REG_SR_NUM_CFG=0x7f [ 3431.381855] Error if no traffic: PSWRQ2: SR_CNT_per_vq6 is not equal to MAX_SRS_vq. Registers: PSWRQ2_REG_SR_CNT_PER_VQ_6=0x7e PSWRQ2_REG_MAX_SRS_VQ6=0x7f [ 3431.564463] Error if no traffic: PSWRQ2: There are more than eight used blocks. Registers: PSWRQ2_REG_BLK_CNT=0x120 PSWRQ2_REG_BLK_NUM_CFG=0x140 [ 3431.564463] Error if no traffic: PSWRQ2: blk_cnt_per_vq6 is not equal to max_blks_vq. Registers: PSWRQ2_REG_BLK_CNT_PER_VQ_6=0x120 PSWRQ2_REG_MAX_BLKS_VQ6=0x140 [ 3431.564463] Error: PGLUE: PCIE error status is set. Registers: PGLUE_B_REG_PCIE_ERR_STATUS=0x4 [ 3431.564463] Error if no traffic: XSDM: XSDM_REG_DST_PXP_SRC_PEND_EMPTY = 0. Registers: XSDM_REG_DST_PXP_SRC_PEND_EMPTY=0x0 [ 3431.564463] Error if no traffic: XSDM: XSDM_REG_DST_PXP_LINK_EMPTY = 0. Registers: XSDM_REG_DST_PXP_LINK_EMPTY=0x0 [ 3431.564463] Error if no traffic: XCM: XX_FREE_CNT is different than reset value. Registers: XCM_REG_XX_FREE_CNT=0x3f [ 3431.564463] Error if no traffic: XCM: XX protection CAM is not empty. Registers: XCM_REG_XX_LCID_CAM_FILL_LVL=0x1 [ 3431.564463] Error if no traffic: XCM: there is locked LCID in XX. Registers: XCM_REG_XX_LOCK_CNT=0x1 [ 3431.664495] Error if no traffic: QM: VOQ LINE credit is not equal to init credit. Registers: QM_REG_VOQCRDLINE[16]=0x80d0 QM_REG_VOQINITCRDLINE[16]=0x8124 MISC_REG_PORT_MODE=0x0 [ 3431.664495] Error if no traffic: QM: VOQ LINE credit is not equal to init credit. Registers: QM_REG_VOQCRDLINE[0]=0x83aa QM_REG_VOQINITCRDLINE[0]=0x841a MISC_REG_PORT_MODE=0x0 [ 3431.664495] Error if no traffic: QM: VOQ BYTE credit is not equal to init credit. Registers: QM_REG_VOQCRDBYTE[16]=0x815a1c QM_REG_VOQINITCRDBYTE[16]=0x818000 MISC_REG_PORT_MODE=0x0 [ 3431.664495] Error if no traffic: QM: VOQ BYTE credit is not equal to init credit. Registers: QM_REG_VOQCRDBYTE[0]=0x815a1c QM_REG_VOQINITCRDBYTE[0]=0x818000 MISC_REG_PORT_MODE=0x0 [ 3431.664495] Error if no traffic: XCM: QM Active State Counter is not 0. Registers: XCM_REG_QM_ACT_ST_CNT[4]=0x2 [ 3431.664495] Error if no traffic: XCM: QM Active State Counter is not 0. Registers: XCM_REG_QM_ACT_ST_CNT[9]=0x1 [ 3431.664495] [ 3431.763194] [ 3431.765204] [qed_dbg_dump:8021(qede_roce)]Collecting a debug feature ["reg_fifo"] [ 3431.773935] [ 3431.773935] fw-version: 8_59_1_1 [ 3431.773935] fw-image: main [ 3431.773935] mfw-version: 8_34_8_0 [ 3431.773935] chip-revision: b0 [ 3431.773935] tools-version: 11 [ 3431.773935] chip: bb [ 3431.773935] platform: asic [ 3431.773935] pci-func: 0 [ 3431.773935] epoch: 1685571839 [ 3431.773935] path: 0 [ 3431.773935] dump-type: reg-fifo [ 3431.773935] [ 3431.773935] fifo contained 0 elements [ 3431.773935] [ 3431.821123] [ 3431.823033] [qed_dbg_dump:8021(qede_roce)]Collecting a debug feature ["igu_fifo"] [ 3431.831668] [ 3431.831668] fw-version: 8_59_1_1 [ 3431.831668] fw-image: main [ 3431.831668] mfw-version: 8_34_8_0 [ 3431.831668] chip-revision: b0 [ 3431.831668] tools-version: 11 [ 3431.831668] chip: bb [ 3431.831668] platform: asic [ 3431.831668] pci-func: 0 [ 3431.831668] epoch: 1685571839 [ 3431.831668] path: 0 [ 3431.831668] dump-type: igu-fifo [ 3431.831668] [ 3431.831668] fifo contained 0 elements [ 3431.831668] [ 3431.877837] [ 3431.879656] [qed_dbg_dump:8021(qede_roce)]Collecting a debug feature ["protection_override"] [ 3431.889332] [ 3431.889332] fw-version: 8_59_1_1 [ 3431.889332] fw-image: main [ 3431.889332] mfw-version: 8_34_8_0 [ 3431.889332] chip-revision: b0 [ 3431.889332] tools-version: 11 [ 3431.889332] chip: bb [ 3431.889332] platform: asic [ 3431.889332] pci-func: 0 [ 3431.889332] epoch: 1685571840 [ 3431.889332] path: 0 [ 3431.889332] dump-type: protection-override [ 3431.889332] [ 3431.889332] window 0, address: 0x0009060, size: 1 regs, read: 1, write: 1, read protection: override PDA, write protection: override PDA [ 3431.944158] window 1, address: 0x0e00098, size: 2 regs, read: 1, write: 1, read protection: override PDA, write protection: override PDA [ 3431.944158] window 2, address: 0x00500a4, size: 5 regs, read: 1, write: 1, read protection: override PDA, write protection: override PDA [ 3431.944158] window 3, address: 0x0540a7c, size: 5 regs, read: 1, write: 1, read protection: override PDA, write protection: override PDA [ 3431.944158] window 4, address: 0x02a8400, size: 5 regs, read: 1, write: 1, read protection: override PDA, write protection: override PDA [ 3431.944158] window 5, address: 0x0040474, size: 5 regs, read: 1, write: 1, read protection: override PDA, write protection: override PDA [ 3431.944158] window 6, address: 0x0e2a0b0, size: 3 regs, read: 1, write: 1, read protection: override VF , write protection: override VF [ 3431.944166] 88 regs, read: 1, write: 1, read protection: override VF , write protection: override VF [ 3431.944166] window 9, address: 0x0e2a100, size: 800 regs, read: 1, write: 1, read protection: override VF , write protection: override VF [ 3431.944166] window 10, address: 0x02aa3bc, size: 1 regs, read: 1, write: 1, read protection: override PDA, write protection: override PDA [ 3431.944166] window 11, address: 0x0000000, size: 8388607 regs, read: 1, write: 0, read protection: override PDA, write protection: override HV [ 3431.944166] window 12, address: 0x0008d1c, size: 1 regs, read: 1, write: 1, read protection: override VF , write protection: override VF [ 3431.944166] protection override contained 13 elements [ 3431.944166] [ 3431.944166] [ 3432.112459] [ 3432.114385] [qed_dbg_dump:8021(qede_roce)]Collecting a debug feature ["fw_asserts"] [ 3432.123438] [ 3432.123438] fw-version: 8_59_1_1 [ 3432.123438] fw-image: main [ 3432.123438] mfw-version: 8_34_8_0 [ 3432.123438] chip-revision: b0 [ 3432.123438] tools-version: 11 [ 3432.123438] chip: bb [ 3432.123438] platform: asic [ 3432.123438] pci-func: 0 [ 3432.123438] epoch: 1685571840 [ 3432.123438] path: 0 [ 3432.123438] dump-type: fw-asserts [ 3432.123438] [ 3432.123438] [ 3432.123438] TSTORM_ASSERT: size=8 [ 3432.123438] 00030004 [ 3432.123438] 87d252a9 [ 3432.123438] 4dcf5854 [ 3432.123438] ff000006 [ 3432.123438] 89000035 [ 3432.123438] ff000006 [ 3432.123438] 04f9020e [ 3432.123438] 020851df [ 3432.123438] [ 3432.123438] MSTORM_ASSERT: size=8 [ 3432.123438] 01030004 [ 3432.123438] 87d252a9 [ 3432.123438] 4703040b [ 3432.123438] ff000006 [ 3432.123438] 89000035 [ 3432.123438] ff000006 [ 3432.123438] 04f9020e [ 3432.123438] 020851df [ 3432.123438] [ 3432.123438] USTORM_ASSERT: size=8 [ 3432.123438] 02030004 [ 3432.123438] 87d252a9 [ 3432.123438] cd34d58d [ 3432.123438] ff000006 [ 3432.234059] YSTORM_ASSERT: size=8 [ 3432.234059] 04030004 [ 3432.234059] 87d252a9 [ 3432.234059] 00076fb8 [ 3432.234059] ff000006 [ 3432.234059] 89000035 [ 3432.234059] ff000006 [ 3432.234059] 04f9020e [ 3432.234059] 020851da [ 3432.234059] [ 3432.234059] PSTORM_ASSERT: size=8 [ 3432.234059] 05030004 [ 3432.234059] 87d252a9 [ 3432.234059] b69a1327 [ 3432.234059] ff000006 [ 3432.234059] 89000035 [ 3432.234059] ff000006 [ 3432.234059] 04f9020e [ 3432.234059] 020851da [ 3432.234059] [ 3432.234059] [ 3432.234059] [ 3432.288633] [ 3432.290375] [qed_dbg_dump:8021(qede_roce)]Collecting a debug feature ["ilt"] [ 3432.299667] [qed_dbg_dump:8021(qede_roce)]Collecting a debug feature ["grc"] [ 3432.594054] [qed_dbg_dump:8021(qede_roce)]Collecting a debug feature ["mcp_trace"] [ 3433.174928] [ 3433.174928] fw-version: 8_59_1_1 [ 3433.174928] fw-image: main [ 3433.174928] mfw-version: 8_34_8_0 [ 3433.174928] chip-revision: b0 [ 3433.174928] tools-version: 11 [ 3433.174928] chip: bb [ 3433.174928] platform: asic [ 3433.174928] pci-func: 0 [ 3433.174928] epoch: 1685571840 [ 3433.174928] path: 0 [ 3433.174928] dump-type: mcp-trace [ 3433.174928] [ 3433.174928] TRACE DRV_HSI : [F1] Updating PF driver with msg 0x5 [ 3433.174928] TRACE DRV_HSI : [F1] DRV_MSG_CODE_GET_OEM_UPDATES 0x1 [ 3433.174928] TRACE DRV_HSI : [F1] Updating PF driver with msg 0x7 [ 3433.174928] TRACE DRV_HSI : [F1] DRV_MSG_CODE_BW_UPDATE_ACK 0x0 [ 3433.174928] TRACE DRV_HSI : [F1] DRV_MSG_CODE_OV_UPDATE_STORM_FW_VER 0x83b0100 [ 3433.174928] TRACE DRV_HSI : [F1] DRV_MSG_CODE_OV_UPDATE_MTU 0x5dc [ 3433.174928] TRACE DRV_HSI : [F1] DRV_MSG_CODE_OV_UPDATE_DRIVER_STATE 0x4 [ 3433.174928] TRACE DRV_HSI : [F1] DRV_MSG_CODE_OV_UPDATE_ESWITCH_MODE 0x0 [ 3433.174928] TRACE DRV_HSI : [F1] INIT_PHY 0x0 [ 3433.174938] on port 1 [ 3433.174938] TRACE ETH : [P1] Port reset: value 0x0 [ 3433.174938] TRACE DRV_HSI : [P1] Updating all pfs on with msg 0x0 [ 3433.174938] TRACE ETH : [P1] Increment link-flap counter to 2 [ 3433.174938] TRACE ETH : [P1] Updating LFA status to 0x20001 [ 3433.174938] TRACE ETH : SmartAN enabled set to 0 [ 3433.174938] TRACE ETH : [P1] Transceiver card [ 3433.174938] TRACE ETH : Init now [ 3433.174938] TRACE ETH : [P1] Disable Firecode-FEC lane 2 [ 3433.174938] TRACE ETH : [P1] Disable RS-FEC [ 3433.174938] TRACE ETH : [P1] Force speed 0x61a8 [ 3433.174938] TRACE ETH : [P1] eth_port_init: req_speed: 25000Mbps adv_speed 0xa [ 3433.174938] TRACE ETH : eth_port_init: flow_ctrl 0x7 loopback 0x0 [ 3433.174938] TRACE ETH : Transceiver type is 0x16 [ 3433.174938] TRACE ETH : Transceiver init done on port 1 [ 3433.174938] TRACE ETH : [P1] phy->lane_map = 0x00000004, num_lanes = 1 [ 3433.275853] ] Setting 25G/50G OUI to 0x6a737d [ 3433.275853] ERROR ERROR : Autoneg Pause setting ignored while speed is not set to autoneg [ 3433.275853] TRACE ETH : [1]: POWER UP PHY. [ 3433.275853] TRACE ETH : [P1] Power ON PHY [ 3433.275853] TRACE ETH : [1]: ETH stat address: 0xe2ae28 [ 3433.275853] TRACE DRV_HSI : [F1] DRV_MSG_CODE_OV_UPDATE_DRIVER_STATE 0x5 [ 3433.275853] TRACE ETH : [P1] MAC fault change: 0x0->0x1 [ 3433.275853] TRACE ETH : physical_link_update: status 0x1800000 on port 1 [ 3433.275853] TRACE ETH : eth_link_update. new_link_status = 0x1800000 link_speed = 0 [ 3433.275853] TRACE ETH : [P1] MAC fault change: 0x1->0x3 [ 3433.275853] TRACE ETH : physical_link_update: status 0x3800000 on port 1 [ 3433.275853] TRACE TRACE : Repeat the last trace for 1 times [ 3433.275853] TRACE ETH : eth_link_update. new_link_status = 0x3800000 link_speed = 25000 [ 3433.275853] TRACE ETH : [P1] MAC fault change: 0x3->0x2 [ 3433.373204] 25000 [ 3433.373204] TRACE DRV_HSI : [F0] INIT_PHY 0x0 [ 3433.373204] TRACE ETH : [P0] Check LFA - First time link init [ 3433.373204] TRACE ETH : [P0] CLPORT: port disabled [ 3433.373204] TRACE ETH : [P0] Power OFF PHY [ 3433.373204] TRACE ETH : [P0] Reset Tx_Credits and soft reset MAC [ 3433.373204] TRACE ETH : [P0] Closing data path and reset serdes [ 3433.373204] TRACE ETH : physical_link_update: status 0x0 on port 0 [ 3433.373204] TRACE ETH : [P0] Port reset: value 0x0 [ 3433.373204] TRACE DRV_HSI : [P0] Updating all pfs on with msg 0x0 [ 3433.373204] TRACE ETH : [P0] Increment link-flap counter to 2 [ 3433.373204] TRACE ETH : [P0] Updating LFA status to 0x20001 [ 3433.373204] TRACE ETH : SmartAN enabled set to 0 [ 3433.373204] TRACE ETH : [P0] Transceiver card [ 3433.373204] TRACE ETH : Init now [ 3433.373204] TRACE ETH : [P0] Disable Firecode-FEC lane 0 [ 3433.373204] TRACE ETH : [P0] Disable RS-FEC [ 3433.471850] H : Transceiver init done on port 0 [ 3433.471850] TRACE ETH : [P0] phy->lane_map = 0x00000001, num_lanes = 1 [ 3433.471850] TRACE ETH : [P0] Serdes core_revision: 0x6, serdes id is 0x4ad4 [ 3433.471850] TRACE ETH : [P0] Autoneg is disabled [ 3433.471850] TRACE ETH : [P0] Clear ALLOW_PLL_CHANGE [ 3433.471850] TRACE ETH : [P0] Setting an.enable to 0 an_mode to 0x3 [ 3433.471850] TRACE ETH : [P0] Setting 25G/50G OUI to 0x6a737d [ 3433.471850] ERROR ERROR : Autoneg Pause setting ignored while speed is not set to autoneg [ 3433.471850] TRACE ETH : [0]: POWER UP PHY. [ 3433.471850] TRACE ETH : [P0] Power ON PHY [ 3433.471850] TRACE ETH : [0]: ETH stat address: 0xe2a1a8 [ 3433.471850] TRACE DRV_HSI : [F0] DRV_MSG_CODE_OV_UPDATE_DRIVER_STATE 0x5 [ 3433.471850] TRACE ETH : [P0] MAC fault change: 0x0->0x1 [ 3433.471850] TRACE ETH : physical_link_update: status 0x1800000 on port 0 [ 3433.572516] new_link_status = 0xe00011 link_speed = 25000 [ 3433.572516] TRACE ETH : [P1] eth_link_update: link_state 0x800000->0xe00011 [ 3433.572516] TRACE ETH : [P1] Link is Up in 25000Mbps (change count 1) [ 3433.572516] TRACE ETH : [P1] Reset Tx_Credits and soft reset MAC [ 3433.572516] TRACE ETH : [P1] CLMAC configured. Pause RX/TX enabled [ 3433.572516] TRACE ETH : physical_link_update: status 0xe00011 on port 1 [ 3433.572516] TRACE DBG : port 1: DCBX config 0x10 [ 3433.572516] TRACE DBG : Port 1: DCB mode set to 0. Updating the driver for operational MIB [ 3433.728827] TRACE DRV_HSI : [P1] Updating all pfs on with msg 0x5 [ 3433.728827] TRACE DRV_HSI : [F1] Updating PF driver with msg 0x5 [ 3433.728827] TRACE DBG : Configuring NIG for PFC and ETS for port 1 [ 3433.728827] TRACE ETH : Disabling PFC [ 3433.728827] TRACE ETH : [P1] CLMAC configured. Pause RX/TX enabled [ 3433.728827] TRACE DRV_HSI : [F1] LINK_RESET 0x0 [ 3433.728827] TRACE INIT : g_port 1 chan->config = 0x4b94adc6 chan->nw2bmc = 0x0 [ 3433.728827] TRACE INIT : g_spad.nvm_cfg.cfg1.port[1].features= 0x0 [ 3433.728827] TRACE ETH : [P1] CLPORT: port disabled [ 3433.728827] TRACE ETH : [P1] Power OFF PHY [ 3433.728827] TRACE ETH : [P1] Reset Tx_Credits and soft reset MAC [ 3433.728827] TRACE ETH : [P1] Closing data path and reset serdes [ 3433.728827] TRACE ETH : physical_link_update: status 0x0 on port 1 [ 3433.728827] TRACE ETH : [P1] Port reset: value 0x0 [ 3433.728827] TRACE DRV_HSI : [F1] INIT_PHY 0x0 [ 3433.827130] TRACE ETH : [P1] CLPORT: port disabled [ 3433.827130] TRACE ETH : [P1] Power OFF PHY [ 3433.827130] TRACE ETH : [P1] Reset Tx_Credits and soft reset MAC [ 3433.827130] TRACE ETH : [P1] Closing data path and reset serdes [ 3433.827130] TRACE ETH : physical_link_update: status 0x0 on port 1 [ 3433.827130] TRACE ETH : [P1] Port reset: value 0x0 [ 3433.827130] TRACE DRV_HSI : [P1] Updating all pfs on with msg 0x0 [ 3433.827130] TRACE ETH : [P1] Increment link-flap counter to 3 [ 3433.827130] TRACE ETH : [P1] Updating LFA status to 0x30001 [ 3433.827130] TRACE ETH : SmartAN enabled set to 0 [ 3433.827130] TRACE ETH : [P1] Transceiver card [ 3433.827130] TRACE ETH : Init now [ 3433.827130] TRACE ETH : [P1] Disable Firecode-FEC lane 2 [ 3433.827130] TRACE ETH : [P1] Disable RS-FEC [ 3433.827130] TRACE ETH : [P1] Force speed 0x61a8 [ 3433.827130] TRACE ETH : [P1] eth_port_init: req_speed: 25000Mbps adv_speed 0xa [ 3433.827138] nes = 1 [ 3433.827138] TRACE ETH : [P1] Serdes core_revision: 0x6, serdes id is 0x4ad4 [ 3433.827138] TRACE ETH : [P1] Autoneg is disabled [ 3433.827138] TRACE ETH : [P1] Clear ALLOW_PLL_CHANGE [ 3433.827138] TRACE ETH : [P1] Setting an.enable to 0 an_mode to 0x3 [ 3433.827138] TRACE ETH : [P1] Setting 25G/50G OUI to 0x6a737d [ 3433.827138] ERROR ERROR : Autoneg Pause setting ignored while speed is not set to autoneg [ 3433.827138] TRACE ETH : [1]: POWER UP PHY. [ 3433.827138] TRACE ETH : [P1] Power ON PHY [ 3433.827138] TRACE ETH : [1]: ETH stat address: 0xe2ae28 [ 3433.827138] TRACE DRV_HSI : [F1] DRV_MSG_CODE_OV_UPDATE_MTU 0x2328 [ 3433.827138] TRACE ETH : [P1] MAC fault change: 0x0->0x1 [ 3433.827138] TRACE ETH : physical_link_update: status 0x1800000 on port 1 [ 3433.827138] TRACE ETH : eth_link_update. new_link_status = 0x1800000 link_speed = 0 [ 3433.827138] TRACE DRV_HSI : [F1] DRV_MSG_CODE_OV_UPDATE_CURR_CFG 0x1 [ 3434.028467] TRACE ETH : [P0] MAC fault change: 0x3->0x2 [ 3434.028467] TRACE ETH : physical_link_update: status 0x2800000 on port 0 [ 3434.028467] TRACE TRACE : Repeat the last trace for 1 times [ 3434.028467] TRACE ETH : eth_link_update. new_link_status = 0x2800000 link_speed = 25000 [ 3434.028467] TRACE ETH : [P0] MAC fault change: 0x2->0x0 [ 3434.028467] TRACE ETH : physical_link_update: status 0x800000 on port 0 [ 3434.028467] TRACE ETH : eth_link_update. new_link_status = 0xe00011 link_speed = 25000 [ 3434.028467] TRACE ETH : [P0] eth_link_update: link_state 0x800000->0xe00011 [ 3434.028467] TRACE ETH : [P0] Link is Up in 25000Mbps (change count 1) [ 3434.028467] TRACE ETH : [P0] Reset Tx_Credits and soft reset MAC [ 3434.028467] TRACE ETH : [P0] CLMAC configured. Pause RX/TX enabled [ 3434.028467] TRACE ETH : physical_link_update: status 0xe00011 on port 0 [ 3434.028467] TRACE DBG : port 0: DCBX config 0x10 [ 3434.028488] r with msg 0x5 [ 3434.028488] TRACE DBG : Configuring NIG for PFC and ETS for port 0 [ 3434.028488] TRACE ETH : Disabling PFC [ 3434.028488] TRACE ETH : [P0] CLMAC configured. Pause RX/TX enabled [ 3434.028488] TRACE DRV_HSI : [F0] LINK_RESET 0x0 [ 3434.028488] TRACE INIT : g_port 0 chan->config = 0x58062e48 chan->nw2bmc = 0x1 [ 3434.028488] TRACE INIT : g_spad.nvm_cfg.cfg1.port[0].features= 0x0 [ 3434.028488] TRACE ETH : [P0] CLPORT: port disabled [ 3434.028488] TRACE ETH : [P0] Power OFF PHY [ 3434.028488] TRACE ETH : [P0] Reset Tx_Credits and soft reset MAC [ 3434.028488] TRACE ETH : [P0] Closing data path and reset serdes [ 3434.028488] TRACE ETH : physical_link_update: status 0x0 on port 0 [ 3434.028488] TRACE ETH : [P0] Port reset: value 0x0 [ 3434.028488] TRACE DRV_HSI : [F0] INIT_PHY 0x0 [ 3434.028488] TRACE ETH : [P0] Check LFA - First time link init [ 3434.028488] TRACE ETH : [P0] CLPORT: port disabled [ 3434.127153] ACE ETH : [P0] Port reset: value 0x0 [ 3434.127153] TRACE DRV_HSI : [P0] Updating all pfs on with msg 0x0 [ 3434.127153] TRACE ETH : [P0] Increment link-flap counter to 3 [ 3434.127153] TRACE ETH : [P0] Updating LFA status to 0x30001 [ 3434.127153] TRACE ETH : SmartAN enabled set to 0 [ 3434.127153] TRACE ETH : [P0] Transceiver card [ 3434.127153] TRACE ETH : Init now [ 3434.127153] TRACE ETH : [P0] Disable Firecode-FEC lane 0 [ 3434.127153] TRACE ETH : [P0] Disable RS-FEC [ 3434.127153] TRACE ETH : [P0] Force speed 0x61a8 [ 3434.127153] TRACE ETH : [P0] eth_port_init: req_speed: 25000Mbps adv_speed 0xa [ 3434.127153] TRACE ETH : eth_port_init: flow_ctrl 0x7 loopback 0x0 [ 3434.127153] TRACE ETH : Transceiver type is 0x16 [ 3434.127153] TRACE ETH : Transceiver init done on port 0 [ 3434.127153] TRACE ETH : [P0] phy->lane_map = 0x00000001, num_lanes = 1 [ 3434.227772] /50G OUI to 0x6a737d [ 3434.227772] ERROR ERROR : Autoneg Pause setting ignored while speed is not set to autoneg [ 3434.227772] TRACE ETH : [0]: POWER UP PHY. [ 3434.227772] TRACE ETH : [P0] Power ON PHY [ 3434.227772] TRACE ETH : [0]: ETH stat address: 0xe2a1a8 [ 3434.227772] TRACE DRV_HSI : [F0] DRV_MSG_CODE_OV_UPDATE_MTU 0x2328 [ 3434.227772] TRACE ETH : [P0] MAC fault change: 0x0->0x1 [ 3434.227772] TRACE ETH : physical_link_update: status 0x1800000 on port 0 [ 3434.227772] TRACE ETH : eth_link_update. new_link_status = 0x1800000 link_speed = 0 [ 3434.227772] TRACE DRV_HSI : [F0] DRV_MSG_CODE_OV_UPDATE_CURR_CFG 0x1 [ 3434.227772] TRACE ETH : [P1] MAC fault change: 0x1->0x3 [ 3434.227772] TRACE ETH : physical_link_update: status 0x3800000 on port 1 [ 3434.227772] TRACE TRACE : Repeat the last trace for 1 times [ 3434.227772] TRACE ETH : eth_link_update. new_link_status = 0x3800000 link_speed = 25000 [ 3434.323252] w_link_status = 0x2800000 link_speed = 25000 [ 3434.323252] TRACE ETH : [P1] MAC fault change: 0x2->0x0 [ 3434.323252] TRACE ETH : physical_link_update: status 0x800000 on port 1 [ 3434.323252] TRACE ETH : eth_link_update. new_link_status = 0xe00011 link_speed = 25000 [ 3434.323252] TRACE ETH : [P1] eth_link_update: link_state 0x800000->0xe00011 [ 3434.323252] TRACE ETH : [P1] Link is Up in 25000Mbps (change count 1) [ 3434.323252] TRACE ETH : [P1] Reset Tx_Credits and soft reset MAC [ 3434.323252] TRACE ETH : [P1] CLMAC configured. Pause RX/TX enabled [ 3434.323252] TRACE ETH : physical_link_update: status 0xe00011 on port 1 [ 3434.323252] TRACE DBG : port 1: DCBX config 0x10 [ 3434.323252] TRACE DBG : Port 1: DCB mode set to 0. Updating the driver for operational MIB [ 3434.323252] TRACE DRV_HSI : [P1] Updating all pfs on with msg 0x5 [ 3434.323252] TRACE DRV_HSI : [F1] Updating PF driver with msg 0x5 [ 3434.421412] ge: 0x1->0x3 [ 3434.421412] TRACE ETH : physical_link_update: status 0x3800000 on port 0 [ 3434.421412] TRACE TRACE : Repeat the last trace for 1 times [ 3434.421412] TRACE ETH : eth_link_update. new_link_status = 0x3800000 link_speed = 25000 [ 3434.421412] TRACE ETH : [P0] MAC fault change: 0x3->0x2 [ 3434.421412] TRACE ETH : physical_link_update: status 0x2800000 on port 0 [ 3434.421412] TRACE TRACE : Repeat the last trace for 1 times [ 3434.421412] TRACE ETH : eth_link_update. new_link_status = 0x2800000 link_speed = 25000 [ 3434.421412] TRACE ETH : [P0] MAC fault change: 0x2->0x0 [ 3434.421412] TRACE ETH : physical_link_update: status 0x800000 on port 0 [ 3434.421412] TRACE ETH : eth_link_update. new_link_status = 0xe00011 link_speed = 25000 [ 3434.421412] TRACE ETH : [P0] eth_link_update: link_state 0x800000->0xe00011 [ 3434.421412] TRACE ETH : [P0] Link is Up in 25000Mbps (change count 1) [ 3434.519309] G : port 0: DCBX config 0x10 [ 3434.519309] TRACE DBG : Port 0: DCB mode set to 0. Updating the driver for operational MIB [ 3434.519309] TRACE DRV_HSI : [P0] Updating all pfs on with msg 0x5 [ 3434.519309] TRACE DRV_HSI : [F0] Updating PF driver with msg 0x5 [ 3434.519309] TRACE DBG : Configuring NIG for PFC and ETS for port 0 [ 3434.519309] TRACE ETH : Disabling PFC [ 3434.519309] TRACE ETH : [P0] CLMAC configured. Pause RX/TX enabled [ 3434.519309] TRACE DRV_HSI : [F0] DRV_MSG_CODE_NIG_DRAIN 0x3e8 [ 3434.519309] TRACE DBG : [P0] Start NIG drain time=87000000 [ 3434.519309] TRACE DBG : [P0] Start NIG drain, time_to_end =1467159835 [ 3434.519309] TRACE DBG : [P0] Start TC NIG drain for TC=0. time_to_end = 1467160733 [ 3434.519309] TRACE DBG : [P0] Start TC NIG drain for TC=1. time_to_end = 1467161633 [ 3434.519309] TRACE DBG : [P0] Start TC NIG drain for TC=2. time_to_end = 1467162625 [ 3434.616868] ime_to_end = 1467166003 [ 3434.616868] TRACE DBG : [P0] Start TC NIG drain for TC=6. time_to_end = 1467167157 [ 3434.616868] TRACE DBG : [P0] Start TC NIG drain for TC=7. time_to_end = 1467168333 [ 3434.616868] TRACE DBG : [P0] Stop NIG drain, diff =59736 [ 3434.616868] TRACE DBG : [P0] Stop NIG drain on TC_bit=1, diff =60635 [ 3434.616868] TRACE DBG : [P0] Stop NIG drain on TC_bit=3, diff =61574 [ 3434.616868] TRACE DBG : [P0] Stop NIG drain on TC_bit=7, diff =62571 [ 3434.616868] TRACE DBG : [P0] Stop NIG drain on TC_bit=15, diff =63682 [ 3434.616868] TRACE DBG : [P0] Stop NIG drain on TC_bit=31, diff =64911 [ 3434.616868] TRACE DBG : [P0] Stop NIG drain on TC_bit=63, diff =66119 [ 3434.616868] TRACE DBG : [P0] Stop NIG drain on TC_bit=127, diff =67320 [ 3434.616868] TRACE DBG : [P0] Stop NIG drain on TC_bit=255, diff =68502 [ 3434.616868] ERROR ERROR : [F0]Invalid command 0xc0040000 [ 3434.616868] [ 3434.616868] [ 3434.616868] [ 3434.815735] [ 3434.817593] [qed_dbg_nvm_image:8155(qede_roce)]Collecting a debug feature ["nvram image 3"] [ 3434.875168] [qed_dbg_nvm_image:8155(qede_roce)]Collecting a debug feature ["nvram image 4"] [ 3434.932962] [qed_dbg_nvm_image:8155(qede_roce)]Collecting a debug feature ["nvram image 5"] [ 3435.404834] [qed_dbg_nvm_image:8155(qede_roce)]Collecting a debug feature ["nvram image 2"] [ 3435.964490] [qed_start_recovery_process:2659(qede_roce-0)]Triggering a recovery process [ 3435.973627] [qede_generic_hw_err_handler:2716(qede_roce)]Generic sleepable HW error handling is done [ 3435.973663] [qed_mcp_handle_process_kill:1699(qede_roce-0)]Received a process kill indication [ 3435.993752] [qed_mcp_handle_process_kill:1716(qede_roce-0)]Process kill counter: 1 [ 3436.002467] [qede_recovery_handler:2645(qede_roce)]Starting a recovery process [ 3445.817859] [qed_hw_err_notify:830(qede_roce-0)]Ramrod is stuck [CID ff000000 RDMA_RAMROD_DEREGISTER_MR:04 PROTOCOLID_ROCE:02 echo 04fb] [ 3462.026215] restraintd[1961]: *** Current Time: Wed May 31 18:24:30 2023 Localwatchdog at: Wed May 31 19:23:29 2023 [-- MARK -- Wed May 31 22:25:00 2023] [ 3522.034296] restraintd[1961]: *** Current Time: Wed May 31 18:25:30 2023 Localwatchdog at: Wed May 31 19:23:29 2023 ======= Start of the tracebacks ============== [ 3564.513986] INFO: task kworker/u96:7:91 blocked for more than 120 seconds. [ 3564.521918] Not tainted 4.18.0-494.el8.x86_64 #1 [ 3564.527832] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3564.536754] task:kworker/u96:7 state stack: 0 pid: 91 ppid: 2 flags:0x80004000 [ 3564.546278] Workqueue: events_unbound ucma_close_id [rdma_ucm] [ 3564.552997] Call Trace: [ 3564.555909] __schedule+0x2d1/0x870 [ 3564.560003] schedule+0x55/0xf0 [ 3564.563673] schedule_timeout+0x281/0x320 [ 3564.568354] ? xas_store+0x303/0x5a0 [ 3564.572519] wait_for_completion+0x96/0x100 [ 3564.577369] cm_destroy_id+0x21c/0x490 [ib_cm] [ 3564.582509] ? _destroy_id+0x2f3/0x300 [rdma_cm] [ 3564.587840] ? kfree+0x22e/0x250 [ 3564.591600] _destroy_id+0xe8/0x300 [rdma_cm] [ 3564.596639] ucma_close_id+0x2b/0x40 [rdma_ucm] [ 3564.601859] process_one_work+0x1d3/0x390 [ 3564.606507] worker_thread+0x30/0x390 [ 3564.610766] ? process_one_work+0x390/0x390 [ 3564.615599] kthread+0x134/0x150 [ 3564.619374] ? set_kthread_struct+0x50/0x50 [ 3564.624238] ret_from_fork+0x35/0x40 [ 3564.628440] INFO: task kworker/2:1:100 blocked for more than 120 seconds. [ 3564.636254] Not tainted 4.18.0-494.el8.x86_64 #1 [ 3564.642202] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3564.651191] task:kworker/2:1 state stack: 0 pid: 100 ppid: 2 flags:0x80004000 [ 3564.660738] Workqueue: events qede_sp_task [qede] [ 3564.666271] Call Trace: [ 3564.669244] __schedule+0x2d1/0x870 [ 3564.673390] schedule+0x55/0xf0 [ 3564.677132] schedule_timeout+0x281/0x320 [ 3564.681827] ? __queue_work+0x13d/0x3f0 [ 3564.686358] ? xas_load+0x8/0x80 [ 3564.690188] wait_for_completion+0x96/0x100 [ 3564.695100] cma_process_remove+0x227/0x250 [rdma_cm] [ 3564.701018] cma_remove_one+0x62/0xd0 [rdma_cm] [ 3564.706317] remove_client_context+0x8b/0xd0 [ib_core] [ 3564.712321] disable_device+0x8c/0x130 [ib_core] [ 3564.717746] __ib_unregister_device+0x40/0xa0 [ib_core] [ 3564.723827] ib_unregister_device+0x21/0x30 [ib_core] [ 3564.729714] qedr_remove+0xe/0xa0 [qedr] [ 3564.734346] qede_rdma_dev_remove+0x1d9/0x200 [qede] [ 3564.740171] __qede_remove+0x4d/0x1a0 [qede] [ 3564.745163] qede_sp_task+0x2ca/0x350 [qede] [ 3564.750153] process_one_work+0x1d3/0x390 [ 3564.754877] worker_thread+0x30/0x390 [ 3564.759189] ? process_one_work+0x390/0x390 [ 3564.764083] kthread+0x134/0x150 [ 3564.767882] ? set_kthread_struct+0x50/0x50 [ 3564.772820] ret_from_fork+0x35/0x40 [ 3564.777037] INFO: task kworker/2:2:101 blocked for more than 120 seconds. [ 3564.784817] Not tainted 4.18.0-494.el8.x86_64 #1 [ 3564.790784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3564.799738] task:kworker/2:2 state stack: 0 pid: 101 ppid: 2 flags:0x80004000 [ 3564.809345] Workqueue: events linkwatch_event [ 3564.814425] Call Trace: [ 3564.817380] __schedule+0x2d1/0x870 [ 3564.821490] schedule+0x55/0xf0 [ 3564.825256] schedule_preempt_disabled+0xa/0x10 [ 3564.830522] __mutex_lock.isra.7+0x349/0x420 [ 3564.835508] ? lock_timer_base+0x67/0x90 [ 3564.840112] linkwatch_event+0xa/0x30 [ 3564.844446] process_one_work+0x1d3/0x390 [ 3564.849151] worker_thread+0x30/0x390 [ 3564.853443] ? process_one_work+0x390/0x390 [ 3564.858338] kthread+0x134/0x150 [ 3564.862184] ? set_kthread_struct+0x50/0x50 [ 3564.867072] ret_from_fork+0x35/0x40 [ 3564.871320] INFO: task NetworkManager:919 blocked for more than 120 seconds. [ 3564.879449] Not tainted 4.18.0-494.el8.x86_64 #1 [ 3564.885412] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3564.894391] task:NetworkManager state stack: 0 pid: 919 ppid: 1 flags:0x00000080 [ 3564.903988] Call Trace: [ 3564.906939] __schedule+0x2d1/0x870 [ 3564.911051] schedule+0x55/0xf0 [ 3564.914786] schedule_preempt_disabled+0xa/0x10 [ 3564.920080] __mutex_lock.isra.7+0x349/0x420 [ 3564.925073] ? bpf_lsm_capset+0x10/0x10 [ 3564.929550] rtnetlink_rcv_msg+0x268/0x370 [ 3564.934386] ? sock_has_perm+0x80/0xa0 [ 3564.938795] ? rtnl_calcit.isra.43+0x120/0x120 [ 3564.943991] netlink_rcv_skb+0x4c/0x130 [ 3564.948510] netlink_unicast+0x19a/0x230 [ 3564.953113] netlink_sendmsg+0x204/0x3d0 [ 3564.957695] sock_sendmsg+0x50/0x60 [ 3564.961796] ____sys_sendmsg+0x22a/0x250 [ 3564.966429] ? copy_msghdr_from_user+0x5c/0x90 [ 3564.971597] ? ____sys_recvmsg+0x91/0x150 [ 3564.976302] ___sys_sendmsg+0x7c/0xc0 [ 3564.980596] ? copy_msghdr_from_user+0x5c/0x90 [ 3564.985794] ? ___sys_recvmsg+0x89/0xc0 [ 3564.990310] ? __wake_up_common+0x7a/0x190 [ 3564.995110] __sys_sendmsg+0x57/0xa0 [ 3564.999321] do_syscall_64+0x5b/0x1b0 [ 3565.003652] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ 3565.009514] RIP: 0033:0x7f152cc29a27 [ 3565.013726] Code: Unable to access opcode bytes at RIP 0x7f152cc299fd. [ 3565.021268] RSP: 002b:00007ffdb8ad09d0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 3565.029953] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f152cc29a27 [ 3565.038166] RDX: 0000000000000000 RSI: 00007ffdb8ad0a20 RDI: 000000000000000d [ 3565.046343] RBP: 00007ffdb8ad0a20 R08: 0000000000000000 R09: 0000000000000000 [ 3565.054520] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 [ 3565.062704] R13: 0000000000000000 R14: 00007ffdb8ad0bb8 R15: 00007ffdb8ad0bb4 [ 3582.033955] restraintd[1961]: *** Current Time: Wed May 31 18:26:30 2023 Localwatchdog at: Wed May 31 19:23:29 2023 [ 3642.033659] restraintd[1961]: *** Current Time: Wed May 31 18:27:30 2023 Localwatchdog at: Wed May 31 19:23:29 2023 [ 3687.394122] INFO: task kworker/u96:7:91 blocked for more than 120 seconds. [ 3687.402026] Not tainted 4.18.0-494.el8.x86_64 #1 [ 3687.407930] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3687.416846] task:kworker/u96:7 state stack: 0 pid: 91 ppid: 2 flags:0x80004000 [ 3687.426376] Workqueue: events_unbound ucma_close_id [rdma_ucm] [ 3687.433080] Call Trace: [ 3687.435978] __schedule+0x2d1/0x870 [ 3687.440070] schedule+0x55/0xf0 [ 3687.443745] schedule_timeout+0x281/0x320 [ 3687.448427] ? xas_store+0x303/0x5a0 [ 3687.452595] wait_for_completion+0x96/0x100 [ 3687.457461] cm_destroy_id+0x21c/0x490 [ib_cm] [ 3687.462611] ? _destroy_id+0x2f3/0x300 [rdma_cm] [ 3687.467951] ? kfree+0x22e/0x250 [ 3687.471719] _destroy_id+0xe8/0x300 [rdma_cm] [ 3687.476760] ucma_close_id+0x2b/0x40 [rdma_ucm] [ 3687.481991] process_one_work+0x1d3/0x390 [ 3687.486636] worker_thread+0x30/0x390 [ 3687.490882] ? process_one_work+0x390/0x390 [ 3687.495710] kthread+0x134/0x150 [ 3687.499497] ? set_kthread_struct+0x50/0x50 [ 3687.504337] ret_from_fork+0x35/0x40 [ 3687.508555] INFO: task kworker/2:1:100 blocked for more than 120 seconds. [ 3687.516377] Not tainted 4.18.0-494.el8.x86_64 #1 [ 3687.522319] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3687.531314] task:kworker/2:1 state stack: 0 pid: 100 ppid: 2 flags:0x80004000 [ 3687.540858] Workqueue: events qede_sp_task [qede] [ 3687.546395] Call Trace: [ 3687.549341] __schedule+0x2d1/0x870 [ 3687.553470] schedule+0x55/0xf0 [ 3687.557221] schedule_timeout+0x281/0x320 [ 3687.561922] ? __queue_work+0x13d/0x3f0 [ 3687.566453] ? xas_load+0x8/0x80 [ 3687.570286] wait_for_completion+0x96/0x100 [ 3687.575192] cma_process_remove+0x227/0x250 [rdma_cm] [ 3687.581116] cma_remove_one+0x62/0xd0 [rdma_cm] [ 3687.586413] remove_client_context+0x8b/0xd0 [ib_core] [ 3687.592421] disable_device+0x8c/0x130 [ib_core] [ 3687.597839] __ib_unregister_device+0x40/0xa0 [ib_core] [ 3687.603917] ib_unregister_device+0x21/0x30 [ib_core] [ 3687.609801] qedr_remove+0xe/0xa0 [qedr] [ 3687.614426] qede_rdma_dev_remove+0x1d9/0x200 [qede] [ 3687.620252] __qede_remove+0x4d/0x1a0 [qede] [ 3687.625249] qede_sp_task+0x2ca/0x350 [qede] [ 3687.630250] process_one_work+0x1d3/0x390 [ 3687.634967] worker_thread+0x30/0x390 [ 3687.639285] ? process_one_work+0x390/0x390 [ 3687.644176] kthread+0x134/0x150 [ 3687.647977] ? set_kthread_struct+0x50/0x50 [ 3687.652924] ret_from_fork+0x35/0x40 [ 3687.657137] INFO: task kworker/2:2:101 blocked for more than 120 seconds. [ 3687.664930] Not tainted 4.18.0-494.el8.x86_64 #1 [ 3687.670898] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3687.679848] task:kworker/2:2 state stack: 0 pid: 101 ppid: 2 flags:0x80004000 [ 3687.689451] Workqueue: events linkwatch_event [ 3687.694550] Call Trace: [ 3687.697499] __schedule+0x2d1/0x870 [ 3687.701615] schedule+0x55/0xf0 [ 3687.705389] schedule_preempt_disabled+0xa/0x10 [ 3687.710649] __mutex_lock.isra.7+0x349/0x420 [ 3687.715641] ? lock_timer_base+0x67/0x90 [ 3687.720254] linkwatch_event+0xa/0x30 [ 3687.724586] process_one_work+0x1d3/0x390 [ 3687.729292] worker_thread+0x30/0x390 [ 3687.733582] ? process_one_work+0x390/0x390 [ 3687.738491] kthread+0x134/0x150 [ 3687.742339] ? set_kthread_struct+0x50/0x50 [ 3687.747238] ret_from_fork+0x35/0x40 [ 3687.751481] INFO: task NetworkManager:919 blocked for more than 120 seconds. [ 3687.759595] Not tainted 4.18.0-494.el8.x86_64 #1 [ 3687.765565] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3687.774546] task:NetworkManager state stack: 0 pid: 919 ppid: 1 flags:0x00000080 [ 3687.784157] Call Trace: [ 3687.787093] __schedule+0x2d1/0x870 [ 3687.791216] schedule+0x55/0xf0 [ 3687.794959] schedule_preempt_disabled+0xa/0x10 [ 3687.800247] __mutex_lock.isra.7+0x349/0x420 [ 3687.805244] ? bpf_lsm_capset+0x10/0x10 [ 3687.809729] rtnetlink_rcv_msg+0x268/0x370 [ 3687.814567] ? sock_has_perm+0x80/0xa0 [ 3687.818964] ? rtnl_calcit.isra.43+0x120/0x120 [ 3687.824156] netlink_rcv_skb+0x4c/0x130 [ 3687.828674] netlink_unicast+0x19a/0x230 [ 3687.833275] netlink_sendmsg+0x204/0x3d0 [ 3687.837859] sock_sendmsg+0x50/0x60 [ 3687.841975] ____sys_sendmsg+0x22a/0x250 [ 3687.846610] ? copy_msghdr_from_user+0x5c/0x90 [ 3687.851775] ? ____sys_recvmsg+0x91/0x150 [ 3687.856488] ___sys_sendmsg+0x7c/0xc0 [ 3687.860794] ? copy_msghdr_from_user+0x5c/0x90 [ 3687.865997] ? ___sys_recvmsg+0x89/0xc0 [ 3687.870512] ? __wake_up_common+0x7a/0x190 [ 3687.875305] __sys_sendmsg+0x57/0xa0 [ 3687.879510] do_syscall_64+0x5b/0x1b0 [ 3687.883841] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ 3687.889690] RIP: 0033:0x7f152cc29a27 [ 3687.893903] Code: Unable to access opcode bytes at RIP 0x7f152cc299fd. [ 3687.901465] RSP: 002b:00007ffdb8ad09d0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 3687.910165] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f152cc29a27 [ 3687.918390] RDX: 0000000000000000 RSI: 00007ffdb8ad0a20 RDI: 000000000000000d [ 3687.926576] RBP: 00007ffdb8ad0a20 R08: 0000000000000000 R09: 0000000000000000 [ 3687.934786] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 [ 3687.942952] R13: 0000000000000000 R14: 00007ffdb8ad0bb8 R15: 00007ffdb8ad0bb4 [ 3687.951155] INFO: task pool-restraintd:138302 blocked for more than 120 seconds. [ 3687.959648] Not tainted 4.18.0-494.el8.x86_64 #1 [ 3687.965601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3687.974593] task:pool-restraintd state stack: 0 pid:138302 ppid: 1 flags:0x00000080 [ 3687.984262] Call Trace: [ 3687.987222] __schedule+0x2d1/0x870 [ 3687.991383] schedule+0x55/0xf0 [ 3687.995126] schedule_preempt_disabled+0xa/0x10 [ 3688.000413] __mutex_lock.isra.7+0x349/0x420 [ 3688.005440] __netlink_dump_start+0x54/0x1f0 [ 3688.010457] ? rtnl_xdp_prog_skb+0x70/0x70 [ 3688.015269] rtnetlink_rcv_msg+0x230/0x370 [ 3688.020042] ? rtnl_xdp_prog_skb+0x70/0x70 [ 3688.024857] ? rtnl_calcit.isra.43+0x120/0x120 [ 3688.030033] netlink_rcv_skb+0x4c/0x130 [ 3688.034541] netlink_unicast+0x19a/0x230 [ 3688.039162] netlink_sendmsg+0x204/0x3d0 [ 3688.043765] sock_sendmsg+0x50/0x60 [ 3688.047884] __sys_sendto+0xee/0x160 [ 3688.052094] ? syscall_trace_enter+0x1ff/0x2d0 [ 3688.057260] ? __sys_socket+0x93/0xe0 [ 3688.061580] __x64_sys_sendto+0x24/0x30 [ 3688.066096] do_syscall_64+0x5b/0x1b0 [ 3688.070404] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ 3688.076270] RIP: 0033:0x7f38f2ef388a [ 3688.080479] Code: Unable to access opcode bytes at RIP 0x7f38f2ef3860. [ 3688.087961] RSP: 002b:00007f38f19a5700 EFLAGS: 00000293 ORIG_RAX: 000000000000002c [ 3688.096652] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f38f2ef388a [ 3688.104851] RDX: 0000000000000014 RSI: 00007f38f19a6830 RDI: 0000000000000007 [ 3688.113090] RBP: 0000000000000000 R08: 00007f38f19a67d4 R09: 000000000000000c [ 3688.121276] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f38f19a6830 [ 3688.129487] R13: 0000000000000014 R14: 0000000000000000 R15: 00007f38f19a67d4 [ 3688.137701] INFO: task kworker/4:0:44833 blocked for more than 120 seconds. [ 3688.145697] Not tainted 4.18.0-494.el8.x86_64 #1 [ 3688.151674] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3688.160657] task:kworker/4:0 state stack: 0 pid:44833 ppid: 2 flags:0x80004080 [ 3688.170265] Workqueue: ipv6_addrconf addrconf_verify_work [ 3688.176531] Call Trace: [ 3688.179516] __schedule+0x2d1/0x870 [ 3688.183650] schedule+0x55/0xf0 [ 3688.187403] schedule_preempt_disabled+0xa/0x10 [ 3688.192683] __mutex_lock.isra.7+0x349/0x420 [ 3688.197681] addrconf_verify_work+0xa/0x20 [ 3688.202527] process_one_work+0x1d3/0x390 [ 3688.207248] ? process_one_work+0x390/0x390 [ 3688.212150] worker_thread+0x30/0x390 [ 3688.216468] ? process_one_work+0x390/0x390 [ 3688.221410] kthread+0x134/0x150 [ 3688.225243] ? set_kthread_struct+0x50/0x50 [ 3688.230122] ret_from_fork+0x35/0x40 [ 3702.033492] restraintd[1961]: *** Current Time: Wed May 31 18:28:30 2023 Localwatchdog at: Wed May 31 19:23:29 2023 [ 3762.033219] restraintd[1961]: *** Current Time: Wed May 31 18:29:30 2023 Localwatchdog at: Wed May 31 19:23:29 2023 Refer to the RDMA client console log: https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/05/79158/7915807/14006698/console.log Expected results: In the client log, the following output is expected without ANY TRACEBACKS or DEBUG messages from qperf : From qperf test with RHEL-8.8.0-20230228.22: [ 2808.949731] restraintd[1949]: ** Fetching task: 157018294 [/mnt/tests/gerrit-git.engineering.redhat.com/kernel-tests/infiniband/qperf] [ 2809.322187] restraintd[1949]: ** Preparing metadata [ 2809.592154] restraintd[1949]: ** Refreshing peer role hostnames: Retries 0 [ 2809.831057] restraintd[1949]: ** Updating env vars [ 2809.831227] restraintd[1949]: ** Updating external watchdog: 5400 seconds [ 2810.234395] restraintd[1949]: ** Installing dependencies [ 2811.922524] restraintd[1949]: ** Running task: 157018294 [/kernel/infiniband/qperf] [ 2861.542149] restraintd[1949]: ** Completed Task : 157018294 [ 2861.983806] restraintd[1949]: ** Fetching task: 157018295 [/mnt/tests/gerrit-git.engineering.redhat.com/kernel-tests/infiniband/iSER/iser_basic] Additional info:
Won't Do