...
Any HPE system running Linux and installed with Intel OPA HFI adapters or Intel Omni-Path Fabric Suite (IFS) and Basic Software version 10.10.2.0.44 (or earlier) may encounter kernel panics due to SDMA memory corruption. Below are three examples of SDMA panics that have been identified: 1) GPF in hfi1_user_sdma_free_queues general protection fault: 0000 [#1] SMP ... RIP: 0010:[<ffffffff9701946c>] [<ffffffff9701946c>] has_cpu_slab+0x1c/0x40 ... [<ffffffff96f130bf>] on_each_cpu_cond+0x9f/0x190 [<ffffffff9701bc10>] ? __flush_cpu_slab+0x50/0x50 [<ffffffff9701f8f5>] kmem_cache_close+0x35/0x300 [<ffffffff97020d44>]_kmem_cache_shutdown+0x14/0x80 [<ffffffff96fdda78>] kmem_cache_destroy+0x58/0x110 [<ffffffffc058f130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1] [<ffffffffc0549350>] hfi1_file_close+0x70/0x1e0 [hfi1] [<ffffffff9704519c>] __fput+0xec/0x260 [<ffffffff970453fe>] ____fput+0xe/0x10 [<ffffffff96ebfd1b>] task_work_run+0xbb/0xe0 [<ffffffff96e2bc65>] do_notify_resume+0xa5/0xc0 [<ffffffff97579134>] int_signal+0x12/0x17 ... 2) list_add panic in SDMA code path list_add corruption. next->prev should be prev (ffff9b9d55a7a9c0), but was ffff9b8dbb60c830. (next=ffff9b8dbb60c830). ... [<ffffffff89a98bff>] warn_slowpath_fmt+0x5f/0x80 [<ffffffff89d97025>] __list_add+0x65/0xc0 [<ffffffffc03d26a5>] defer_packet_queue+0x145/0x1a0 [hfi1] [<ffffffffc03b2987>] sdma_check_progress+0x67/0xa0 [hfi1] [<ffffffffc03b79d2>] sdma_send_txlist+0x432/0x550 [hfi1] [<ffffffff89c20009>] ? kmem_cache_alloc+0x179/0x1f0 [<ffffffffc03d2973>] ? user_sdma_send_pkts+0xc3/0x1990 [hfi1] [<ffffffffc03d3e3a>] user_sdma_send_pkts+0x158a/0x1990 [hfi1] [<ffffffff89aab65e>] ? try_to_del_timer_sync+0x5e/0x90 [<ffffffff89c3fe1a>] ? __check_object_size+0x1ca/0x250 [<ffffffffc03d558b>] hfi1_user_sdma_process_request+0xdab/0x1280 [hfi1] ... 3) list_del panic in SDMA code path WARNING: CPU: 8 PID: 0 at lib/list_debug.c:62 __list_del_entry+0x82/0xd0 list_del corruption. next->prev should be ffff939cabb52230, but was ffff93a941edf230 ... [<ffffffff91898bff>] warn_slowpath_fmt+0x5f/0x80 [<ffffffff91a1caa9>] ? __slab_free+0x79/0x260 [<ffffffff91b97102>] __list_del_entry+0x82/0xd0 [<ffffffffc0372c55>] sdma_desc_avail+0xf5/0x2c0 [hfi1] [<ffffffff91a1cd96>] ? kfree+0x106/0x140 [<ffffffffc0392497>] ? user_sdma_free_request+0x107/0x130 [hfi1] [<ffffffffc03927da>] ? user_sdma_txreq_cb+0xda/0x1b0 [hfi1] [<ffffffffc0392700>] ? defer_packet_queue+0x1a0/0x1a0 [hfi1] [<ffffffffc0374c20>] sdma_make_progress+0x2e0/0x420 [hfi1] [<ffffffffc03769cc>] sdma_engine_interrupt+0x8c/0x100 [hfi1] [<ffffffffc033f381>] sdma_interrupt+0x61/0xd0 [hfi1] [<ffffffff9194bc04>] __handle_irq_event_percpu+0x44/0x1c0
Any HPE system running Linux configured as follows: The following adapters are currently affected: HPE 100Gb 1-Port OP101 QSFP28 x8 OPA Adapter (HPE Part Number: 829334-B21) HPE 100Gb 1-Port OP101 QSFP28 x16 OPA Adapter (HPE Part Number: 829335-B21) HPE 100Gb 1p OPA 860z FIO Adapter (HPE Part Number: 851226-B21) The following Software versions are affected: Intel Omni-Path Fabric Suite (IFS) Software version 10.10.2.0.44 and lower. Intel Omni-Path Basic Software version 10.10.2.0.44 and lower.
The issue is fixed with Intel Omni-Path Fabric Suite (IFS) and Basic Software version 10.10.2.2.1 (and later). To download Intel Omni-Path Fabric Suite (IFS) and Basic Software version 10.10.2.2.1 (and later), perform the following steps: Click the following link: Hewlett Packard Enterprise Support Center Enter a product name (e.g., "QSFP28") in the text search field and wait for a list of Suggested Products to display. From the Suggested Products list displayed, identify the desired product and select it. The page should refresh to display the "DRIVERS AND SOFTWARE" tab and the components that support the selected product. From the "DRIVERS AND SOFTWARE" expandable filter menus on the left side of the page: For further filtering if needed - Select the specific Operating System from the Operating Environment. Locate and select the Intel Omni-Path Fabric Suite (IFS) and Basic Software version 10.10.2.2.1 (and later). click the Revision History tab to locate the latest version. For more important information, review the Release Notes tab. Click the Download button. RECEIVE PROACTIVE UPDATES : Receive support alerts (such as Customer Advisories), as well as updates on drivers, software, firmware, and customer replaceable components, proactively via e-mail through HPE Subscriber's Choice. Sign up for Subscriber's Choice at the following URL: Proactive Updates Subscription Form. NAVIGATION TIP : For hints on navigating HPE.com to locate the latest drivers, patches, and other support software downloads for HPE systems and Options, refer to the Navigation Tips document . SEARCH TIP : For hints on locating similar documents on HPE.com, refer to the Search Tips Document . To search for additional advisories related to Linux, use the following search string: +Advisory +ProLiant -"Software and Drivers" +Linux