Loading...
Loading...
Scope Affected Versions Platform: PowerStore 500T Version: PowerStoreOS 3.2.1 and later Symptoms A Node becomes nonresponsive at both the PowerStoreOS and Hardware levels CMI link between nodes is disabled Affected node reboots (fenced) In rare cases: The remaining Peer node may also panic and reboot Resulting in a short, autorecovered DU
Cause PowerStore 500T uses RDMA over PCIe (CMI NTB links) for internode memory access Historical issue: CPU IERR could propagate across nodes risk of dual failure/data loss See article 000213516 PowerStore 500T: Hardware failure may propagate to peer node leading to service disruption Fix in 3.2.1+: Introduces a BMC watchdog timer for this condition If watchdog expires (~5 sec): CMI link is cut Faulted node is rebooted intentionally Side effect: The surviving node may experience resource starvation depending on the state and usage of remote memory This can lead to secondary panic of the remaining peer node and result in a short disruption Design Behavior Cutting the CMI link is a containment mechanism Prevents error propagation between nodes Tradeoff: Accepts a possible short service disruption Avoids data loss due to dual-node failure and cache loss
Behavior Clarification This behavior is intentional and protective by design The trigger for the CMI link teardown is the initially nonresponsive node Replacement Criteria If a node stops responding and triggers a CMI teardown more than once during its lifespan, that node is considered unstable and should be replaced.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
BugZero Plan
Streamline upgrades with automated vendor bug scrubs
BugZero Prevent
Wish you caught this bug sooner? Get proactive today.