...
An internal error in the Mellanox 40 Gb or InfiniBand card may cause the card to fail. When failure occurs, the interface no longer responds to commands, such as ifconfig or pciconfig. In addition, when this issue occurs, and the card is configured for an external network. Flexnet and smartconnect are unable to assign IP addresses to the interface. Footprints of the failure are seen in the messages file, include the following syntax: Errors indicating that the driver can no longer post commands: Notice the driver number mlx4_core1: mlx4_core1: mlx4_cmd_post:cmd_pending failed Or Indication of Internal error detected: Notice the driver number mlx4_core1: 2018-12-26T16:31:34-08:00 isilon-1 /boot/kernel.amd64/kernel: mlx4_core1: Internal error detected:2018-12-26T16:31:34-08:00 isilon-1 /boot/kernel.amd64/kernel: mlx4_core1: buf[00]: ffffffff ..2018-12-26T16:31:34-08:00 isilon-1 /boot/kernel.amd64/kernel: mlx4_en mlx4_core1: Internal error detected, restarting device
This occurs when there are Cisco BiDi QSFP+ Optics in use with this card. The optic can produce up to 3.5 W of power while the NIC can only accept a maximum of 1.5 W of power. Since the margin is too great for the input rail to handle, the NIC stops functioning causing the node to panic.
Workaround: Use non-BiDi optical cable to avoid over use of power.Solution: Shut down the node and replace the NIC. Replacement NICs are available with a larger fuse and power capacity.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.