Loading...
Loading...
During a rolling reboot or rolling upgrade, some clients on the same subnet as the PowerScale cluster may disconnect to PowerScale dynamic IPs. Only the clients on the same subnet with Isilon cluster have the problem. The clients cannot even ping the dynamic IPs with the problem. This can also happen to the other nodes in that same Isilon cluster. Some nodes in the cluster cannot ping any dynamic IPs on other nodes. Checking the ARP table on a client machine that cannot ping a dynamic IP shows an invalid entry. The ARP table still has the old entry that maps the dynamic IP to the wrong MAC address. For example, node 11 rebooted and the dynamic IP 10.x.x.43 was moved to node 10 to avoid down time. Then, node 1 started failing to ping the IP. After reviewing the ARP table on node 1, the entry for node 11 was invalid. It showed IP 10.x.x.43 was still mapped to node 11's MAC ec:0d:xx:xx:c5:00. node-1# arp -a ? (10.x.x.43) at ec:0d:xx:xx:c5:00 on mlxen1 expires in 232 seconds [ethernet] The MAC address for node 11 is ec:0d:xx:xx:c5:00. node-11: mlxen0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 node-11: options=d07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE> node-11: ether ec:0d:xx:xx:c5:00 node-11: inet 10.x.x.43 netmask 0xffffff00 broadcast 10.x.x.255 zone 1 node-11: nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> node-11: media: Ethernet autoselect (40Gbase-CR4 <full-duplex,rxpause,txpause>) node-11: status: active When node 11 is rebooted, IP 10.x.x.43 was moved to node 10. 2018-11-15T16:06:45+09:00 <3.6> node-1 isi_smartconnect[5222]: Assigned unused IP 10.x.x.43 to { key=10,40gige-1 addr_idx=0 lni=40gige-1 nic=mlxen0[Up] vlan_nic=<NULL> addrs={ 10.x.x.43 } } . 2018-11-15T16:06:45+09:00 <3.6> node-1 isi_smartconnect[5222]: FLXAPI: OP: FLXAPI_OP_CURRENT_STATE Pool[2:1:1:1]: subnet0 zones: filer25.xxx.com IP[18]: 10.x.x.21:up IP[18]: 10.x.x.54:up IP[17]: 10.x.x.32:up IP[17]: 10.x.x.56:up IP[17]: 10.x.x.30:up IP[16]: 10.x.x.37:up IP[16]: 10.x.x.39:up IP[16]: 10.x.x.45:up IP[15]: 10.x.x.29:up IP[15]: 10.x.x.33:up IP[15]: 10.x.x.49:up IP[14]: 10.x.x.31:up IP[14]: 10.x.x.34:up IP[13]: 10.x.x.38:up IP[13]: 10.x.x.40:up IP[13]: 10.x.x.46:up IP[12]: 10.x.x.41:up IP[12]: 10.x.x.36:up IP[10]: 10.x.x.53:up IP[10]: 10.x.x.43:up IP[9]: 10.x.x.44:up IP[9]: 10.x.x.28:up IP[8]: 10.x.x.51:up IP[8]: 10.x.x.26:up IP[7]: 10.x.x.55:up IP[7]: 10.x.x.35:up IP[6]: 10.x.x.42:up IP[6]: 10.x.x.24:up IP[5]: 10.x.x.52:up IP[5]: 10.x.x.25:up IP[4]: 10.x.x.48:up IP[4]: 10.x.x.50:up IP[3]: 10.x.x.22:up IP[3]: 10.x.x.27:up IP[2]: 10.x.x.47:up IP[2]: 10.x.x.23:up The MAC address for node 10 is ec:0d:xx:xx:c0:80. node-10: mlxen0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 node-10: options=d07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE> node-10: ether ec:0d:xx:xx:c0:80 node-10: inet 10.x.x.43 netmask 0xffffff00 broadcast 10.x.x.255 zone 1 node-10: nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> node-10: media: Ethernet autoselect (40Gbase-CR4 <full-duplex,rxpause,txpause>) node-10: status: active The ARP entry on node 1 was mapped to an invalid (old) MAC address. This results in any client or node not able to connect the IP address until corrected.
According to the "PowerScale Network Design Considerations"https://infohub.delltechnologies.com/es-es/t/dell-powerscale-network-design-considerations/"A SmartConnect zone with dynamic allocation for IP addresses immediately hot-moves the one IP address on the failed node to one of the other three nodes in the cluster. It sends out several gratuitous address resolution protocols (ARP) requests to the connected switch, so that client I/O continues uninterrupted."The hosts on the same subnet did not receive Gratuitous ARP (GARP) packets from node 10 after the IP address was assigned. Thus, the ARP entry was not updated properly on the hosts, which results in a network connection problem. The cause is that ARP broadcasts are either dropped or blocked at the network level. Cisco Application Centric Infrastructure (ACI) has contributed to these issues due to misconfiguration.
Solution: As a long-term solution, "Gratuitous ARP Flooding" must be enabled on the switch side. The following knowledge article is meant to describe cumulative recommendations with Cisco ACI. [000028116] Clients disconnect after IP address moves and Cisco ACI is in use Workaround: As a workaround, the obsolete ARP entry could be deleted with the "arp -d" command on the affected hosts. The hosts broadcast a new ARP resolution request for the IP and update their ARP tables with the updated MAC address.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.