...
ACI leaf reloads intermittently with the following reset-reason: # show system reset-reason *************** module reset reason (1) ************* 0) At 2016-11-21T15:40:54.245-08:00 Reason: reset-requested-due-to-fatal-module-error Service:Service on linecard had a hap-reset Version: 12.0(2f) 1) At 2016-11-17T08:14:49.842-08:00 Reason: reset-requested-due-to-fatal-module-error Service:Service on linecard had a hap-reset Version: 12.0(2f) 2) At 2016-11-01T06:23:52.264-08:00 Reason: reset-requested-due-to-fatal-module-error Service:Service on linecard had a hap-reset Version: 12.0(2f) 3) At 2016-08-31T11:39:40.536-08:00 Reason: reset-by-installer Service:Upgrade Version: 12.0(1o) While at the same time generating cores against ipfib.
seen on ACI leaf running 2.0(2f). A high number of IPv6 adjacencies events can be seen under vrf overlay-1. BDs in use do not have unicast routing enabled and have no L3 interface. Leaf# pwd /var/log/dme/log Leaf# cat am-trace.txt | egrep "Added neighbor entry for vrf overlay-1" | more 23:07:58.857754: [ALL]Added neighbor entry for vrf overlay-1 addr fe80::402f:23bc:46a9:c313 if 0x1801000a mac 6c88.149f.aa6c to kernel 23:08:07.651756: [ALL]Added neighbor entry for vrf overlay-1 addr fe80::402f:23bc:46a9:c313 if 0x1801000a mac 6c88.149f.aa6c to kernel 23:08:16.442891: [ALL]Added neighbor entry for vrf overlay-1 addr fe80::402f:23bc:46a9:c313 if 0x1801000a mac 6c88.149f.aa6c to kernel 23:08:25.235921: [ALL]Added neighbor entry for vrf overlay-1 addr fe80::402f:23bc:46a9:c313 if 0x1801000a mac 6c88.149f.aa6c to kernel 23:08:34.028785: [ALL]Added neighbor entry for vrf overlay-1 addr fe80::402f:23bc:46a9:c313 if 0x1801000a mac 6c88.149f.aa6c to kernel ... IPv6 link-local NA/NS should not be received on L2 only BDs.
Enable Unicast Routing on affected BDs. A subnet is not required. Additionally, "Enforced subnet check" can be enabled to ensure IPs are still not learned with unicast routing enabled if not needed.
When an ICMPv6 frame is sent by a host on an L2 only BD, it may incorrectly be processed on the overlay VRF. Adjacency manager attempts to allocate an IPv6 adjacency for this host and fails (as expected since IPv6 is not enabled on the overlay VRF). During the event, although no IPv6 route is pushed to hardware, an adjacency index is allocated in software but not correctly release (there is a fix to the iStack component to correct this behavior starting in the 12.2.x switch release) As a result, UFIB see's that all adjacencies have been allocated. When a true adjacency change occurs (such as an IPv4/IPv6 adjacency change on L3Out), UFIB does not have any available indexes left and triggers the crash. This bug is to address the UFIB crash. The workaround above prevents adjacency manager from incorrectly processing ICMPv6 messages on overlay-1 (and stops further 'leaks' of the UFIB available adjacencies). However, the workaround does not free adjacencies previously leaked. Therefore it's possible that adj indexes are exhausted after the workaround has been applied and the next valid adjacency event may still result in a crash. There is no command available to see the internal adjacency bit-map used by UFIB. However, each event where adjacency manager tries to build the IPv6 adjacency on overlay-1 and fails increments an error counter. This counter can be viewed with the following command: fab3-leaf103# vsh_lc -c 'show forwarding internal error counts' Error-Description Count Error-Description Count Error-Description Count -------------------------------- -------------------------------- -------------------------------- PD Errors: -------------------------------------------------------------------------------------------------- Error-Description Count Error-Description Count -------------------------------------------------------------------------------------------------- E_PD_ADJ_INF_DEL 7310 E_PD_SDK_ADJ_GEN_DEL 7310 E_PD_SDK_ADJ_INF_DEL 7310 Note the 'E_PD_SDK_ADJ_GEN_DEL' counter. The total number of adjacencies is 8192. The combined number of valid adjacencies + the number of failed events counted in E_PD_SDK_ADJ_GEN_DEL should not exceed 8K. The only way to free the indexes allocated by the failed events is to reload the leaf. The recommendation is to apply the workaround for this bug, and then reload any leaf with an E_PD_SDK_ADJ_GEN_DEL error counter greater than 6K.