Symptom
Leaf switch crash:
show system reset-reason
*************** module reset reason (1) *************
0) At 2022-03-21T14:23:50.582+03:00
Reason: reset-triggered-due-to-ha-policy-of-reset
Service:nfm hap reset
Version: 14.2(7l)
Conditions
Seen on N9K-C9336C-FX2 running ACI 14.2(7l) but:
- It is a day-1 behavior
- It is platform independent
Workaround
Disabled Netflow or reduce the scale of netflow deployment.
Further Problem Description
The issue is a day-1 issue and is caused by a combination of:
1) netflow deployment with very large-scale flows probably (>500K flows)
2) kernel being busy and unable to empty netflow socket buffers within some interval and subsequently blocking netflow process (leading to nfm hap reset)