Symptom
Nexus switch reloads unexpectedly due to an "afm hap reset".
Conditions
AFM polls the ASICs for statistics. This is something that happens automatically, and there are no particular triggers that cause this issue.
Workaround
Reduce ASIC load.
Most of the cases we have seen multiple SFP's were polled together.
Instead of collecting 10-20 SFPs data at 1 shot, Collect 5 at 1st minute, then next 5 at next min & so on.
Further Problem Description
This issue is currently under investigation.
The reset reason in 'show version' shows "afm hap reset":
`show version `
Reason: Reset triggered due to HA policy of Reset
System version: 7.3(7)N1(1b)
Service: afm hap reset <<<-------------------------------------------
The process logs show more details related to the issue. Namely, we see a heartbeat failure:
`show processes log details`
Service: afm <<<------------------------------------------------------
Description: Acl manager Daemon <<<-----------------------------------
Executable: /isan/bin/afm
Started at
Stopped at
Uptime:
Start type: SRV_OPTION_RESTART_STATELESS (23)
Death reason: SYSMGR_DEATH_REASON_FAILURE_HEARTBEAT (9) <<<-----------
Last heartbeat secs ago
RLIMIT_AS: 530007667
System image name: n5000-uk9.7.3.7.N1.1b.bin
System image version: 7.3(7)N1(1b) S0
PID:
Exit code: signal 6 (core dumped) <<<---------------------------------
cgroup: 184:devices,memory,cpuacct,cpu:/1
The following syslog is seen when the issue occurs:
%SYSMGR-2-SERVICE_CRASHED: Service "afm" (PID XXXX) hasn't caught signal 6 (core will be saved).
Despite the syslog and process logs showing a core was dumped, we don't see any cores present from NXOS:
# show core vdc-all
VDC Module Instance Process-name PID Date(Year-Month-Day Time)
--- ------ -------- --------------- -------- -------------------------
Please open a service request with Cisco TAC if you encounter this issue.