Symptom
Active Supervisor/Switch crash / failover can occur suddenly.
%PMAN-3-RPSWITCH: RP switch initiated. Critical process linux_iosd-image has failed (rc 0)
%PMAN-3-PROCHOLDDOWN: The process linux_iosd-image has been helddown (rc 134)
%PMAN-0-PROCFAILCRIT: A critical process linux_iosd_image has failed (rc 134)
Before the crash, the following errors will start appearing in the logs:
%IOSXE_INFRA-6-PROCPATH_CLIENT_HOG: IOS shim client 'iosd-spa' has taken 3514 msec (runtime: 0 msec) to process a 'ngmod_macsec_sa_sc_res' message
These errors will continue appearing for 5 minutes before the actual crash event occurs.
Conditions
Seen on Cat9000 switches with IOS 17.3.x and 17.6.x with MACSec running with Endpoints / Anyconnect NAM.
Crashes are commonly seen after ISE migration, or new ISE implementation.
Workaround
Issue is not seen on 17.9.x releases.
Further Problem Description
The crash itself is very generic: the switch will crash due to messages/process taking too long.