Symptom
In the logs we see the time taken to process 'eedge_epm_send_msg' message by IOSd client 'smd reader mqipc' increases until the IOSd chasfs task detects the lock up (after about 5 mins) and crashes the IOSd.
In fact, there is no processing for this message at all (runtime: 0 msec).
%IOSXE_INFRA-6-PROCPATH_CLIENT_HOG: IOS shim client 'smd reader mqipc' has taken 298513 msec (runtime: 0 msec) to process a 'eedge_epm_send_msg' message
%IOSXE_INFRA-6-PROCPATH_CLIENT_HOG: IOS shim client 'smd reader mqipc' has taken 303513 msec (runtime: 0 msec) to process a 'eedge_epm_send_msg' message
%Software-forced reload
Exception to IOS Thread:
Frame pointer 0x7F2FB84C4458, PC = 0x7F30498F22F2
UNIX-EXT-SIGNAL: Aborted(6), Process = IOSD chasfs tas
Conditions
The issue was observed when the entire stack was booting up and all switches were initializing in parallel
Workaround
It was observed that configuring ‘terminal length 0’ for the console and VTY lines helps mitigate the crash. This configuration disables the paging.
Alternatively, booting the switches in the stack one by one could be also used as a workaround.
Once the Active switch comes up proceed with booting the Standby. Wait until the Standby election is done and then proceed with booting the next stack member.
Further Problem Description
The problem was related to the IPC messages not getting processed due to the receiver process waiting for keyboard input. If there was no keyboard input the process was getting stuck and eventually watchdog crash was triggered.