...
A router may reload unexpectedly, generating a system-report, core, or other crash-related file. The crashes indicate that an IOSXE-WATCHDOG event took place, although the process reported can differ (IOSXE-RP Punt Service, IPAM Manager, MFIB_mrib_write, etc.) Cisco TAC can decode the backtrace of functions within the system-report to help confirm if this crash is relevant.
The trigger for this issue is high control plane or punted traffic. This leads to high load on the packet thread. When IOSd gets scheduled, it can be starved out of CPU resources, leading to the WATCHDOG timeout. There is no single configuration, feature, or scenario that results in this behavior. Rather, anything that leads to excessive control plane or punted traffic can result in a similar crash. Excessive mDNS traffic and multiple tests running on a ThousandEyes Agent are two example scenarios that can lead to this problematic state.
Users can utilize the following command to check the load on the packet thread: Router# show platform software infrastructure thread packet Syspage index for the Packet thread: 4 Statistics for Packet thread activities: 0 minimum packet received, 2048 maximum packet received 0 minimum message sent, 0 maximum message sent 0 msec minimum clock runtime, 78241 msec maximum clock runtime 0 msec minimum cpu runtime, 78241 msec maximum cpu runtime 707420791 pkt thread invocation, 9 epoll timeout, 0 epoll intr 2 pkt thread triggered by IOS thread, 2 wakeup 20763433 IOS triggered by packet thread 217080 IOS triggered by fastpath thread 20975128 IOS scheduler thread wakeup mstr_efd 14, pkt thread_wakeup_fd 9 2 wakeup_efd_ready 707420780 punt_fd ready (fd 54) 0 punt_fd ready (fd 106) 2629544740 rx messages processed 0 memory allocation failures, 0 read paused, 0 read pause cleared Current state: read paused: no Clock/CPU utilization with 5 seconds 99%/99%, 1 min 24%/24%, 5 min 19%/19% <<<----- This line displays the Maximum mutex acquire time: 3082 msec at *Dec 16 12:22:56.654 packet thread CPU utilization. Cisco TAC can also extract the above data from an IOSd core file using internal tools. The following commands can be used to monitor the amount of control plane and punted traffic: show platform software infrastructure punt show platform software punt-policer show platform software punt-policer drop-only show platform hardware qfp active infrastructure punt statistics type punt-drop show platform hardware qfp active infrastructure punt statistics type per-cause show platform hardware qfp active infrastructure punt statistics type global-drop show platform hardware qfp active infrastructure punt statistics type inject-drop show platform hardware qfp active statistics drop show platform software infrastructure thread packet Enhancements to interrupt-related processing in lsmpi-rx were integrated via defect CSCwf65540. Please refer to that defect for information regarding versions of code with the software changes.