...
The crash happens due to the problem with processing a message exchanged between IOSXE processes: "%IOSXE_INFRA-6-PROCPATH_CLIENT_HOG: IOS shim client 'iosd meaning that the communication between IOSd SHIM and PUBD took more time than expected IOSd SHIM is the API that IOSd uses to communicate with other processes, in this case, pubd.
You can see these messages with increasing time: Nov 24 2023 15:04:33.829 CET: %IOSXE_INFRA-6-PROCPATH_CLIENT_HOG: IOS shim client 'iosd <-- pubd' has taken 6395 msec (runtime: 0 msec) to process a 'unknown' message ... Nov 24 2023 15:09:23.833 CET: %IOSXE_INFRA-6-PROCPATH_CLIENT_HOG: IOS shim client 'iosd <-- pubd' has taken 296399 msec (runtime: 0 msec) to process a 'unknown' message Nov 24 2023 15:09:28.833 CET: %IOSXE_INFRA-6-PROCPATH_CLIENT_HOG: IOS shim client 'iosd <-- pubd' has taken 301399 msec (runtime: 0 msec) to process a 'unknown' message Eventually, after 5 minutes (300000msec) the process "IOSD chasfs task" triggered a reboot: UNIX-EXT-SIGNAL: Aborted(6), Process = IOSD chasfs task -Traceback= 1#1537dd7e2cde051cde20167531ce2117 c:7F476CE3E000+166D2 c:7F476CE3E000+56B :5623B31A1000+6045A0B :5623B31A1000+3C25AF9 :5623B31A1000+4D94F78 A few seconds earlier in the crashinfo file logs we see this: Nov 24 2023 15:04:13.425 CET: %PKI-3-CRL_FETCH_FAIL: CRL fetch for trustpoint DNAC-CA failed Reason : Enrollment URL not configured. Nov 24 2023 15:04:27.408 CET: %PKI-3-CRL_FETCH_FAIL: CRL fetch for trustpoint DNAC-CA failed Reason : Enrollment URL not configured. Found out that in the moment of the crash the, sessmgrd_rp_0 max_diff_calls was super high. ------------------ show process memory platform accounting ------------------ Hourly Stats process callsite_ID(bytes) max_diff_bytes callsite_ID(calls) max_diff_calls tracekey timestamp(UTC) ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- sessmgrd_rp_0 AA3580F1347EC002 6428784 5BF8C149AEA80000 12269 1#6a3f64509dcc42f461a5891414c89c61 2023-11-24 15:14 <<<<< cmcc_cc_6 6795565C7BD7C000 4144 6795565C7BD7C000 1 1#422371b32438abde5910480fe3165b31 2023-11-24 14:24 cmcc_cc_3 6795565C7BD7C000 4144 6795565C7BD7C000 1 1#422371b32438abde5910480fe3165b31 2023-11-24 14:24 cmcc_cc_5 6795565C7BD7C000 4144 6795565C7BD7C000 1 1#422371b32438abde5910480fe3165b31 2023-11-24 14:24 PUBd Tracelogs: 2023/11/24 14:09:50.546295941 {pubd_R0-0}{1}: [mdt-ctrl] [1801]: UUID: 0, ra: 0 (note): **** Callback Entry: VRF resolution result for '' in connection '160.103.180.191:25103:65535:172.23.255.37': 225. 2023/11/24 14:09:50.546303723 {pubd_R0-0}{1}: [bso] [1801]: UUID: 0, ra: 0 (ERR): 0x7f85ddabb640: IPC request timeout vrf_name app_ctx:0x560dae4f20b8 status:225 2023/11/24 14:09:50.546352386 {pubd_R0-0}{1}: [mdt-ctrl] [1801]: UUID: 0, ra: 0 (note): **** Callback Entry: VRF resolution result for '' in connection '160.103.180.191:25103:65535:172.23.255.37': 225. 2023/11/24 14:09:50.546360815 {pubd_R0-0}{1}: [bso] [1801]: UUID: 0, ra: 0 (ERR): 0x7f85ddabb640: IPC request timeout vrf_name app_ctx:0x560dae4c0548 status:225 2023/11/24 14:09:50.546363254 {pubd_R0-0}{1}: [mdt-ctrl] [1801]: UUID: 0, ra: 0 (note): **** Callback Entry: VRF resolution result for '' in connection '160.103.180.191:25103:65535:172.23.255.37': 225. 2023/11/24 14:09:55.542570710 {pubd_R0-0}{1}: [mdt-ctrl] [1801]: UUID: 0, ra: 0 (note): **** Event Entry: VRF resolution callback for subscription 602 VRF name '' 2023/11/24 14:09:55.547151580 {pubd_R0-0}{1}: [mdt-ctrl] [1801]: UUID: 0, ra: 0 (note): **** Event Entry: VRF resolution callback for subscription 553 VRF name '' 2023/11/24 14:09:55.547172008 {pubd_R0-0}{1}: [mdt-ctrl] [1801]: UUID: 0, ra: 0 (note): **** Event Entry: VRF resolution callback for subscription 503 VRF name '' 2023/11/24 14:09:58.378895427 {pubd_R0-0}{1}: [bso] [1801]: UUID: 0, ra: 0 (ERR): 0x7f85ddabb640: IPC receive failed due to channel disconnect: rc 104 2023/11/24 14:09:58.378897069 {pubd_R0-0}{1}: [bso] [1801]: UUID: 0, ra: 0 (ERR): 0x7f85ddabb640: IPC fd closed unexpectedly, clean up the connection 2023/11/24 14:09:58.379206633 {pubd_R0-0}{1}: [bipc] [1801]: UUID: 0, ra: 0 (note): Successfuly connected to server /tmp/rp/lipc/ios_srvs_query_socket-b0 2023/11/24 14:09:58.379225776 {pubd_R0-0}{1}: [bso] [1801]: UUID: 0, ra: 0 (ERR): 0x7f85ddabb640: Error in processing query replies from IOSd, rc 104 2023/11/24 14:09:58.379235605 {pubd_R0-0}{1}: [tps-client] [1801]: UUID: 0, ra: 0 (ERR): IPC receive failed due to channel disconnect: 32 2023/11/24 14:09:58.379236049 {pubd_R0-0}{1}: [tps-client] [1801]: UUID: 0, ra: 0 (ERR): IPC fd closed unexpectedly, clean up the connection 2023/11/24 14:09:58.379322009 {pubd_R0-0}{1}: [bipc] [1801]: UUID: 0, ra: 0 (ERR): Unable to connect to domain socket iosd_tps_socket-b0: Connection refused Looks like the device experienced high resource utilization long before the crash: 2023/11/24 13:39:17.520896882 {hman_R0-0}{1}: [hman] [20921]: UUID: 0, ra: 0 (ERR): Insufficient resources to read cpu memory information
Problematic subscription must first be identified to use workaround. Please collect core and follow steps under "Q: Is my customer affected?" in FAQ enclosure. Option 1: Remove the problematic subscription completely Option 2: Reduce the frequency of the problematic subscription by increasing the value of its "update-policy" field OR changing it to "update-policy on-change" to trigger only when there's changes to the queried table. Note, this will make the deadlock less likely but it will not 100% remove like Option 1.