...
- ifmgr process crash following an RP/RSP failover when the active RP/RSP fails unexpectedly: RP/0/RSP1/CPU0:Feb 20 11:02:57.294 UTC: processmgr[51]: ifmgr(1) (jid 454) (pid 7180) (fail_count 1) abnormally terminated, restart scheduled RP/0/RSP1/CPU0:Feb 20 11:02:26.775 UTC: processmgr[51]: %OS-SYSMGR-5-NOTICE : This standby node is going active at Mon Feb 20 11:02:26 2023 RP/0/RSP1/CPU0:Feb 20 11:02:26.775 UTC: processmgr[51]: %OS-SYSMGR-3-ERROR : sysmgr_check_rmf_ctx: rmf_set_not_ready failed: Invalid argument RP/0/RSP1/CPU0:Feb 20 11:02:26.810 UTC: processmgr[51]: %OS-SYSMGR-5-NOTICE : This node is active now at Mon Feb 20 11:02:26 2023 RP/0/RSP1/CPU0:Feb 20 11:02:26.810 UTC: processmgr[51]: %OS-SYSMGR-5-NOTICE : Critical failover elapsed time 0.035 seconds (0.000% idle) RP/0/RSP1/CPU0:Feb 20 11:02:31.793 UTC: processmgr[51]: %OS-SYSMGR-7-DEBUG : kim(1) (jid 141) did not signal end of initialization RP/0/RSP1/CPU0:Feb 20 11:02:31.805 UTC: processmgr[51]: %OS-SYSMGR-7-DEBUG : ztp_edm(1) (jid 422) did not signal end of initialization RP/0/RSP1/CPU0:Feb 20 11:02:55.901 UTC: dumper[67185]: %OS-SYSLOG-6-LOG_INFO : Dumping core /misc/scratch/core/ifmgr_7180.by.11.20230220-110255.xr-vm_node0_RSP1_CPU0.38d9f.core.gz RP/0/RSP1/CPU0:Feb 20 11:02:57.294 UTC: processmgr[51]: %OS-SYSMGR-3-ERROR : ifmgr(1) (jid 454) exited, will be respawned with a delay (slow-restart) RP/0/RSP1/CPU0:Feb 20 11:02:57.294 UTC: processmgr[51]: %OS-SYSMGR-3-ERROR : ifmgr(454) (fail count 1) will be respawned in 10 seconds RP/0/RSP1/CPU0:Feb 20 11:02:58.991 UTC: icpe_satfabmgr[242]: %SYSDB-LIBSYSDB-7-APPLY : 4 out of 4 apply callbacks failed (first error: 'ifmgr' detected the 'warning' condition 'The client passed an invalid connection, probably because its connection to Interface Manager is not open.'), system may now be in an inconsistent state. RP/0/RSP1/CPU0:Feb 20 11:03:09.782 UTC: dumper[172]: %OS-COREHELPER-6-CORE_COPIED : Copied core ifmgr_7180.by.11.20230220-110255.xr-vm_node0_RSP1_CPU0.38d9f.core.gz to 0/RSP1/CPU0:harddisk: RP/0/RSP1/CPU0:Feb 20 11:03:09.986 UTC: dumper[172]: %OS-COREHELPER-6-DELETE_CORE : Deleted core file ifmgr_7180.by.11.20230220-110255.xr-vm_node0_RSP1_CPU0.38d9f.core.gz. RP/0/RSP1/CPU0:Feb 20 11:03:35.423 UTC: logger[67805]: %OS-SYSLOG-4-LOG_WARNING : PAM detected crash for ifmgr on 0_RSP1_CPU0. All necessary files for debug have been collected and saved at 0/RSP1/CPU0 : harddisk:/cisco_support/PAM-crash-xr_0_RSP1_CPU0-ifmgr-2023Feb20-110334.tgz (Please copy tgz file out of the router and send to Cisco support. This tgz file will be removed after 14 days.) RP/0/RSP1/CPU0:Feb 20 11:03:37.298 UTC: processmgr[51]: %OS-SYSMGR-7-DEBUG : ifmgr(1) (jid 454) did not signal end of initialization - This may result in interfaces going down
- RP/RSP failover due to the failure in active RP/RSP(Where communication with previous RP is broken ).
- No workaround.
- ifmgr process would restart and recover on its own. - However, This may cause satellite connections to drop and not recover. To restore those, please restart the icpe_satfabmgr process: "Process restart icpe_satfabmgr". Logging observed during troubled state: RP/0/RSP0/CPU0:Feb 20 11:33:59.708 UTC: icpe_satfabmgr[130]: %LIBRARY-REPLICATOR-3-IDT_FAIL : Failed to complete IDT after several retries: rc 0x0 (Success) RP/0/RSP1/CPU0:Feb 20 11:34:00.493 UTC: icpe_satfabmgr[242]: %PKT_INFRA-UTIL_RETRY_HANDLER-6-EXTENDED_RETRY_PERIOD : Retries have been occurring for an extended period of time (1792 seconds) for Communications resync