...
Following CallHome alert is occasionally reported in syslog: %CALLHOME-2-EVENT: SW_SYSTEM_INCONSISTENT Arround the same time, the following messages could be shown as well. %UFDM-3-FIB_IPv4_ADJ_CONSISTENCY_CHECKER_PASS: FIB IPv4 adjacency consistency checker PASSED on slot 1 %UFDM-3-FIB_IPv4_ROUTE_CONSISTENCY_CHECKER_PASS: FIB IPv4 route consistency checker PASSED on slot 1 At this point in time no related business impact is observed
* CallHome enabled/used * NX API config can cause this issue more often
* none at this point
* Checking the timestamp of the log: 2021 Apr 5 16:13:51 NIX4-C3 %CALLHOME-2-EVENT: SW_SYSTEM_INCONSISTENT * At the same time we see that callhome received alert from another process: show system internal callhome event-history | no-more 2021 Apr 05 16:13:51.803759: E_MTS_RX [REQ] Opc:MTS_OPC_CALLHOME_ALERT(2151), Id:0X3659DD1D, Ret:SUCCESS << causes syslog generation, message received from SAP 1550 Src:0x00000101/1550, Dst:0x00000101/66, Flags:None HA_SEQNO:0X00000000, RRtoken:0x3659DD1D, Sync:UNKNOWN, Payloadsize:183 Payload: 0x0000: 53 57 5f 53 59 53 54 45 4d 5f 49 4e 43 4f 4e 53 * Checking which process triggered the alert show system internal mts sup sap 1550 description << SAP 1550 is MTS_MGR mts_mgr * Checking why process triggered the alert show system internal mts-mgr event-history errors | no-more ... 2021 Apr 05 16:13:51.803721: E_DEBUG callhome_from_here: send callhome message sap 284 (TCPUDP process client MTS queue) has unprocessed msg over 1440 minute << MTS_MGR reports MTS meesage stuck for over 1440min (24h) * Checking from stuck MTS messages show system internal mts buffers detail ... Node/Sap/queue Age(ms) SrcNode SrcSAP DstNode DstSAP OPC MsgId MsgSize RRToken Offset sup/284/pers 147814819 0x101 5275 0x101 284 86017 0x2cfd3848 4596 0x2cfd3848 0xfaa8004 <<we only see a single MTS message in buffer for more than 24h (single one for DST SAP 284 is expected) * Checking to which process is related: show system internal mts sup sap 5275 description nginx_1_fe In this case the callhome alert seems to be a false-positive as there a no unexpected messages stuck in buffers. If we observe more MTS messages (than a single/default one) stuck in the buffers for more the 24h, the CallHome triggered alert is most likely legitimate (this defect doesn't apply) and further investigation is required. Additional information: It seems that certain NXAPI calls may cause the AGE timer to reset under MTS buffers. In that case each time the message age exceeds 24h (converted to msec) a similar callhome alert may be triggered