...
Simple Network Management Protocol (SNMP) queries for bootup and firmware install date or time, along with boot prom of installation date or time can stop responding due to contention with other snmpd threads. FOS continues to retry these queries until SCN-1001 queue overflow alerts are logged due to the deadlock condition leading to an snmpd termination.SNMP terminates during swBootPromLastUpdated, swFlashLastUpdated, or swBootProminstallDate queries due to a stuck RPM call which is seen in the below output: ps exfcl /fabos/cliexec/errdump -a: 2023/12/01-21:16:29, [SCN-1001], 86271, SLOT 1 | FFDC | CHASSIS, CRITICAL, Dell_Brcd_X6-4, SCN queue overflow for process snmpd. 2023/12/01-21:16:29, [RAS-1001], 86272, SLOT 1 | CHASSIS, INFO, Dell_Brcd_X6-4, First failure data capture (FFDC) event occurred. 2023/12/01-21:16:29, [SCN-1001], 86273, SLOT 1 | FFDC | CHASSIS, CRITICAL, Dell_Brcd_X6-4, SCN queue overflow for process snmpd. 2023/12/01-21:16:37, [LOG-1000], 86280, SLOT 1 | CHASSIS, INFO, Dell_Brcd_X6-4, Previous message repeated 7 time(s). 2023/12/01-21:16:37, [SCN-1001], 86281, SLOT 1 | FFDC | CHASSIS, CRITICAL, Dell_Brcd_X6-4, SCN queue overflow for process snmpd. 2023/12/01-21:16:38, [LOG-1000], 86282, SLOT 1 | CHASSIS, INFO, Dell_Brcd_X6-4, Previous message repeated 1 time(s). 2023/12/01-21:16:38, [KSWD-1002], 6908, FFDC | CHASSIS, WARNING, Dell_Brcd_X6-4, Detected termination of process snmpd:2648. NOTE: If this occurs on a Director, the control processors (CPs) may lose sync. /fabos/cliexec/hadump: --------------------------------------- TIME_STAMP: Dec 1 22:43:03.131548 --------------------------------------- Local CP (Slot 2, CP1): Active, Warm Recovered Remote CP (Slot 1, CP0): Standby, Healthy HA enabled, Heartbeat Up, HA State not in sync The output of the below command indicates a stuck RPM thread. ps excfl /bin/ps exfcl: 0 0 29270 2413 20 0 0 0 exit Z ? 0:00 \_ snmpd 0 0 23760 1 20 0 5144 3304 - R ? 5531:10 rpm. <<<<stuck RPM thread called by snmpd
This issue is identified due to FOS defect FOS-851141 under FOS release v9.1.1c.The swBootDate value is retrieved with an "Application Programming Interface (API)" that uses file operation. Similarly, the swFlashLastUpdated along with the swBootProminstallDate value is retrieved with another API that uses RPM queries. These I/O operations typically take time. The query is retried when processing the SNMP GET request for these parameters during a time when the SNMP agent is processing many requests simultaneously. These retries add overhead for the agent which creates a queue overfull condition that leads to an snmpd termination.
Workaround: Avoid issuing SNMP queries. Resolution: Firmware was optimized in the v9.1.1c SNMP code to cache data like bootup date or time and firmware install date or time. An additional enhancement has been checked into v9.1.1d that also caches the boot prom of installation date or time during SNMP activation. The cached data are used for these queries to prevent contention between threads within SNMPd.Brocade DEFECT FOS-851141
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.