
OPERATIONAL DEFECT DATABASE
...

...
PFxM UI shows intermittent failed health checks and/or 'unreachable' against random components. The failures can come and go on their own. From PFxM CLI, the Nagios service shows to be in a failed state: [delladmin@pfxm_3 init.d]$ systemctl status nagios ● nagios.service - Nagios Core 4.4.9 Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled) Active: failed (Result: signal) since Thu 2024-03-07 01:48:07 EST; 1 months 15 days ago Docs: https://www.nagios.org/documentation Process: 27158 ExecStopPost=/usr/bin/rm -f /var/spool/nagios/cmd/nagios.cmd (code=exited, status=0/SUCCESS) Process: 27140 ExecReload=/usr/bin/kill -s HUP ${MAINPID} (code=exited, status=0/SUCCESS) Process: 27138 ExecReload=/usr/sbin/nagios -v /etc/nagios/nagios.cfg (code=exited, status=0/SUCCESS) Main PID: 2109 (code=killed, signal=ABRT) Apr 22 12:06:05 pfxm_3.8.6 systemd[1]: Unit nagios.service cannot be reloaded because it is inactive. Apr 22 12:08:05 pfxm_3.8.6 systemd[1]: Unit nagios.service cannot be reloaded because it is inactive. Apr 22 12:10:05 pfxm_3.8.6 systemd[1]: Unit nagios.service cannot be reloaded because it is inactive. Apr 22 12:12:04 pfxm_3.8.6 systemd[1]: Unit nagios.service cannot be reloaded because it is inactive. Apr 22 12:14:04 pfxm_3.8.6 systemd[1]: Unit nagios.service cannot be reloaded because it is inactive. /var/log/messages and/or journalctl -u nagios show the Nagios service being shut down by SIGTERM: Apr 22 12:06:05 pfxm_3.8.6 systemd[1]: nagios.service: main process exited, code=killed, status=6/ABRT Apr 22 12:08:05 pfxm_3.8.6 nagios[15242]: Caught SIGTERM, shutting down... Apr 22 12:10:05 pfxm_3.8.6 systemd[1]: Unit nagios.service entered failed state. Apr 22 12:12:04 pfxm_3.8.6 systemd[1]: nagios.service failed. Apr 22 12:14:04 pfxm_3.8.6 systemd[1]: Unit nagios.service cannot be reloaded because it is inactive. Apr 22 12:16:04 pfxm_3.8.6 systemd[1]: Unit nagios.service cannot be reloaded because it is inactive. Apr 22 12:18:04 pfxm_3.8.6 systemd[1]: Unit nagios.service cannot be reloaded because it is inactive. The service will fail until manually restarted with systemctl start/restart nagios but may crash again after some time. Impact Health check monitoring by PFxM (nagios) will be impacted.
It is not currently known what is causing the SIGTERM crash but it is suspected to be either a Nagios or some associated library code bug. PFxM 3.8.6 is running Nagios version 4.4.9-1.
An immediate workaround is to modify the file /opt/asm-deployer/nagios/nagios-export.rb to include a line that automatically starts the Nagios service if it is not running. It is recommended to back up the nagios-export.rb file before making any changes. unless new_config == old_config STDERR.puts("\nDetected changes in the configuration. The configuration file will be re-written and nagios will reload.") File.open("/etc/nagios/objects/asm.cfg", "w") do |f| f.puts new_config end unless system("/sbin/service nagios reload 2>&1 >/dev/null") system("/sbin/service nagios start 2>&1 >/dev/null") <----- add this line end end After making the above changes, restart the Nagios service: systemctl start/restart nagios A longer-term fix is to get to a newer Nagios version in a later PFxM version. Impacted Versions PFxM 3.8.6 Fixed In Version PFxM 3.8.8
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.