General
PAM workflow design needs refresh.
Symptom
pam_manager keeps on spawning multiple instances of below processes every 2 minutes (can happen on XR or Sysadmin):
- pam_manager.sh
- start_pam.sh
- monitor_cpu.pl
This results in total # of processes to increase & once system limit is reached, OS/Host triggers a kernel crash to recover
Conditions
If rpm_db gets corrupted, there is a chance that rpm db file get locked & no furtther rpm queries are allowed.
The chance of rpm_db getting corrupted is very rare. For instance a yum operation which was incomplete or terminated causing the corruption and corrupting the rpm db
If pam_manager or any of its child process (monitor_cpu.pl, monitor_show_logging.pl, monitor_crash.pl or pam_cli_agent) gets killed or restarted and rpm queries are locked, this bug may be hit
Workaround
There is no workaround.
Further Problem Description
PAM, IOS XR Platform Automated Monitoring will be improved to identify these types of process issues
Impacts:
6.6.3, 6.6.4, 7.0.14, 6.7.1, 7.0.2, 7.1.1, 7.1.15, 7.1.2, 7.2.12
Does not impact:
7.3.1, 7.2.2, 7.1.3, 6.7.3