...
Increasing file size of monitor_crash.pl.log on sysadmin VM ================================== sysadmin-vm:0_RP0# dir harddisk:* location 0/RP1 // snip // 129036 -rw-r--r--. 1 1000895628 Oct 28 03:27 monitor_crash.pl.log <<<<<< more than 1Gbyte and increasing day by day 129028 -rw-r--r--. 1 8 Oct 28 03:03 calvados-0_RP1_VM1-uptime.txt ==================================
No special condition and the issue is seen after bootup without any triggers
No workaround but we can recover it with the following options, also, if chassis reload is needed to recover it which can be too much but some Customers may prefer it, then it must be done by admin hw-mod location all reload command and should not be done via reload location all from XR VM as it will only reload all XR VMs without touching sysadmin VM where the problem exists Recovery Option #1 --------------------------- Remove/delete the file and perform process restart on the RP where this process is placed as it can be on either RP0 and RP1 and restarting the process without checking will not help. In order to find the RP where process is active, please check the following first admin) show process pam_monitor location 0/RP0 admin) show process pam_monitor location 0/RP1 In the following example, the process was placed on RP1 but it was RP0 where confd_helper process was placed, therefore, RP0 prompt was default when logged in sysadmin and have to attach to 0/RP1 by the following to attach to RP1 to delete the file attach location 0/RP1 rm /misc/disk1/cisco_support/monitor_crash.pl.log process restart pam_manager location 0/RP1 Recovery Option #2 -------------------------- Log rotate In sysadmin vm copy ?pam_logrotate.conf? file from ?/opt/cisco/calvados/pam? to ?/etc/logrotate.d/? directory sysadmin-vm:0_RP0# run Thu Jan 12 20:56:45.981 UTC+00:00 [sysadmin-vm:0_RP0:~]$ [sysadmin-vm:0_RP0:~/cisco_support]$cd /opt/cisco/calvados/pam/ [sysadmin-vm:0_RP0:/opt/cisco/calvados/pam]$pwd /opt/cisco/calvados/pam [sysadmin-vm:0_RP0:/opt/cisco/calvados/pam]$ls collect_ltrace.pl monitor_show_logging.pl parse_metadata.pl edcd_cli.py pam-functions.sh release_memory.sh get_cpu_memory_snapshots.pl pam.pm start_pam.sh get_cpu_snapshots.pl pam_logrotate.conf stop_pam.sh get_memory_snapshots.pl pam_logrotate.sh vxr_calv_start_pam.sh get_pid_cmdlines.pl pam_ltrace.pm vxr_host_start_pam.sh monitor_cpu.pl pam_manager.sh vxr_mount_iso.sh monitor_crash.pl pam_perf.pm [sysadmin-vm:0_RP0:/opt/cisco/calvados/pam]$cp pam_logrotate.conf /etc/logrotate.d/ Update the log file path in ?/etc/logrotate.d/pam_logrotate.conf? (I used vi to edit). Only the line with log file path needs to updated. The modified line is in red below [sysadmin-vm:0_RP0:/etc/logrotate.d]$more pam_logrotate.conf # xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # log rotation conf for PAM # # Aug 2016, Jieming Wang # # Copyright (c) 2016-2017 by Cisco Systems, Inc. # All rights reserved. # xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # # see "man logrotate" for details # It is size based. Default size is 1M. Unless it is specified in # each template. # keep last 2 logs rotate 2 # Truncate the original log file in place after creating a copy copytruncate # uncomment this if you want your log files compressed compress nomail noolddir notifempty missingok /misc/disk1/cisco_support/*.log { size 1M rotate 1 } [sysadmin-vm:0_RP0:/etc/logrotate.d]$ Now the system will automatically log rotate the log files at regular intervals . I am not sure how often the log rotation is trigger. During my test I saw the pam log files log rotated in less than 1 hour. Repeat the above steps on the secondary RPs. Recovery Option #3 ------------------------- The continuous log messages are generated by a perl script. We can modify the perl script to remove the line of code that generates the log message. In sysadmin vm edit (I used vi) file ?/opt/cisco/calvados/pam/monitor_crash.pl? sysadmin-vm:0_RP0# run Thu Jan 12 20:56:45.981 UTC+00:00 [sysadmin-vm:0_RP0:~]$ [sysadmin-vm:0_RP0:~/cisco_support]$cd /opt/cisco/calvados/pam/ [sysadmin-vm:0_RP0:/opt/cisco/calvados/pam]$pwd /opt/cisco/calvados/pam [sysadmin-vm:0_RP0:/opt/cisco/calvados/pam]$ls collect_ltrace.pl monitor_show_logging.pl parse_metadata.pl edcd_cli.py pam-functions.sh release_memory.sh get_cpu_memory_snapshots.pl pam.pm start_pam.sh get_cpu_snapshots.pl pam_logrotate.conf stop_pam.sh get_memory_snapshots.pl pam_logrotate.sh vxr_calv_start_pam.sh get_pid_cmdlines.pl pam_ltrace.pm vxr_host_start_pam.sh monitor_cpu.pl pam_manager.sh vxr_mount_iso.sh monitor_crash.pl pam_perf.pm Look for the line ?print "New file closes - processing ....\n";? . line #285. Remove this line. I made a backup of the file just in case. [sysadmin-vm:0_RP0:/opt/cisco/calvados/pam]$diff monitor_crash.pl monitor_crash.pl.bak 284a285 > print "New file closes - processing ....\n"; [sysadmin-vm:0_RP0:/opt/cisco/calvados/pam]$ Remove or archive the bloated log file if needed. ?/misc/disk1/cisco_support/monitor_crash.pl.log? Restart pam_manager process on sysadmin. sysadmin-vm:0_RP0# process restart pam_manager location 0/RP0 After few mins all PAM services would have restarted. Check ?/misc/disk1/cisco_support/monitor_crash.pl.log? if continuous logs are no longer seen. Repeat above steps on all secondary RPs. We still need to monitor the log directory over time as log rotation is not working.
This issue is seen on the sysadmin VM on eXR platforms (NCS560, NCS5500, ASR9K) . This issue does not apply to thinXR platforms (spitfire) that does not have a sysadmin VM. There are two parts to this issue 1. Continuous logs getting written into monitor_crash.pl.log file in sysadmin. (seen in NCS 560) 2. Pam log files in sysadmin are not getting log rotated Issue #1 : The continuous logs are getting generated when a non PAM processes (component: ncs560-sys-ctrldrv) is performing continuous file operations on the harddisk (file: /misc/disk1/toggle_pci_link.log) . This issue is seen on NCS560 . But not on NCS5500 or ASR9K . Not sure if other exr platforms will see this issue. Not sure if the continuous file operation is a day 1 issue or something new . Issue #2 will be seen on all exr platforms . pam log file in sysadmin VM will slowly increase in size over time. On router with issue#1 and #2 (NCS560), it is not a minor issue as the log file can grow too big over time and fill up the harddisk . Which could cause issues that can impact traffic. On routers with issue#2 only (NCS5500 , ASR9K ), it is a minor issue as the log file is unlikely to grow too big. Please read eng-notes attachment for further details