...
Uncontrollable Splitter - KDriver is stuck during startup therfore its getting killed after 150 seconds by its watchdog mechanism.Symptoms found in the logs: ESX logs: /scratch/log/kdriver.log.startup Wed Feb 20 18:20:55 UTC 2019) launch_kdriver_watchdog: KDriver stopped / died / killed Wed Feb 20 18:20:55 UTC 2019) launch_kdriver_watchdog: Sleeping for 10 seconds Wed Feb 20 18:21:05 UTC 2019) launch_kdriver_watchdog: Next action = open Wed Feb 20 18:21:05 UTC 2019) launch_kdriver_watchdog: Launching kdriver_heartbeats.sh Wed Feb 20 18:21:05 UTC 2019) kdriver_heartbeats: deleting old /scratch/log/kdriver-heartbeats Wed Feb 20 18:21:05 UTC 2019) launch_kdriver_watchdog: kdriver_options_manager.py already running Wed Feb 20 18:21:05 UTC 2019) launch_kdriver_watchdog: Launching run_kdriver.sh Wed Feb 20 18:21:05 UTC 2019) run_kdriver: run_kdriver.sh - starting Wed Feb 20 18:21:05 UTC 2019) run_kdriver: Launching KDriver Wed Feb 20 18:21:05 UTC 2019) run_kdriver: running kdriver_generate_spl_uuid.sh --startup-and-run-periodic /etc/config/emc/rp/kdriver/kdriver_generated_uuid RP_SPLITTER_GENERATED_UUID /scratch/log/kdriver.log.startup Wed Feb 20 18:21:05 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: === Entered with args: --startup-and-run-periodic /etc/config/emc/rp/kdriver/kdriver_generated_uuid RP_SPLITTER_GENERATED_UUID /scratch/log/kdriver.log.startup === Wed Feb 20 18:21:06 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: Finished refreshing the advanced optionsWed Feb 20 18:21:06 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: Checks if /UserVars/RP_SPLITTER_GENERATED_UUID is valid Wed Feb 20 18:21:06 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: Property RP_SPLITTER_GENERATED_UUID contains a valid UUID = 8CA32DD0-55DF-1458-4F1A-64BC8F9188F6 Wed Feb 20 18:21:06 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: Property RP_SPLITTER_GENERATED_UUID is 8CA32DD0-55DF-1458-4F1A-64BC8F9188F6 . Writing it to file /etc/config/emc/rp/kdriver/kdriver_generated_uuid Wed Feb 20 18:21:06 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: Recovering file /etc/config/emc/rp/kdriver/kdriver_generated_uuid to UUID = 8CA32DD0-55DF-1458-4F1A-64BC8F9188F6 Wed Feb 20 18:21:06 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: Checks if file /etc/config/emc/rp/kdriver/kdriver_generated_uuid is valid Wed Feb 20 18:21:06 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: File /etc/config/emc/rp/kdriver/kdriver_generated_uuid contains a valid UUID = 8CA32DD0-55DF-1458-4F1A-64BC8F9188F6 Wed Feb 20 18:21:06 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: Activating periodic scan each 30 secondsWed Feb 20 18:21:06 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: === Entered with args: --periodic-scan /etc/config/emc/rp/kdriver/kdriver_generated_uuid RP_SPLITTER_GENERATED_UUID /scratch/log/kdriver.log.startup === Wed Feb 20 18:21:06 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: Found already running processes of this script with the same args! Wed Feb 20 18:21:06 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: Going to kill processes: 35087 Wed Feb 20 18:21:08 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: Successfully killed other processes Wed Feb 20 18:21:08 UTC 2019) /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh: Begin periodic scan Wed Feb 20 18:23:37 UTC 2019) kdriver_heartbeats: didn't get heartbeat from the kdriver for the last 150 seconds, killing the kdriver The above line is being printed by the watchdog mechanism: Wed Feb 20 18:23:37 UTC 2019) kdriver_heartbeats: didn't get heartbeat from the kdriver for the last 150 seconds, killing the kdriver Also, running ps -c | grep kdriver will result in a short list of KDriver processes: 7277456 7277456 sh /usr/bin/sh ./kdriver_heartbeats.sh 7277472 7277472 sh /usr/bin/sh ./run_kdriver.sh 7277515 7277515 sh /usr/bin/sh /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh --periodic-scan /etc/config/emc/rp/kdriver/kdriver_generated_uuid RP_SPLITTER_GENERATED_UUID /scratch/log/kdriver.log.startup 7277536 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7262309 7262309 python /usr/bin/python kdriver_options_manager.py 7262549 7262549 sh /usr/bin/sh /opt/emc/rp/kdriver/bin/launch_kdriver_watchdog.sh 7404571 7404571 grep grep kdriver In a stable ENV the results should look like: [root@A13T7090:/vmfs/volumes/59531dba-1e561f1d-43ed-484d7e72c8fb/log] ps -c | grep kdriver 7277456 7277456 sh /usr/bin/sh ./kdriver_heartbeats.sh 7277472 7277472 sh /usr/bin/sh ./run_kdriver.sh 7277515 7277515 sh /usr/bin/sh /opt/emc/rp/kdriver/bin/kdriver_generate_spl_uuid.sh --periodic-scan /etc/config/emc/rp/kdriver/kdriver_generated_uuid RP_SPLITTER_GENERATED_UUID /scratch/log/kdriver.log.startup 7277536 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277540 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277541 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277542 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277543 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277544 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277545 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277546 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 6261739 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277548 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277566 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277567 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277568 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277569 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277570 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277571 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277572 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277573 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277586 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 6261779 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277588 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277589 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277590 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7277591 7277536 kdriver /opt/emc/rp/kdriver/bin/kdriver 7262309 7262309 python /usr/bin/python kdriver_options_manager.py 7262549 7262549 sh /usr/bin/sh /opt/emc/rp/kdriver/bin/launch_kdriver_watchdog.sh 7404571 7404571 grep grep kdriver
An old log file forces the KDriver to iterate a large number of files and issue syscall to more than 1 million files which takes time.Affected versions: All versions of RP4VM.
Workaround: Delete old KDriver log files under /scratch/log/ where their number in the filename is smaller than the latest KDriver log file by at least 1000 rm -r How to identify the old unzipped kdriver log file Under /scratch/log there is an old KDriver log file (no zipme extension): esx-hostname -2019-02-23--03.57/scratch/log: ls -altr | grep kdriver- 1 root root 10624529 Aug 19 2018 kdriver.log.003342690 This will be saved as the oldest filename. The newest filename is: - 1 root root 10207623 Feb 22 15:16 kdriver.log.004347269