...
Trash directory HealthCheck trash_dir_pq_limit failure.
Trash directory HealthCheck fails when trash directory PQ accumulates more than 10,240 entries. This HealthCheck failure can be due to many concerns about the health of the trash directory service, including but not limited to: - (1) Trash directory service has stopped working (crashed/hung/disabled), or the service failed to clean up stale trash directory PQ entries. (2) Backbone OneFS services like isi_job_d, isi_papi_d or TreeDelete job have crashed/hung, or have been disabled by a privileged user. (3) Trash directory consumer services are trashing directories aggressively, or currently running jobs are blocking deletion of trashed directories.
This KB will be updated once we have any patch available. Workaround:The following workarounds should be helpful based on three cause categories mentioned above. (1) MCP should restart the service when trash directory service crashes. Please file bugs for hung trash directory service issues. If trash directory service is hung, it can be killed/restarted for temporary resolution. Trash directory HealthCheck failure can be ignored if trash directory service isi_trash_d has been disabled by a privileged user. However, the service should be enabled when appropriate, so that trash directory service can finish deleting trashed directories. isi services -a isi_trash_d enable Please run the following command to clean-up stale PQ entries in case trash directory service fails to clean up stale entries in PQ. /usr/libexec/isilon/isi_trash_pq_clean --cleanup [--debug] Please also verify that job state query interval for the service queued TreeDelete jobs is not larger than 30 seconds in the case mentioned above. isi_gconfig -t trash-config job_query_interval=30 (2) If backbone OneFS services or TreeDelete job is hung or have been disabled by a privileged user, trash directory HealthCheck failure can be ignored. However, backbone OneFS services, or TreeDelete job should be enabled when appropriate, so that trash directory service can finish deleting trashed directories. (3) Trash directory HealthCheck failure can be ignored if consumer services (Lhotse data mover/Writable snapshot) are trashing directories aggressively, or currently running jobs are blocking deletion of trashed directories. Backbone OneFS services are bottleneck in this case. If long running TreeDelete jobs queued by trash directory service are blocking processing of substantial number of recently trashed directories, then job timeout limit and job query interval for TreeDelete jobs queued by the service can be tuned to a smaller value. Minimum recommended values for these tunables are the following: - isi_gconfig -t trash-config job_timeout_limit=3600 isi_gconfig -t trash-config job_query_interval=30 Please note that trash directory service cancels TreeDelete job queued by the service still running or paused after job timeout limit. So, if the service cancels the majority of the TreeDelete jobs queued by itself, then job timeout limit can be increased to a higher value.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.