Loading...
Loading...
0. Verify OneFS version: - For OneFS 9.5.0.0 to 9.5.0.5 , verify if a isi_stats_d core is observed. If yes, increase virtual memory first (First workaround noted below). If continue to see isi_stats_d failures, apply the second workaround to disable isi_pp_d . - For OneFS 9.5.0.6 and above , apply all steps in this KB. 1. In the events list, isi_stats_d failed messages are observed along with the event being resolved shortly afterwards: On live cluster # isi event events list |grep -i -A1 isi_stats_d 3.30117 04/23 23:11 W 3 17469 Query to isi_stats_d failed 3.30119 04/23 23:11 I 3 17469 Resolving event group -- 3.30126 04/23 23:14 W 3 17490 Query to isi_stats_d failed 3.30127 04/23 23:14 I 3 17490 Resolving event group -- 3.30132 04/23 23:48 W 3 17503 Query to isi_stats_d failed 3.30133 04/23 23:48 I 3 17503 Resolving event group In Log set # cat local/isi_event |grep -i -A1 isi_stats_d 1.529723 05/09 08:52 W 1 305263 Query to isi_stats_d failed 1.529724 05/09 08:52 I 1 305263 Resolving event group 1.529726 05/09 08:57 W 1 305266 Query to isi_stats_d failed 1.529727 05/09 08:57 I 1 305266 Resolving event group 1.529728 05/09 09:02 W 1 305268 Query to isi_stats_d failed 1.529729 05/09 09:02 I 1 305268 Resolving event group 2. In event groups list, SW_PP_STATS_QUERY_FAILED messages are observed: On live cluster # isi event groups list |grep -i sw_pp 17469 04/23 23:11 04/23 23:11 SW_PP_STATS_QUERY_FAILED 3 2 warning 17490 04/23 23:14 04/23 23:14 SW_PP_STATS_QUERY_FAILED 3 2 warning 17503 04/23 23:48 04/23 23:48 SW_PP_STATS_QUERY_FAILED 3 2 warning In log set # cat local/isi_event_groups | grep -i "sw_pp" 305263 05/09 08:52 05/09 08:52 SW_PP_STATS_QUERY_FAILED 1 2 warning 305266 05/09 08:57 05/09 08:57 SW_PP_STATS_QUERY_FAILED 1 2 warning 305268 05/09 09:02 05/09 09:02 SW_PP_STATS_QUERY_FAILED 1 2 warning 3. In the timeframe of the alert, the isi_pp_d.log shows remote communication errors: 2024-05-09T08:52:06.587136-07:00 <3.6> cluster-1(id1) isi_pp_d[7876]: [leader]: call exception: Remote query: remote communication error: connection to remote node closed: LNN 13 2024-05-09T08:52:06.612912-07:00 <3.6> cluster-1(id1) isi_pp_d[7876]: [leader]: stats query exception: Remote query: remote communication error: connection to remote node closed: LNN 13 2024-05-09T08:52:06.612936-07:00 <3.3> cluster-1(id1) isi_pp_d[7876]: [leader]: Remote query: remote communication error: connection to remote node closed: LNN 13 2024-05-09T08:57:07.658365-07:00 <3.6> cluster-1(id1) isi_pp_d[7876]: [leader]: call exception: Remote query: remote communication error: connection to remote node closed: LNN 13 2024-05-09T08:57:07.658459-07:00 <3.6> cluster-1(id1) isi_pp_d[7876]: [leader]: stats query exception: Remote query: remote communication error: connection to remote node closed: LNN 13 2024-05-09T08:57:07.658472-07:00 <3.3> cluster-1(id1) isi_pp_d[7876]: [leader]: Remote query: remote communication error: connection to remote node closed: LNN 13 4. In the timeframe of alert, isi_stats_d.log shows timeout errors: 2024-05-09T08:52:02.474147-07:00 <3.3> cluster-13(id13) isi_stats_d[40798]: timeout on key 208, devid 13 2024-05-09T08:52:02.474212-07:00 <3.3> cluster-13(id13) isi_stats_d[40798]: timeout in main loop 2024-05-09T08:57:03.174010-07:00 <3.3> cluster-13(id13) isi_stats_d[40798]: timeout on key 208, devid 13 2024-05-09T08:57:03.174070-07:00 <3.3> cluster-13(id13) isi_stats_d[40798]: timeout in main loop
The warning occurs when the isi_pp_d process (Partition Performance daemon) is unable to retrieve information from the isi_stats_d process.
This issue has been fixed in 9.5.1.3 / 9.7.1.5 / 9.10.0.1. There are two methods and workarounds to resolve this listed below, you must only apply one: First Work around: Increase virtual memory: 1) Run this command to check the current virtual memory: # isi_for_array -s 'limits -P $(pgrep isi_stats_d) -B | grep vmemoryuse' 2) Set virtual memory to 2 GB: (default is 1 GB) # isi_for_array -s 'limits -P $(pgrep isi_stats_d) -v 2g' 3) Confirm that the change took effect: # isi_for_array -s 'limits -P $(pgrep isi_stats_d) -B | grep vmemoryuse' Note : This change does not negatively impact cluster operations. Second Workaround: Disable the isi_pp_d service. # isi services -a isi_pp_d disable Note: Disabling the isi_pp_d service affects SmartQoS which is used for performance statistics and monitoring. If the customer never configured or uses custom SmartQoS datasets, there will be no affect by disabling isi_pp_d . The SmartQoS configuration of the cluster can be reviewed with the following 'isi performance' command. Note 2: Advise customer to reenable isi_pp_d service once they are upgraded to fixed OneFS releases (9.5.1.3 / 9.7.1.5 / 9.10.0.1) To check if SmartQOS performance datasets are configured: # isi performance datasets list ID Name Metrics Filters Statkey Creation Time ----------------------------------------------------------------------------- 0 System job_type - cluster.performance.dataset.0 Never system_name ----------------------------------------------------------------------------- Note: In the above example, the default ID 0 'System' performance dataset with a Creation Time of 'Never', is normal and expected. Any additional custom datasets would be listed below that. If nothing but the default is listed, then the cluster is not configured to use SmartQOS. Further information about SmartQoS is detailed in the weblinks noted below: Information about the isi_pp_d service https://www.dell.com/support/manuals/en-us/isilon-onefs/ifs_pub_9.4.0.0_administration_guide_gui/partitioned-performance-monitoring?guid=guid-9d3d1c75-6b07-4709-97c1-45a18232409f&lang=en-us OneFS SmartQoS http://www.unstructureddatatips.com/onefs-smartqos/ https://infohub.delltechnologies.com/en-us/p/onefs-smartqos/ OneFS SmartQoS Configuration and Setup https://infohub.delltechnologies.com/en-us/p/onefs-smartqos-configuration-and-setup/ http://www.unstructureddatatips.com/onefs-smartqos-configuration-and-setup/ The third workaround This issue has been fixed in 9.5.1.3 / 9.7.1.5 / 9.10.0.1.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.