...
EVENTYou received an event notification indicating one or more nodes recovered from a panic. Information about the panic is recorded in a file located under /var/tmp/ on the affected nodes.Example: 4.3394 03/12 18:02 W 4 53125 Node 4 has recovered from a panic. Info about panic is recorded in file: /var/tmp/panic.1615590175
The exact causes of a Node panic can vary, but typical causes may include: Hardware failureSoftware code failureMisconfiguration Analysis of the cluster logs must be performed with PowerScale Support for the exact cause of the panic.
To begin troubleshooting the issue, first confirm that the node has recovered from the panic event and is not down or offline.*To Troubleshoot, open an SSH connection to the node and log in using the "root" account.Run the following command to confirm the node rejoined the cluster: # isi status The isi status command returns output similar to the following. If the node successfully rejoined the cluster, the Health column does not display "D" (down): Health Throughput (bps) HDD Storage SSD Storage ID |IP Address |DASR | In Out Total| Used / Size |Used / Size ---+---------------+-----+-----+-----+-----+-----------------+----------------- 1|10.16.141.226 | OK | 553M| 3.2M| 557M|61.9T/ 106T( 59%)| L3: 1.5T 2|10.16.141.227 | OK | 481M| 96.0| 481M|62.2T/ 106T( 59%)| L3: 1.5T 3|10.16.141.228 | OK | 372k| 332k| 704k|62.3T/ 106T( 59%)| L3: 1.5T 4|10.16.141.229 | OK |10.8M| 941k|11.7M|62.6T/ 106T( 59%)| L3: 1.5T 5|10.16.141.230 | OK | 9.4M| 393k| 9.8M|62.6T/ 106T( 59%)| L3: 1.5T 6|10.16.141.231 | OK | 7.3M|256.0| 7.3M|63.4T/ 106T( 60%)| L3: 1.5T ---+---------------+-----+-----+-----+-----+-----------------+----------------- Cluster Totals: | 1.1G| 4.9M| 1.1G| 375T/ 634T( 59%)| L3: 8.7T Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only Gather logs by running the following command and provide the log set to Isilon Technical Support for analysis of the Panic: # isi_gather_info -f /var/tmp/ Note: /var/tmp/ panic data is not collected in a default log gather, you must use isi_gather_info -f /var/tmp/ to collect proper panic information. Once the logs are received, Technical Support is to review and analyze the Panic Stack details. They determine if the panic stack corresponds to any known issue or Knowledge Base article. In the event the Panic Stack details do not match a known issue or existing KB article, the issue is escalated for further assessment. Technical Support determines what actions are needed, such as a hardware replacement, code fix, firmware update, or other mitigation.* If the Node is still down, additional troubleshooting must be performed to bring the node back online. Contact Isilon Technical Support if assistance is needed.For more information, see article 55936: Isilon OneFS: Event notification: Node Offline - Event ID: 200010001, 300010003, 399990001, 900160001, 910100006, 400150007