Loading...
Loading...
Event You receive a "Node Offline" event notification. Event ID: 200010001. "Node Offline" events are generated when a node is reported offline by the other nodes in the cluster. This event can also be generated when the internal link is lost on any node. NOTE: If the node is not turned on, then perform ‘ How to power cycle and drain an Isilon node ’.
Details One of the following conditions is true: One or more nodes rebooted. One or more nodes are powered off. A node lacks back-end network (InfiniBand (IB)) connectivity. (Back-end connectivity refers to a node's ability to communicate with other nodes.) A node cannot join the group.
Response Before you begin troubleshooting the issue, confirm that the event is not related to maintenance on the cluster. After confirming that no maintenance is in progress, proceed with the following troubleshooting. If the node rebooted Open an SSH connection to the node and log on using the "root" account. Run the following command to confirm the node rejoined the cluster: isi status The isi status command returns output similar to the following. If the node successfully rejoined the cluster, the Health column will not display D (down): Health Throughput (bps) HDD Storage SSD Storage ID |IP Address |DASR | In Out Total| Used / Size|Used / Size -------------------+-----+-----+-----+-----+-----------------+----------------- 1|10.111.183.10 | OK | 115K| 220K| 335K| 531M/ 10T(< 1%)| (No SSDs) 2|10.111.183.11 | OK | 0| 0| 0| 519M/ 10T(< 1%)| (No SSDs) 3|10.111.183.12 | OK | 0| 26K| 26K| 521M/ 10T(< 1%)| (No SSDs) -------------------+-----+-----+-----+-----+-----------------+----------------- Cluster Totals: | 115K| 246K| 361K| 1.5G/ 31T(< 1%)| (No SSDs) Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only Run the following command to confirm the uptime duration: uptime Output similar to the following appears: 8:41PM up 10 mins, 1 user, load averages: 0.08, 0.18, 0.14 If the node recently rebooted, the uptime duration will be relatively short, in minutes. Gather logs by running the following command and send them to Isilon Technical Support for analysis: isi_gather_info If you can ping the external IP address of the down node Confirm the status of the node: Open an SSH connection to the node and log on using the "root" account. Run the following command: ifconfig |grep -A4 ib1 The ifconfig command should return the following status indicating that the internal interface is active: ib1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 2004 lladdr 0.15.1b.0.10.bd.4c.77 inet 172.10.111.200 netmask 0xffffff00 broadcast 1.10.111.255 zone 1 media: Infiniband autoselect status: active If the status is inactive, check the following: Are the activity lights for the ports on the IB card on or off? If the lights are off, go to step b. Are the IB cables firmly attached to the node and the IB switch? If not, reseat the cables on the node and the switch. Is the IB switch powered on? If not, power it on. Visually inspect the node to verify that the power light is on. If the node is turned off Attempt to turn on the node. NOTE It is best if you can establish serial access to the node to monitor as it boots up to capture any information that might assist in troubleshooting. For more information, see Isilon: How to connect to the management port of a node . If the node turns on, confirm whether it rejoined the cluster: Open a secure shell (SSH) connection to a different node in the cluster and log on using the root account. Run the following command to determine whether the node has rejoined the cluster: isi status The isi status command returns output similar to the following. If the node successfully rejoined the cluster, the Health column will not display D (down): Health Throughput (bps) HDD Storage SSD Storage ID |IP Address |DASR | In Out Total| Used / Size|Used / Size -------------------+-----+-----+-----+-----+-----------------+----------------- 1|10.111.183.10 | OK | 115K| 220K| 335K| 531M/ 10T(< 1%)| (No SSDs) 2|10.111.183.11 | OK | 0| 0| 0| 519M/ 10T(< 1%)| (No SSDs) 3|10.111.183.12 | OK | 0| 26K| 26K| 521M/ 10T(< 1%)| (No SSDs) -------------------+-----+-----+-----+-----+-----------------+----------------- Cluster Totals: | 115K| 246K| 361K| 1.5G/ 31T(< 1%)| (No SSDs) Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only If the node rejoins the cluster, gather logs by running the following command and send them to Isilon Technical Support for analysis: isi_gather_info If the node does not rejoin the cluster, proceed to the next section. If the node does not turn on, ensure that the circuit breakers are operational and that the power outlets are active. If the node is not receiving power, resolve the power supply issue. If the node is off and it is receiving power, contact Isilon Technical Support for help troubleshooting the issue. If the node is powered on but did not rejoin the cluster Attempt to establish remote access via a secure shell (SSH) session. If the SSH session fails, attempt to establish remote access via the serial console. If neither the SSH session nor the serial console are responsive, press CTRL+T either within the SSH session or on the serial console. If pressing CTRL+T produces output, record the output, and then contact Isilon Technical Support for failure analysis. If the node is unresponsive, press the power button three times and then wait five minutes for the node to power off. If the node does not power down, press and hold down the power button until the node powers off. Press the power button again to power on the node. If the node powers up and returns a login prompt, log on using the "root" account. Gather logs by running the following command and send them to Isilon technical support for analysis isi_gather_info If the node does not rejoin the cluster, contact Isilon Technical Support for help troubleshooting the issue.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
BugZero Plan
Streamline upgrades with automated vendor bug scrubs
BugZero Prevent
Wish you caught this bug sooner? Get proactive today.