...
LUN expansion performed with XtremIO. Microsoft Failover cluster resources fail or go offline during a volume expansion on XtremIO array and before completing the expansion on the host side.Microsoft cluster control holds SCSI reservation but goes offline. XtremIO receives SCSI reservation release later, and immediately acknowledges release once it is received. Communication errors are seen causing the node to be removed from the cluster after seeing the LUNs being expanded.Operating System Version:Windows Server 2012 StandardDell EMC PowerPath Version 6.1 (build 295).All paths are active and alive with no errors. System event log: 10/16/2017 11:34:48 PM Error XXXXXXX-.activ 7034 Service Control Manager The SQL Server (XXXXX) service terminated unexpectedly. It has done this 1 time(s). 10/16/2017 11:34:22 PM Information XXXXXXX-..activ 7036 Service Control Manager The SQL Server Agent ( XXXXX) service entered the stopped state. 10/16/2017 11:34:20 PM Error XXXXXXX-..activ 1069 Microsoft-Windows-FailoverCluste Cluster resource 'XXXXX' of type 'Physical Disk' in clustered role 'SQL Server (XXXXX)' failed. Based on the 10/16/2017 11:34:20 PM Error XXXXXXX-..activ 1038 Microsoft-Windows-FailoverCluste Ownership of cluster disk 'XXXXX' has been unexpectedly lost by this node. Run the Validate a Configuration wizard 10/16/2017 11:34:17 PM Warning XXXXXXX-..activ 140 Microsoft-Windows-Ntfs The system failed to flush data to the transaction log. Corruption may occur in VolumeId: O:, DeviceName: \Device\HarddiskVolu 10/16/2017 11:34:17 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 1 has changed. 10/16/2017 11:34:17 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 1 has changed. 10/16/2017 11:34:17 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 1 has changed. 10/16/2017 11:34:16 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 1 has changed. 10/16/2017 11:34:16 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 1 has changed. 10/16/2017 11:34:15 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 1 has changed. 10/16/2017 11:27:30 PM Information XXXXXXX-..activ 7036 Service Control Manager The Windows Modules Installer service entered the stopped state. System event log: 11/13/2017 10:41:22 PM Error XXXXXXX-..activ 1069 Microsoft-Windows-FailoverCluste Cluster resource 'Cluster Disk 1' of type 'Physical Disk' in clustered role 'StorageTest' failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet. 11/13/2017 10:41:22 PM Error XXXXXXX-..activ 1038 Microsoft-Windows-FailoverCluste Ownership of cluster disk 'Cluster Disk 1' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration. 11/13/2017 10:41:22 PM Warning XXXXXXX-..activ 140 Microsoft-Windows-Ntfs The system failed to flush data to the transaction log. Corruption may occur in VolumeId: X:, DeviceName: \Device\HarddiskVolume17. (A device which does not exist was specified.) 11/13/2017 10:41:22 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 17 has changed. 11/13/2017 10:41:22 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 17 has changed. 11/13/2017 10:41:22 PM Warning XXXXXXX-..activ 151 disk The capacity of Disk 17 has changed. Cluster.log: 00001364.000013f0::2017/11/13-22:41:22.599 INFO [RES] Physical Disk: PNPDEBUG: reset notification handle 0x35b73da0 00001364.000013f0::2017/11/13-22:41:22.599 INFO [RES] Physical Disk: PNPDEBUG: UnregisterDeviceNotification handle 0000001D35B73DA0 00001364.000013f0::2017/11/13-22:41:22.599 INFO [RES] Physical Disk: PNP: \\?\STORAGE#Volume#{7b3fcfa4-c894-11e7-93ff-0025b505a19f}#0000000000100000#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b} volume disappeared 00001364.000013f0::2017/11/13-22:41:22.599 INFO [RES] Physical Disk: PnpRemoveVolume: Removing volume \\?\STORAGE#Volume#{7b3fcfa4-c894-11e7-93ff-0025b505a19f}#0000000000100000#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b} 00001364.00002404::2017/11/13-22:41:22.600 INFO [RES] Physical Disk : PNP: HardDiskpSetPnpUpdateTimePropertyWorker: status 0 000008ac.000036d0::2017/11/13-22:41:22.600 INFO [GEM] Node 3: Sending 1 messages as a batched GEM message with gid 7190 00001364.00005034::2017/11/13-22:41:22.600 INFO [RES] Physical Disk: HarddiskpIsDiskCsv: IOCTL_DISK_GET_CLUSTER_INFO: device \Device\Harddisk17\Partition0, IsClustered 1 IsCsv 0 InMaintenance 0 00001364.00005034::2017/11/13-22:41:22.600 ERR [RES] Physical Disk: Failed to open device \Device\Harddisk17\ClusterPartition1, status 0xc0000034 00001364.00005034::2017/11/13-22:41:22.600 ERR [RES] Physical Disk: HarddiskpIsPartitionHidden: failed to open device \Device\Harddisk17\ClusterPartition1, status 2 00001364.00005034::2017/11/13-22:41:22.600 ERR [RES] Physical Disk : HardDiskpGetVolumeInfo: Cant tell if disk 17 partition 1 is hidden 2 00001364.00005034::2017/11/13-22:41:22.600 ERR [RES] Physical Disk : PnpUpdateDiskConfigThread: Failed to get volume info status 2 000008ac.000023e8::2017/11/13-22:41:22.602 INFO [NM] Received request from client address fe80::6d90:4572:db28:58f7. 00001364.000024d8::2017/11/13-22:41:22.946 ERR [RES] Physical Disk : IsAlive sanity check failed!, pending IO completed with status 1117. 00001364.000024d8::2017/11/13-22:41:22.946 ERR [RES] Physical Disk : IsAlive sanity check failed!, pending IO completed with status 1117. 00001364.000024d8::2017/11/13-22:41:22.946 WARN [RHS] Resource Cluster Disk 1 IsAlive has indicated failure.
An intermittent issue in PowerPath 6.1 code is causing this issue. A change in the Persistent Reservation IN (PRI) load balancing is leading to cluster disk failure during the LUN expansion.
To resolve this issue upgrade to PowerPath 6.3 or above. PowerPath for Windows can be downloaded from the following web link: https://support.emc.com/products/1781 Workaround:If upgrading is not possible, you can reduce the occurrence of the issue by performing a LUN expansion with active paths that are reduced to half the HBAs. There is still a risk. Run 'powermt display bus' from the command prompt. That shows the HBA# and target relationship.Run 'powermt set mode=standby hba= dev=' command. This command allows to only send half the paths with I/O to a specific device thereby leaving one initiator HBA active and the other in standby mode assuming only two x HBAs in use.Expand the LUN as planned.Run 'powermt set mode=active hba= dev=' command to restore default state.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.