Symptoms
Attempt to Failover or run SRM test FailoverSRM Failover or Test failover fails due to volumes/VMFS not foundExample errors in SRM logs:2017-05-21T14:20:53.757+03:00 [53540 error 'AbrRecoveryEngine' opID=773f5494:dd26:ffe2] Dr::Providers::Abr::AbrRecoveryEngine::RecoverVmfsDatastore: Cannot find VMFS volume for datastore 'ds:///vmfs/volumes/57c9acfd-5ead583e-ada6-0017a477xxxx/': (dr.storageProvider.VmfsRecoverySpec) {2017-05-21T14:20:53.761+03:00 [193968 verbose 'AbrRecoveryEngine' opID=773f5494:dd26:ffe2] FailDeviceGroups: Failing group2017-05-21T14:20:53.761+03:00 [193968 verbose 'AbrRecoveryEngine' opID=773f5494:dd26:ffe2] GroupCallbackHandler::FailGroup: Failing group 'vm-protection-group-xxxxxx'2017-05-21T14:20:53.761+03:00 [193968 verbose 'Replication' opID=773f5494:dd26:ffe2] Dr::Replication::EntityOperationJoinerBase,void>::EntityFailed: Received a failure update for protection group Id=[dr.replication.VmProtectionGroup:bcf4f9b1-2063-4d9a-b773-93be06xxxxxx:vm-protection-group-xxxxxx], error=--> (dr.storageProvider.fault.DatastoreRecoveryFailed) {--> faultCause = (dr.storageProvider.fault.RecoveryVmfsVolumeNotFound) {--> faultCause = (dr.storageProvider.fault.RecoveryDeviceNotFound) {--> faultCause = (vmodl.MethodFault) null, --> device = "60:06:01:60:0C:F0:3D:00:0E:29:21:28:xx:xx:xx:xx", --> msg = ""--> }, --> device = (string) [--> "60:06:01:60:0C:F0:3D:00:0E:29:21:28:x:xx:xx:x"--> ], --> msg = ""--> }, --> protectedName = "General", --> protectedUrl = "ds:///vmfs/volumes/57c9acfd-5ead583e-ada6-0017a4770xxx/", --> msg = ""--> }Manually mounting the target devices without SRM appears to work.
Cause
In some scenarios, the time it takes the devices to be seen on the ESX nodes is longer than the default Timeout, requires a second scan, or requires a delay prior to the scan.
Resolution
Resolution:
Change the ESX/SRM parameters for Storage discovery.These can be changed to different numbers, depending on the environment. Generally larger environments with more Luns, ESXs require longer Timeouts/retries/delays
Open the vSphere Web client.Right-click the SRM site that you want to modify and select Advanced Settings.Click StorageProvider.Check/Edit parameters:
storageProvider.hostRescanDelaySec - Default is 0, Attempt changing to a number between 20 and 180storageProvider.hostRescanRepeatCnt - Default is 1, Attempt changing to a number between 2 and 3storage.commandTimeout - Default is 300, Attempt increasing to 600 or 900
Similar information can be found in VMWare KBs, for example KB 1008283