
OPERATIONAL DEFECT DATABASE
...


...

Impacted VPLEX Hardware:EMC Hardware: VPLEX SeriesEMC Hardware: VPLEX VS2EMC Hardware: VPLEX-Metro Impacted VPLEX GeoSynchrony code versions:See the article Metadata for affected GeoSynchrony versions. Note: GeoSynchrony code levels 4.x through 5.3 Patch 4 are End of Service Life (EOSL) and no longer supported.If you are running your VPLEX on one of these EOSL code versions it is recommended that you contact your local field representative and discuss the planning for the upgrade of the VPLEX to at least the target code 5.5 SP2 P4, which has many fixes and enhancements not available in the EOSL code versions and that you benefit from.Also to make you aware, if you are not already, GeoSynchrony is at End of Life (EOL) and will be going EOSL on April 30, 2019. If you are running on any version of 5.4.x, you have until April 30, 2019 to plane and upgrade the VPLEX to at least target code 5.5 SP2 P4. EMC Software: VPLEX Site Replication Adapter (SRA) for VMware vCenter Site Recovery Manager (SRM) 6.1 There are three issues for this matter talked to in this article:Issue 1:When you run a planned migration on a stretched storage with static site bias, the operation may fail during the storage sync step. Issue 2:Certain commands sent by VPLEX SRA to the VPlexcli (through REST API) are not getting updated for up to 10 minutes and this is causing tests to fail. Reason for failure:The above error seems to be reported because when SRA queried for the detach rule for both the clusters they provided different values. From cluster -1: DetachRule Value:(winner cluster-2 after 5s) From cluster -2: DetachRule Value:(winner cluster-1 after 5s) VMware SRM Recovery Plan fails in a VMware stretched cluster configuration involving a VPLEX Metro. During the Synchronize storage step in the Recovery Steps in VMware SRM, errors are produced resulting in a Plan status: Incomplete recovery and the Recovery Plan are halted. Example errors from VMware SRM Recovery Steps include (in logs or from the UI): Error - Failed to promote replica devices. Timed out (300 seconds) while waiting for SRA to complete 'failover' command.'Error: failed to promote replica devices. SRA command 'failover' didn't return a response.Failed to promote replica devices. SRA command failover failed. The SRA was unable to find consistency group from group identifier. Example VPLEX SRA Logs report: sra_queryStrings_01-20-2017_01-28-17.268.log: <String id="REPLICATION.fail_over_waiting_timeout.hint">Check log for detail information.</String> sra_queryStrings_01-20-2017_02-18-30.656.log: <String id="REPLICATION.fail_over_waiting_timeout.desc">Fail over taking too long time and waiting process timeout.</String> Increasing the timeout value, storage.commandTimeout, in VMware SRM Advanced Settings does not help. Issue 3:When the fail-over completes, extra virtual-volumes are getting added on cluster-2 side storage views.
During Storage synchronization steps in a Recovery Plan, certain commands sent by VPLEX SRA to VPLEXCLI (through REST API) are not getting updated VPlexcli contexts values for up to 10 minutes, as the timeout value of 600 seconds to read cached data was set on the REST API calls, and thus causing tests to fail. VMware SRM sends commands to VPLEX SRA, and then the VPLEX SRA sends RESTAPI calls to the VPLEX. The VPLEX then send the response back to the VPLEX SRA. At this point the VPLEX SRA sends a response back to the SRM and if the cached value does not match the actual values, the test fails.
Workarounds: Issue 1 workaround: After the planned migration fails in the first attempt, manually run 'discover devices' and rerun the operation.From UI SRM, Site A, Monitor tab, click "SRAs," click the "Rescan SRAs" button, and repeat previous steps for Site B.Those steps would trigger the discoverDevices call. Fix:Upgrade either VMware SRM or VPLEX SRA to v6.1.0.100 and later where this issue has been fixed. Issue 2 workaround: Reason for failure:The above error seems to be reported because when SRA queried for the detach rule for both the clusters they provided different values. From cluster -1 : DetachRule Value:(winner cluster-2 after 5s) From cluster -2 : DetachRule Value:(winner cluster-1 after 5s) Analysis:Dell VPLEX Engineering has reviewed the VPLEX logs (client and REST logs) and has found no issues of slowness that can cause VPLEXCLI context refresh to delay and thus causing the stale values being returned to VPLEX SRA commands that are requesting the update post applying the changes. Fix:Upgrade both clusters to the VPLEX SRA v6.1.0.100 and later that has the fix that addresses the REST command (sent to VPlexcli) time-out issue. Issue 3 workaround: Analysis:From the VPLEX SRA team's analysis, this issue is because all the VPLEX virtual-volumes are part of one TargetGroup (input given by VMware SRM to VPLEX SRA). In VPLEX, all the virtual-volumes are part of one VPLEX CG. Workaround:Theoretically, as per VPLEX Engineering's understanding of VMware SRM and VPLEX SRA functionality, if the VPLEX virtual-volumes are separated out in two VPLEX CGs and if VMware SRM can protect them in different TargetGroups (an input supplied by VMware SRM to VPLEX SRA), then it should address this issue.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.