Symptoms
On a managed LPAR, message CGRP135E was issued because the CONGROUPs validate fails to find a non-R1 device - CGRP135E Bad device list - has R21 device. On the primary C system running the script, message CGRP387W was issued because a lock was still held by a member LPAR - 'CGRP387W Command not allowed. ALL-CONGROUPS lock held by ECGMN00 MSTC'.
Cause
In step 6 of the PA21 script running on the primary C system, all CONGROUP members are asked to DASD swap to R2 for swap group CGNAME. When the post swap cleanup completes on the primary C system, the script moves on to run several steps issuing RDF commands to reconfigure the SRDF configuration for DC2 DASD to be the R1 site. Normally, by the time step 15 is reached where a CONGROUP REFRESH is issued to have the new CONGROUP manage synchronous mode from DC2 to DC1, all member systems had completed their post DASD swap cleanup. The cleanup ran long on a managed host, and the validate failed when non-R1 devices in the old CONGROUP were found because of the RDF reconfiguration that had already occurred. A lock was then left behind which causes the CONGROUP REFRESH command that is issued by the script in step 15 to then fail.
Resolution
Workaround: The script can be restarted at the failing step since it is only a matter of a few more minutes before the managed LPARs that did not complete theeir post swap cleanup would have.Permanent Fix:
For Geographical Dispersed Disaster Restart(GDDR) 5.3, PTF GD53062 is available to address the issue. It is available to download from Dell Technologies Online Support.For Geographical Dispersed Disaster Restart(GDDR) 5.2, PTF GD52132 is available to address the issue. It is available to download from Dell Technologies Online Support.