Loading...
Loading...
All the components are upgraded but finalize the upgrade step failed. GW: 1. The last step, the Finalize step, starts to run and fails. a. The finalize step starts: b. Step fails: c. Error details: 2. Operation.log shows that 'finalize upgrade' failed on the MDM Management IP and performance profile was aborted due to 'finalize upgrade' state: 2020-06-17 05:04:18,577 [scheduler-1] INFO operations - Adding storage pool [com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@737d6e2b, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@64633c67, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@6423e4f4, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@3b2205e4, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@23b6289d, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@5e4ba888, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@25b4e3f1, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@e12df86] on protection domain 10.0.183.70,10.0.182.6,10.0.182.25 on MDM {2} 2020-06-17 05:04:18,677 [scheduler-1] INFO operations - alignStoragePoolMediaTypesEnd[com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@737d6e2b, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@64633c67, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@6423e4f4, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@3b2205e4, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@23b6289d, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@5e4ba888, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@25b4e3f1, com.emc.s3g.scaleio.domain.installation.configuration.ProtectionDomain@e12df86]10.0.183.70,10.0.182.6,10.0.182.25{2}{3}{4}{5}{6}{7}{8}{9}{10}{11}{12}{13}{14}{15}{16}{17}{18}{19}{20}{21} 2020-06-17 05:04:18,677 [scheduler-1] INFO operations - Requesting to finalize upgrade on 10.0.183.70,10.0.182.6,10.0.182.25 2020-06-17 05:05:14,777 [scheduler-1] ERROR operations - finalize upgrade request failed on 10.0.183.70,10.0.182.6,10.0.182.25 2020-06-17 05:05:14,877 [scheduler-1] WARN operations - Aborted waiting for cluster to resume to normal on MDM 10.0.183.70,10.0.182.6,10.0.182.25 due to: Stopped due to previously failed command 2020-06-17 05:05:14,977 [scheduler-1] INFO operations - set of global (high) performance profile (for MDM true for All SDS true and for all SDC true) has ended with result of aborted 3. Scaleio.log we see that during the finalize upgrade state, it failed with WRONG_UPGRADE_STATE: 2020-06-17 05:05:14,669 [https-jsse-nio-443-exec-8] INFO c.e.s.s.s.ConnectionServiceImpl - Got rc SUCCESS 2020-06-17 05:05:14,709 [executor-2] INFO c.e.s.s.d.i.c.FinalizeUpgradeCommand - running pre-executor for .FinalizeUpgradeCommand 2020-06-17 05:05:14,709 [executor-2] INFO c.e.s.s.d.i.c.FinalizeUpgradeCommand - running pre-executor for .FinalizeUpgradeCommand complete in 0 ms 2020-06-17 05:05:14,710 [executor-2] INFO c.e.e.c.service.CommandService - finalizeUpgrade called on 10.0.183.70,10.0.182.6,10.0.182.25 with force-or-fail-flag value of false 2020-06-17 05:05:14,710 [executor-2] INFO c.e.e.c.service.CommandService - finalizeUpgrade return value of WRONG_UPGRADE_STATE 2020-06-17 05:05:14,710 [executor-2] ERROR c.e.e.c.service.CommandService - finalizeUpgrade on 10.0.183.70,10.0.182.6,10.0.182.25 failed. 2020-06-17 05:05:14,711 [executor-2] ERROR c.e.s.s.d.i.c.FinalizeUpgradeCommand - Error Could not finalize upgrade on 10.0.183.70,10.0.182.6,10.0.182.25 due to: The command cannot be executed in the current upgrade state executing command .FinalizeUpgradeCommand (abort) : com.emc.s3g.scaleio.im.services.installation.configurators.CommandServiceMdmConnection.finalizeUpgrade(CommandServiceMdmConnection.java:753) 4. ScaleIO trace logs show that the finalizeUpgrade function returns a value of WRONG_UPGRADE_STATE: 2020-06-17 05:05:14,709 [executor-2] INFO c.e.s.s.d.i.c.FinalizeUpgradeCommand - running pre-executor for .FinalizeUpgradeCommand 2020-06-17 05:05:14,709 [executor-2] INFO c.e.s.s.d.i.c.FinalizeUpgradeCommand - running pre-executor for .FinalizeUpgradeCommand complete in 0 ms 2020-06-17 05:05:14,709 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - property upgrade.mdm.data.ips is missing in gatewayUser.properties - either upgrade is not active or upgrade flawed upgrade persistence 2020-06-17 05:05:14,709 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - property upgrade.mdm.mgmt.ips is missing in gatewayUser.properties - either upgrade is not active or upgrade flawed upgrade persistence 2020-06-17 05:05:14,709 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - property upgrade.mdm.actor.port is missing in gatewayUser.properties - either upgrade is not active or upgrade flawed upgrade persistence 2020-06-17 05:05:14,709 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - property upgrade.mdm.role is missing in gatewayUser.properties - either upgrade is not active or upgrade flawed upgrade persistence 2020-06-17 05:05:14,709 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - property upgrade.mdm.version.orig is missing in gatewayUser.properties - either upgrade is not active or upgrade flawed upgrade persistence 2020-06-17 05:05:14,709 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - property upgrade.mdm.version.target is missing in gatewayUser.properties - either upgrade is not active or upgrade flawed upgrade persistence 2020-06-17 05:05:14,709 [executor-2] DEBUG o.a.c.c.PropertiesConfiguration - FileName set to /opt/emc/scaleio/gateway/webapps/ROOT/WEB-INF/classes/gatewayUser.properties 2020-06-17 05:05:14,709 [executor-2] DEBUG o.a.c.c.ConfigurationUtils - ConfigurationUtils.locate(): base is null, name is /opt/emc/scaleio/gateway/webapps/ROOT/WEB-INF/classes/gatewayUser.properties 2020-06-17 05:05:14,709 [executor-2] DEBUG o.a.c.c.DefaultFileSystem - Could not locate file /opt/emc/scaleio/gateway/webapps/ROOT/WEB-INF/classes/gatewayUser.properties at null: no protocol: /opt/emc/scaleio/gateway/webapps/ROOT/WEB-INF/classes/gatewayUser.properties 2020-06-17 05:05:14,709 [executor-2] DEBUG o.a.c.c.ConfigurationUtils - Loading configuration from the absolute path /opt/emc/scaleio/gateway/webapps/ROOT/WEB-INF/classes/gatewayUser.properties 2020-06-17 05:05:14,709 [executor-2] DEBUG o.a.c.c.PropertiesConfiguration - Base path set to file:///opt/emc/scaleio/gateway/webapps/ROOT/WEB-INF/classes/gatewayUser.properties 2020-06-17 05:05:14,710 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - (Update): upgrade.mdm.actor.port () is same in the configuration file - no need to update. 2020-06-17 05:05:14,710 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - (Update): upgrade.mdm.version.orig () is same in the configuration file - no need to update. 2020-06-17 05:05:14,710 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - (Update): upgrade.mdm.role () is same in the configuration file - no need to update. 2020-06-17 05:05:14,710 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - (Update): upgrade.mdm.data.ips () is same in the configuration file - no need to update. 2020-06-17 05:05:14,710 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - (Update): upgrade.sds.mm_list () is same in the configuration file - no need to update. 2020-06-17 05:05:14,710 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - (Update): upgrade.mdm.status () is same in the configuration file - no need to update. 2020-06-17 05:05:14,710 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - (Update): upgrade.mdm.version.target () is same in the configuration file - no need to update. 2020-06-17 05:05:14,710 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - (Update): upgrade.mdm.mgmt.ips () is same in the configuration file - no need to update. 2020-06-17 05:05:14,710 [executor-2] DEBUG c.e.s.s.services.GeneralServiceImp - Gateway configuration file, the following keys were updated. 2020-06-17 05:05:14,710 [executor-2] INFO c.e.e.c.service.CommandService - finalizeUpgrade called on 10.0.183.70,10.0.182.6,10.0.182.25 with force-or-fail-flag value of false 2020-06-17 05:05:14,710 [executor-2] DEBUG c.e.e.c.service.CommandService - finalizeDuringUpgrade() 2020-06-17 05:05:14,710 [executor-2] INFO c.e.e.c.service.CommandService - finalizeUpgrade return value of WRONG_UPGRADE_STATE 2020-06-17 05:05:14,710 [executor-2] ERROR c.e.e.c.service.CommandService - finalizeUpgrade on 10.0.183.70,10.0.182.6,10.0.182.25 failed. 2020-06-17 05:05:14,711 [executor-2] ERROR c.e.s.s.d.i.c.FinalizeUpgradeCommand - Error Could not finalize upgrade on 10.0.183.70,10.0.182.6,10.0.182.25 due to: The command cannot be executed in the current upgrade state executing command .FinalizeUpgradeCommand (abort) : com.emc.s3g.scaleio.im.services.installation.configurators.CommandServiceMdmConnection.finalizeUpgrade(CommandServiceMdmConnection.java:753) com.emc.s3g.scaleio.domain.installation.commands.FinalizeUpgradeCommand.executeMdmCommand(FinalizeUpgradeCommand.java:111) com.emc.s3g.scaleio.domain.installation.commands.FinalizeUpgradeCommand.executeMdmCommand(FinalizeUpgradeCommand.java:36) com.emc.s3g.scaleio.domain.installation.commands.MdmCommand.executeCommand(MdmCommand.java:112) com.emc.s3g.scaleio.domain.installation.commands.BaseCommand.call(BaseCommand.java:576) 5. MDM logs shows the same error we get when we try to manually finalize the upgrade, using SCLI (see below): 18/06 00:48:24.519598 0x7ff9e1d5cdb8:netCon_UpdateChanNums:05259: :: CONNECTED SERVER con 0x7ff93efe0b30 conId(76e496fe) hCon e1000000df ownerType CLI Updated channel 0x7ff93efe0e60, recv: 0, send: 0 18/06 00:48:24.519717 0x7ff9de5ebdb8:mosEventLog_PostInternal:00608: New event added. Message: "Command finalize_upgrade received, User: ''. [588]". Additional info: "" Severity: Info 18/06 00:48:24.519742 0x7ff9de5ebdb8:mosEventLog_PostInternal:00608: New event added. Message: "Command finalize_upgrade was not successful. Error code: The command cannot be executed in the current upgrade state [588]". Additional info: "" Severity: Warning SCLI: 1. Running Finalize upgrade from SCLI and failed with the below error: Message: "Command finalize_upgrade was not successful. Error code: The command cannot be executed in the current upgrade state." 2. Query_upgrade: scli --query_upgradeUpgrade State: MDM Upgrade in Progress Upgrade Start Version: 3.0.200List of Slave MDMs not yet upgraded: 0 Slave MDMs need to be upgraded in totalList of Tie-Breakers not yet upgraded: 0 Tie-Breakers need to be upgraded in totalList of SDSs not yet upgraded: Protection Domain dc035b9b00000000 Name: PD2 0 SDSs need to be upgraded in this Protection Domain Protection Domain dc035b9c00000001 Name: PD1 0 SDSs need to be upgraded in this Protection Domain Protection Domain dc03a9b600000002 Name: PD4 0 SDSs need to be upgraded in this Protection Domain Protection Domain dc03a9b700000003 Name: PD3 0 SDSs need to be upgraded in this Protection Domain Protection Domain dc03a9b800000004 Name: PD5 0 SDSs need to be upgraded in this Protection Domain Protection Domain dc03d0c500000005 Name: PD7 Impact Upgrade state is not completed and might affect the system behavior.
The main reason is that the code is looking for a State that is one of the following, in order to proceed with the finalizing step, and it is not any of these: UPGRADE_POST_PROCESSING_DONE UPGRADE_POST_PROCESSING_IN_PROGRESS NONE Since the last node did not have any of the states mentioned above the GW/Primary MDM could not proceed with finalizing the NDU, but once we tell the GW/Primary MDM to stop providing from that point and start fresh (by running scli --abort_upgrade ) and once the MDM database was repopulated with the correct states, then the NDU process had nothing to do from that point (can be seen by running scli --query_upgrade ). The state in the MDM trc logs show the "real" state. cat trc.* | egrep 'Upgrade state:|New Upgrade state:'18/06 00:47:45.459392 0x7ff9de2dcdb8:mdmObj_MoveUpgradeStateAfterReconstruct:04958: New Upgrade state: MDM_CLUSTER_UPGRADE_IN_PROGRESS 18/06 00:47:45.459385 0x7ff9de2dcdb8:mdmObj_MoveUpgradeStateAfterReconstruct:04911: Upgrade state: MDM_CLUSTER_UPGRADE_IN_PROGRESS 18/06 00:47:45.459392 0x7ff9de2dcdb8:mdmObj_MoveUpgradeStateAfterReconstruct:04958: New Upgrade state: MDM_CLUSTER_UPGRADE_IN_PROGRESS 12/06 00:30:06.080142 0x7ff684605eb0:mdmObj_MoveUpgradeStateAfterReconstruct:04178: Upgrade state: NONE 12/06 01:43:54.389752 0x7f0194605eb0:mdmObj_MoveUpgradeStateAfterReconstruct:04178: Upgrade state: NONE
The following set of workarounds would have to be went over one by one and only then click Retry, meaning only go to the next step if it failed again. 1) If there are no volumes in a Storage Pool, we need to add one volume for the upgrade and then can erase it. scli query_all output Protection Domain dc03f7d500000007 Name: PD_test Storage Pool 3206795700000006 Name: SP_test <No volumes defined> 2) Kill the MDM process and verify the state after about 10 seconds. If its Upgrade State = "No Upgrade", then run the finalize command: pkill mdm scli --query_upgrade Expected output: Upgrade State: No Upgrade Upgrade Start Version: 3.0.200 List of Slave MDMs not yet upgraded: 0 Slave MDMs need to be upgraded in total List of Tie-Breakers not yet upgraded: 0 Tie-Breakers need to be upgraded in total List of SDSs not yet upgraded: Protection Domain dc035b9b00000000 Name: PD2 0 SDSs need to be upgraded in this Protection Domain Protection Domain dc035b9c00000001 Name: PD1 0 SDSs need to be upgraded in this Protection Domain Protection Domain dc03a9b600000002 Name: PD4 0 SDSs need to be upgraded in this Protection Domain Protection Domain dc03a9b700000003 Name: PD3 0 SDSs need to be upgraded in this Protection Domain Protection Domain dc03a9b800000004 Name: PD5 0 SDSs need to be upgraded in this Protection Domain Protection Domain dc03d0c500000005 Name: PD7 scli --finalize_upgrade 3) Try to abort the upgrade on the Master by running the following SCLI command: scli --abort_upgradeThis command will abort the upgrade process. Press 'y' and then Enter to confirm: y Successfully aborted the upgrade process Then try to finalize the upgrade on the Master by running the following SCLI command: scli --finalize_upgrade Impacted Versions 3.0.0.x Fixed In Version 3.0.x and up
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.