...
OBJSTEPUPD-4011 CRITICAL alerts are generated when an upgrade fails during the infrastructure upgrade step, or any infrastructure node step.OBJSTEPUPD-4011 NORMAL alerts are generated at objectscale-lcm upgrade start or passed.ObjectScale UI - Alert and Log tabs contain OBJSTEPUPD-4011 alerts listed in the table the below. Symptom IDComponentResource IDSeverityMessageOBJSTEPUPD-4011objectscale-lcmInfrastructure-UpgradeNORMALInfrastructure upgrade start NORMALInfrastructure upgrade complete CRITICALFailed to read manifest: CRITICALIpre upgrade failed: CRITICALLcmupdate failed: CRITICALCMUpdate failed. CRITICALNode steps failed: CRITICAL node upgrade: CRITICALFailed lcmupdate: failed phase : CRITICALFailed lcmupdate: Failed during node upgrade: : CRITICALPost check failed: OBJSTEPUPD-401objectscale-lcmNode/NORMALInfrastructure upgrade on start CRITICALUpgrade node failed to retrieve cmo/LCMUpdate CRITICAL Failed node upgrade on
There are three classifications or high-level root causes associated with critical alerts: Errors during bundle or upgrade processing as part of CMOs LCMUpdate Customer Resource (CR) processing. This is unlikely, as infrastructure upgrade occurs after previous upgrades and manifest processing. However, bundle packaging has the potential to cause errors with services or pods associated with CMO (docker registry, helm registry, http-share), or a defect in CMO's software stack or objectscale-lcm's logic for monitoring LCMUpdate CRs.Upgrade CR's ability to manage child LCMUpdate CR instances Manifest version upgrade is performed by creating a child instance of cmo/LCMUpdate CR and then monitoring it. The upgrade is FAILED if any of the following occurs. The LCMUpdate CR cannot be createdUpgrade CR's Status block has a reference to the created LCMUpdate CR but is unable to retrieve or find it in the system.Upgrade CR's Status block does not have a reference to the created LCMUpdate CR, an attempt is made to locate it by searching with Kubernetes labels, and too many LCMUpdate CRs are returned. Note: The expectation is that there is one and only one LCMUpdate CR created for this step of this upgrade attempt. Finding more than one such CR indicates that something is wrong in the cluster or the streams are becoming crossed with a prior upgrade attempt. It is too risky to continue the upgrade without investigating the existing LCMUpdate CRs in the system. Infrastructure failure during upgrade: Underlying components related to infrastructure upgrade as implemented by CMO after LCMUpdate is submitted. Resources such as scannings, pre-check and post-check jobs, and network instability can cause infrastructure upgrade failure.
Contact Dell Support for assistance is resolving alert OBJSTEPUPD-4011 Critical failures.