BugZero | Hewlett Packard Enterprise BugID a00114843en_us - Advisory: (Revision) HPE Serviceguard for Linux

Hewlett Packard Enterprise - Defect ID: a00114843en_us

Advisory: (Revision) HPE Serviceguard for Linux - Upgrades May Take Ten Minutes to Complete and if the Live Application Detach Feature Is Used, Packages May Fail When Re-Attached After an Upgrade Is Completed

Hewlett Packard Enterprise - Defect ID: a00114843en_us

Advisory: (Revision) HPE Serviceguard for Linux - Upgrades May Take Ten Minutes to Complete and if the Live Application Detach Feature Is Used, Packages May Fail When Re-Attached After an Upgrade Is Completed

Last updated on 11/16/2022

Overall: 0N/A

Severity: 0N/A

Community: 0N/A

Lifecycle: 0N/A

What is the BugZero Risk Score?

Vendor details

Priority: Customer Advisory

Overall: 0N/A

Severity: 0N/A

Community: 0N/A

Lifecycle: 0N/A

What is the BugZero Risk Score?

Vendor details

Priority: Customer Advisory

Info

Document Version Release Date Details 2 November 14, 2022 Updated the Resolution with the permanent fix, HPE Serviceguard for Linux version 12.80 (or later). 1 May 21, 2021 Original Document Release. When upgrading HPE Serviceguard for Linux, significant delays of up to ten minutes may occur during RPM script execution. Below is an example when using rpm directly. Note: The output may vary depending on the tool used to perform the upgrade; yum, zypper, rpm, other. [root@node1~]# rpm -Uvh /tmp/serviceguard-license-A.12.60.00-0.rhel7.x86_64.rpm ... <delay occurs at the "Creating sg.slice..." step below> NOTE: Serviceguard binary config file is in SG 11.08 or later format. No conversion is required. Restarting cmproxyd. Creating sg.slice ... Job for cmcluster.init.service failed because the control process exited with error code. See "systemctl status cmcluster.init.service" and "journalctl -xe" for details. ... The upgrade does eventually complete if allowed to run. In the case where a Serviceguard cluster or node is halted with 'cmhaltcl -d" or "cmhaltnode -d" and the systemd cmcluster.init.service service is showing in a FAILED state (from "systemctl status cmcluster.init.service") during a Serviceguard core rpm upgrade, it is possible that the node may attempt a cmrunnode because "systemctl start cmcluster.init.service" is issued by the rpm script. The result is that the rpm -Uvh <serviceguard rpm> may appear to be unresponsive for up AUTO_START_TIMEOUT seconds (default 600) waiting for the cmrunnode to timeout. This only occurs if the service is in the state "failed," and AUTOSTART_CMCLD=1 is set in the $SGCONF/cmcluster.rc file, AND the cluster is halted. Another symptom may be Serviceguard package services configured (not systemd services that is also mentioned in this advisory) may fail later when the node is joined to the cluster again during package re-attach phase. This second symptom can occur regardless of whether AUTOSTART_CMCLD is set to 0 or 1.

Scope

Any HPE system when upgrading HPE Serviceguard for Linux to any current version HPE Serviceguard for Linux 12.70.00 (or earlier).

Resolution

To prevent this issue, upgrade to HPE Serviceguard for Linux version 12.80 (or later). To download the HPE Serviceguard for Linux version 12.80 (or later), perform the following steps: Click the following link: Hewlett Packard Enterprise Support Center Enter a product name (e.g., "HPE Serviceguard for Linux") in the text search field and wait for a list of Suggested Products to display. From the Suggested Products list displayed, identify the desired product and select it. The page should refresh to display the "DRIVERS AND SOFTWARE" tab and the components that support the selected product. From the "DRIVERS AND SOFTWARE" expandable filter menus on the top of the page: Locate and select the appropriate HPE Serviceguard for Linux edition (Base, Advanced, or Enterprise) and version (12.80 or later). Note: To ensure that you have selected the latest version of the firmware/driver, click the Revision History tab to check if a new version of the firmware/driver is available. For more important information, review the Release Notes tab. Click the Download button. If upgrading to HPE Serviceguard for Linux version 12.80 (or later) is not an option, refer to the workaround below however, HPE recommends upgrading to HPE Serviceguard for Linux version 12.80 (or later): Perform both of the following steps to avoid both symptoms of this issue. These steps should be performed BEFORE the node or cluster is halted prior to performing the upgrade. Step 1 - Before performing the upgrade, check the $SGCONF/cmcluster.rc file in an editor and set AUTOSTART_CMCLD=0 if it is currently set to AUTOSTART_CMCLD=1. After the upgrade is complete, the value may be set back to "1." This step must be performed on each node of the cluster that will be updated individually. Step 2 - Check the current status of the cmcluster.init.service using "systemctl status cmcluster.init.service" command and verify the state is "active." Below is an example of a good state. Note : Some details may be different than what is shown however, verify line 3 displaying "Active: active." cmcluster.init.service - Serviceguard cluster startup script Loaded: loaded (/usr/lib/systemd/system/cmcluster.init.service; enabled; vendor preset: disabled) Active: active (exited) since Thu 2021-05-06 15:40:48 EDT; 5s ago Process: 2678 ExecStart=/opt/cmcluster/conf/cmcluster_service start (code=exited, status=0/SUCCESS) Main PID: 2678 (code=exited, status=0/SUCCESS) Tasks: 0 (limit: 512) CGroup: /system.slice/cmcluster.init.service May 06 15:40:47 node1 systemd[1]: Starting Serviceguard cluster startup script... May 06 15:40:48 node1 cmcluster_service[2678]: AUTOSTART_CMCLD not set to 1 in /opt/cmcluster/conf/cmcluster.rc, exiting May 06 15:40:48 node1 systemd[1]: Started Serviceguard cluster startup script. If the state is something other than active (for example "failed" or "activating") then do not proceed with the upgrade until the issue is fixed. Below are two examples of bad states again from "systemctl status cmcluster.init.service" command output: cmcluster.init.service - Serviceguard cluster startup script Loaded: loaded (/usr/lib/systemd/system/cmcluster.init.service; enabled; vendor preset: disabled) Active: activating (start) since Thu 2021-05-06 12:07:44 EDT; 55s ago Main PID: 2557 (cmcluster_servi) Tasks: 3 (limit: 512) CGroup: /system.slice/cmcluster.init.service ├─2557 /bin/sh /opt/cmcluster/conf/cmcluster_service start ├─2677 /bin/sh /opt/cmcluster/conf/cmcluster_service start └─2678 /opt/cmcluster/bin/cmrunnode -v May 06 12:07:44 node1 systemd[1]: Starting Serviceguard cluster startup script... May 06 12:07:44 node1 cmcluster_service[2557]: tcp 0 0 127.0.0.1:35170 127.0.0.1:5302 TIME_WAIT May 06 12:07:44 node1 cmcluster_service[2557]: tcp 0 0 127.0.0.1:35168 127.0.0.1:5302 ESTABLISHED May 06 12:07:44 node1 cmcluster_service[2557]: tcp 0 5540 127.0.0.1:5302 127.0.0.1:35168 ESTABLISHED May 06 12:07:44 node1 cmcluster_service[2557]: tcp 0 0 :::5302 :::* LISTE N May 06 12:07:44 node1 cmcluster_service[2557]: udp 0 0 :::5302 :::* May 06 12:07:44 node1 cmrunnode[2678]: /opt/cmcluster/bin/cmrunnode -v OR cmcluster.init.service - Serviceguard cluster startup script Loaded: loaded (/usr/lib/systemd/system/cmcluster.init.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2021-05-06 12:17:54 EDT; 3min 39s ago Process: 2557 ExecStart=/opt/cmcluster/conf/cmcluster_service start (code=exited, status=1/FAILURE) Main PID: 2557 (code=exited, status=1/FAILURE) May 06 12:17:54 node1 cmcluster_service[2557]: Waiting for cluster to form ....................................................... .................................................... timed out May 06 12:17:54 node1 cmcluster_service[2557]: No packages are re-attached. May 06 12:17:54 node1 cmcluster_service[2557]: Check the syslog files for information. May 06 12:17:54 node1 cmcluster_service[2557]: cmrunnode failed: timed out waiting for cluster to form May 06 12:17:54 node1 cmcluster_service[2557]: ERROR: Ran out of time while attempting to join the cluster May 06 12:17:54 node1 cmcluster_service[2557]: ERROR: Unable to join cluster May 06 12:17:54 node1 systemd[1]: cmcluster.init.service: Main process exited, code=exited, status=1/FAILURE May 06 12:17:54 node1 systemd[1]: Failed to start Serviceguard cluster startup script. May 06 12:17:54 node1 systemd[1]: cmcluster.init.service: Unit entered failed state. May 06 12:17:54 node1 systemd[1]: cmcluster.init.service: Failed with result 'exit-code'. Note : "Active: activating" should be a transient state observed only when a node is attempting to join a cluster but the cluster is halted and the node will attempt to re-join for up to AUTO_START_TIMEOUT seconds (default 600). The much more common state that occurs is "Active: failed". If the cmcluster.init.service is in a failed state but the cluster is up and healthy as shown by cmviewcl command, simply start the service. It should detect the active state and set the service state to be active as shown below: # cmviewcl CLUSTER STATUS sg_cluster up SITE_NAME SiteA NODE STATUS STATE node1 up running PACKAGE STATUS STATE AUTO_RUN NODE hdbpUH1 up running enabled node1 SITE_NAME SiteB NODE STATUS STATE node2 up running PACKAGE STATUS STATE AUTO_RUN NODE hdbsUH1 up running enabled node2 # systemctl start cmcluster.init.service # systemctl status cmcluster.init.service cmcluster.init.service - Serviceguard cluster startup script Loaded: loaded (/usr/lib/systemd/system/cmcluster.init.service; enabled; vendor preset: disabled) Active: active (exited) since Thu 2021-05-06 12:26:29 EDT; 33s ago Process: 17535 ExecStart=/opt/cmcluster/conf/cmcluster_service start (code=exited, status=0/SUCCESS) Main PID: 17535 (code=exited, status=0/SUCCESS) May 06 12:26:29 node1 systemd[1]: Starting Serviceguard cluster startup script... May 06 12:26:29 node1 cmcluster_service[17535]: cmcld already running, using pid: 7064 May 06 12:26:29 node1 systemd[1]: Started Serviceguard cluster startup script. After the systemctl service status displays "Active: active" and the AUTOSTART_CMCLD=0 is set in the $SGCONF/cmcluster.rc file, the upgrade should proceed quickly and no packages should fail at node re-attach time after upgrade if the Live Application Detach feature is being used for the upgrade. If the state of the cmcluster.init.service cannot be set to the "active" state do not proceed with the upgrade and contact HPE support. RECEIVE PROACTIVE UPDATES : Receive support alerts (such as Customer Advisories), as well as updates on drivers, software, firmware, and customer replaceable components, proactively in your e-mail through HPE Support Alerts. Sign up for Support Alerts at the following URL: HPE Email Preference Center. NAVIGATION TIP: For hints on navigating HPE.com to locate the latest drivers, patches and other support software downloads, refer to the Navigation Tips document. SEARCH TIP: For hints on locating similar documents on HPE.com, refer to the Search Tips document.

Original Vendor Announcement

Defect ID: a00138717en_us
Advisory: (Revision) HPE Compute Scale-up Server 3200 - System May Encounter an HWERR_BIOS_HALT_DETECTED Condition During the OS Crashdump Process
Defect ID: a00142136en_us
Advisory: (Revision) HPE ProLiant DL20/ML30 Gen10 Plus Servers - Systems Configured with Intel I350-T4 or Broadcom BCM5719 Adapters May Stop Responding During a Reboot or Shutdown if All Four NIC Ports Are Disabled
Defect ID: a00119124en_us
Notice: (Revision) HPE B-series Switches - Accessing HPE B-series SANnav, Fabric OS, and TruFOS Certificates
Defect ID: a00118860en_us
Advisory: HPE InfoSight for Servers - Manually Uploaded Active Health System (AHS) Log to the Analyze Log Page Is Not Displayed After a Successful Upload to the InfoSight Portal
Defect ID: a00146525en_us
Advisory: HPE OneView - OneView May Display the Error, "Unable to Create Volume Template Error Regarding Read-Only Attribute"

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

Hewlett Packard Enterprise - Defect ID: a00114843en_us

Advisory: (Revision) HPE Serviceguard for Linux - Upgrades May Take Ten Minutes to Complete and if the Live Application Detach Feature Is Used, Packages May Fail When Re-Attached After an Upgrade Is Completed

Hewlett Packard Enterprise - Defect ID: a00114843en_us

Advisory: (Revision) HPE Serviceguard for Linux - Upgrades May Take Ten Minutes to Complete and if the Live Application Detach Feature Is Used, Packages May Fail When Re-Attached After an Upgrade Is Completed

Last updated on 11/16/2022

Vendor details

Vendor details

Description

Info

Scope

Resolution

Links

Top Hewlett Packard Enterprise defects by risk score

Ready to prevent the next vendor outage?