
OPERATIONAL DEFECT DATABASE
...

...
The following error appears in the Avamar UI, or a Dial Home SR may be generated containing the following error: Sep 12 09:41:19 avamar-node MR_MONITOR[8156]: <MRMON195> Controller ID: 0 BBU disabled; changing WB logical drives to WT, Forced WB VDs are not affected
The problem with the Backup Battery Unit (BBU) has been detected by the RAID controller firmware. As a result, the RAID controller's cache has been disabled to avoid data corruption if there is a sudden power loss. This problem would significantly affect the performance of the I/O subsystem. BBU stands for a battery backup unit. This device protects and maintains the cached data that is on the server's RAID controller. A BBU is essentially a data fail-safe. The battery backup unit allows the RAID card to remember what has not yet been synced to disk. The BBU can provide enough backup power to preserve the data for up to 72 hours without power. When the machine powers back up, the BBU writes the cache contents on the disk. The BBU is composed of Lithium-Ion (Li-Ion) and an electronic control circuitry. A unique nature of the Li-Ion battery is that its life span is dependent upon aging (shelf life). From the time of manufacturing, regardless of whether it was charged or the number of charge or discharge cycles, the battery declines slowly and predictably in capacity. This means that an older battery will not last as long as a new battery solely due to its age. This is a main reason why "Relative State of Charge" of the BBU is not going to be equal to "Absolute State of Charge," as by design batteries are consumable goods and degrade over time. Before a BBU can be used, it has to be calibrated. The controller does not use the BBU until the calibration is done. Until then, it disables the Write-Back cache on any logical drive for data integrity reasons, resulting in temporarily reduced performance.(as in the Write-Through mode affects the disk I/O performance under load, which is most likely to be noticed as maintenance activities such as Garbage Collection or the hfscheck taking longer.) The controller identifies this fact on Power-On-Self-Test (POST) with an error message. The calibration (Autolearn mode) is a process whereby the controller records the battery discharge curve in order to know the battery autonomy, in addition to maximum and minimum voltages. It is split into three (3) steps: 1. Begin calibration. The controller charges the BBU to maximum capacity. 2. The controller discharges the BBU. 3. The controller recharges the BBU. When maximum capacity is reached, the process is finished. Note: If either Step 2 or Step 3 is interrupted, the learning process stops and does not restart automatically.
1. Log in to the Avamar Utility Node admin, switch to root and load the ssh keys. See Avamar: How to Log in to an Avamar Server and Load Various Keys for instructions on loading keys. Note: The rest of this article assumes that the admin keys are loaded. a. Using the information from the UI event or the DialHome Service Request determine the node that produced the error message. b. Connect to the node as root: ssn --user=root 0.n (Where 0.n is the node number) 2. Verify the errors in the log: grep -h "MRMON151\|MRMON195\|MRMON153\|MRMON194" /var/log/messages | sort Sample output: Oct 28 13:50:41 MR_MONITOR[29442]: Controller ID: 0 Battery relearn started Oct 28 13:51:49 MR_MONITOR[29442]: Controller ID: 0 Battery relearn completed Oct 28 13:52:51 MR_MONITOR[29442]: Controller ID: 0 BBU enabled; changing WT logical drives to WB Oct 28 14:23:11 MR_MONITOR[29442]: Controller ID: 0 BBU disabled; changing WB logical drives to WT, Forced WB VDs are not affected ... Oct 28 14:23:11 MR_MONITOR[29442]: Controller ID: 0 Policy change on VD: 0 Current = Current Write Policy: Write Back; 3. Check the status of the write policy for each of the virtual disks. See APPENDIX A for sample output: CmdTool2 -ldpdinfo -a0 -nolog| grep 'Cache Policy:' 4. Attempt to initiate the learning cycle: CmdTool2 - AdpBbuCmd -BbuLearn -a0 -nolog Sample output: Adapter 0: BBU Learn Succeeded. Exit Code: 0x00 If the command returns an error, create a Service Request with DELL Technologies Avamar Support. 5. Review the /var/log/messages file to check if the learning was successful, and the cache was switched back to WriteBack: grep -h -i "changing\|battery\|learn\|charge" /var/log/messages* |sort Expected output: Controller ID: 0 Battery relearn started Controller ID: 0 Battery relearn in progress Controller ID: 0 Battery relearn completed Controller ID: 0 BBU enabled; changing WT logical drives to WB Controller ID: 0 BBU disabled; changing WB logical drives to WT, Forced WB VDs are not affected Controller ID: 0 BBU enabled; changing WT logical drives to WB Controller ID: 0 BBU disabled; changing WB logical drives to WT, Forced WB VDs are not affected Controller ID: 0 BBU enabled; changing WT logical drives to WBController ID: 0 Battery relearn started If the learning cycle did not complete in about 30 minutes or if the learning cycle completed, but the cache has not been changed back to WriteBack mode, create a Service Request with DELL Technologies Avamar Support. 6. Monitor for future MRMON195 messages for several days. If the messages persist, create a Service Request with DELL Technologies Avamar Support.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.