Loading...
Loading...
RecoverPoint system pauses data transfer due to high load events. Example event: Time: Thu Mar 13 15:59:49 2023 Topic: RPA Scope: NORMAL Level: ERROR Event ID: 12009 Cluster: CL01 Global links: None Groups: [CG01, Prod] Links: [CG01, Prod->Replica_at_CL02] Summary: Link entered high load More information: Due to heavy I/O activity a link entered a high load state in order to prevent I/O failures on that link. The following are among the possible causes of the high load:- RPA is unable to handle the large volume of incoming data. (RPA performance statistics are presented in the Release Notes that accompany each RecoverPoint product release.) - Journal reaches capacity, because the rate of the distribution process consistently lags behind the rate of incoming data to the copy journal. - WAN is too slow to handle the data rate. - Compression for WAN optimization is too high, such that the RPA is unable to handle the volume of incoming data. Peak I/O activity in the SAN causes a temporary bottleneck in the environment. This can be considered normal behavior, and does not necessarily require user action. User action: If high load persists, consider running the balance_load command and applying the load balancing recommendation, or manually modifying the preferred RPA of each group according to the recommendation. Check for scheduled activities in your environment. If relevant, consider enabling fast first-time initialization. (For details, see the RecoverPoint Administrator's Guide). Service Request info: N/A
High loads are normal in RecoverPoint A-Synchronous replication. This is due to RecoverPoint allowing the production applications to write faster than RecoverPoint can replicate. The meaning of high load is that the memory buffer on the local RecoverPoint Appliance (RPA) is full, and the system temporarily pauses data transfer. The buffer fills when the incoming I/O is higher than the traffic the RPA can push to the target. This may be due to a bottleneck downstream from the source-side. While data transfer is paused, the system switches to marking mode, and all new writes are registered in the Dirty Region Log (DRL). The DRL is a bitmap marking "dirty blocks." After the I/O rate returns to a supported RecoverPoint level, the system resynchronizes the "dirty blocks" that were registered in the DRL. The reason for the bottleneck can be the WAN connection, remote storage, or the RecoverPoint appliance CPU cycles. In implementing the RecoverPoint solution, ensure that the environmental resources can support the average I/O.
Resolution: Remember that high loads are normal in RecoverPoint A-Synchronous replication, they cannot be prevented completely. Improving RecoverPoint replication performance may help to reduce the number of high load events or the time it takes to recover from them. Check variables that affect replication such as: WAN bandwidth, Packet loss, and Latency Incoming writes for a RecoverPoint Appliance (RPA) CPU utilization (replication and compression utilization) Storage congestion or errors reading from or writing to disks Consider adding resources for virtual RecoverPoint Appliances: RecoverPoint Appliances CPU Memory (Must be either 16 GB, 32 GB, or 64 GB and must be fully reserved)
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.