BugZero | Dell BugID 173452 - RecoverPoint for VMs: Intermittent Loss of Access ...

Dell - Defect ID: 173452

RecoverPoint for VMs: Intermittent Loss of Access to Journal and Repository Volumes

Dell - Defect ID: 173452

RecoverPoint for VMs: Intermittent Loss of Access to Journal and Repository Volumes

Last updated on December 23rd, 2020

BugZero Risk Score
0.0 Coming soon

Overall: N/A

Severity: N/A

Community: N/A

Lifecycle: N/A

What is the BugZero Risk Score?

Dell Integration

Learn more about where this data comes from

Dell Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Bug Details

Support Case Count: 2
Article View Count: 225
Impact Category:

Description

Symptoms

RecoverPoint for Virtual Machines, (RP4VMs,) requires constant access to both its specific Repository LUNs and the corresponding journal LUNs that are created for each Consistency Group (CG). The vRPAs access these volumes using the Jiraf module (JAM) running on the ESXi host they are on.An issue can occur where intermittent access to both the Repository and Journal LUNs may cause Data Replication Unavailability (DRU) and leave virtual machines unprotected during this time.The following sequence is witnessed in both the Control and Replication logs of the site control virtual Recover Point Appliance, (vRPA), when access to these LUNs is impacted: I/Os to the JIRAF module (running on each ESXi host that vRPAs reside on) timeout on the vRPA side, followed by the vRPA itself timing out trying to read responses from the JIRAF module. From control logs: 2019/08/18 05:34:51.318 - #1 - 3960/3909 - SocketInfoJIRAF::isFDReady: poll timeout errno = 0 a_expireTimeUsecs = 852155604014( m_lr=(0xXXXXXXXXXX,0xXXXXXXXXXe_JIRAF) m_handle=0 m_openCount=1 m_status=e_OK m_cidPort = 2:5050 m_afVMCI = 40 m_sockFD = 149)2019/08/18 05:36:53.933 - #1 - 3953/3909 - SocketInfoJIRAF::isFDReady: poll timeout errno = 0 a_expireTimeUsecs = 852278255722( m_lr=(0xXXXXXXXXXX,0xXXXXXXXXXe_JIRAF m_handle=0 m_openCount=1 m_status=e_OK m_cidPort = 2:5050 m_afVMCI = 40 m_sockFD = 149) Partial messages are sent to the JIRAF module on the ESXi host, and the vRPA fails trying to send more to this module: From control logs: 2019/08/15 01:02:43.410 - #1 - 4773/4734 - SocketInfoJIRAF::sendData: send byte count mismatch( m_lr=(0xXXXXXXXXXX,0xXXXXXXXXXe_JIRAF) m_handle=0 m_openCount=6 m_status=e_OK m_cidPort = 2:5050 m_afVMCI = 40 m_sockFD = 60) bytes_sent = 261883 a_num_bytes = 10485762019/08/15 04:29:42.574 - #1 - 9455/9402 - SocketInfoJIRAF::sendData: send byte count mismatch( m_lr=(0xXXXXXXXXXX,0xXXXXXXXXXe_JIRAF) m_handle=0 m_openCount=6 m_status=e_OK m_cidPort = 2:5050 m_afVMCI = 40 m_sockFD = 64) bytes_sent = 19759 a_num_bytes = 225792 The following sequence is seen in the JIRAF logs, located on each ESXi host under /scratch/log/iofilterd-emcjiraf.log, with the JIRAF module reading partial messages: 2019-08-03T02:03:59Z iofilterd-emcjiraf[2308573]: jiraf_receive_msg: unknown cmd type The following sequence is also seen in the ESXi splitter logs, located under /scratch/log/kdriver.log.xxxxxxxx: 2019-07-30T17:36:32*Z iofilterd-emcjiraf[2099635]: IoStats_s_printStats: total 2 IOs over 90 seconds. average time to start 6us, pending 555us, processing 18us These "I/Os over X seconds," print should be every 60 seconds. If 60 seconds is not the value in here, the issue is being encountered.

Cause

The JAM (emcjiraf) module is responsible for maintaining repository and journal access. As part of its normal operation, it does a RPVS discovery process to ensure it knows which datastores, storage and VMDKs are available to it. In addition to this ongoing RPVS discovery process, network operations are also running in tandem. If the discovery process takes too long to complete, the network operations may not run frequently enough, resulting in the loss of access to the repository volume, journal volumes, or both.

Resolution

This issue is addressed in the RecoverPoint for Virtual Machine version 5.2.2.1 and later.

Support Cases

Change history

No changes to display

Top Dell Defects by Risk Score

No bugs this month

Dell Integration

Learn more about where this data comes from

Dell Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

Dell - Defect ID: 173452

RecoverPoint for VMs: Intermittent Loss of Access to Journal and Repository Volumes

Dell - Defect ID: 173452

RecoverPoint for VMs: Intermittent Loss of Access to Journal and Repository Volumes

Last updated on December 23rd, 2020

BugZero Risk Score
0.0 Coming soon

Bug Details

Symptoms

Cause

Resolution

Support Cases

Links

Top Dell Defects by Risk Score

Ready to prevent the next vendor outage?

OPERATIONAL DEFECT DATABASE

Dell - Defect ID: 173452

RecoverPoint for VMs: Intermittent Loss of Access to Journal and Repository Volumes

Dell - Defect ID: 173452

RecoverPoint for VMs: Intermittent Loss of Access to Journal and Repository Volumes

Last updated on December 23rd, 2020

BugZero Risk Score0.0 Coming soon

Bug Details

Symptoms

Cause

Resolution

Support Cases

Links

Top Dell Defects by Risk Score

Ready to prevent the next vendor outage?

BugZero Risk Score
0.0 Coming soon