BugZero | VMware BugID 67129 - Performance on NFS datastores decrease within a co...

OPERATIONAL DEFECT DATABASE

...

BugZero | VMware BugID 67129 - Performance on NFS datastores decrease within a co...

VMware - Defect ID: 67129

Performance on NFS datastores decrease within a couple of days after host reboot

VMware - Defect ID: 67129

Performance on NFS datastores decrease within a couple of days after host reboot

Last updated on February 27th, 2024

BugZero Risk Score
5.3 Medium

Overall: N/A

Severity: N/A

Community: N/A

Lifecycle: N/A

What is the BugZero Risk Score?

VMware Integration

Learn more about where this data comes from

VMware Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Bug Details

Description

Details

This issue is believed to affect any version of NFS. The performance of VMs decrease on NFS datastores within a couple of days after a host reboot.IO Meter tests show increasing latencies and a significant drop of IOPS after a week, and gets worse after 2 to 3 weeks.A host reboot fixes the issue temporarily.The ESXi host may also crash with a PSOD similar to the below 2019-10-23T02:00:01.778Z cpu30:65681)@BlueScreen: Spin count exceeded - possible deadlock with PCPU 52019-10-23T02:00:01.778Zcpu30:65681)Code start: 0x418017a00000 VMK uptime: 50:01:19:48.2732019-10-23T02:00:01.778Zcpu30:65681)Saved backtrace from: pcpu 5 SpinLock spin out NMI2019-10-23T02:00:01.778Zcpu30:65681)0x43928bd9b3e0:[0x4180189c6da1]NFSSchedQFree@(nfsclient)#+0x81 stack:0x4309a6be31882019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9b400:[0x4180189c7fac]NFSSched_FreeMPE@(nfsclient)#+0x7c stack: 0x43928bd9b4a02019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9b430:[0x4180189c375c]NFSVolumeFreeMPE@(nfsclient)#+0x24 stack: 0x4304000000002019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9b440:[0x4180189c494b]NFSVolumeUmountInt@(nfsclient)#+0x1e7 stack: 0x4309a6bd70002019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9b570:[0x4180189c56b0]NFSVolume_Umount@(nfsclient)#+0xcc stack: 0x482019-10-23T02:00:01.778Zcpu30:65681)0x43928bd9b5a0:[0x4180189c1c76]NFSVsi_UmountSet@(nfsclient)#+0xba stack: 0x12019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9b5d0:[0x418017a02034]VSI_SetInfo@vmkernel#nover+0x408 stack: 0x43928bd9b6b02019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9b650:[0x41801811cf64]UWVMKSyscallUnpackVSI_Set@(user)#+0x308 stack: 0x02019-10-23T02:00:01.778Zcpu30:65681)0x43928bd9bef0:[0x4180181103e0]User_UWVMKSyscallHandler@(user)#+0xa4 stack: 0x02019-10-23T02:00:01.778Zcpu30:65681)0x43928bd9bf20:[0x418017b0fd05]User_UWVMKSyscallHandler@vmkernel#nover+0x1d stack: 0x02019-10-23T02:00:01.778Z cpu30:65681)0x43928bd9bf30:[0x418017b3d067]gate_entry_@vmkernel#nover+0x0 stack: 0x02019-10-23T02:00:01.778Zcpu30:65681)base fs=0x0 gs=0x418047800000 Kgs=0x02019-10-23T02:00:01.778Z cpu5:47364475)NMI: 663: NMI IPI: We Halt. RIPOFF(base):RBP:CS [0xfc6da1(0x418017a00000):0x1c19b75:0x4010] (Src 0x4, CPU5)2019-10-23T02:00:01.778Zcpu30:65681)NMI: 689: NMI IPI: RIPOFF(base):RBP:CS [0xf2b13(0x418017a00000):0xeb2d5bd1:0x4010] (Src 0x1, CPU30) Before the PSOD you will see similar messages as below: 2019-10-23T02:00:01.778Z cpu15:34361800)WARNING: NFS: 5603: Failed to convert sgArr to NFSIoInfo: Out of memory. ESXi host may crash with a PSOD while collecting logs from vSphere Client with similar backtrace as below: 2019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69b550:[0x418028f775e8]NFSVolume_GetMPEWorldList@(nfsclient)#<None>+0x8d stack: 0x43186847f6302019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69b5b0: [0x418028f72a5a]NFSVsi_MPEWorldList@(nfsclient)#<None>+0x2f stack: 0x02019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69b5d0:[0x4180278027be]VSI_GetListInfo@vmkernel#nover+0x2c3 stack: 0x451aac69b6902019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69b650:[0x418027fb6935]UW64VMKSyscallUnpackVSI_GetList@(user)#<None>+0x21e stack: 0x451aac69b6902019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69bed0:[0x418027f46660]User_UWVMK64SyscallHandler@(user)#<None>+0x249 stack: 0x02019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69bf30:[0x418027961f78]SyscallUWVMK64@vmkernel#nover+0x90 stack: 0x02019-10-23T02:00:01.778Z cpu11:28559871)0x451aac69bf38:[0x418027963066]gate_entry@vmkernel#nover+0x67 stack: 0x0^[[7m2019-10-23T02:00:02.778Z cpu36:2098074)WARNING: Heartbeat: 760: PCPU 52 didn't have a heartbeat for 7 seconds; *may* be locked up.^[[0m^[[31;1m2019-10-23T02:00:02.778Z cpu52:28559919)ALERT: NMI: 696: NMI IPI: RIPOFF(base):RBP:CS [0x1118eb(0x418027800000):0x451ac841b470:0xfc8] (Src 0x1, CPU52)^[[0m2019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b420:[0x4180279118ea]MCSWaitForUnblock@vmkernel#nover+0xdb stack: 0x1ae000000002019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b470:[0x418027912009]MCS_LockWait@vmkernel#nover+0xae stack: 0x02019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b540:[0x41802791252a]MCSLockWithFlagsWork@vmkernel#nover+0x23 stack: 0x02019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b550:[0x418028f775b4]NFSVolume_GetMPEWorldList@(nfsclient)#<None>+0x59 stack: 0x451ac841b5602019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b5b0:[0x418028f72a5a]NFSVsi_MPEWorldList@(nfsclient)#<None>+0x2f stack: 0x02019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b5d0:[0x4180278027be]VSI_GetListInfo@vmkernel#nover+0x2c3 stack: 0x451ac841b6902019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841b650:[0x418027fb6935]UW64VMKSyscallUnpackVSI_GetList@(user)#<None>+0x21e stack: 0x451ac841b6902019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841bed0:[0x418027f46660]User_UWVMK64SyscallHandler@(user)#<None>+0x249 stack: 0x02019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841bf30:[0x418027961f78]SyscallUWVMK64@vmkernel#nover+0x90 stack: 0x02019-10-23T02:00:02.778Z cpu52:28559919)0x451ac841bf38:[0x418027963066]gate_entry@vmkernel#nover+0x67 stack: 0x0^[[7m2019-10-23T02:00:04.329Z cpu6:2103178)WARNING: Lock: 1701: (held by 38: Spin count exceeded 1 time(s) - possible deadlock.^[[0m2019-10-23T02:00:04.329Z cpu6:2103178)0x451ac621af18:[0x41802782e3e5]Lock_CheckSpinCount@vmkernel#nover+0x261 stack: 0x43086b5924 NFS heap and NFS slabs run out of memory with similar log messages to: 2018-12-21T09:36:27.671Z cpu37:67667)WARNING: NFSLock: 311: out of memory2018-12-21T09:36:27.671Z cpu37:67667)NFSLock: 1779: Failed to create lock info for lock file .lck-ca1b000000000000 : Out of memory2018-12-21T09:36:27.703Z cpu37:67667)WARNING: NFS: 2189: Unable to allocate fileData2018-12-21T09:36:27.703Z cpu37:67667)NFS: 2333: Failed to get object (0x43914299b386) 40 28f0e377 a2c9ae 7ed2 23243005 20 d87e0000 26e3af68 >272c752 d0ee3700000040 0 0 0 :Out of memory2018-12-21T09:36:27.717Z cpu37:67667)WARNING: NFS: 2079: no memory for result Note: The preceding log excerpts are only examples .Date,time and environmental variables may vary depending on your environment.

Solution

This issue is resolved in VMware ESXi 6.5 U3 and VMware ESXi 6.7 U3, available at VMware Downloads.To workaround this issue:Increase the following host parameter to 32 or 64 (default is 4)Example:esxcfg-advcfg -s 32 /SunRPC/MaxConnPerIPThe value should be bigger than the maximum number of shares mounted from the same IP address.Reboot the host after changing the value.Note: There is no impact by implementing the above workaround. Workaround is applicable only if the total number of shares mounted are less than or equal to 128.

Related Information

SunRPC.MaxConnPerIP defines the maximum number of unique TCP connections that can be opened for a given IP address. Currently VMware set this advanced setting to 4 although a maximum of 128TCP connections is supported.

Change history

Top VMware Defects

No bugs this month

VMware Integration

Learn more about where this data comes from

VMware Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

VMware - Defect ID: 67129

Performance on NFS datastores decrease within a couple of days after host reboot

VMware - Defect ID: 67129

Performance on NFS datastores decrease within a couple of days after host reboot

Last updated on February 27th, 2024

BugZero Risk Score5.3 Medium

Bug Details

Details

Solution

Related Information

Links

Top VMware Defects

Ready to prevent the next vendor outage?

BugZero Risk Score
5.3 Medium