...
A VMware ESX host fails with a purple diagnostic screen. The screen has this message: COS Panic: Lost heartbeat @esxsc_panic+0x43/0x4f The vmkernel log extracted from the zdump contains messages similar to: cpu0:4096)VMNIX: WARNING: VmkDev: 3102: Can't find serial# nnnnnnnnn6 from virt handle 2000 cpu0:4096)VMNIX: WARNING: VmkDev: 3102: Can't find serial# nnnnnnnnn7 from virt handle 2000 cpu0:4096)VMNIX: WARNING: VmkDev: 3102: Can't find serial# nnnnnnnnn8 from virt handle 2000 ... cpu0:4096)ALERT: VMNIX: ALERT: HB: 362: Lost heartbeat (comm=ProcessName pid=x t=29 to=30 clt=1). cpu0:4096)VMNIX: VmkDev: 2747: a/r=2 cmd=0xnn sn=nnnnnnnnn9 dsk=vsaN:N:N reqbuf=nnnnnnn (sg=n) The service console logs in /var/log/messages or on the console contain messages similar to: <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn1 after 360s <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn2 after 360s <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn3 after 360s <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn4 after 360s <3>[uptime] sd n:n:n:n: still retrying nnnnnnnnn5 after 360s The SCSI command serial numbers logged are greater than 4294967295.
This issue occurs when the SCSI command serial numbers begin to exceed 4294967295 and completed SCSI commands do not return values to the service console.
This issue is resolved in ESX 4.1 Update 3. For more information see: VMware ESX 4.1 Update 3 Release NotesVMware ESX 4.1, Patch ESX410-201208201-UG: Updates the VMware ESX 4.1 Core and CIM components (2020337) To download ESX 4.1 Update 3, see the VMware Download Center.Note: This issue does not affect ESXi, because ESXi does not have a console operating system. To work around this issue when you do not want to upgrade: Monitor the value of the current SCSI command serial number counter, and ensure it does not reach the max value, 4294967295. To check how close the counter is to that value, use this script: bootDev=$(df | grep " /$" | awk '{print $1}' | sed -e 's!/dev/!!' -e 's/[0-9]*$//') devRead=$(grep " ${bootDev} " /proc/diskstats | awk '{print $4}') devWrite=$(grep " ${bootDev} " /proc/diskstats | awk '{print $8}') (( devIO = devRead + devWrite )) (( microFull = devIO / 42950 )) percentFull=$(echo ${microFull} | awk '{printf "%04s",$1}' | sed 's/\(...\)$/.\1/') echo Percent Full: ${percentFull}% The output is the percentage of that max value which the counter has reached. Example: Percent Full: 69.848%. This indicates that the counter is at 2999948756 (69.848% of 4294967295). Schedule a reboot of the host before the counter fills completely.
Interpreting an ESX/ESXi host purple diagnostic screenUnderstanding a "Lost Heartbeat" purple diagnostic screenVMware ESX 4.1, Patch ESX410-201208201-UG: Updates the VMware ESX 4.1 Core and CIM componentsハートビート消失による紫色の診断画面に「VmkDev: 3102: Can't find serial#」と表示され、ESX ホストが停止するESX 主机故障,同时出现列有”VmkDev: 3102: 找不到序列号“消息的“丢失检测信号”紫色诊断屏幕(紫屏)