...
What were you trying to do that didn't work? Brief description: nested VMs scenario: RHEL9.2 host, RHEL9.2 L1 VM on it, 10 Cirros L2 VMs inside the L1 VM 10 L2 VMs are set to autostart upon L1 VM start If we restart the L1 VM, with ~90% probability we get a paused L2 VM (1 of 10) and following complains in /var/log/libvirt/qemu/VM_NAME.log (on L1 level): ERROR cluster 597 refcount=0 reference=1 ERROR cluster 601 refcount=0 reference=1 Rebuilding refcount structure Repairing cluster 600 refcount=1 reference=0 Repairing cluster 602 refcount=1 reference=0 2023-10-23T10:25:42.465618Z qemu-kvm: warning: Machine type 'pc-i440fx-rhel7.6.0' is deprecated: machine types for previous major releases are deprecated KVM: entry failed, hardware error 0x80000021 If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX=febc0001 EBX=00000030 ECX=febc0001 EDX=00000cfc ESI=00000000 EDI=00000000 EBP=1efeb3f0 ESP=00006d8c EIP=000ec1fc EFL=00000086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 00000000 00008000 DPL=0 Reserved CS =0000 00000000 00000000 00c09b00 DPL=0 CS32 [-RA] SS =0000 00000000 00000000 00c09300 DPL=0 DS [-WA] DS =0000 00000000 00000000 00008000 DPL=0 Reserved FS =0000 00000000 00000000 00008000 DPL=0 Reserved GS =0000 00000000 00000000 00008000 DPL=0 Reserved LDT=0000 00000000 00000000 00008000 DPL=0 Reserved TR =0000 00000000 00000000 00008000 DPL=0 Reserved GDT= 00000000 00000000 IDT= 00000000 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=d8 0d 00 00 00 80 ba f8 0c 00 00 ef ba fc 0c 00 00 89 c8 ef <5b> 5e c3 56 53 89 d3 8b 15 f8 54 0f 00 85 d2 0f b7 c0 74 0c 01 da c1 e0 0c 01 c2 66 89 0a Please provide the package NVR for which bug is seen: kernel-5.14.0-284.30.1.el9_2.x86_64 How reproducible: 100% if you try several times 90% it happens on the every first boot Steps to reproduce L1 VM # echo b > /proc/sysrq-trigger Wait until L1 VM restarts and L2 VMs are started. Check "virsh list" in L1 VM, find a "paused" VM. Expected results All L2 VMs are running. Actual results 1 L2 VM of 10 VM is paused. Detailed description: Configuration: host (L0): RHEL9.2, one L1 VM is running L1 VM: RHEL9.2, 10 L2 VMs are running L2 VMs: Guest OS: cirros-0.4.0-x86_64 $ uname -a Linux cirros 4.4.0-28-generic #47-Ubuntu SMP Fri Jun 24 10:09:13 UTC 2016 x86_64 GNU/Linux L0 (host): [root@rhel9test ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 9.2 (Plow) [root@rhel9test ~]# uname -a Linux rhel9test.aci 5.14.0-284.30.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 25 09:13:12 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux CPU: model name : Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz (20 threads) [root@rhel9test ~]# virsh list Id Name State -------------------------- 1 nestedrh running L1: VM config: CPUs: 4, MEM: 32Gb [root@localhost ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 9.2 (Plow) [root@localhost ~]# uname -a Linux localhost.localdomain 5.14.0-284.30.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 25 09:13:12 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux [root@localhost ~]# virsh list Id Name State ------------------------ 1 test1 paused 2 test9 running 3 test4 running 4 test3 running 5 test8 running 6 test5 running 7 test6 running 8 test2 running 9 test7 running 10 test10 running L2: VM config: CPUs: 2, MEM: 512 Mb Guest OS: cirros-0.4.0-x86_64 $ cat /etc/os-release NAME=Buildroot VERSION=2015.05-g31af4e3-dirty ID=buildroot VERSION_ID=2015.05 PRETTY_NAME="Buildroot 2015.05" $ uname -a Linux cirros 4.4.0-28-generic #47-Ubuntu SMP Fri Jun 24 10:09:13 UTC 2016 x86_64 GNU/Linux How do i reproduce it (easy!): [L1 VM]# echo b > /proc/sysrq-trigger ~90% probability after L1 VM restart one of L2 VMs will be in "paused" state with following complains in logs: [root@localhost ~]# virsh list Id Name State ------------------------ 1 test3 paused ... L1 dmesg: [ 5.509169] virbr0: port 1(vnet0) entered listening state [ 5.902969] set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state. L1 journalctl: Oct 23 13:25:42 localhost.localdomain systemd[1]: Started Virtual Machine qemu-1-test3. Oct 23 13:25:42 localhost.localdomain virtqemud[1810]: 2023-10-23 10:25:42.442+0000: 1810: info : libvirt version: 9.0.0, package: 10.3.el9_2 (Red Hat, Inc. <http://bugzilla.redhat.com /bugzilla>, 2023-08-24-06:08:50, ) Oct 23 13:25:42 localhost.localdomain virtqemud[1810]: 2023-10-23 10:25:42.442+0000: 1810: info : hostname: localhost.localdomain Oct 23 13:25:42 localhost.localdomain virtqemud[1810]: 2023-10-23 10:25:42.442+0000: 1810: warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /var/lib/lib virt/images/test3.qcow2 secdriver=dac Oct 23 13:25:42 localhost.localdomain kernel: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state. Oct 23 13:25:43 localhost.localdomain virtqemud[1855]: 2023-10-23 10:25:43.329+0000: 1855: info : libvirt version: 9.0.0, package: 10.3.el9_2 (Red Hat, Inc. <http://bugzilla.redhat.com /bugzilla>, 2023-08-24-06:08:50, ) L1 /var/log/libvirt/qemu/test3.log: char device redirected to /dev/pts/0 (label charserial0) ERROR cluster 597 refcount=0 reference=1 ERROR cluster 601 refcount=0 reference=1 Rebuilding refcount structure Repairing cluster 600 refcount=1 reference=0 Repairing cluster 602 refcount=1 reference=0 2023-10-23T10:25:42.465618Z qemu-kvm: warning: Machine type 'pc-i440fx-rhel7.6.0' is deprecated: machine types for previous major releases are deprecated KVM: entry failed, hardware error 0x80000021 If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX=febc0001 EBX=00000030 ECX=febc0001 EDX=00000cfc ESI=00000000 EDI=00000000 EBP=1efeb3f0 ESP=00006d8c EIP=000ec1fc EFL=00000086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 00000000 00008000 DPL=0 Reserved CS =0000 00000000 00000000 00c09b00 DPL=0 CS32 [-RA] SS =0000 00000000 00000000 00c09300 DPL=0 DS [-WA] DS =0000 00000000 00000000 00008000 DPL=0 Reserved FS =0000 00000000 00000000 00008000 DPL=0 Reserved GS =0000 00000000 00000000 00008000 DPL=0 Reserved LDT=0000 00000000 00000000 00008000 DPL=0 Reserved TR =0000 00000000 00000000 00008000 DPL=0 Reserved GDT= 00000000 00000000 IDT= 00000000 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=d8 0d 00 00 00 80 ba f8 0c 00 00 ef ba fc 0c 00 00 89 c8 ef <5b> 5e c3 56 53 89 d3 8b 15 f8 54 0f 00 85 d2 0f b7 c0 74 0c 01 da c1 e0 0c 01 c2 66 89 0a
Done