...
What were you trying to do that didn't work? On an OSP 16.2.3 (RHEL 8.4) compute host, tenant observes CPU steal from within instance. We made sure that isolcpus was specified correctly [1]. Also made sure instances "emulatorpin cpuset" was set to use cpus outside of isolcpus [2], and that there were no overlap in vcpu pin assignment between instances (XML not shown here for that). In nova.conf, vcpu_pin_set [3] matches isolcpus from cmdline. And for some reason customer has specified values for cpu_shared_set, which we undestand are overriden by vcpu_pin_set since cpu_dedicated_set is not specified. File /etc/sysconfig/irqbalance shows: IRQBALANCE_BANNED_CPUS=fffffcff,fffcffff,fcfffffc which matches the non-isolcpus set Tuned on compute host is set to cpu-partitioning [4]. when the tuned-adm verify command is run, several IRQs throw [5] which seems to suggest cpu steal could be caused by something originating in tuned handling of interrupts. [1] $ cat ./proc/cmdline BOOT_IMAGE=(hd0,gpt3)/boot/vmlinuz-4.18.0-305.49.1.el8_4.x86_64 root=UUID=d34238e8-4842-40b3-9634-0d75169ead87 ro console=ttyS0 console=ttyS0,115200n81 no_timer_check crashkernel=auto rhgb quiet default_hugepagesz=1GB hugepagesz=1G hugepages=900 intel_iommu=on iommu=pt transparent_hugepage=never vt.handoff=1 nmi_watchdog=0 numa_balancing=disable intel_idle.max_cstate=0 nosoftlockup rcu_nocbs=2-23,26-47,50-71,74-95 nohz_full=2-23,26-47,50-71,74-95 isolcpus=2-23,26-47,50-71,74-95 tsx=off skew_tick=1 nohz=on nohz_full=2-23,26-47,50-71,74-95 rcu_nocbs=2-23,26-47,50-71,74-95 tuned.non_isolcpus=00000300,00030000,03000003 intel_pstate=disable nosoftlockup [2] grep "emulatorpin cpuset" ./etc/libvirt/qemu/* ./etc/libvirt/qemu/instance-000001b6.xml: <emulatorpin cpuset='0-1,24-25,48-49,72-73'/> ./etc/libvirt/qemu/instance-000001b9.xml: <emulatorpin cpuset='0-1,24-25,48-49,72-73'/> [3] from nova.conf: vcpu_pin_set=2-23,26-47,50-71,74-95 #cpu_dedicated_set=<None> cpu_shared_set=0-1,24-25,48-49,72-73 [4] $ cat ./etc/tuned/active_profile cpu-partitioning [5] $ grep ERROR ./var/log/tuned/tuned.log ... output truncated - many more errors are showing ... 2023-09-20 12:45:08,064 ERROR tuned.plugins.plugin_scheduler: verify: failed: 'SMP affinity of IRQ 0' = '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95]', expected '[0, 1, 72, 73, 48, 49, 24, 25]' ... 2023-09-20 12:45:08,069 ERROR tuned.plugins.plugin_scheduler: verify: failed: 'SMP affinity of IRQ 322' = '[21]', expected '[0, 1, 72, 73, 48, 49, 24, 25]' 2023-09-20 12:45:08,069 ERROR tuned.plugins.plugin_scheduler: verify: failed: 'SMP affinity of IRQ 323' = '[22]', expected '[0, 1, 72, 73, 48, 49, 24, 25]' 2023-09-20 12:45:08,069 ERROR tuned.plugins.plugin_scheduler: verify: failed: 'SMP affinity of IRQ 324' = '[23]', expected '[0, 1, 72, 73, 48, 49, 24, 25]' 2023-09-20 12:45:08,069 ERROR tuned.plugins.plugin_scheduler: verify: failed: 'SMP affinity of IRQ 325' = '[64]', expected '[0, 1, 72, 73, 48, 49, 24, 25]' 2023-09-20 12:45:08,069 ERROR tuned.plugins.plugin_scheduler: verify: failed: 'SMP affinity of IRQ 326' = '[65]', expected '[0, 1, 72, 73, 48, 49, 24, 25]' 2023-09-20 12:45:08,069 ERROR tuned.plugins.plugin_scheduler: verify: failed: 'SMP affinity of IRQ 327' = '[66]', expected '[0, 1, 72, 73, 48, 49, 24, 25]' 2023-09-20 12:45:08,070 ERROR tuned.plugins.plugin_scheduler: verify: failed: 'SMP affinity of IRQ 328' = '[67]', expected '[0, 1, 72, 73, 48, 49, 24, 25]' 2023-09-20 12:45:08,070 ERROR tuned.plugins.plugin_scheduler: verify: failed: 'SMP affinity of IRQ 329' = '[68]', expected '[0, 1, 72, 73, 48, 49, 24, 25]' 2023-09-20 12:45:08,070 ERROR tuned.plugins.plugin_scheduler: verify: failed: 'SMP affinity of IRQ 330' = '[69]', expected '[0, 1, 72, 73, 48, 49, 24, 25]' 2023-09-20 12:45:08,070 ERROR tuned.plugins.plugin_scheduler: verify: failed: 'SMP affinity of IRQ 331' = '[70]', expected '[0, 1, 72, 73, 48, 49, 24, 25]' 2023-09-20 12:45:08,070 ERROR tuned.plugins.plugin_scheduler: verify: failed: 'SMP affinity of IRQ 332' = '[71]', expected '[0, 1, 72, 73, 48, 49, 24, 25]' Please provide the package NVR for which bug is seen: How reproducible: Customer said this happens more often under high load. Steps to reproduce Can be reproduced on customer setup Expected results No CPU steal should be observed. Actual results CPU steal observed in tenant instance.
Not a Bug
Click on a version to see all relevant bugs
Red Hat Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.