Loading...
Loading...
Description of problem: The domain with vhost-user interface + iommu throws "call trace" when running the netperf tests Version-Release number of selected component (if applicable): qemu-kvm-8.0.0-4.el9.x86_64 5.14.0-323.el9.x86_64 dpdk-22.11-3.el9_2.x86_64 openvswitch3.1-3.1.0-28.el9fdp.x86_64 How reproducible: 100% Steps to Reproduce: 1. setup the host kernel option, like CPU isolation,huge-page, iommu # grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` # echo "isolated_cores=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,31,29,27,25,23,21,19,17,15,13,11" >> /etc/tuned/cpu-partitioning-variables.conf tuned-adm profile cpu-partitioning # reboot 2. start a ovs-dpdk on the host # echo 20 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages # echo 20 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages # modprobe vfio # modprobe vfio-pci # dpdk-devbind.py --bind=vfio-pci 0000:5e:00.0 # dpdk-devbind.py --bind=vfio-pci 0000:5e:00.1 ... # ovs-vsctl get Open_vSwitch . other_config {dpdk-init="true", dpdk-lcore-mask="0x2", dpdk-socket-mem="1024,1024", pmd-cpu-mask="0x15554", vhost-iommu-support="true"} # ovs-vsctl show 1e271d29-308d-4201-be11-d898617cc592 Bridge ovsbr0 datapath_type: netdev Port ovsbr0 Interface ovsbr0 type: internal Port dpdk0 Interface dpdk0 type: dpdk options: {dpdk-devargs="0000:5e:00.0", n_rxq="2", n_txq="2"} Port vhost-user0 Interface vhost-user0 type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser0.sock"} Bridge ovsbr1 datapath_type: netdev Port ovsbr1 Interface ovsbr1 type: internal Port dpdk1 Interface dpdk1 type: dpdk options: {dpdk-devargs="0000:5e:00.1", n_rxq="2", n_txq="2"} Port vhost-user1 Interface vhost-user1 type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser1.sock"} 3. start a nfv virt domain with iommu device and vhost-user interfaces <interface type='vhostuser'> <mac address='18:66:da:5f:dd:22'/> <source type='unix' path='/tmp/vhostuser0.sock' mode='server'/> <target dev='vhost-user0'/> <model type='virtio'/> <driver name='vhost' queues='2' rx_queue_size='1024' iommu='on' ats='on'/> <alias name='net1'/> </interface> <iommu model='intel'> <driver intremap='on' caching_mode='on' iotlb='on'/> </iommu> 4. setup the kernel option in the domain # grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` # echo "isolated_cores=1,2,3,4,5" >> /etc/tuned/cpu-partitioning-variables.conf # tuned-adm profile cpu-partitioning # reboot 5. run the netperf tests between the domain clinet and host server (5.1) The host is the netperf server # # ip addr add 192.168.1.3/24 dev ens3f1 # netserver Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC (5.2)The domain is the netperf client: # ip addr add 192.168.1.2/24 dev enp6s0 <-- the domain can ping the 192.168.1.3 successfully but with some package lost # netperf -H 192.168.1.3/24 6. check the domain dmesg # dmesg [ 4802.234530] ------------[ cut here ]------------ [ 4802.234532] NETDEV WATCHDOG: enp6s0 (virtio_net): transmit queue 0 timed out [ 4802.234549] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x1f9/0x200 [ 4802.236690] Modules linked in: intel_rapl_msr intel_rapl_common isst_if_common nfit libnvdimm kvm_intel kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge ip_set stp llc iTCO_wdt rfkill iTCO_vendor_support nf_tables irqbypass nfnetlink rapl virtio_balloon i2c_i801 i2c_smbus lpc_ich qrtr pcspkr vfat fat drm fuse xfs libcrc32c ahci libahci nvme_tcp nvme_fabrics nvme libata nvme_core nvme_common t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net ghash_clmulni_intel net_failover virtio_blk failover serio_raw sunrpc dm_mirror dm_region_hash dm_log dm_mod [ 4802.243011] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.14.0-323.el9.x86_64 #1 [ 4802.243900] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20230301gitf80f052277c8-5.el9 03/01/2023 [ 4802.244809] RIP: 0010:dev_watchdog+0x1f9/0x200 [ 4802.245284] Code: 00 e9 40 ff ff ff 48 89 ef c6 05 03 af 7a 01 01 e8 3c c5 fa ff 44 89 e9 48 89 ee 48 c7 c7 a0 b1 6d 97 48 89 c2 e8 17 82 77 ff <0f> 0b e9 22 ff ff ff 0f 1f 44 00 00 55 53 48 89 fb 48 8b 6f 18 0f [ 4802.247210] RSP: 0018:ffffb32980003eb0 EFLAGS: 00010286 [ 4802.247766] RAX: 0000000000000000 RBX: ffff99428b8ff488 RCX: 0000000000000027 [ 4802.248511] RDX: 0000000000000027 RSI: ffffffff97e67460 RDI: ffff994337c1f8c8 [ 4802.249262] RBP: ffff99428b8ff000 R08: ffff994337c1f8c0 R09: 0000000000000000 [ 4802.250016] R10: ffffffffffffffff R11: ffffffff98b6f070 R12: ffff99428b8ff3dc [ 4802.250766] R13: 0000000000000000 R14: ffffffff96b7e5b0 R15: ffffb32980003f08 [ 4802.251516] FS: 0000000000000000(0000) GS:ffff994337c00000(0000) knlGS:0000000000000000 [ 4802.252364] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4802.252977] CR2: 00007ffe7a749000 CR3: 0000000101d54004 CR4: 0000000000770ef0 [ 4802.253732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4802.254479] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4802.255230] PKRU: 55555554 [ 4802.255532] Call Trace: [ 4802.255805] <IRQ> [ 4802.256031] ? pfifo_fast_change_tx_queue_len+0x70/0x70 [ 4802.256586] call_timer_fn+0x24/0x130 [ 4802.256986] __run_timers.part.0+0x1ee/0x280 [ 4802.257444] ? enqueue_hrtimer+0x2f/0x80 [ 4802.257870] ? __hrtimer_run_queues+0x159/0x2c0 [ 4802.258358] run_timer_softirq+0x26/0x50 [ 4802.258785] __do_softirq+0xc7/0x2ac [ 4802.259173] __irq_exit_rcu+0xb9/0xf0 [ 4802.259573] sysvec_apic_timer_interrupt+0x72/0x90 [ 4802.260084] </IRQ> [ 4802.260318] <TASK> [ 4802.260559] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 4802.261103] RIP: 0010:default_idle+0x10/0x20 [ 4802.261571] Code: 8b 04 25 40 ef 01 00 f0 80 60 02 df c3 cc cc cc cc 0f ae 38 eb bb 0f 1f 40 00 0f 1f 44 00 00 66 90 0f 00 2d be da 47 00 fb f4 <c3> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 65 [ 4802.263496] RSP: 0018:ffffffff97e03ea8 EFLAGS: 00000252 [ 4802.264050] RAX: ffffffff96d8d320 RBX: ffffffff97e1a940 RCX: 0000000000000000 [ 4802.264803] RDX: 4000000000000000 RSI: ffff994337c22b20 RDI: 000000000497eebc [ 4802.265554] RBP: 0000000000000000 R08: 0000045e163d1cbb R09: ffff9941d6202400 [ 4802.266301] R10: 0000000000020604 R11: 0000000000000000 R12: 0000000000000000 [ 4802.267054] R13: 000000006dc53d18 R14: 000000006d3c47a8 R15: 000000006d3c47b0 [ 4802.267810] ? mwait_idle+0x70/0x70 [ 4802.268189] default_idle_call+0x33/0xe0 [ 4802.268615] cpuidle_idle_call+0x125/0x160 [ 4802.269051] ? kvm_sched_clock_read+0x14/0x30 [ 4802.269519] do_idle+0x78/0xe0 [ 4802.269891] cpu_startup_entry+0x19/0x20 [ 4802.270311] rest_init+0xca/0xd0 [ 4802.270671] arch_call_rest_init+0xa/0x14 [ 4802.271099] start_kernel+0x4a3/0x4c2 [ 4802.271495] secondary_startup_64_no_verify+0xe5/0xeb [ 4802.272037] </TASK> [ 4802.272279] ---[ end trace 87fb221169225dfd ]-- Besides above "Call Trace" , the domain will keep throwing the info like" virtio_net virtio3 enp6s0: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 7820000 usecs ago" 7. run the ping tests # ping 192.168.1.3 PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data. ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ... Actual results: The domain with vhost-user interface + iommu throws "call trace" when running the netperf tests Expected results: No Call Trace Additional info:
Done
Click on a version to see all relevant bugs
Red Hat Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.