Symptoms
Include only the primary, followed by secondary symptom(s) seen by the customer (log entries, similar errors and so on)
Deploying a management or workload cluster with the following infrastructure and configuration may fail or result in restricted traffic between pods if those pods are on different ESXi hosts:
NSX-T versions: vSphere with NSXT v3.1.3 with Enhanced Data Path on, vSphere with NSX-T v3.1.x lower than v3.1.3, NSX-T v3.0.x lower than v3.0.2 hot patch, or NSX-T v2.x.Base images: Photon 3 or Ubuntu with Linux kernel 5.8
This combination exposes a checksum issue between older versions of NSX-T and Antrea CNI.
Resolution
This includes all core contents of the article.There are two options to resolve this issue:
Upgrade to NSX-T v3.0.2 Hot Patch, v3.1.3 or later. If Enhanced Datapath is enabled, you need to upgrade to NSX v3.2.1.
Use an Ubuntu base image with Linux Kernel v5.9 or later.
Workaround
For TKG 1.5+, you can set `ANTREA_DISABLE_UDP_TUNNEL_OFFLOAD` to `true` when creating the cluster.For TKG 1.4.2+, not 1.5, you can set `DISABLE_CHECKSUM_OFFLOAD` to `true` when creating the cluster.In some cases, the management cluster deploys successfully, but there is traffic drop. To work around this issue, ssh into all controlplane and worker VMs and run the following command on all nodes:ethtool -K eth0 tx-udp_tnl-segmentation off && ethtool -K eth0 tx-udp_tnl-csum-segmentation off