...
Upgrading a Tanzu Kubernetes cluster via TMC, Tanzu CLI, or yaml edit results in no new control plane nodes.Scaling a Tanzu Kubernetes Cluster control plane count results in no new nodes.Machine Health Check of control plane nodes is not replacing a broken control plane node.Checking etcd status on the existing nodes shows that the etcd cluster is healthy and status is running.VMware Carbon Black Cloud Container Operator or another 3rd party security policy controller is deployed to the cluster and not allowing port forwarding in the kube-system namespace. Logs from the capi kubeadm control plane manager pod shows the following message: Command to get logs from vSphere with Tanzu Supervisor pods - kubectl logs -n vmware-system-capw capi-kubeadm-control-plane-controller-manager-XXXXXXX -c managerCommand to get logs from TKG Management cluster pods - kubectl logs -n capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-XXXXX manager controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="foo-prod" "kubeadmControlPlane"="foo-prod-control-plane" "namespace"="default" "failures"="machine foo-prod-control-plane-g7s2c reports EtcdMemberHealthy condition is unknown (Failed to connect to the etcd pod on the foo-prod-control-plane-g7s2c node: unable to create etcd client: endpoints: [etcd-foo-prod-control-plane-g7s2c], proxy.KubeConfig.Host: https://[2001:1900:2200:5f75::aba2]:6443: context deadline exceeded)" Execute `etcdctl --cluster=true endpoint health --write-out=table` on the guest clusterOutput that shows that the etcd status on each member is healthy: Guest Cluster Control Plane logging will present logging similar to: The apiserver pod logging might report security policy violations related to Port forwarding (the error below is presented if CarbonBlack PortBlock security policy is applied to kube-system namespace): W0312 08:15:35.048653 1 dispatcher.go:161] rejected by webhook "resources.validating-webhook.cbcontainers": &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue: "", RemainingItemCount: (*int64) (nil)}, Status: "Failure", Message: "admission webhook \"resources.validating-webhook.cbcontainers\" denied the request: Blocked by Kubernetes security policy "Kube-system\".\nViolated rule(s): \n Port forward\n", Reason:"", Details: (*v1.StatusDetails) (nil), Code:400}} On the control plane node, journalctl -xeu containerd logs show: failure attempting to dial 127.0.0.1:2379 failed to execute portforward in network namespace "host": failed to dial 2379: dial tcp4 127.0.0.1:2379: connect: connection refused
Tanzu Kubernetes Grid and vSphere with Tanzu use an underlaying open source component called ClusterAPI(CAPI). On the Management cluster or Supervisor cluster there is a controller pod called capi-kubeadm-control-plane-controller-manager this controller requires permissions on the workload/guest cluster to port forward to the etcd pods to check etcd cluster health prior to adding or updating a control plane node. If the controller cannot get etcd status then it will not proceed and the reconcile of control planes will be stalled indefinitely.
Ensure that VMware Carbon Black Cloud Container Operator or the 3rd party security policy settings do not block port forwarding in the guest cluster kube-system namespace. Without port forwarding available the ClusterAPI infrastructure on the Management/Supervisor cluster can not validate etcd cluster health or function.
ClusterAPI Docs: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20191017-kubeadm-based-control-plane.md?plain=1#L587 Similar discussion and behavior here but found during IPv6 deployment: https://github.com/kubernetes-sigs/cluster-api/issues/4253
Click on a version to see all relevant bugs
VMware Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.