Symptoms
In the instance where this was observed, all PPDM Kubernetes backups start to fail after recovery of PPDM from its server disaster recovery backup. It could apply to other situation though.Kubernetes backup failed with error 'controller pod is not running'.Below error can be observed in logs:2021-07-21T03:49:48.340Z ERROR [] [task-5011a057-340f-40fb-8cd8-12414685d058] [][][][TRACE_ID:a66ce529604914ad;JOB_ID:a9b8915af1637407][] [K8sHelperApi.isDone(90)] - Failed to wait on job com.emc.dpsg.ecdm.baseresourceservice.exception.ValidationServiceException: controller pod is not running2021-07-21T03:50:14.065Z WARN [] [dsSource-plpd-testcluster] [][][][][] [c.e.b.c.s.p.K8sHealthMonitor.checkPodHealth(200)] - Controller Pod is down, cluster: , age=PT153H49M43.065SOutput of command kubectl describe pod -n powerprotect for that k8s cluster:powerprotect powerprotect-controller-666ffccbbf-p5rwh 0/1 ImagePullBackOff 0 6d12hvelero-ppdm backup-driver-587cfcdf59-2mc8p 1/1 Running 0 49dvelero-ppdm velero-5df5fcd896-p68rw 1/1 Running 0 49d
Cause
Powerprotect-controller pod is unable to pull required image from internet.
Resolution
1. Check if Kubernetes cluster can access Docker Hub at https://hub.docker.com/ and Quay at https://quay.io/ to pull required images.2. If a Kubernetes cluster cannot access these sites due to firewall or other restrictions, you can pull these images to a local registry that the cluster can access. Please follow below procedure.1). Create an application.properties file /usr/local/brs/lib/cndm/config/application.properties onthe PowerProtect Data Manager appliance with the following contents:k8s.docker.registry=fqdn:port For example, k8s.docker.registry=artifacts.example.com:8446k8s.image.pullsecrets=secret resource name Specify this entry only if you require an image pull secret.2). Run cndm restart to apply the properties.Note: See PPDM Administration and User Guide for more details. 3. As Kubernetes cluster has already been added as an asset source in PPDM GUI, a manual discovery of the Kubernetes cluster is required after step 1 or 2 is checked/performed.