Symptoms
No output when the command kubectl get pods -o wide --all-namespaces | grep <IP Address of appliance> is run You see errors in the logs similar to:
ERROR 37 --- [or-http-epoll-3] c.v.c.service.impl.NodeServiceImpl : Error forwarding to callback service connection timed outERROR 37 --- [or-http-epoll-3] c.v.c.service.impl.ExecutionServiceImpl : Unable to queue execution due to connection timed out
Cause
The error is expected if one of the Code Stream pods was destroyed and recreated by Kubernetes. During this process, the pods is assigned a different IP address on recreation and the associated database record for this pod and node is not updated with the new IP, causing any request to fail with a connection time out.
Resolution
This issue is resolved in Cumulative Update for vRealize Automation 8.0.1 Patch 3.
Workaround
Clear the nodes table and restart all Code Stream pods
Login into one of the the vRealize Automation appliance(s) as rootRun the following command to reset the number of Code Stream pods to zero:
kubectl -n prelude scale deploy codestream-app --replicas=0
Wait for a couple of minutes to allow the pod deletion to processRun the following command to reset the number of Code Stream pods to their original values
Clustered
kubectl -n prelude scale deploy codestream-app --replicas=3
Single
kubectl -n prelude scale deploy codestream-app --replicas=1
Validate there are now 1 or 3 running Code Stream pods with the following command:
kubectl -n prelude get pods | grep codestream
codestream-*********-2v5x9 1/1 Running 0 **codestream-*********-9b8px 1/1 Running 0 **codestream-*********-ndfrw 1/1 Running 0 **
Note: The number of entries should match the number of replica nodes provided in Step #4 based on cluster size.