...
When adding a VxRail host to expand a WLD cluster in SDDC Manager: Task is stuck in fetching information Errors showing execution workflow Task never completes from the UI and is reported as failed from SDDC DB domainmanager=# select * from task where id ='a4ff8635-976f-402f-a6e6-9bcd389e356c'; id | a4ff8635-976f-402f-a6e6-9bcd389e356c resource_id | 93303e5d-60ae-414c-b9b5-fcf4d20ca662 resource_type | ESX_HOST state | COMPLETED_WITH_FAILURE description | Adding new host(s) to vxrail cluster errors | [{"messageBundle":"com.vmware.evo.sddc.common.core.error.messages","errorCode":"VCF_ERROR_INTERNAL_SERVER_ERROR","arguments":[], "message":"A problem has occurred on the server. Please retry or contact the service provider and provide the reference token.","cause": [{"type":"com.vmware.evo.sddc.common.services.error.SddcManagerServicesIsException","message":"Error in getting workflow options for addition of host to cluster. Check logs"}, {"type":"com.vmware.evo.sddc.common.vxrail.error.VxRailManagerException","message":"Unable to fetch details for port groups managed by VxRail Manager vxrm.gsslabs.com"}],"referenceToken":"ONPAQ3"}] timestamp | 1674144689474 completion_timestamp | localizable_description | null domainmanager.log shows the specific VxRail API that is timing out: 2023-01-19T20:52:50.118+0000 DEBUG [vcf_dm,63bb90392797462f,03ea] [c.v.v.secure.http.HttpClientService,dm-exec-5] Making request: GET https://vxrm.gsslabs.com:443/rest/vxm/v1/system/cluster-portgroups/esx07.gsslabs.com ... ... 2023-01-19T20:52:51.695+0000 ERROR [vcf_dm,2f2578c538a84ba1,559a] [c.v.v.v.h.w.VxRailHostWorkflowInitiator,dm-exec-6] Failed to start workflow for add host task a4ff8635-976f-402f-a6e6-9bcd389e356c com.vmware.evo.sddc.common.services.error.SddcManagerServicesIsException: Error in getting workflow options for addition of host to cluster. Check logs at com.vmware.evo.sddc.common.services.adapters.workflow.options.WorkflowOptionsAdapterImpl.getWorkflowOptionsForAddHostToVxRailCluster(WorkflowOptionsAdapterImpl.java:269) at com.vmware.vxrail.vcf.hostmanager.workflows.VxRailHostWorkflowInitiator.startWorkFlow(VxRailHostWorkflowInitiator.java:151) at com.vmware.vxrail.vcf.hostmanager.workflows.VxRailHostWorkflowInitiator$$FastClassBySpringCGLIB$$13eaaa4f.invoke(<generated>) ... ... Caused by: com.vmware.evo.sddc.common.vxrail.error.VxRailManagerException: Unable to fetch details for port groups managed by VxRail Manager vxrm.gsslabs.com at com.vmware.evo.sddc.common.vxrail.VxRailManagerService.getVxRailSystemTrafficPortGroups(VxRailManagerService.java:1213) at com.vmware.evo.sddc.common.vxrail.VxRailManagerService.getVxRailSystemTrafficPortGroups(VxRailManagerService.java:1277) ... ... Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method)
The purpose of this KB is workaround the issues described above and add a VxRail host successfully to the cluster in SDDC Manager.
This issue is caused when the API response from the VxRail Manager for getting the cluster-portgroups takes more than 1.5 minutes.The API in question (as reported in the logs above) is: curl -k -X GET --user 'administrator@vsphere.local:<sso_password>' https://<VxRail_Manager>/rest/vxm/v1/system/cluster-portgroups/<VxRail_Host_Name> For Example: curl -k -X GET --user 'administrator@vsphere.local:$ecretPa55' https://vxrm.gsslabs.com:443/rest/vxm/v1/system/cluster-portgroups/esx07.gsslabs.com The timeout value configured in the domainmanager service is 1.5 minutes. So if the API takes longer than that to respond, the task fails with the errors reported above.
MINIMAL: The workaround describes steps on increasing the timeout value for the domainmanager service. Since a configuration is changed on the SDDC Manager, a snapshot of the SDDC Manager VM is recommended.
To resolve the issue, we need to address why the VxRail Manager is taking an extended amount of time to respond to the GET API call to return the cluster-portgroups. On the SDDC Manager, we can workaround this temporarily by increasing the timeout value for the domainmanager service. The steps for this are provided below.
0. Take a snapshot of the SDDC VM.1. SSH to the SDDC Manager with the vcf user, and su root.2. Edit the file: /etc/vmware/vcf/domainmanager/application-prod.properties vi /etc/vmware/vcf/domainmanager/application-prod.properties 3. Add the following entry to edit the timeout value to 300,000 ms (i.e 5 minutes)Note: The default value is 90000 ms (i.e 1.5 minutes) http.client.timeout.milis=300000 4. Save the file and quitESC and :wq!5. Restart domainmanager service using the command systemctl restart domainmanager 6. Wait for the service to come up 7. Re-try adding the VxRail host to the cluster.Reference Document: Add the VxRail Hosts to the Cluster in VMware Cloud FoundationThis time the task should progress forward, and we should see the status of task with its sub-tasks and additional details in the SDDC Manager UI.