Loading...
Loading...
An update to vCenter failed and a new vCenter Instance was deployed. The existing SVMs were added to the new vCenter and then to the newly created DVS (distributed virtual switch). Once this was completed, one SDS continued to enter a decoupled/disconnected state The SDS is showing as disconnected, Join-Pending in the scli --query_all_sds output: SDS ID: b7g3bd870000000aName: SIO-ESX1 State: Disconnected, Join-Pending IP: 172.21.55.110 Port: 7072 Version: 2.0.13000 The MDM events log shows the SDS continuously decoupling, entering cool down mode and, then reconnecting. 2017-10-09 12:35:53.155 SDS_DECOUPLED ERROR SDS: SIO-ESX1 (id b7g3bd870000000a) decoupled. 2017-10-09 12:35:53.156 SDS_IN_COOL_DOWN WARNING SDS: SIO-ESX1 (ID b7g3bd870000000a) will disconnect from MDM for 15 seconds failed to reconnect multiple times 2017-10-09 12:36:08.155 SDS_RECONNECTED INFO SDS: SIO-ESX1 (ID b7g3bd870000000a) reconnected 2017-10-09 12:36:13.161 SDS_RECONNECTED INFO SDS: SIO-ESX1 (ID b7g3bd870000000a) reconnected 2017-10-09 12:36:18.255 SDS_RECONNECTED INFO SDS: SIO-ESX1 (ID b7g3bd870000000a) reconnected 2017-10-09 12:36:23.259 SDS_RECONNECTED INFO SDS: SIO-ESX1 (ID b7g3bd870000000a) reconnected 2017-10-09 12:36:28.255 SDS_RECONNECTED INFO SDS: SIO-ESX1 (ID b7g3bd870000000a) reconnected ... 2017-10-09 12:38:41.153 SDS_DECOUPLED ERROR SDS: SIO-ESX1 (id b7g3bd870000000a) decoupled. 2017-10-09 12:38:41.155 SDS_IN_COOL_DOWN WARNING SDS: SIO-ESX1 (ID b7g3bd870000000a) will disconnect from MDM for 15 seconds failed to reconnect multiple times 2017-10-09 12:38:56.159 SDS_RECONNECTED INFO SDS: SIO-ESX1 (ID b7g3bd870000000a) reconnected 2017-10-09 12:39:01.164 SDS_RECONNECTED INFO SDS: SIO-ESX1 (ID b7g3bd870000000a) reconnected 2017-10-09 12:39:06.260 SDS_RECONNECTED INFO SDS: SIO-ESX1 (ID b7g3bd870000000a) reconnected 2017-10-09 12:39:11.265 SDS_RECONNECTED INFO SDS: SIO-ESX1 (ID b7g3bd870000000a) reconnected The MDM trace logs show the SDS having issues communicating. The SDS is connecting then disconnecting: 09/10 12:39:11.261955 0x7f66b2a32eb0:tgtMgr_CheckMsgReturnedRC:01728: Message RC is TIMEOUT, RC is COMMUNICATION_ERROR 09/10 12:39:11.261972 0x7f66b2a32eb0:tgtMgr_ConfigureTgt:05190: TGT_CONFIG Failed to update new TGT b9f3bd870000000a on all other TGTs RC: COMMUNICATION_ERROR 09/10 12:39:11.261981 0x7f66b2a32eb0:tgtMgr_ProcessComplete:07188: Tgt: b7g3bd870000000a RC=COMMUNICATION_ERROR type: UP upDownState: DOWN processState: UP_INPROGRESS mdmTgtConGen: 51897 upPend: 1 downPend: 0 devErrPend: 0 wasDown: 1 createIP: 0 reconstructIP: 0 09/10 12:39:11.261994 0x7f66b2a32eb0:keepalive_SetTgtFenced:00331: TGT: b7g3bd870000000a 09/10 12:39:11.262015 0x7f66b0902eb0:repExtent_IO:02976: Writing the repository to disk and cluster. Extent: 0 Page: 186 Offset: 761856 Size: 65536 09/10 12:39:11.263420 0x7f66b2a32eb0:tgtMgr_InitiateProcessUnlocked:07027: Tgt: b7g3bd870000000a processType: UP upDownState: DOWN processState: IDLE mdmTgtConGen: 51897 upPending: 0 downPending: 0 wasDown: 1 createIP: 0 reconstructIP: 0 aboutToBeRemoved: 0 09/10 12:39:11.263472 0x7f66b2a71eb0:tgtMgr_HandleWorkReq:04511: TgtId: b7g3bd870000000a starting Up process 09/10 12:39:11.263488 0x7f66b0926eb0: repExtent_IO:02976: Writing the repository to disk and cluster. Extent: 0 Page: 186 Offset: 761856 Size: 65536 09/10 12:39:11.264680 0x7f66b2a71eb0:keepalive_SetTgtListenOnly:00377: TGT: b7g3bd870000000a 09/10 12:39:11.264689 0x7f66b2a71eb0:mdmTgtMsg_SendSyncAddMdm:00505 : TGT_CONFIG TgtId: b7g3bd870000000a MdmId: 47e9330a1b66a52e mdmTgtConnectionGenNum: 51898 bForceClean: 0 bQuiesce: 0 quiesceGen: 1 09/10 12:39:11.265081 0x7f66b2a71eb0:mosEventLog_PostInternal:00590: New event added. Message: "SDS: SIO-ESX (ID b7g3bd870000000a) reconnected". Additional info: "" Severity: Info 09/10 12:39:11.265091 0x7f66b2a71eb0:mdmTgtMsg_SendReconfStart:00571: TGT_CONFIG TgtId: b7g3bd870000000a
One of the uplinks on the DVS (distributed virtual switch) was set to 1500 MTU, while the remaining network components were set to 9000 MTU. This was causing communication issues between the SDS and MDM.
This is not a ScaleIO issue, this is an ESXi network configuration issue. Edit the DVS (distributed virtual switch) uplink and set the MTU size to 9000 in order to be consistent with all the network MTU settings. Verify all components in the network path are set to 9000 MTU.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.