...
Common symptoms when an SMS issue is being encountered: DD-CLI in 'Limited Session'Unable to authenticate or interact over PowerProtect DD System Manager UI (DDSM).DD-CLI commands reporting *** Error connecting to management service at "localhost" SMS has generated core dumps, and is unresponsiveInvalid or Expired License (Locking ID) (See KB --000050243) NOTICE: Elicense refresh error: DD_DDBOOST license: **** Invalid locking id of DD_DDBOOST.. NOTICE: Elicense refresh error: DD_REPLICATION license: **** Invalid locking id of DD_REPLICATION.
SMS issues are encountered when its Service Queue is overwhelmed by unresponsive or timed-out commands.These 'time-outs' in the Service queue can be attributed to various causes, for example: Underlying storage or network issuesCertificate or Registry issues.Driver or firmware timeoutsService or daemon unresponsive; for example, due to a memory leakAn unresponsive platform monitoring stack (for example iDRAC, PTAgent)Running low on capacity in /ddvar Here is an example where we see that SMS restarted due to the service queue becoming full and no jobs progressing for 2 hours: In the 'sms.log' file: 06/15 17:48:42.745 (tid 0x3ab4400): Service Queue ----------- 8 jobs 06/15 17:48:42.745 (tid 0x3ab4400): job: 2421162, completed: NO, start_time: 1371328844356, end_time: 0, duration: 0 msec, operation: sms_enclosure_get_fans_status ... 06/15 19:51:42.823 (tid 0x3ab4400): INFO: Event posted: 341: EVT-SMS-00001: System management server restarted due to no progress for 120 minutes. Here we see that the oldest running job was 'sms_enclosure_get_fans_status' with all other service queue slots consumed by other jobs. As a result: sms_enclosure_get_fans_status was spawned and passed to lower layers before taking lockssms_enclosure_get_fans_status was not able to complete (evidenced by: 'completed: NO')Other jobs require access to the locks held by sms_enclosure_get_fans_status hence cannot runAs sms_enclosure_get_fans_status cannot be completed, we are in a deadlock situation until SMS initiates a restart after 2 hours In this example, the underlying BMC module (a.k.a iDRAC) had become unresponsive, resulting in SMS being unable to complete the issued commands.
Check if SMS is running on the system. If it is not, the system cannot access the UI or command line, resulting in errors such as 'cannot contact management service' or commands hanging indefinitely. Restart the SMS service using DD-CLI sms restart Note: In some DDOS versions, this command is limited to 'SE Mode' (If that is the case, contact Dell Support to let us investigate and run the command). It is possible that SMS timeouts result in Core Dump or unexpected reboots; if so, generate a Support Bundle, gather relevant Core dump and contact your Technical Support provider to investigate the cause. Note: Usually, an issue with SMS does not affect backup/restore functionality as SMS has no bearing on the operation of the DDFS file system.