...
-Memory depletion may be observed on a XE SDWAN router over time in the 'ncsshd' process. -When the issue occurs, the following can be seen under the cEdge's "Show log" output: cEdge#show log *Jan 19 11:40:21.740: %DMI-3-NETCONF_SSH_ERROR: R0/0: ncsshd_bp: NETCONF/SSH: error: fork: Cannot allocate memory *Jan 19 11:41:35.140: %PLATFORM-4-ELEMENT_WARNING: R0/0: smand: RP/0: Used Memory value 89% exceeds warning level 88%. -On vManage side, the cEdge may still have control connection. -Statistics polling may fail. -Pushing templates to the cEdge may fail. -The failure prevents vManage from reaching the cEdge over netconf (Port 830). -The following error may be observed on vManage: vManage# vshell vManage# cd /var/log/nms/ vManage:/var/log/nms# cat vmanage-server.log | grep 22-Jan-2021 00:26:18,107 UTC ERROR [vManage] [NetConfClient] (device-data-collection-31) || Failed to connect to device : XXX.XXX.XXX.XXX Port: 830 user : vmanage-admin error : Connection failed 22-Jan-2021 00:26:18,107 UTC INFO [vManage] [DeviceDataCollectionWorker] (device-data-collection-31) || Failed to collect data from device [XXX.XXX.XXX.XXX] Netconf client error [com.viptela.vmanage.server.device.common.NetConfClientException: java.net.SocketException: Connection reset] 22-Jan-2021 00:30:00,087 UTC ERROR [vManage] [NetConfClient] (device-statistics-collection-queue-0-63) || Failed to connect to device : XXX.XXX.XXX.XXX Port: 830 user : vmanage-admin error : Connection failed vManage:/var/log/nms# cat vmanage-server-statistics* | grep 22-Jan-2021 13:30:00,089 UTC ERROR [vManage] [StatisticsCollector] (device-statistics-collection-queue-0-85) Failed to get vnf statistics data for device XXX.XXX.XXX.XXX connection failed : java.net.SocketException: Connection reset 22-Jan-2021 13:30:00,094 UTC ERROR [vManage] [StatisticsCollector] (device-statistics-collection-queue-0-85) Failed to get vnf_interfaces statistics data for device XXX.XXX.XXX.XXX connection failed : java.net.SocketException: Connection reset 22-Jan-2021 13:30:00,097 UTC ERROR [vManage] [StatisticsCollector] (device-statistics-collection-queue-0-85) Failed to get System statistics data for device XXX.XXX.XXX.XXX connection failed : java.net.SocketException: Connection reset 22-Jan-2021 13:30:00,109 UTC ERROR [vManage] [StatisticsCollector] (device-statistics-collection-queue-0-85) Failed to get Intf statistics data for device XXX.XXX.XXX.XXX connection failed : java.net.SocketException: Connection reset
-The issue may be seen on hardware or software cEdges. -The router is connected to an SDWAN overlay with normal control connections. -The following outputs can be observed, which helps very the issue: cEdge#show platform software status control-processor brief Memory (kB) Slot Status Total Used (Pct) Free (Pct) Committed (Pct) RP0 Warning 3783984 3388696 (90%) 395288 (10%) 5263820 (139%) <<<<=== (Committed usage is too high) cEdge#show platform software process memory r0 all sorted Pid RSS PSS Heap Shared Private Name -------------------------------------------------------------------------- 28007 597256 592212 52 5488 591768 ncsshd <<<===== 27079 575072 484702 224 108592 466480 linux_iosd-imag 20491 174020 155040 4 26680 147340 ucode_pkt_PQF0 21505 146268 66569 84 90260 56008 cpp_cp_svr 23391 140372 139932 110280 692 139680 confd
Reload the affected cEdge.
-The issue can be verified by attempting to use the netconf console from vManage's vshell to try and connect over port 830 (May require consent token access): vManage# vshell vManage:~# ssh admin@ -p 830 -s netconf ssh_exchange_identification: read: Connection reset by peer Or vManage# vshell vManage:~# ssh -i /etc/viptela/.ssh/id_dsa -vv -l vmanage-admin -p 830 -s netconf OpenSSH_7.6p1, CiscoSSL 1.0.2q.6.2.323-fips debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 20: Applying options for * debug2: resolving "XXX.XXX.XXX.XXX" port 830 debug2: ssh_connect_direct: needpriv 0 debug1: Connecting to XXX.XXX.XXX.XXX [XXX.XXX.XXX.XXX] port 830. debug1: Connection established. debug1: permanently_set_uid: 0/0 debug1: identity file /etc/viptela/.ssh/id_dsa type 0 debug1: key_load_public: No such file or directory debug1: identity file /etc/viptela/.ssh/id_dsa-cert type -1 debug1: Local version string SSH-2.0-OpenSSH_7.6 ssh_exchange_identification: read: Connection reset by peer <<<<====== vManage:~# -On cEdge, the netconf session may still be showing as established with vManage: cEdge#show netconf-yang sessions R: Global-lock on running datastore C: Global-lock on candidate datastore S: Global-lock on startup datastore Number of sessions : 1 session-id transport username source-host global-lock -------------------------------------------------------------------------------- 139631 netconf-ssh vmanage-admin x.x.x.x None cEdge#show platform software yang-management process confd : Running nesd : Running syncfd : Running ncsshd : Running dmiauthd : Running nginx : Running ndbmand : Running pubd : Running