...
Errors related to networking or host connectivity, including but not limited to:Failures of backups which appear to have completed actual data transferGeneralized exhaustion of resources or communications collapse GSS warning Session information (number hex:hex) registered by user for nsrexecd has expired because a NetWorker daemon had not requested it after 120 minutesGSS error Session information (number hex:hex) was requested by nsrmmd but the session has expiredRPC severe Unable to query NSR database for list of configured devices: RPC receive operation failed; peer = ip_addr:port, errno = Connection timed outRPC severe Unable to query NSR database for list of configured devices: RPC send operation failed; peer = ip_addr:port, errno = Broken pipeNSR notice Chunking ssid ssid failed, because saveset was abortedddp_open_file_ext() failed for File: //mtree/vol_dir/nn/nn/long_ssid, Err: 5004-nfs lookup failed (nfs: No such file or directory) ).NSR critical Connectivity check request is failed for: SN_CONN_REPORT_DD type data_domain deviceRPC error RPC client handle: No route to host.RPC error RPC client handle: Connection refused.RPC error Unable to create the connection with 'portmapper' to host 'hostname' with address 'ip_addr' at port number 7938.RPC critical Aborting client connection from ip_addr: Connection timed out.RPC critical Check whether the firewall is blocking the client ports on the host 'hostname'.RPC critical Check whether the client services are running on the host 'hostname'.
NetWorker is an application which creates many sockets both locally and to remote hosts while regular operations. While the server and storage nodes generally create more, the client configuration can also affect job successes. Keepalives: Every socket created by a NetWorker caller process to connect to a listener daemon process, and these can be interrupted if left idle for too long by network devices attempting to reclaim resources. Generally, this requires that keepalives are enabled for the NetWorker server and nodes by default, and clients experiencing issues. NetWorker has its own internal keepalive handling for some (but not all) binaries. The Operating System also has keepalives that should be engaged, by default.Port availability: Each socket NetWorker sets out to establish requires a port in the ephemeral range to communicate from, but this range is limited by default on all Operating Systems, and should be opened to the maximum extent possible so as not to artificially limit communications. With nsrauth enabled by default, the number of ports required for a single wanted socket will be at least 3, with each failure potentially reattempting rapidly, leaving ports in TIME_WAIT until the connection succeeds. For this reason, the maximum available number of ports should be raised, with TIME_WAIT states ideally lowered.Other long-running sockets may also be fortified with specific internal software variables which enable higher resiliency or improve buffering.
The following are the usual recommended settings by Operating System and host class along with their implementation commands. Always, applicability varies; those considered universally desirable are uncommented, while those with more variable suitability are commented, but available for use at need. These settings are provided in good faith as general recommendations but should be reviewed by Operating System administrators before implementation. These are considered best default best-practice all cases for Servers and Storage Nodes. Client suitability may vary, based on configuration and role in any given environment, and in such cases should be considered carefully before use since different application server roles may conflict with recommended settings - in these cases, settings required by the role should take priority. Linux: All appropriate settings should be entered in the /nsr/nsrrc file, which must have global read/execute permissions (755) in order to be executed at service startup. The default standard entries are uncommented, with non-standard or circumstantial options commented. Change availability of settings using # prefix on the relevant lines. Trim the file as relevant for NetWorker clients, nodes or servers, depending on where you will deploy the file. Service restart will be required after changes are made. ### LINUX - For all NetWorker hosts - Clients, Nodes and Server NSR_KEEPALIVE_WAIT=10 export NSR_KEEPALIVE_WAIT NSR_EXEC_MAX_AUTH_THREADS=50 export NSR_EXEC_MAX_AUTH_THREADS # NSR_SOCK_BUF_SIZE=65536 # (262144 for 10 Gb ETH NICs) # export NSR_SOCK_BUF_SIZE # NetWorker internal keepalive settings for some, but not all binaries - 4.5 minutes to ensure keepalives are passed before the increasingly common 5 minute router idle socket kill timer NW_TCP_KEEPIDLE_SECS=270 export NW_TCP_KEEPIDLE_SECS NW_TCP_KEEPINTVL_SECS=30 export NW_TCP_KEEPINTVL_SECS NW_TCP_KEEPCNT=10 export NW_TCP_KEEPCNT # OS-level keepalive values - also set to 4.5 minutes for the same reason sysctl -w "net.ipv4.tcp_keepalive_intvl=30" sysctl -w "net.ipv4.tcp_keepalive_probes=10" sysctl -w "net.ipv4.tcp_keepalive_time=270" # Set kernel limits to ensure core dump generation ulimit -Sn 262144 ulimit -Sc unlimited ### For NetWorker Storage Nodes and Server# Set kernel limits to provide maximum file descriptor availabilityulimit -Hn 262144 ulimit -Hc unlimited # Globally disable IPv6, if it is not necessary for operation:# sysctl -w "net.ipv6.conf.all.disable_ipv6=1"# Disable dynamic TCP window scaling - requires compatible equipment in the data path, as well as ECN sysctl -w "net.ipv4.tcp_window_scaling=0" sysctl -w "net.ipv4.tcp_ecn=0" # Raise connection backlog (hash tables) to the maximum value allowed if desired # sysctl -w "net.ipv4.tcp_max_syn_backlog=8192" # sysctl -w "net.core.netdev_max_backlog=8192" # (For 10 Gb Eth use the value = 30000) # Raise memory size available for TCP buffers as needed # sysctl -w "net.core.rmem_default=262144" # sysctl -w "net.core.wmem_default=262144" # sysctl -w "net.core.rmem_max=16777216" # sysctl -w "net.core.wmem_max=16777216" # sysctl -w "net.ipv4.tcp_rmem=8192 524288 16777216" # sysctl -w "net.ipv4.tcp_wmem=8192 524288 16777216" # Increase shared memory pool if required - particularly for immediate mode on Storage Nodes # sysctl -w kernel.shmmax = 2147483648 # - e.g. 2 GB # sysctl -w kernel.shmall = 2147483648 # - e.g. 2 GB # Available TCP client ephemeral port range increase from default: sysctl -w "net.ipv4.ip_local_port_range=10000 64000" # Enable TCP Time Wait Reuse for very high load servers and nodes to increase socket reuse availability sysctl -w "net.ipv4.tcp_tw_recycle=0" sysctl -w "net.ipv4.tcp_tw_reuse=2" # Lower TIME_WAIT delay to close connections more quickly. This may not be necessary in concert with tw_reuse. # sysctl -w "net.ipv4.tcp_fin_timeout=30" # NFS I/O concurrency: sysctl -w "sunrpc.tcp_slot_table_entries=128" sysctl -w "sunrpc.udp_slot_table_entries=128" ### For NetWorker Server only # Settings to increase device resilience for cloud operations or other potentially high-latency devices # NSR_DEVOP_TIMEOUT=3600 # export NSR_DEVOP_TIMEOUT # NSR_DEVOP_POLLING_INTERVAL=600 # export NSR_DEVOP_POLLING_INTERVAL # NSR_DEVOP_INQUIRY_TIMEOUT=900 # export NSR_DEVOP_INQUIRY_TIMEOUT### Media database tunables# NSR_TCP_READ_LONG_WAIT=Y # export NSR_TCP_READ_LONG_WAIT # NSR_MAX_MEDIADB_RETRY=10 # export NSR_MAX_MEDIADB_RETRY# MMDB_SQLITE_CONFIGURE_MEMORY=1# export MMDB_SQLITE_CONFIGURE_MEMORY# MMDB_SQLITE_PAGECACHE_SIZE=65536# export MMDB_SQLITE_PAGECACHE_SIZE# MMDB_SQLITE_PAGE_COUNT=65536# export MMDB_SQLITE_PAGE_COUNT# MMDB_SQLITE_HEAP_SIZE=1073741824# export MMDB_SQLITE_HEAP_SIZE# MDB_SQLITE_HEAP_MIN_ALLOC_SIZE=128# export MDB_SQLITE_HEAP_MIN_ALLOC_SIZE Windows: Since the /nsr/nsrrc file does not currently exist for Windows, changes must be executed using batch file, e.g. nsrrc.bat or other deployment method. Commands are provided here where a command-driven option exists. These changes are global, and will not not need to be run repeatedly. Like Linux's nsrrc file, the default standard entries are uncommented, with non-standard or circumstantial options commented. Change availability of settings using REM prefix on the relevant lines. Trim the file as relevant for NetWorker clients, nodes or servers, depending on where you will deploy the file. Service restart will be required after changes are made. REM ### WINDOWS - For all NetWorker hosts - Clients, Nodes and Server REM # TCP window size tuning - greater throughput / Data Domain REM reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters /v DefaultSendWindow /t REG_DWORD /d 262144 /f REM reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters /v DefaultReceiveWindow /t REG_DWORD /d 262144 /f REM reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v GlobalMaxTcpWindowSize /t REG_DWORD /d 262144 /f REM reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v TcpWindowSize /t REG_DWORD /d 262144 /f REM # Global keepalive registry settings - 270s to fall below common idle socket timer kills of 300s reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v KeepAliveTime /t REG_DWORD /d 270000 /f reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v KeepAliveInterval /t REG_DWORD /d 10000 /f reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v TcpMaxDataRetransmissions /t REG_DWORD /d 10 /f REM # Global NetWorker keepalive and connectivity variables setx /m NW_TCP_KEEPIDLE_SECS 270 setx /m NW_TCP_KEEPINTVL_SECS 30 setx /m NW_TCP_KEEPCNT 10 setx /m NSR_KEEPALIVE_WAIT 10 setx /m NSR_EXEC_MAX_AUTH_THREADS 50 REM setx /m NSR_SOCK_BUF_SIZE=65536 # (262144 for 10 Gb Eth NICs) REM ### For NetWorker Storage Nodes and Server REM # Standard TCP features - disable in case of disconnections REM netsh interface tcp set global rss=disabled REM netsh interface tcp set global autotuning=disabled REM netsh interface tcp set global ecncapability=disabled REM netsh interface tcp set global timestamps=default REM # Port range availability for TCP client callers netsh int ipv4 set dynamicport tcp start=10000 num=54000 netsh int ipv4 set dynamicport udp start=10000 num=54000 netsh int ipv6 set dynamicport tcp start=10000 num=54000 netsh int ipv6 set dynamicport udp start=10000 num=54000REM # Global port maximum (deprecated) and TIME_WAIT window REM reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v MaxUserPort /t REG_DWORD /d 65535 /f reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v TcpTimedWaitDelay /t REG_DWORD /d 30 /f REM # Disable IPv6 if not requiredREM reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip6\Parameters /v DisabledComponents /t REG_DWORD /d 0x000000ff /f REM ### For NetWorker Server only REM # Settings to increase device resilience for cloud operations or other potentially high-latency devices REM setx /m NSR_DEVOP_TIMEOUT 3600 REM setx /m NSR_DEVOP_POLLING_INTERVAL 600 REM setx /m NSR_DEVOP_INQUIRY_TIMEOUT 900 REM ### Settings for media database tuningREM setx /m NSR_TCP_READ_LONG_WAIT Y REM setx /m NSR_MAX_MEDIADB_RETRY 10REM setx /m MDB_SQLITE_HEAP_MIN_ALLOC_SIZE 128REM setx /m MMDB_SQLITE_CONFIGURE_MEMORY 1REM setx /m MMDB_SQLITE_HEAP_SIZE 1073741824REM setx /m MMDB_SQLITE_PAGE_COUNT 65536REM setx /m MMDB_SQLITE_PAGECACHE_COUNT 65536REM setx /m MMDB_SQLITE_TMP path_to_temp_dir