Loading...
Loading...
The symptom that presents itself is that the NetWorker server service (nsrd) takes a long time to start. This may be interpreted as the NetWorker server is unresponsive, unavailable, or not coming up. The NetWorker service has been started; however, the server is not accessible through NetWorker interfaces such as the NetWorker Management Console (NMC), NetWorker Web User Interface (NWUI), or nsradmin command-line utility. The server's daemon.raw reports that it is at "step 3 of 5" during service startup. This process is "Checking resource types in the RAP database" Linux: /nsr/logs/daemon.raw Windows: <Install Drive>:\Program Files\EMC NetWorker\nsr\logs\daemon.raw NOTE: The .raw file must be rendered to be analyzed properly. NetWorker: How to use nsr_render_log to render .raw log files For Example: 83273 MM/DD/YYYY 03:51:16 PM nsrd NSR notice Startup in process (step 3 of 5); checking resource types in the RAP database... 83278 MM/DD/YYYY 03:52:00 PM nsrd NSR notice Checking resource types in the RAP database (122 resources completed)... 83278 MM/DD/YYYY 03:52:40 PM nsrd NSR notice Checking resource types in the RAP database (124 resources completed)... 83278 MM/DD/YYYY 03:55:29 PM nsrd NSR notice Checking resource types in the RAP database (341 resources completed)... 83278 MM/DD/YYYY 03:56:11 PM nsrd NSR notice Checking resource types in the RAP database (372 resources completed)... 83278 MM/DD/YYYY 03:56:52 PM nsrd NSR notice Checking resource types in the RAP database (392 resources completed)... 83278 MM/DD/YYYY 03:57:33 PM nsrd NSR notice Checking resource types in the RAP database (417 resources completed)... 83278 MM/DD/YYYY 03:58:13 PM nsrd NSR notice Checking resource types in the RAP database (449 resources completed)... 83278 MM/DD/YYYY 03:58:54 PM nsrd NSR notice Checking resource types in the RAP database (457 resources completed)... 83278 MM/DD/YYYY 03:59:39 PM nsrd NSR notice Checking resource types in the RAP database (602 resources completed)... 83278 MM/DD/YYYY 04:00:20 PM nsrd NSR notice Checking resource types in the RAP database (612 resources completed)... 83278 MM/DD/YYYY 04:01:01 PM nsrd NSR notice Checking resource types in the RAP database (658 resources completed)... 83278 MM/DD/YYYY 04:01:42 PM nsrd NSR notice Checking resource types in the RAP database (660 resources completed)... 83278 MM/DD/YYYY 04:02:23 PM nsrd NSR notice Checking resource types in the RAP database (683 resources completed)... 83278 MM/DD/YYYY 04:03:04 PM nsrd NSR notice Checking resource types in the RAP database (686 resources completed)... NOTE: This output has been edited to show only the "Checking resource types" lines. What is noticed is that it takes a long time to check the resources. In some instances, it takes several minutes to check a couple of resources. The amount of time it takes can vary depending on the size of the environment and scope of the issue. For example, in an environment with thousands of resources in the RAP database, it could take 60+ minutes for NetWorker to start. In most instances, the nsrd service eventually starts (if no other issues are present); however, the process for checking RAP resources should occur faster.
Enabling nsrd debug: dbgcommand -n nsrd Debug=9 With nsrd debug enabled, the following errors are observed in the rendered daemon.raw : 0 MM/DD/YYYY 04:07:24 PM nsrd NSR notice 12/12/24 16:07:24.829153 nsrd-D5 lg_inet_getaddrinfo(): pass-through ... calling external libc getaddrinfo() ... 0 MM/DD/YYYY 04:07:25 PM nsrd NSR notice 12/12/24 16:07:25.360793 nsrd-D5 lg_inet_getaddrinfo(): EXIT rc=-2 output cannonname = null output addr = retval error 0 MM/DD/YYYY 04:07:25 PM nsrd NSR notice 12/12/24 16:07:25.360904 nsrd-D7 lookup_name of host CLIENT_1 (in microsecond) took 20099719, CR 2, NF 1, getaddrinfo 20099716 0 MM/DD/YYYY 04:07:25 PM nsrd NSR notice 12/12/24 16:07:25.360947 nsrd-D5 lg_inet_getaddrinfo(): ENTER input host= CLIENT_1 input service=NULL input hints-flags=0x0002 AI_PASSIVE=0 AI_NUMERICHOST=0 AI_NUMERICSERV=0 AI_CANONNAME=1 0 MM/DD/YYYY 04:07:25 PM nsrd NSR notice 12/12/24 16:07:25.361070 nsrd-D5 lg_inet_getaddrinfo(): pass-through ... calling external libc getaddrinfo() ... 0 MM/DD/YYYY 04:07:25 PM nsrd NSR notice 12/12/24 16:07:25.542699 nsrd-D5 lg_inet_getaddrinfo(): EXIT rc=-2 output cannonname = null output addr = retval error 0 MM/DD/YYYY 04:07:25 PM nsrd NSR notice 12/12/24 16:07:25.542799 nsrd-D5 lg_inet_getaddrinfo(): ENTER input host= CLIENT_2 input service=NULL input hints-flags=0x0002 AI_PASSIVE=0 AI_NUMERICHOST=0 AI_NUMERICSERV=0 AI_CANONNAME=1 0 MM/DD/YYYY 04:07:25 PM nsrd NSR notice 12/12/24 16:07:25.542835 nsrd-D5 lg_inet_getaddrinfo(): pass-through ... calling external libc getaddrinfo() ... 0 MM/DD/YYYY 04:07:25 PM nsrd NSR notice 12/12/24 16:07:25.543600 nsrd-D5 lg_inet_getaddrinfo(): EXIT rc=-2 output cannonname = null output addr = retval error 0 MM/DD/YYYY 04:07:25 PM nsrd NSR notice 12/12/24 16:07:25.543638 nsrd-D7 lookup_name of host CLIENT_2 (in microsecond) took 182701 , CR 3, NF 1, getaddrinfo 182697 NOTE: The log reports "retval error," indicating that it failed to retrieve the correct name resolution/address for the RAP resource. nsrd debug can be disabled with: dbgcommand -n nsrd Debug=0 The causes for this symptom can typically be attributed to: Name resolution: There are many clients which do not resolve properly in the Domain Name System (DNS) (Fully Qualified Domain Name (FQDN), short name, reverse/IP). Conversely, there may be no DNS and system hosts files are used. The system hosts file contains an incorrect IP address for a host, or is formatted incorrectly. Decommissioned clients: There are a lot of decommissioned clients which still exist in the NetWorker server's media database (mm). This means that there are still backups of the clients; even if the client no longer exists in the server configuration (nsrdb)
Correct any name resolution issues: The system hosts file contains a lot of addresses. Linux: /etc/hosts Windows: C:\Windows\System32\Drivers\etc\hosts Incorrect hosts file entries (IP pointing to wrong hostname and aliases) IP pointing to the wrong hostname/aliases The same IP address specified on multiple lines Entries for clients which have been decommissioned from NetWorker. DNS issues. NetWorker resources which are not in the system hosts file, and also do not resolve completely (FQDN, shortname, IP) in DNS. 1. Stop NetWorker server services from an Administrator Powershell/root shell. Linux: nsr_shutdown Windows: net stop nsrexecd /y 2. Ensure all NetWorker services have stopped: Linux: ps -ef | grep nsr Windows: tasklist | findstr nsr 3. Rename the daemon.raw. Linux Path: /nsr/logs/daemon.raw Windows Path: <Install Drive>:\Program Files\EMC NetWorker\nsr\logs\daemon.raw 4. Start the NetWorker server service. Linux: systemctl start networker Windows: net start nsrd 5. Put nsrd in debug level 9. dbgcommand -n nsrd Debug=9 NOTE: This command is included with NetWorker, but it is not OS-specific. The same command can be used on Windows/Linux distributions.6. Wait for NetWorker services to start. 6. Once NetWorker services have started, render the NetWorker server's daemon.raw: NetWorker: How to use nsr_render_log to render .raw log files 7. Look for all hosts that returned " lg_inet_getaddrinfo .... output addr = retval error " For example, on a Linux server you can use grep to return each line retval error line, followed by the line which includes which resources reported the retval error . The output can be redirected to a file for further review. cat /nsr/logs/daemon.log | grep nsrd | grep -A1 "retval error" > /tmp/nsr_retvalerr.out Windows hosts require other tools or scripting (outside the scope of NetWorker support) to format the data. 8. Using the information from the daemon.log identify which hosts are having name resolution issues: NetWorker: Name Resolution Troubleshooting Best Practices . Fix DNS configuration so that the reported clients resolve correctly. Ensure that DNS records exist for the systems reporting retval error . Ensure the NetWorker server is configured to use DNS server's where the DNS records exist. Check the NetWorker server hosts file. If the hosts reporting a " retval error " during nsrd startup exist in the hosts file, they must have the correct IP addresses and aliases. If the hosts reporting a " retval error " are not in the hosts file and cannot be resolved through DNS, hosts file entries must be created for the IP address and hostname aliases used by the host. If there are hosts file entries for clients which have been decommissioned (no longer online), false entries can be created using fake IP addresses (example, 1.1.1.1, and so forth). The hosts file must be formatted correctly (that is: The same IP does not appear on multiple different lines. The same host is not referenced by multiple IPs.) Decommissioned Deleted Clients: CAUITON: The following action plan must be performed by the backup administrator with careful review. Performing these steps incorrectly can result in data loss. Contact Dell NetWorker support for assistance in reviewing the procedure and outputs. The NetWorker backup administrator must validate the data and perform any actions necessary to complete this procedure. Decommissioned or deleted clients appear when the client host no longer exists. The client was deleted from NetWorker, and all backups have expired; however, there is still a NetWorker clientid in the media database for the client. NetWorker is still trying to check this client during the RAP consistency check and is unable to resolve/connect to it. The client reports the "retval error" reported in the cause section. 1. Use the nsrclientfix command-line utility to check for clients which have a clientid registered in the media database but are not in the nsrdb and have no backups: nsrclientfix -a nsrclientfix1.out -p This command returns a list of clients where a clientid issue is observed. Any line reporting only a single name (no comma-separated names) is for clients which still have a clientid in the media database but no NetWorker client exists, and no backups of that client exist. 2. Edit the file so that it only includes lines that include single hostnames (no comma-separated names). This can be done in Notepad or using commands. For example, on a Linux host run: grep -v ",\|#" nsrclientfix1.out > nsrclientfix1.in This creates a new file called nsrclientfix.in which only contains single hosts from the nsrclientfix.out file. CAUTION: You must delete any line that includes comma-separated values. The file should only include lines which listed only a single client name. When multiple entries exist on the same line, this indicates that one or more NetWorker clients share the same clientid . These can be merged; however, this should only be done if the clients represent the same system. Further review and validation must be done before leaving them in the file. 3. Before rerunning nsrclientfix , verify that no save sets exist for these clients with: mminfo -avot -q client= CLIENT_NAME The expectation is that if nsrclientfix reported a single client host, no backups exist. NOTE: If save sets are found and must be kept, remove it from the nsrclientfix.in file. Ensure the NetWorker server has a hosts file entry for the client. Confirm the NetWorker client exists, even if the client host no longer does. See: NetWorker: How to do a file level restore for a deleted/decommissioned client . If save sets are found but are not needed, the host can be left in the file. Proceeding with the next steps will remove anything related to the client from NetWorker. 4. After carefully reviewing the nsrclientfix output, remove the decommissioned-deleted clientid s from the media database: nsrclientfix -u nsrclientfix1.in 5. Run nsrclientfix -a again and confirm that no single host lines are returned: nsrclientfix -a nsrclientfix2.out -p Clients which report multiple names per line occur when save sets exist under each name that appears to match the same host. This requires further review to confirm which client actually exists and what name the save sets should be merged into, see: NetWorker: How to use the nsrclientfix tool 6. When running mminfo you observe lots of " cannot get client ID map for client ID '######-#####-######-####' " messages: mminfo: Cannot get client ID map for client ID '687d2265-00000004-5f75cac3-5f7d87c7-918c5000-277baf56'. 7. These save sets must be removed from NetWorker. Perform the following command to collect the output: Linux: mminfo -avot | grep index Windows: mminfo -avot | findstr index The deletion procedure is outlined in NetWorker: How to delete Multiple or Individual SSIDs
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.