
OPERATIONAL DEFECT DATABASE
...


...

NetWorker savesets marked Expired but not removedSpace recovery messages appear in logs more than once per dayData Domain speed and load impactsGeneral server performance impacts
Volumes eligible for space recovery are reading during Expiry action (staging cloning or recovering)Space recovery runs by default after every staging operation on any given volumeSpace recovery checks each file in a volume directory structure when running space recoveryServer operations and responsiveness may slow down during the space recovery phase
NetWorker's space recovery phase runs once a day as one of the final phases of the Expiration action in the Server backup workflow. It is intended to delete saveset file objects within a volume following the server's assessment, expiry, and deletion of saveset records after calculating those which are safe to remove according to their configuration. There are several factors which may have adverse impacts on the Data Domain or NetWorker server responsiveness. Enable any of the below which appear to suit the requirements of the datazone in question. Before considering testing with the debug keyfiles below: Disable the daily Server Protection > Server backup > Expiration action to disable all recover space and media database calculations for one or more days to confirm the performance issues encountered are related to space recovery and/or expiration activities.If disabling Expiration confirms the issue related to daily maintenance, the following features can be disabled for troubleshooting by creating an empty file of the same name (without an extension) in NetWorker server or node under the main nsr directory's debug subdirectory. None of these flag files require a restart, and will take effect with recover space jobs launched while they are present.Linux Location: /nsr/debugWindows Location: C:\Program Files\EMC NetWorker\nsr\debug (or corresponding nsr installation path) NOTE: Not all tunables here are present below NetWorker version 19.8.0.4. The file names and their functions are detailed below:skip_recover_space_for_stage Storage nodes. This flag causes NetWorker to skip the recover space phase of a staging operation (cloning followed by source deletion). If your environment uses staging, particularly staging from the same source volumes repeatedly, this is recommended since it negates the possibility of spawning multiple recover space operations for the same volumes. When this flag is in place, the recover space operation is deferred entirely, allowing the system to delete the files when the Expiration daily action runs, or the nsrim command is run manually. recover_space_anytime Server only. This allows recover space to expire and remove savesets on volumes which are actively reading, which by default is deferred. This means that for volumes which have long-running clone jobs, expiry, and space recovery can be deferred repeatedly when Expiration action, nsrim, or a staging job (see previous) runs. This in turn can lead to large space recovery backlogs, gradual free space depletion, and a larger space recovery job when it is allowed to run. skip_disk_usage Storage nodes. As part of space recovery and disk volume file system checking, by default, individual files are recursively checked and counted to produce a precise aggregate of data for the volume. While some may consider this precision essential, deferring this step relies on NetWorker's media database records for the file and byte totals, which usually can be expected to be accurate enough for most uses. In a heavily loaded Data Domain, especially one where many recover space operations run repeatedly for volumes, this can be considered a needless expense, and safely disabled. skip_consistency_check_in_recover_space Storage nodes. During space recovery for a volume, the volume filesystem is checked file by file to ensure consistency between the media database; this can also introduce latency. Adding this keyfile to each node will prevent that node from deleting saveset files where a corresponding record does not exist in the media database, or marking media database records where no file is found as 'suspect'. Note that this will prevent the normal cleaning operations, and should be used to help qualify latency related to recover space operations, and should not be disabled longer term. More verbose logging has been introduced by default causing the entire saveset paths to be logged into the data_audit logs on the NetWorker server. Where there is already heavy load, many/large space recovery jobs, this is a factor which can lead to unresponsiveness, in particular from Storage Nodes which return the information remotely to NetWorker. To disable this, raise the logging threshold for these logs on the NetWorker server: # nsradmin # nsradmin> show name; auditlog severity # print type: nsr auditlog Restrict this change to only affect the data audit, if wanted, by refining the query to the specific instance by including its name. Skip this step to reapply the setting to each: # print type: nsr auditlog; name: servername_data_audit.raw Change the threshold to one or both to 'Error' to cease logging the individual deletes - deletions are still logged in the server's daemon.raw. # update auditlog severity: Error
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.