...
This article focuses on: Avamar clients which back up file systems or databases, to an Avamar server or Data Domain back-end.L1 backups where the initial backup has completed and a full backup is present on the Avamar server. Why optimize client backup performance? To ensure that individual backups can reliably complete within the backup window.To minimize unnecessary load on an Avamar client's hardware resources.To make efficient use of backup sessions and reducing backup queueing.When backups overlap with maintenance activities, ALL activities run slower.Provide a period of quiet time for the hash-referenced bit maps to reset ( GC cannot clean up certain hashes due to the Avamar v7.x “Hash Referenced Bit Maps” not resetting and skipped-hashes). Typical symptoms of slow backup performance: Backup fails to complete within the scheduled window. The activity monitor reports "Client time out - end"Backup does not get a chance to start before the scheduled window ends. The activity monitor reports "Client time out - start"Garbage collection regularly fails with MSG_ERR_BACKUPSINPROGRESS or MSG_ERR_TRYAGAINLATER Understanding what happens during an Avamar backup from a performance perspectiveA detailed explanation of what happens in the background to influence Avamar client backup performance and behavior can be found in: Avamar client backup performance - behavior and theory
See Resolution for a list of causes.
Gather information:Gather detailed information about the issue: Information required by Support when investigating Avamar client backup performance issues. Determine which part of the backup chain has the most severe bottleneck:The following schematic shows the main components in a backup system. Bottlenecks ALWAYS exist, but we should work to understand where they are.If we can do this, and mitigate the bottleneck, performance should improve. Once a bottleneck is mitigated, another bottleneck may become apparent. Our end goal is to reach a situation where backup duration is acceptable. Avamar server-side bottlenecks:If ALL backups to an Avamar server are slow, consider the possibility of a server-side issue. If ALL backups to an Avamar server are slow during certain times of day, consider server-side contention or a network bottleneck.If there is a performance issue with one or a few backup clients, focus on each client by itself.Server Health:A healthy Avamar server is unlikely to be a bottle-neck for backups. Check the health of the backup server. Avamar: How to Run the proactive_check.pl health check script on an Avamar ServerIf backups are being sent to Data Domain, check DD Autosupport information or engage Data Domain support to verify it is healthy Avamar restricts client connections to preserve acceptable levels of performance.See Avamar: How many simultaneous client sessions can be made to the Avamar server? (versions 6.1 and later) Server Contention:If there are times of day when backup performance is poor, this might indicate contention. The sched.sh script can give a visual representation of activities which were running in parallel with the slow backup.See Avamar: How to use the sched.sh script to check historical backup, replication and maintenance activity on an Avamar Server.Check for in-progress maintenance tasks by running status.dpnCheck how many client sessions are active admin@utilitynode:~/>: avmaint session | grep path | wc -l Arrange maintenance and backup schedules so they do not overlap.Review the output of the status.dpn and top commands to check the load on the data nodesRun mapall 'iostat -x' on the data nodes. Check %iowait and %idle and %util to see if I/O bandwidth of any disk is saturated.To isolate a particular client's performance, test the backup when the Avamar server is not performing maintenance tasks or other backups or replication. Data Domain backup ingestion performance:Log in to the Dell Support portal and review: Data Domain: DDPCONNCHK How to troubleshoot Data Domain DDBoost connectivity and performance Network side bottlenecks:The network may be a bottleneck if a client is backed up over a WAN.Network latency:This affects the rate at which clients can check if hashes are present on the Avamar server. Run ping from the client to the Avamar server and check the network's packet loss and latency Network bandwidth:During a backup, new data must be sent over the network to the Avamar server. See the log for a completed backup and learn the amount being sent. 2014-11-20 04:45:30 avtar Info : Backup #1180 timestamp 2014-11-20 04:45:28, 23 files, 5 folders, 291.7 GB (23 files, 4.316 GB, 1.48% new) If client and server are separated by a WAN, can the link can transmit the necessary data within the backup window?In this case the data that needs to be transmitted is 4.316 GB.These values are all interrelated: Amount of new backup dataTime available for backupEffective network bandwidth Greater amounts of new data require more network bandwidth or a longer backup time.These factors have practical limits but can be controlled to some degree by the user.Consider if any of them can be manipulated to accommodate a timely backup.If a network bottleneck or server communication problem is suspected:Confirm network throughput between the client and the backup device. Avamar - How to run iperf between two Avamar systems to test network throughput performance Enable avtar comstats logging to facilitate troubleshooting. Avamar: How to enable and interpret avtar COMSTATS logging to diagnose communication issues Client-side bottlenecks: View the avtar backup log in a sophisticated text editor such as Notepad++.Ensure this is not the client's initial backup to the server:First-time backups are expected to be slow. Avamar Backup failed with "Time Out - End" because it is still performing an initial backup If this is a mature client, check if the backup configuration has recently changed. Ensure that the backup was not prematurely canceled:Search the backup log for 'canceled'. Below is an example where an impatient user canceled a L1 backup. 2013-11-05 12:15:29 avtar Info : PARTIAL Backup #14 timestamp 2011-11-05 12:13:36, 2,030 files, 562 folders, 397.3 MB (691 files, 17.44 MB, 4.39% new) 2013-11-05 12:15:29 avtar Info : Label "MOD-xxxxxxxxxx", scheduled to expire 11/12/11, none backup 2013-11-05 12:15:29 avtar Info : Backed-up 397.3 MB in 1.36 minutes: 17 GB/hour (89,593 files/hour) 2013-11-05 12:15:29 avtar Info : Finished at 2011-11-05 12:15:29 GMT Standard Time, Elapsed time: 0000h:01m:21s 2013-11-05 12:15:29 avtar Info : Sending wrapup message to parent 2013-11-05 12:15:29 avtar Info : Command failed (exit code 10013: Externally canceled) In cases such as this, where a backup terminates gracefully, the data is retained as a 'PARTIAL' backup. Avamar: Partial backup functionality and best practices for initial backups Although partial backup logs indicate backup performance, proper analysis requires the log from a completed backup.Check the log for file cache or hash cache sizing issues: See Avamar - How to check the Avamar client log for file cache or hash cache sizing issues Check if throttling flags are passed to avtar:Avtar CPU or network throttling greatly reduces backup performance. See Avamar : How to throttle an Avamar client's consumption of system resources (CPU, network, I/O & memory). This can be detected in the backup log. 2013-09-06 14:22:13 avtar Info : Network bandwidth throttling is enabled, limiting to approx. 0.512 Mbps (62.50 KB/sec) 2013-09-06 14:22:13 avtar Info : CPU throttling is enabled, limiting CPU usage to approx. 70% Is there an Avamar client CPU or memory bottleneck?An Avamar backup runs as fast as hardware allows and competes with other services for resources. Be mindful of the client's "day job" and when it is busy. Monitor the client using Task Manager or Process Explorer (on Windows) or the 'top' command (UNIX or Linux). These can reveal if CPU saturation occurs during the backup. Dell has an internal "LogAnalyzer" tool which charts resource consumption and performance over time. Work with Support to use this. Cache files are loaded into memory during the backup. Check the client's memory usage to watch for page faults or clues that the client is short of RAM. This is less of an issue where Avamar v7.x clients to Data Domain leverage the 'paging cache' (f_cache2.dat).The paging cache reduces memory footprint on a client compared with the traditional 'monolithic' avtar cache.Check for a client-side I/O bottleneck:After client cache sizing, the next factor determining backup performance is the storage system which hosts the backup data and feeds it to avtar.Ensure that the target storage is healthy:Ensure that there are no problems with the target storage device preventing optimum performance. Ensure that third-party software is not competing with avtar for I/O:Are any applications on the client competing with the Avamar client for storage I/O?Anti-virus software real-time or on-access scanning drastically impact Avamar client performance. Avamar client backup performance is greatly impacted by antivirus software real-time scanning of files Can the file scan be configured to run in parallel? Sometimes, backup data is hosted across multiple volumes serviced by separate read heads. In these scenarios, it may be possible to configure volume parallelism so that Avamar scans multiple volumes simultaneously. Using volume parallelism to improve Avamar backup performance Ensure that the client is not backing up data using CIFS or NFS:Backup of CIFS or NFS data is only supported through an NDMP accelerator. Avamar and supportability for CIFS (SMB) & NFS mapped network shares. Check if storage compression or encryption is in use:Backup performance may be lower than expected if the target data resides on target storage where data is compressed or encrypted at a file system level. Analyzing Windows client resource bottlenecks with Perfmon:The following article helps create performance graphs to understand if the client is waiting on any particular resource at a certain moment in time. Consider using with graphs produced by the LogAnalyzer tool. Avamar - Using Microsoft Windows perfmon for performance monitoring of Avamar clients Backup of Outlook archive .pst filesA backup with many, or large .pst files may perform slowly. Performance considerations when using Avamar to back up Outlook archive .pst files Benchmarking storage performance Check the performance of the storage device where the target data is hosted. Disk performance testing for Avamar Linux clients Poor backup performance due to the data being backed up:The most common cause of slow backups is due to the characteristics of the data being backed up.Check if there is a lot of new or changed data: Avamar - A backup is slow or fails with "Time out - end" due to a lot of new or changed data A few large new or modified files may cause an otherwise fast backup to overrun the backup window. To identify those files see: Avamar: How to use the client logs to identify which files are new or changed since the previous backup How to identify which files took a long time to be processed during an Avamar backup Windows clients Avamar backup of dataset containing many symbolic links is running very slowly Avamar client performance and Windows NTFS compression Linux and UNIX Clients - Check if the client's dataset contains any large, sparse files. Avamar and sparse files The backup size of an Avamar Linux client may be misleading due to '/var/log/lastlog' and Avamar sparse file handling behavior Check the backup summary lines to understand the backup scope and identify outlier values:Search the backup log for the strings "Backup #" or "Backed-up". 2017-06-07 20:21:38 avtar Info : Backup #441 timestamp 2017-06-07 20:21:38, 2,653,523 files, 255,181 folders, 1,566 GB (10,777 files, 668.4 MB, 0.04% new) 2017-06-07 20:21:38 avtar Info : Backed-up 1,566 GB in 1281.60 minutes: 73 GB/hour (124,228 files/hour) These can save you a lot of time when investigating backup performance.For the output above, consider: Whether this is an initial or level 1 backup. (Unlikely, since the backup label is #441)Whether the number of files in the backup is reasonable. (2.6 million files are reasonable)The file to folder ratio? (It is 10:1, this is typical)The total amount of data in the dataset. (~1.5 TB)The number of files to be processed and the proportion of the total number of files. (~11 K out of 2.5M files is reasonable)The total size of all files to be processed. (this can only be an estimate)The amount of changed data to be sent to the Avamar server. (668 MB)Whether the change rate is reasonable. Higher change rates can be tolerated for smaller datasets (0.04% is reasonable)Whether the performance per hour, given the overall size and scope of the backup, is reasonable. (124 K files/hour would be considered slow performance given the other figures) Frequently, these details provide us with enough data to understand the cause of poor backup performance.If necessary, review the status line messages that are generated while the backup runs.Determine if any of the values in these two log lines are outliers. In other words, are they larger or smaller than is typical?If you are familiar with the backup behaviour it is easier to detect anomalies.File to folder ratioMost customer datasets have a file to folder of approximately 10:1, and avtar is tuned to reflect this.If a dataset has a low file to folder ratio as in the example below, the backup may not run as efficiently without minor tuning. 2015-11-18 00:34:32 avtar Info : Backup #75 timestamp 2015-11-18 00:24:43, 4,007,032 files, 1,974,043 folders, 1,589 GB (2,680 files, 419.4 MB, 0.03% new) See Avamar client backup performance tuning for datasets with low ratio of files to folders. Performance analysis using avtar log Status information messages:Using Notepad++ or similar, filter the log for avtar Info lines which contain Status messages. These may be filtered using the code entries containing or depending on the version of the Avamar client. These lines are periodic status messages reported by avtar. Avamar - How to interpret avtar backup log status lines. Check for third-party applications unexpectedly updating file metadata:Some applications may change file metadata. If this happens, Avamar will back up the entire file. Avamar backup runs slow due to 3rd party software modifying metadata for otherwise unchanged files Review the use of include and exclude flags. Avoid 'include' statements:The Operational Best Practices guide discusses Include and Exclude lists. Avamar must compare every file in the backup dataset with both lists to determine whether to back up the file. This comparison process adds overhead, and can increase backup runtime. Check the client's avs\var directory for the presence of an avtar.cmd file.Check if that file contains any active --exclude or --exclude-from-file statements.If a directory or file system is excluded, but include flags are used, avtar scans it for items which it has been told to 'include'. Check if the dataset contains reparse points or stub files:Be wary if a dataset contains stub files or pointers to data stored on another device.Backup performance suffers if avtar has to wait for the remote file to be recalled.Examples of such software are: Enterprise Vault Archiver, Moonwalk, and DiskXtender. Windows backup performance is slow due to reparse point handling (v7.1, v7.2). Backups of virtual clients with an Avamar guest installation Avamar guest backup of Virtual Machine runs slowly and times out due to a hardware resource bottleneck Avamar VM client guest backup experiencing slow performance due to VMware vShield Endpoint Trend Micro Deep Security Known backup performance-related issues from v7.2 due to file scanning behavior change Avamar: Poor backup performance in winclusterfs plugin backup after upgrading to 7.2 from 7.0