E N D
1. Advanced Veritas NetBackup Performance Tuning Dave Little – Sr. Distinguished System Engineer
3. Most Common Questions Asked According to SE’s, Tech Support Engineers and Consultants, which of the following questions are most commonly asked?
Are you taking us golfing this weekend?
Where is the “Any” Key?
What was up with Britney?
Seriously, what was up with Britney??
I just added 6 new tape drives so why am I still not meeting my backup window?
I just replaced my robot with a VTL so why am I still not meeting my backup window?
4. Backup System Infrastructure As we talk about “Tuning” today we are talking about matching the components in the environment properly AND proper configuration of those components, in addition to actual “tuning”
Some of the most commonly asked questions posed to NetBackup technical support relate to performance
Most performance issues can be traced to hardware / environmental issues
Basic understanding of the entire backup data path is important in determining maximum obtainable performance
Poor performance is usually the result of unrealistic expectations and/or poor planning
The Bottom Line – It’s all about Bandwidth
5. Why Tuning Is So Critical All hardware has different throughputs
“Matching” the throughput will help to avoid bottlenecks
The ultimate goal is to make the final storage point the bottleneck (i.e. the tape drive)
Tuning provides higher ROI
This reduces the need to buy more hardware
Improves scaling
Helps provide understanding of how everything works together
As you tune, you become more familiar with your environment
A data protection solution has a lot of “moving parts”
Each must be balanced from end to end
Reduces management
If things are running smoothly, less time is needed to baby sit backups
6. Hardware Capacity Planning – Overview Never exceed 70% of the rated capacity of any component
Manufacturer throughput and performance specifications based on theoretical environment – seldom (if ever) achieved in real world
Trust us on this one
The 70% rule applies to nearly every component in the environment
Disk
CPU
Internal Bus
Response time significantly increases after 70% utilization threshold is exceeded
Tape drives are an exception to this rule
7. Throughput vs. Response Time
8. The NetBackup Servers So … what servers should I purchase?
If we had a dollar for every time we had been asked this, we wouldn’t be here presenting to you
This question is like the “How long is a piece of string” question
It all depends on your requirements and what type of shop you are in (i.e. Windows, Unix, both) and your goals/needs
Some guidelines:
Master Servers – High CPU requirement, lower I/O
As a general guideline, any Master Server should have multiple CPU’s and lots of RAM
Media Servers – Lower CPU requirement, High I/O
As a general guideline, any Media Server should have PCIExpress bus
Both Servers – Expandability can reduce hardware expenditure later
This means don’t buy something that will be maxed out tomorrow
9. The NetBackup Servers – Media Servers What kind of I/O do I need in my Media Servers?
PCIexpress is your Friend
Newer servers with PCIexpress make I/O bottlenecks less of an issue than it was even 18 months ago (most servers now have PCIe)
A Media Server with PCIexpress means the I/O bus is no longer the bottleneck like it was in PCI days (two years ago)
Server Example – SUN T5220 ($20 – 25k)
4GB/sec Bus throughput when reading and writing at the same time
10GbE optional
With dual port PCIe 4Gb/sec FC-HBA able to move 400MB/sec if properly configured and tuned
This equates to over 17TB in a 12 hour backup window for the server itself
Obviously the rest of the environment would need to be matched for this type of speed
Brings us back to why tuning/matching is important
10. The NetBackup Servers – Media Servers How many CPU’s do I need in my Media Servers?
I/O Bus is more important than CPU in Media Servers
Experiments on SUN systems have shown that a useful, conservative estimate is 5MHz of CPU capacity per 1MB/sec of data movement in AND out of the Media Server
Example
A LAN Media Server backing up 20 Clients at 5MB/sec each to a tape drive would need 1000MHz of available CPU power
500MHz to receive the data across the LAN
500MHz to send the data to the tape device
Depending on the Media Server, other applications and the OS may use CPU cycles
You can see why, with modern servers, the CPU is not as important as the I/O
This is one reason we don’t like to see Master/Media Server Combos
11. The NetBackup Servers – Media Servers How much RAM do I need in my Media Servers?
More is always better
Server prices are coming down, therefore it doesn’t make sense to not have a robust system
Other apps on the Media Server and OS use memory
NetBackup uses shared memory for local backups
Buffer tuning on the Media Server can increase throughput dramatically
Buffer tuning (covered later) is a requirement. NBU is “out of the box” geared toward low end systems rather than newer modern hardware
Buffers use shared memory – a finite resource
To determine how much memory is being used use this formula
(buffer_size * number_buffers) * number of drives * MPX
Defaults
Size = 65536 (64 * 1024)
Number = 30
12. The NetBackup Servers - Master How much CPU and RAM do I need on my Master?
CPU is more important than I/O on the Master
More horsepower is always better than less
If you can’t fully configure it now ($$) room for growth is always good
Provided the technology will still be available when it is time to upgrade
There is not a single “correct” Master Server or configuration
Back to the “how long is a piece of string” question
Master is typically based on the number of Clients, amount of data being backed up, number of drives and number of Media Servers as well as the number of jobs per day
The JAVA GUI takes memory so consider this when sizing RAM
This is all important, but modern hardware has come a long way and most of it can do the job quite well
Bottom line – Look at Tuning Guide, do the math and decide what is right for you
13. The NetBackup Servers – Master How much disk space do I need on my Master?
Assume 120 bytes per backed up file and use 1.5 multiplier for growth and error
Example
100 systems with 100,000 files on each backed up FULL daily
100 systems * 100,000 files * 30 backups = 300,000,000 files total
300,000,000 * 120bytes = 36,000,000,000 bytes = 36GB to store this much in the catalog
Use this with Retention to determine long term catalog needs with the 1.5 multiplier
Use the 2% rule
Total data tracked * 2%
If you are backing up 3TB of data based on fulls, incrementals, and retention then catalog space needed is 60GB
Using a disk manager so you can grow on the fly is very important
14. The NetBackup Servers – Master Master Example
We always recommend a dedicated Master
Most modern systems will work very well
V490 (SUN)
T5220 (SUN)
DL580 (Compaq)
Similar system from your preferred Vendor
4 x CPU
16GB RAM
200GB Disk to start
Catalog
EMM database
No real difference (except management) between Windows and Unix in Master Performance
15. Disk vs. Tape Discussion
16. Disk & Tape Performance Comparison
17. Disk & Tape Performance Comparison
18. Some Disk & Tape Comparisons Disk
Disk price is at historical lows
Disk offers performance advantages tape cannot provide
Tape
Tape offers TCO advantages
Tape can easily and cheaply be sent offsite
Overall
Disk staging can take advantage of specific strengths of disk and tape while avoiding weakness of each
Think you know Disk pretty well? Lets take a small quiz…
19. Other Disk / Array Considerations Quiz: Disk drives made in which year are the fastest?
20. Other Disk / Array Considerations
1993: Number of IOPS per GB in the 1000's
IOPS per disk = 50
I/O spread across lots of disks by necessity
2007: Number of IOPS per GB in the 10's
IOPS per disk = 250
Density of data per disk MUCH higher => fewer spindles!
21. Other Disk / Array Considerations An important number for performance is IOPS / GB
With larger disk capacities, I/O tends to be spread across fewer spindles
Place data over as many spindles as possible
High capacity disks make this less obvious
Test your disk subsystem: Iometer (www.iometer.org)
22. Other Disk / Array Considerations
23. Enhanced Disk Capabilities in NBU 6.5
24. Disk vs. Tape Discussion
25. Creating a New Solution? Updating Your Existing Solution? … Recommended Configuration
26. NetBackup Configuration Tuning Main NetBackup tuning settings
NetBackup buffers
Tape Multiplexing
Client Multi-Streaming
Use exclude lists – not include lists
Guaranteed you will pick up added drives
Do you really need 1000’s of copies of the “WINDOWS” folder?
Have you looked at Deduplication?
Perform fewer full backups
Most RTO are not strict enough to warrant frequent Full backups
Change your backup paradigm
Synthetic backups can help here
27. NetBackup Configuration Tuning Main NetBackup tuning settings
NetBackup buffers – Very critical
SIZE_DATA_BUFFERS, NUMBER_DATA_BUFFERS, NET_BUFFER_SZ
Think of buffers as “buckets”. If you are trying to drain a pool of water, the number of buckets is important, as is the size
A higher number of larger buckets will move more water
Take care when tuning as too big/many can decrease performance
Requires testing to determine correct setting for your environment
Check out the tuning guide listed at the end for detailed tuning steps – Document Number 281842
Tape Multiplexing (MPX)
Modern tapes can write a great deal more than Clients can send
MPX = Multiple streams of data interleaved onto the same tape
Restore performance issues? Used to be a big issue, not so much any more
Client Multi-Streaming
Clients can send more than one stream if on GbE however this can easily overload the Client CPU so be careful
28. NetBackup Configuration Consider advanced technologies
Many people not aware of all capabilities in base NetBackup
Snapshot Client has many advanced backup technologies
Offload the backup from the Client to a Media Server
Deduplication
Stop backing up the same file over and over
Flexible Disk
Create pools of disk to back up to
SAN Client
Stop using your LAN for high volume data transfer
Quite a few others to choose from
29. NetBackup Capacity Planning Some prep work is required to properly size the solution
How much data will be backed up?
What is the amount of daily change?
What types of data will be backed up – text, graphics, databases etc
How many files will be backed up?
What are your SLA’s?
Do you plan to use Tape, Disk, Snaps, Dedupe, all of the above?
What are your recovery requirements?
Can you use chargeback to help pay for a higher end solution?
There is more to a properly planned strategy than simply buying a couple of servers and some tape drives or disk
30. Scaling – Can You Continue to Grow?
31. Special Backup Problems Millions of files – FlashBackup
Recommended with >200,000 files
6.5 now supports all Unix and Linux
Very Large Database’s
Use built-in API’s effectively – stagger incremental backups
Snapshot client has many solutions here
VMware
NetBackup for VMware in 6.5
Enhances VCB based backup technologies
Don’t see your problem listed here?
Check out with you Symantec account team!
32. Tuning Is Critical - Summary Most hardware, out of the box, is not set up to perform optimally
Adding new hardware without thinking of the other components that will be affected can reduce overall performance
By matching hardware performance from end to end, higher ROI can be achieved
NetBackup Media Servers require tuning for optimal throughput
Increased throughput means
Shorter backup windows
Reduced infrastructure needs
Reduced management requirement
Increased scaling
Increased ROI
Happier CEO’s (which makes your lives much better!)
33. Analyzing Drive Utilization & Performance
34. Analyzing utilization – The devil is in the details Drive Composite Average across 24 hours. Sorted by utilization to easily identify underutilized drives
Configurable % utilization ranges (you determine the shades of blue)
“All Drives Average” - The ultimate utilization number across entire drive inventory
Easily visualize utilization across backup windows by composite average for each hour of day
Rich set of filters enabling analysis by Policy, Policy Type, Level and Job Type
35. Analyze across Library, Media Server, Drive Type Aggregate at Library level for cross-library analysis
Aggregate at Media Server to understand which ones are over/underutilized
Aggregate at Drive Type
36. Perspective across time frames Time frame of analysis is day of week. Is utilization by day consistent?
Aggregation is logical drive (Library/Media Server/Drive) . Visualize drives that are shared across media servers
37. Analyzing Performance Configurable throughput (Kbytes/Sec) ranges
Hair-splitting aggregation and averaging
- Aggregation begins at Image/Copy/Fragment level.
- Weighted averaging.
- Throughput reflects time delta between begin write and end write time for each fragment
Observe drives with fluctuating performance. What is behind this?
38. Where is the bottleneck? Media Server? Client? Analyze throughput by drive type – Is performance consistent with manufacturers specs?
Aggregate at Media Server – Is the bottleneck here? Is the Media Server overwhelmed and impacting performance?
39. Can we meet the Recovery Time Objective (RTO)? Analyze throughput for restores only – Which drives are being used? What is the throughput? Can it meet the RTO?
See hours of day in which restores taking place? Compare same drives performance for backup jobs.
Supporting tabular report providing job level details – See which Master/Media servers and client? Size of restore?
40. Developing Your DP Solution Some great publications out there to help you out
“Backup & Recovery”
By W. Curtis Preston
Available on amazon.com
“Implementing Backup and Recovery: The Readiness Guide for the Enterprise”
By David B. Little and David A. Chapa
Available on amazon.com
“Veritas NetBackup™ Backup Planning and Performance Tuning Guide”
Available for download at support.veritas.com
http://seer.entsupport.symantec.com/docs/281842.htm