270 likes | 500 Views
EMC Disk Library. Two new EDL engines : DL5100 & DL5200 Based on Clariion CX4 array Up to 10.2 TB per hour New software revision : 4.0 Similar to 3.3 only new feature is CX4 support. DataDomain . 3 new appliances : DD670: single quad-core / 3 optional PCI Card
E N D
EMC Disk Library • Two new EDL engines: DL5100 & DL5200 • Based on Clariion CX4 array • Up to 10.2 TB per hour • New software revision : 4.0 • Similar to 3.3 only new feature is CX4 support
DataDomain • 3 new appliances : • DD670: single quad-core / 3 optional PCI Card • DD860: DD Archiver enables data movement between tiered storage based on time • DD890: TBD • DDOS 5.0 scheduled for Q1 2011 • Support up to 96GB of RAM instead of 64GB on DD880/DD890 • DD Archiver for externalization • Support for I-series AS400 thru BRMS • LACP & IP-aliasing support • 50% of the CLI commands changed compare to 4.x • IPv6 is not supported
DataDomain: Replication • CIFS / NFS replication of files will start after 10min of inactivity • VTL replication needs the virtual tape to be unmounted for the replication to starts • Replication can be encrypted if needed • Files due to be replicated will not be affected by GC (cleaning) • If replication is slow, box can fill up
DataDomain: Best Practices • DDBoost device paralelism: • “Target Session” default is 1 • Optimal 4, maximum 10 for performance reasons • Can configure multiple devices per SN. • DDBoost 2.2.2.0 library is used by NetWorker 7.6 SP1 • AIX and HP-UX are currently not supported by DD Boost,planned for NW 7.6 SP2 • DD VTL device, “target Sessions” and “maximum sessions” needs to be set to 1 to avoid multiplexing which causes poor de-duplication ratios • Hashing is optimized for Intel architectures (performances will be better compared to SPARC)
DataDomain: Best Practices cont. Amount of sessions / Memory available • Optimized cloning is counted as replication stream
Storage Node Implementation Data Domain save save File systems Virtual MMD (Storage node Media Daemon) daemon) libddboost wrapper Proprietary File systems Via NDMP dasv DSA Applications Snapshots, CDP, and CRR
Configuration • Basic Constructs • Savestreams • Movement of data • Generates a saveset on the target device • Can represent a System disk, file system, directory (system disk shown), • Data Domain Device type • Logical construct • Each Data Domain Device type uses a unique instance of Boost • Storage unit • Logical construct that Boost uses for target • Max qty of storage units is model dependent /C: /D: /C: /D: /C: /D: /C: /D: Savestreams Data Domain Device Type Boost Boost Storage Unit Saveset Savesets
Configuration • Design considerations • Multiple Data Domain Devices per storage node • Cannot be shared between multiple storage nodes • Each generates a new Boost footprint • Practical limits • # of devices per storage node • available memory • Multiple Logical Storage units per Data Domain system • Each creates a new folder /C: /D: /C: /D: /C: /D: /C: /D: Data Domain Device Type Boost Boost Boost Storage Unit
Configuration • Design considerations • Best practices for savestream multiplexing • Boost is optimized for handling a single stream • Open – read/write – close, move on to next file • Target sessions • ‘optimal’ setting for multiplexed savestreams • default = 4, NetWorker will exceed this if workload demands more resources • No benefit in reducing to <4 • Max sessions • ‘Hard limit’ setting • default is 10, NetWorker will not exceed this • Allocates memory for 10 sessions • Reduce Max sessions to less than 10 to reduce memory allocation /C: /D: /C: /D: /C: /D: /C: /D: 8 savestreams (sessions) shown Boost Storage Node w/ Boost
Configuration • Design considerations • Best practices for pools • Add devices if savestreams exceed max sessions and if available system resources • Maximize available DD system bandwidth and de-duplication efficiency • Optionally add devices & reduce max sessions • 2 devices with MAX SESSIONS = 4 is better than 1 device with MAX SESSIONS = 8 • Build pools across multiple systems as a last option • If sessions/bandwidth to the first Data Domain system is maximized • Lose some global de-duplication efficiency • Do not mix Data Domain Device types and any other device type in the same pool • Impacts Clone Controlled Replication operation Boost Boost Storage Node w/ Boost
Configuration • Design considerations • Best practices for design • Configure a Data Domain Device type to a single storage node • Cannot share a Data Domain Device type between storage nodes • Map multiple storage nodes to a single Data Domain system to maximize system bandwidth • Multiple storage nodes per Data Domain system helps drive available DD bandwidth to saturation Boost Boost Storage Node A w/ Boost Storage Node B w/ Boost
Configuration • Design considerations • Best practices for design • Configure a Data Domain Storage Unit to a single NetWorker Data Zone • No sharing of storage units across NetWorker Data Zones Boost Boost Data Zone A w/ Boost Data Zone B w/ Boost
Configuration • Design considerations • Best practices for design • Do not exceed maximum sessions specification for the Data Domain model • E.g. DD880 = 180 sessions max • 10 Data Domain Devices, each with max sessions @ 10 = 100 potential sessions 5 storage nodes w/ Max sessions = 50 sessions Boost Boost Boost Boost Boost
Clone Controlled Replication • Immediate Cloning • Clones begin as soon as the savegroup backup has finished • Pro: Reduces the gap in time between a secure backup and the completed copy • Con: Other savegroups may still be running, creating resource contention • Scheduled cloning • Two approaches with NW 7.6 SP1: NMC and Scripts/scheduler • Objective: Postpone clone process to reduce resource contention with backups • Pro: allows backups to complete as quickly as possible • Con: increases the gap in time between the secure backup and the completed copy
Clone Controlled Replication • Comparison of replication types • Directory Replication • Used by existing Data Domain users without backup application control • Replication begins even as the backup is in process • Pro: Reduces the gap in time between a secure backup and the completed copy • Con: The replica is not kept in the backup apps catalog • File replication • Used by customers deploying the NW/DD Boost integration • Replication process is initiated by NetWorker after the backup is completed • Pro: The replica is cataloged by NetWorker • Con: Increases the gap in time between the secure backup and the completed copy T0 T1 T2 T3 T4 T5 T6 Backup Replication T0 T1 T2 T3 T4 T5 T6 Replication Backup
Clone Controlled Replication • Best Practices • Reduce the gap in time between a secure backup and the completed copy • Use Immediate cloning • Increase granularity of the backup and increase concurrency • Reduce Savegroup size • Use saveset cloning • Reduce saveset size T0 T1 T2 T3 T4 T5 T6 T0 T1 T2 T3 T4 T5 T6 Savegroup 1 Savegroup 2 Replication 1 Replication 2 Savegroup 1 Replication 1
Clone Controlled Replication NMC NW Server ManagesNW Saveset Use Clone ID/ or Clone pool Use remote storage node Remote Site Data Domain Replication Storage Node Clone 1 Clone 2
Clone Controlled Replication • Remote Clone to tape • Remember to clone from the clone ID/clone pool and not the original saveset (backup) • Remember to use a storage node attached to the tape device(s) at the remote site • Each clone is independently scheduled • First clone is based upon backup • Use immediate or scheduled • Remote clone is created from the first clone • Must be scheduled • Clone of a clone is a separate policy • Data Domain system stream bandwidth is shared • Backups, recoveries, replications • E.g. DD880 – 180 connections max • # backup savestreams + # recoveries + # replications cannot exceed 180
Resource Planning • EMC NetWorker with EMC Data Domain Boost Best Practices Planning (in draft) • Storage Node Memory -Boost • Default and minimum memory allocation is 64MB, supporting 4 sessions • Changing target sessions to <4 still allocates 64MB • Each additional session allocates 16MB • Back of napkin calculation for Boost • m = n * (64*s) • m= memory in MB • n= number of Data Domain Devices • s= sessions • Storage Node Memory – Data Domain Device type • Allocates between 200MB and 250MB of memory per device • Includes memory used by the RO device • Design recommendation (not minimums) • 8 Data Domain Devices per storage node, no more than 16 max. • 4 streams per device, no more than 10 max • 8GB RAM
Resource Planning • EMC® NetWorker® Data Domain® Deduplication Devices Integration Guide • Page 16: Memory and network considerations • Each read/write device (active nsrmmdprocess) that takes four save streams requires about 96 MB of RAM on the storage node. • Each read-only device requires about 20 MB, regardless of the number of save streams. • a fully loaded Data Domain system that is running four save streams per device would require about • (96MB x 16 devices) + (20MB x 16 devices) = 2.3 GB, of physical memory on the storage node. • The recommended minimum memory requirement for a storage node is 4 GB of RAM. • preliminary, subject to update in the next revision of document • 4GB minimum also reflects purchasing options (increments) Subject to update For reference only
Resource Planning • Storage Node Processor • Distributed Segment Processing increases processor utilization on the first backup • Subsequent backups will benefit from • Reduced CPU utilization • Reduced LAN traffic • Server • Encryption (encryptasm) not supported to the Data Domain Device type • Compression (compressasm) not suported with the Data Domain Device type • CheckPoint Restart not supported with the Data Domain Device type • Data Domain retention locks are not supported with this release of NetWorker
Customer Benefits Higher aggregate backup throughput Backup windows shrink considerably Enables faster DR readiness Lower CPU usage on the Data Domain system CPU can be used for other tasks, such as replication, cleaning Reduced CPU usage on the media server 20-40% lower overhead on the media server No need to upgrade the media server hardware Leverage existing 1GbE backup infrastructure Achieve 10GbE throughput with 1GbE networks Avoid the need to upgrade media server and network hardware Failed backups go much faster on retries Data that is already sent to the Data Domain system need not be sent again Enables faster backups for retried backups Distributed Segment Processing
Good fit situations GDA (mandatory) Network (1 GbE) constrained connectivity to Data Domain system High stream counts (>8 streams) DD model dependent; lower –end models see benefits at fewer stream counts Distributed Segment Processing – Good Fit
Other Considerations • Boost license applied to the DD system • Distributed Segment Processing ‘on’ • Replicator license applied to the DD system • NetWorker enabler for Data Domain Device type applied • Enables backups to be directed to Boost • Enables Clone Controlled Replication
Installation & Setup • Configure DataDomain system • Install DD BOOST license • Install REPLICATION license (optional) • Enable DD BOOST protocol • Create DD BOOST user/password • Create DD BOOST Storage Units (at least one for each StorageNode) • Enable DD BOOST distributed segment processing • Define DD BOOST Interface Group (optional) • Enable/Disable DD BOOST low bandwidth optimization (optional) • Configure SNMP for NMC monitoring (optional) • Configure Networker system • Install Networker 7.6.1 • Install DD Device Type Enabler (based on the DD raw disk capacity) • DD system and Networker Storage Node must be IP connected • Create a new DataDomain system • Add credentials for the DD system (user/password) • Create a new Networker device (DataDomain device type) • Configure media pool • Configure SNMP monitoring options (optional) • Configure backup group, cloning policy, etc.