410 likes | 609 Views
Advanced Storage Management -Data Protection-. HOSHINO Takashi Kitsuregawa Lab., IIS, University of Tokyo Nov 21 2003. Outline. Storage management Data protection Approaches Remote mirroring Distributed storage Summary and conclusion. Increasing Data and Cost. IT manager 9.5% increase
E N D
Advanced Storage Management-Data Protection- HOSHINO Takashi Kitsuregawa Lab., IIS, University of Tokyo Nov 21 2003.
Outline • Storage management • Data protection • Approaches • Remote mirroring • Distributed storage • Summary and conclusion
Increasing Data and Cost • IT manager 9.5% increase • Storage capacity over 75% increase • Capacity/manager 60% increase IDC Japan 2002
Jobs in Storage Management • Data protection • Performance tuning • Planning and deployment • Monitoring and record-keeping • Diagnosis and repair From Seneca’s paper
Storage Management Problem • Data volume explosion • Data is very important • It’s all about data, it’s all about information. • Data protection
Data Protection from What? • Faults • Human error • Disaster
Protect Data Methods Negative but necessary • Backup • Tape, optical media • Replication • RAID (1, 1/0, 5, etc.) • Remote mirroring • Distributed storage Reduce downtime in any damage
Outline • Storage management • Data protection • Approaches • Remote mirroring • Distributed storage • Summary and conclusion
Service Remote Mirroring Latest approaches • SnapMirror (NetApps), Seneca (HP) • Volume Replicator (Veritas), PPRC (IBM), SRDF (EMC), PPRC (Hitachi), etc. Service Primary in Tokyo Secondary in Osaka Online Mirroring
Synchronous vs Asynchronous • Synchronous • Asynchronous Much overhead, No data loss I/O Primary Secondary Data per I/O ACK Performance, Data loss? I/O Primary Data per period Secondary Buffer
Methods for Buffering • In-order • Out-of-order Buffering, no reduction data transfer To secondary Primary A D A B C A D Buffering, reduction data transfer To secondary Primary D C A B
Problem on out-of-order Write-coalescing From Seneca’s paper • Reduce data transfer by simple write-coalescing Consistency may be lost Primary down Inconsistency
Advanced Approaches in Asynchronous Remote Mirroring • Propagation order • In-order (large data, consistency) • Out-of-order (small data, inconsistency?) • Advanced approaches • Write-coalescing batch with atomic update [SnapMirror2002] • Overwrite-logwith atomic update [Seneca2003]
SnapMirror:File System Based Asynchronous Mirroring for Disaster Recovery Hugo Patterson, Stephan, Stephan Manley, Mike Federwisch, Dave Hitz, Steve Kleiman, Shane Owara Network Appliance Inc. In Proc. of FAST2002, Jan 2002.
SnapMirror • Using NetApp’s WAFL • No-overwrite file system [Hits94] • Copy-on-write-based snapshot • Only 1 minute interval mirroring • Reduce data transfer by 30%-80% • Extracting difference between snapshots • Sending only differences from primary to secondary
view Write-coalescing batches Primary Secondary C’DBE C’AE CADB CADB C’AE Atomic update Consistent anytime! Primary Secondary C’DBE C’DBE CADB C’AE CADB C’AE SnapMirror’s Technology I/O C’ A E Primary Secondary CADB CADB
C’ added C deleted File system changes A deleted D unchanged B unchanged E added unused Copy-on-write-based Snapshot Base Ref Snapshot C A D B Active File System
Tracing Result Users2 Pubs Source Reduce data transfer! Users1 Users3 Bug
SnapMirror vs Dump Command SnapMirror is efficient • Data scan • Sequential scan of references table vs • Scan of whole directories • Transfer • only added blocks vs • all changed files 2.1GB 4.0GB 15.3GB 25.2GB 63~65GB 135~150GB
Seneca:remote mirroring done write Minwen Ji, Alistair Veitch, John Wilkes HP Laboratories, Palo Alto, CA In Proc. Of USENIX Technical Conference (San Antonio, TX), pages 253-268, June 2003.
Seneca’s Overview • Block-device level mirroring • Asynchronous remote-mirroring • Write-coalescing • In-order delivery
Receive batches A D B C A’ Receive Batches Sequential log A’ Write record A D A B C Preserving a block write order, Write-coalescing Send barrier
Experimental Test Bed • Using workloads by real services • Cello2002: file system for researchers • SAP: customer’s utility bills on DB, batch jobs • RDW: data warehouse system • OpenMail: email server • Simulating data transfer reduction and observing application specific behavior
1 minute Performance Result (1) Mean transmission fraction SAP OpenMail RDW cello Send batch duration (s)
Performance Result (2) cello RDW Largest transmission fraction OpenMail SAP Send batch duration (s)
Application Specific Behavior • Batch size depends on application • Write-coalescing reduces batch size
Summary of Remote Mirroring • Synchronous vs asynchronous • Asynchronous approaches • SnapMirror • Seneca • Key points of asynchronous approach • Consistency on secondary image • Trade off between amount of data loss and performance (cost).
Outline • Storage management • Data protection • Approaches • Remote mirroring • Distributed storage • Summary and conclusion
Distributed Storage • More flexible and feasible architecture • Absolutely reliability (availability) • Scalability • Cost • Today’s approaches • FAB (HP Lab. 2003) • Petal & Frangipani (Compaq SRC 1996-) • Self-* (CMU 2003) • OceanStore • …
FAB: enterprise storage systems on a shoestring Svend Frolund, Arif Merchant, Yasushi Saito, Susan Spence and Alistair Veiteh Storage Systems Department, Hewlett-Packard Laboratories, Palo Alto, CA In Proceedings off 9th Workshop on Hot Topics in Operating Systems (HOTOS IX), May 200
FAB’s Approach • Using many bricks (commodity) • Completely decentralized system • Redundancy by replication • Consistency among replicas by lazy majority voting protocol
FAB’s Overview Any brick is frontend as LUs Many bricks
FAB’s Availability Analysis • Analysis Condition • Using component failure rate [Asami2000] • 256 bricks • 12 SATA disks for each brick (128 data segments) • 3-way replication • Mean unavailability • 3x10-6% (1second/year) • MTTDL • 1.3 million years
Replica Management Protocol • Basic techniques • Two(Three)-phase commit protocol • Majority voting protocol [Thomas1979, Gifford1979, …]
B,t2 B,t2 Majority Voting Protocol (1) • R + W > N for consistency • R: read quorum • W: write quorum • N: replica number R=2, W=2, N=3 Initially X=A, Read(X), Write(X)=B, Read(X) A,t1 A,t1 A,t1
Majority Voting Protocol (2) • Requirement features • global timestamp per operation • exclusive reads and writes • atomic write into replicas • Majority Voting Protocol is inefficient. • Read (and write) request must do more than 1 phase access to replicas.
FAB’s improvement • Two techniques of FAB • Optimistic read (only 1 phase) • Read only timestamps from quorum • disk-I/O-bound system • Timestamp management cost is very smaller than disk-I/O cost.
FAB’s Lazy Majority Voting Protocol • Recovery from a coordinator failure • After (5), repair using majority data (6) • Recovery not when brick restart, but when data is read after that
Summary and Conclusion • Storage Management -Data protection- • Remote Mirroring –SnapMirror, Seneca • Atomic update with write-coalescing batches and in-order overwrite-log • Distributed Storage –FAB • Efficient protocol for high availability • Issues • Higher level data protection • Application-oriented interface