1 / 71

Global Copy, Metro Mirror, Global Mirror, and Metro/Global Mirror Overview

Global Copy, Metro Mirror, Global Mirror, and Metro/Global Mirror Overview. Charlie Burger Storage Systems Advanced Technical Support. Table of Contents. Background Consistency and dependent writes Peer-to-Peer Remote Copy - PPRC PPRC Considerations Establish Path Considerations

karen-lucas
Download Presentation

Global Copy, Metro Mirror, Global Mirror, and Metro/Global Mirror Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Global Copy, Metro Mirror, Global Mirror, and Metro/Global Mirror Overview Charlie Burger Storage Systems Advanced Technical Support

  2. Table of Contents • Background • Consistency and dependent writes • Peer-to-Peer Remote Copy - PPRC • PPRC Considerations • Establish Path Considerations • FCP vs ESCON Links • Metro Mirror • Global Copy • Global Mirror • Metro/Global Mirror • Failover • Failback • Addendum

  3. Background • Dependent writes • The start of one write operation is dependent upon the completion of a previous write to a disk in either the same subsystem frame or a different subsystem frame • Basis for providing consistent data for copy operations • Consistency • Preserves the order of dependent writes • For databases, consistent data provides the capability to perform a data base restart rather than a data base recovery • Restart can be measured in minutes while recovery could be hours or even days • Asynchronous processing • The separation of data transmission from the signaling of I/O complete • Distance between primary and secondary has little impact upon the response time of the primary volume • Helps minimize impact to application performance

  4. Database RESTART or RECOVER? • Operating System • Applications Other UNIX/NT Other UNIX/NT • Operations Staff • Operations Staff AS/400's AS/400's • Network Staff S/390 RS/6000 RS/6000 • Data • Applications Staff • Management Control • Telecom Network • Physical Facilities Consistency provides: • "DB Restart" - To start a DataBase application following an outage without having to restore the database. This is a process measured in minutes. • and avoid: "DB Recover" - Restore last set of DataBase Image Copy tapes and apply log changes to bring database up to point of failure. This is a process measured in hours or even days

  5. Dependent Writes 1. Log Update OK C B OK 3. Mark Log Complete M L X Y DataBase Application C B OK M L 2. Update database X Y • Many examples where start of one write operation is time dependent on the completion of a previous write on a different disk group or even different disk frame • Data base & log for example • Synchronous copy insures data integrity

  6. Dependent Write Consistency (1) 1. Log Update OK C B M L X Y DataBase Application C B X M L X Y • Scenario: 'Update Database' doesn't get propagated to 2nd site • Synchronous copy and 'dependant writes' means 'Mark Log Complete' will never be issued by application • Result: • Database is consistent

  7. Dependent Write Consistency (2) 1. Log Update OK C B X M L DataBase Application X Y C B OK M L 2. Update database X Y • Scenario: 'Mark Log Complete' doesn't get propagated to 2nd site • Result: • Secondary site logs say update was not completed • Backout of valid data will be done upon restart at secondary site • But database is consistent

  8. Consistency • To achieve consistency at a remote mirror location you must maintain the order of dependent writes • You cannot have a write to one volume mirrored and then have a dependent write to another volume not mirrored • The remote mirror functions each maintain consistency in their own way • Metro Mirror uses ELB (Extended Long Busy) for CKD volumes and I/O Queue Full for FB LUNs • Global Mirror holds write I/Os while building an alternate bit map prior before draining the OOS (out of sync) bit map when creating a consistency group • z/OS Global Mirror (XRC) uses timestamps to create consistency groups • Global Copy requires procedures to create consistency

  9. Peer-to-Peer Remote Copy - PPRC • Metro Mirror – Synchronous PPRC • Synchronous mirroring with consistency at the remote site • RPO of 0 • Global Copy – PPRC Extended Distance (XD) • Asynchronous mirroring without consistency at the remote site • Consistency manually created by user • RPO determined by how often user is willing to create consistent data at the remote • Global Mirror • Asynchronous mirroring with consistency at the remote site • RPO can be somewhere between 3-5 seconds • Metro/Global Mirror • Three site mirroring solution using Metro Mirror between site 1 and site 2 and Global Mirror between site 2 and site 3 • Consistency maintained at sites 2 and 3 • RPO at site 2 near 0 • RPO at site 3 near 0 if site 1 is lost • RPO at site 3 somewhere between 3-5 seconds if site 2 is lost

  10. PPRC Considerations • PPRC secondary volume must be off-line to all attached systems • One-to-one PPRC volume relationship • Only IBM (DS8000/DS6000/ESS) to IBM is supported • Logical paths have to be established between Logical Subsystems • FCP links are bidirectional • FCP links can also be used for server data transfer • Up to 8 links per LSS • More than 8 links per physical subsystem is allowed • Up to 4 LSS secondaries can be connected to a primary LSS • A secondary LSS can be connected to as many primary LSS systems as links are available • Distance • ESCON links - 103 KM • FCP links - 300 KM

  11. Establish Path Considerations • If paths have been established, issuing another path establish will overlay the existing established path • For example: • 2 Paths are established using this command • I wish to add another path so this command is issued • The result will be the loss of the 2 previously established paths and only having the new path established • To add the path, the following should be issued mkpprcpath -dev IBM.2107-75FA120 -remotedev IBM.2107-75FA150 -srclss 01 -tgtlss 01 –remotewwnn 12341234000A000F I1A10:I2A20 I1A11:I2A21 mkpprcpath -dev IBM.2107-75FA120 -remotedev IBM.2107-75FA150 -srclss 01 -tgtlss 01 –remotewwnn 12341234000A000F I0100:I0100 mkpprcpath -dev IBM.2107-75FA120 -remotedev IBM.2107-75FA150 -srclss 01 -tgtlss 01 –remotewwnn 12341234000A000F I0100:I0100 I1A10:I2A20 I1A11:I2A21

  12. FCP vs ESCON • ESCON links run at 17. MB/sec with a sustained rate of approximately 12-14 MB/sec • Fibre links can run 100-400 MB/sec depending upon the adapter with a sustained rate of approximately 80-320 MB/sec • DS8000 and DS6000 only support PPRC fcp links • ESS 800 supports both PPRC ESCON and fcp links

  13. What is Metro Mirror? • Disaster protection for all IBM supported platforms • Other potential uses: • Data migration/movement between devices • Data workload migration to alternate site • Hardware and LIC solution • Synchronous copy, mirroring (RAID 1) to another DS8000/DS6000/ESS • Application independent • Some performance impact on application I/Os • Established at a disk level • A 2 site solution

  14. Profile of a PPRC Synchronous Write Synchronous write • Write (channel end) • Write (channel end) • Write to secondary • Write acknowledged by secondary • Acknowledgement (Device end -- I/O complete 1. 4. 3. 2.

  15. Maintaining Consistency – Metro Mirror • Consistency group on the establish path commands • Loss of communication between primary and secondary sites will cause an extended long busy (ELB) for zSeries and I/O queue full for open systems to be returned to any write issued to a volume in the LSS that lost communication • Automation can issue FREEZE commands to all of the LSSs that have dependent data with the LSS that lost communication • Returning ELB or I/O queue full causes the next dependent write to NOT be issued maintaining the order of the dependent writes • After all of the FREEZE commands have been issued, RUN commands can be issued to the LSSs to resume, otherwise the ELB or I/O queue full will be returned for a default of 2 minutes • Automation is required when dependent data spans across multiple physical disk subsystems and is HIGHLY recommended when all primaries are within a single subsystem

  16. Consistency – Metro Mirror One Physical Subsystem No automation 1 2 x 3 4 All paths lost means that no updates are xmitted to secondaries and consistency is maintained

  17. Consistency – Metro Mirror One Physical Subsystem No automation 1 x 2 3 4 One pair suspends, others still mirror, lose consistency

  18. Consistency – Metro Mirror Multiple Physical Subsystems No Automation 1 2 x 3 4 Without automation, order of dependent writes not maintained and consistency is lost

  19. Consistency – Metro Mirror Multiple Physical Subsystems Automation 1 2 x 3 4 Automation insures that the order of dependent writes is maintained

  20. When to use Metro Mirror • Recovery system required to be current with the primary application system • Can accept some performance impact to application write I/O operations at the primary location • Recovery is on a disk-by-disk basis • Distance within maximum limits • 103 KM for ESCON links and 300 KM for fcp links • RPQ for greater disances

  21. Metro Mirror Sequential Write Data Rate – Turbo R2

  22. Pre-Turbo DS8000 Metro Mirror 4 KB Write Service Time Comparisons

  23. Pre-Turbo DS8100 Metro Mirror Sequential Write Throughput

  24. Pre-Turbo DS8100 Metro Mirror Sequential Write Throughput

  25. What is Global Copy? • Global Copy uses an additional PPRC mode designed for high performance data copy at long distances • TSP OPTION(XD) Extended Distance or ds cli –type gcp • Disk level option • Asynchronous transfer of application primary writes to secondary allows mirroring over long distances with minimal impact to host performance • Writes to primary disk receive immediate completion status while in XD mode • Writes can be out of sequence on secondary disk • Develop procedures to create a point in time consistency • A 2 site solution

  26. Asynchronous write Synchronous write 1. Write Write (channel end) Write to secondary 2. Write acknowledgement Write acknowledged by secondary Acknowledgement (Device end -- I/O (channel end / device end) complete 1. 4. 3. Write to secondary 3. 4. Write acknowledged by 2. secondary 1. 2. 4. 3. Profile of an Asynchronous Write

  27. Global Copy – How it works (1) • Synchronous PPRC establish (initial copy) is done in two phases: • Phase I - Copy all tracks in the volume starting at zero and going to the end of the volume. Use a bitmap to keep track of which tracks need to be copied. Do not transfer any host updates - just set the bit in the bitmap for new host updates. • Phase II - Go back through the bitmap to copy any host updates received while in Phase I. Any host updates received during this phase, and for the remainder of the PPRC pair life, will be sent synchronously to the remote volume. • Extended Distance PPRC • Stay in Phase I forever • No impact to host write response time • Copy at remote site is "fuzzy" - updates are not sent in order or in time consistent groups

  28. Global Copy – How it works (2) • Establish PPRC pairs with Extended Distance option • Writes to primary receive immediate completion status • Primary records updated tracks in a bitmap • Incremental copy of changed tracks or records periodically sent to secondary • To create a point in time consistency: • Transition to PPRC synchronous until full duplex state is reached • Usually a matter of seconds • Alternatively, quiesce of I/O and flushing of buffers on primary host will result in consistent secondary disk

  29. Global Copy – How it works (4) Primary Secondary • Agents process a volume using the Out-of-Sync (OOS) bit map to determine which tracks to xmit • Not all volumes are processed at the same time • As the volume is processed, tracks updated behind the active track being xmitted is recorded in the OOS and will be processed the next time Primary Secondary Primary Secondary Primary Secondary

  30. PPRC State Changes • Transition to simplex means PPRC is withdrawn • XD is established at the volume/LUN level

  31. Volume State Transitioning • SYNC SIMPLEX CDELPAIR • SIMPLEX SYNC CESTPAIR OPTION(SYNC) • SYNC SUSP CSUSPEND • SUSP SYNC CESTPAIR MODE(RESYNC) • XD SUSP CSUSPEND • SUSP XD CESTPAIR MODE(RESYNC) OPTION(XD) • XD SIMPLEX CDELPAIR • SIMPLEX XD CESTPAIR OPTION(XD) • SUSP SIMPLEX CDELPAIR • XD SYNC CESTPAIR OPTION(SYNC) To transition from…..To….. Use the following command…..

  32. Maintaining Consistency – Global Copy • Consistency group is NOT specified on the establish path command • Data on Global Copy secondaries is not consistent so there is no need to maintain the order of dependent writes • Consistent data is created by the user • Quiesce I/O • Suspend the pairs • FREEZE can be used and ELB will not be returned to the server since consistency group was NOT specified on the establish path • FlashCopy secondary to tertiary • Tertiary will have consistent data • Reestablish paths (if necessary) • RESYNC (resumepprc) Global Copy

  33. When to use Global Copy • Recovery system does not need to be current with the primary application system • RPO in the range of hours or days • User creates consistent copy of recovery data • Minor impact to application write I/O operations at the primary location • Recovery uses copies of data created by the user on tertiary volumes • Distance beyond ESCON or fcp limits • 103 KM for ESCON links and 300 KM for fcp links • RPQ for greater disances • A great tool for migrating data

  34. What is Global Mirror? • A Disaster Recovery (DR) data replication solution • Reduced (less than peak bandwidth) network bandwidth requirements (duplicate writes not sent) • A 2 site solution • Asynchronous data transfer • No impact to the production write I/Os • Peer-to-peer (no, outside the box, server MIPS) • Microcode controlled • Peer-to-peer data copy mechanism is Global Copy • Consistency Group formation mechanism is FlashCopy • 3 copies (A  B  C) • Or 4 copies (if test/practice copy (D copy) & DR is to be continued) • Unlimited distance • Very little data loss (Recovery Point Objective (RPO)) • Single digit seconds (goal was/is 3-5 seconds) • Scalable • Up to 8 primary and secondary physical subsystems • More with an RPQ

  35. Global Mirror: Basic concept • Concept • Asynchronous long distance copy (Global Copy), i.e., little to no impact to application writes • Momentarily pause application writes (fraction of millisecond to few milliseconds) • Create point in time consistency group across all primary subsystems (in OOS bitmap) • New updates saved in Change Recording bitmap • Restart application writes and complete write (drain) of point in time consistent data to remote site • Stop drain of data from primary (after all consistent data has been copied to secondary) • Logically FlashCopy all data (i.e., 2ndary is consistent, now make tertiary look like 2ndary) • Restart Global Copy writes from primary • Automatic repeat of sequence every few seconds to minutes to hours (selectable and can be immediate) • Intended benefit • Long distance, no application impact (adjusts to peak workloads automatically), small RPO, remote copy solution for zSeries and Open Systems data, and consistency across multiple subsystems Global Copy (PPRC-XD) over long distance Could require channel extenders FCP links only FlashCopy (record, nocopy, persistent, inhibit target write) Host I/O Primary Secondary Tertiary (Asynchronous) Global Copy Local Remote Site Site

  36. Consistency Group Formation FlashCopy Relationships being established Coordinate local units Drain Time CG Interval Time . . . . . . . . . . . . Coordination Time Let CG data drain to remote Record new writes in bitmaps but do not copy to remote All FlashCopy Relationships established Global Copy continually cycles through volume bitmaps copying changed data to remote mirror volumes

  37. Tuneables (input parameters) • Maximum Coordination Time • Maximum allowed pause of production write updates for the Consistency Group coordination action • I.e., when the Master coordinates the formation of the Consistency Group with all Subordinates • When coordination is completed, writes are allowed to continue • Default = 50 milliseconds {Range: 0 to 65535 ms (65+ seconds)} • If the ‘coordination time” is exceeded, coordination is stopped and all writes are allowed to continue • Design point is 2-3 ms • Maximum Drain Time • Maximum CG drain time in seconds before failing (terminating) current drain activity • Default = 30 seconds {Range: 0 to 65535 (just over 18 hours)} • After 5 failures, drain time is infinite, i.e., until a consistency group is form, i.e., completely drained • Consistency Group Interval Time • Time to wait before again starting the next consistency group formation process • Default = 0 seconds {Range = 0 to 65535 seconds (just over 18 hours)}

  38. Typical Global Mirror Configuration • Multiple primary to multiple secondary subsystems • Consistency across all primary subsystems LSS Subordinate LSS LSS When forming a Consistency Group, PPRC-XD continues transmitting / draining the consistent data to the secondary site. Once the consistency group is formed, the new update data will be transmitted as in a PPRC-XD environment without Asynchronous PPRC. Once the A volumes have been drained to the B volumes, the B volumes will be FlashCopied to the C volumes. Host I/O LSS Local Site LSS LSS Subordinate Remote Site LSS One Master, multiple Subordinates LSS LSS LSS Master Master communicates with Subordinates to form consistency groups Note: The Master performs the same operations on volumes in the consistency group in its box when it directs the Subordinates to perform operations.

  39. SAN . FlashCopy FlashCopy FlashCopy . Global Mirror Initialization Process 4. Define Global Mirror session and add volumes to the session 1. Establish Global Copy paths Subordinate 2. Establish Global Copy pairs Wait until Global Copy pairs have completed 1st pass copy then establish FlashCopy pairs Subordinate Master 3. Establish FlashCopy pairs 6. Start Global Mirror with Start command sent to Mast 5. Establish control paths between Master and Subordinates Note: These paths could be created earlier

  40. Maintaining Consistency – Global Mirror • Consistency group is NOT specified on the establish path command • Loss of communication will NOT cause ELB to be returned for writes • FREEZE command can be used to suspend pairs after Global Mirror session is paused but ELB will NOT be returned for writes to LSSs • Consistency is maintained by not returning CE/DE or I/O complete during the coordination phase when forming a Consistency Group • Not returning CE/DE or I/O complete causes the next dependent write to NOT be issued maintaining the order of the dependent writes

  41. When to use Global Mirror? • RPO can be greater than 0 but still needs to be very current • In the single digit second range • Limited impact to application write I/O operations at the primary location • Asynchronous data transfer • Recovery is on a disk-by-disk basis • Distance exceeds maximum limits for synchronous data transfer • 300 KM for fcp links • Global Mirror only supports fcp links

  42. Global Mirror at 1000 mi DS8300 vs ESS 800 (both w/ 128 x 10k RPM disk)

  43. What is Metro/Global Mirror • A 3 site Disaster Recovery (DR) data replication solution • Metro Mirror from local (A) to intermediate (B) and Global Mirror from intermediate to remote (C) • The Metro Mirror secondary is cascaded to the remote site • 4 copies of data (A  B  C  D) • C and D are Global Mirror secondary and FlashCopy volumes • Or 5 copies (if test/practice copy (D copy) & DR is to be continued) • Unlimited distance between intermediate site and remote • RPO of 0 for “A” site failure • Zero RPO implies automation to ensure no production updates if mirroring stops • Potential RPO of 3-5 seconds for “A” and “B” twin site failure • Depends on workload and bandwidth between B and C

  44. When to use Metro/Global Mirror • When two recovery sites are required

  45. Remote Mirror Comparisons

  46. 0 1 0 1 0 0 0 1 0 0 0 0 1 0 Failover Processing (1) • The secondary volume to which the command was issued becomes a suspended PPRC primary • The targeted volume gets a Change Recording bitmap • Used to track changes that make it different from its partner • Establishes a new relationship between the volume the command was issued to and its PPRC primary volume • Valid for both Metro Mirror and Global Copy Failover Secondary Primary Primary Primary GC or MM SUS or active ? SUS CR SUS or active Before After

  47. Failover Processing (2) • No communication occurs between the two volumes • Typically, failover is used when the relationship between the volumes is suspended • Consider a path failure – Primary goes suspended, Secondary does not know anything is wrong, is not suspended • If the relationship is NOT suspended when the command is issued: • The secondary volume WILL become a suspended primary • The primary volume will BECOME a suspended primary when host I/O is targeted to the volume – or a suspend command is issued to the primary volume • If neither I/O nor suspend occur, problems may arise during failback

  48. 0 0 0 0 Primary Primary 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 SUS 1 1 1 1 SUS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 Before 0 0 0 0 Failback Processing (1) • The primary volume to which this command is issued has it’s PPRC partner converted, if necessary, to a PPRC secondary • A path(s) must exist between the pairs • The volume to which the command was issued • Combines the partner bitmaps to get total “difference” • begins to resync to its partner which becomes a PPRC secondary volume – data begins to transfer Example 1: Failback original Primary Failback Secondary Secondary Primary Primary GC CR CR GC or MM OOS CR During After

  49. 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 Failback Processing (2) • Similar to all PPRC establish operations, the resync begins processing in “Global Copy” mode (First Pass) keeping track of updates received during the resync (CR bitmap) • The pairs will return to their original mode (Global Copy or Metro Mirror) at the conclusion of the resync operation • Failover and Failback are applicable to all PPRC relationships, not just Global Mirror, as we will see in later lectures and labs Example 2: Failback original Target Failback Secondary Primary Primary Secondary Primary Primary GC SUS SUS GC or MM During After Before CR CR OOS CR

  50. Failover and Failback Command Parameters • For both failover and failback, the Primary and Secondary parameters must reflect the “new direction” of the copy operation Serial 85551 Serial 85551 Serial ABC2A Serial ABC2A A A B B Primary Primary Primary Secondary To issue FAILBACK to A: CESTPAIR FB DEVN(A) PRI(85551) SEC(ABC2A) failbackpprc –dev 85551 –remotedev abc2a a:b To issue FAILOVER to B: CESTPAIR FODEVN(B) PRI(ABC2A) SEC(85551) failoverpprc –dev abc2a –remotedev 85551 b:a

More Related