1 / 25

NoCOUG Fall Conference 2005

NoCOUG Fall Conference 2005. High Availability & Disaster Recovery: A 360 o View Alok Pareek. Agenda. Introduction High Availability - 2005 Industry Shift from MTTF to MTTR Understanding transaction failure models Evaluating HA technologies Real-World HA Examples About GoldenGate

talor
Download Presentation

NoCOUG Fall Conference 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NoCOUG Fall Conference 2005 High Availability & Disaster Recovery: A 360o View Alok Pareek

  2. Agenda • Introduction • High Availability - 2005 • Industry Shift from MTTF to MTTR • Understanding transaction failure models • Evaluating HA technologies • Real-World HA Examples • About GoldenGate • Questions & Answers

  3. Speaker Introduction/Background • GoldenGate Software Architect • Transactional Data Management and High Availability Solutions • 10 years in Oracle kernel development • Data Storage Technologies • Recovery and High Availability Group • High Speed Data Movement • Key Features • Cross platform/Heterogeneous transportable tablespaces (10g) • Multi threaded redo generation (9.2) • Implementation of multiple block size caches (9.0) • (TSPITR) Tablespace point in time recovery (8.0) • Owner of LGWR process (redo generation) algorithms (8i  10.2)

  4. High Availability, Disaster Recovery • Definition • Ratio of system uptime to sum of uptime and downtime Availability = MTTF/(MTTF+MTTR) • Challenges • Addressing Performance vs. Reliability in computer systems • Hardware Faults, Software Bugs, Human errors are realities in any complex system deployment • Enterprise applications need to function 24x7 • Disasters are no longer a distant threat • Inadequate planning to handle outages • Shift in industry (and academic research) focus • From fault tolerance to reducing MTTR

  5. HA/DR – Systematic View Database 1 Active 2 3 Unplanned outage Planned outage • Node death • Power failure • Upgrades • Migrations System Failure System Changes • Physical Media • Logical corruption Data Failure Data Changes • Maintenance

  6. 1) High Availability Concerns – No Outage Database 1 Active • Performance • DSS vs. OLTP • conflicting requirments • Mixed workload • Data validation • Data Transformation

  7. 2) Availability Concerns – Unplanned Database Unplanned outage Common Approaches • Database Restore/Recovery • RAID • Shared Disk Clusters • Standby database System failure Data failure

  8. Understanding Database Failures • Failure points • Statement • Process • Instance • Database • Site • Failure types • Physical (Media, corruption, inconsistency amongst redundant copies) • Logical (Incorrect DML, out-of-synch, accidental table drop) • Failure Handling • Automatic • Manual

  9. 3) Availability Concerns – Planned Outage Database Planned Outage • Selected windows of downtime • Phased approach to maintenance System Changes Data Changes

  10. Key HA Evaluation Criteria

  11. Conventional Backup/Recovery RAID multiple hard disks behaving as a single large fast drive (mirrors/stripes/duplexing/parity) Snapshots Block Level Database Replication Change Level Database Replication Remote Mirroring Solutions Transactional Data Management Differentiating HA Technologies High Availability and Disaster Recovery Roll Forward / File Protection

  12. HA Technologies & Tradeoffs • Block based database replication • Standby kept in constant recovery (inactive) mode • Useful for strict disaster recovery only, not HA • Cannot be used for reporting in recovery mode • No write access for distributed load balancing • Application response times suffer after failover • Cannot address availability across heterogeneous systems • Change based database replication • Trigger or log based • Not optimized for real time performance • Intrusive, Complex • Cannot address availability across heterogeneous systems

  13. HA Technologies & Tradeoffs • Remote mirroring solutions • Volume managers maintain mirrors of local writes on a set of remote volumes • Useful for file protection • Physical distance to remote volumes is a critical limitation • No protection from logical corruption, or storage stack corruption • Message basedlogical writes sent by primary host over IP to remote hosts (synchronously/asynchronously) • Write ordering must be maintained by primary host • Remote volumes are standby-only, applications cannot access them • No protection from logical corruption • Hardware based • Storage arrays propagate IOs to storage arrays at a secondary site • Secondary arrays are inaccessible during replication • No protection from logical corruption • Only useful for block availability during DR

  14. HA Technologies & Tradeoffs • Transactional Data Management • Addresses low-latency in HA hybrid computing environments (built on 1 Safe) • Management of transactional streams -- captures, transforms, routes, delivers and verifies data transactions in real time • Real time, heterogeneous, data integrity, low impact • Use cases for HA, DR, data integration, distributed computing • Not for file-level replication Log-based data capture (can be selective) Route Transform (if needed) Apply Target Source Sub-second latency

  15. HA/DR – Solution Examples Database 1 Active 2 3 Unplanned outage Planned outage • Node death • Power failure • Upgrades • Migrations System Failure System Changes • Physical Media • Logical corruption Data Failure Data Changes • Maintenance

  16. Real-World HA Configuration (Active) Primary “Master” • Uni/Bi-directional configuration – if one system fails, the other is immediately available and ready to take over. • For… • Highest Availability • Maximized ROI on hardware (transaction balancing) • Data Centers with Active/Passive • Example areas: • 24x7 (ATMs, Online Banking) • Online Retail • Real-time Analytics, Warehousing Active Secondary “Master” Live/Active

  17. Real-World HA Configuration (Transaction Off-Loading) Processing Commit Transactions • Improve availability and performance of transaction processing by offloading query load to lower-cost DBs/platforms • For… • Horizontal scalability • Improved performance • Example areas: • Online Reservations • Online Lookups Active Live/Active Lookup Query Handling

  18. Real-World DR Configuration (Failover) Database Active • An HA implementation that captures and applies data to a failover system in real time. • For… • Fast failover (No restore) • Do root-cause analysis later! • Surgical Repair (Dynamic, Selective undo) • Example areas: • 24x7/mission-critical applications • Strict SLA requirements Unplanned outage System failure Data failure Failover Database

  19. Real-World Configuration (Switchover) System Changes Data Changes Current Database • Zero-Downtime Migrations • Rolling Upgrades • Zero-Downtime Maintenance • For… • 24x7 availability • Reduced windows for system maintenance • Example areas: • Can’t afford downtime to do in-place upgrade Planned Outage “Switchover” Database

  20. About GoldenGate Software GoldenGate Software is a privately held software company that offers Transactional Data Management solutions. 250 customers... 1500+ solutions implemented… in 35 countries Established, Loyal Customer Base Leading Industry Solutions 18,000 Node ATM Network with 24/7 Availability Achieving paperless enterprise for this visionary healthcare provider Saving $ millions with real-time DW and zero downtime migrations. Database tiering handles average of 300,000 updates/hour, peaks at 800,000/hour

  21. Provides guaranteed capture, routing, transformation, delivery, and verification of data transactions across heterogeneous environments in real time. GoldenGate & Transactional Data Management TDM must be: • Real time Moves with sub-second latency • Heterogeneous Moves transactions across different databases and platforms • Transactional Maintains transaction integrity GoldenGate differentiates on: • Performance Handles thousands of transactions per second with very low impact on IT systems • Extensibility & Flexibility Open architecture to meet demanding customer needs and data environments • Reliability Supports continuous operations and availability

  22. How GoldenGate Works Capture: Committed changes are captured as they occur by reading the transaction logs. Trail files: Stages and queues data for routing. Route: Data is compressed, encrypted for routing to targets. Delivery: Applies transactional data with guaranteed integrity. GoldenGate Veridata™: Reports on data discrepancies between in-use DBs

  23. TDM & HA Evaluation Criteria

  24. Questions & Answers Contact Information: Alok Pareek www.goldengate.com (415) 777-0200 apareek@goldengate.com 301 Howard Street, Suite 2100, San Francisco CA 94105

  25. Lag points address Logical failures One-to-Many One-to-One Source Source Target Target (current) Target (1 hour lag) Target (1 day lag)

More Related