1 / 33

Enterprise IT Resiliency

Enterprise IT Resiliency. Paul Massiglia Technical Director, Engineering VERITAS Software Corporation Board of Directors Member Storage Networking Industry Association. It Can Happen to You. “When the 23 rd floor collapsed into the 22 nd floor, the non-stop system stopped”

Download Presentation

Enterprise IT Resiliency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enterprise IT Resiliency Paul Massiglia Technical Director, Engineering VERITAS Software Corporation Board of Directors Member Storage Networking Industry Association

  2. It Can Happen to You • “When the 23rd floor collapsed into the 22nd floor, the non-stop system stopped” • —Bob Passmore, Gartner Group • (formerly Digital Equipment Corporation) • A disaster can be worse than your worst nightmare • What can you do about it?

  3. Information Technology Resiliency • “We’re doing the right things” • Regular backup • RAID • Maybe even clustering • Is that enough? • To survive, you must exploit IT resources to the max

  4. What Can Happen?Types of Disaster • Predictable • Hurricane, flood, planned power outage,… • Unpredictable • Local (e.g., fire), environmental (e.g., civil event) • “Rolling” • Virus, application bug, bank failure,… Different Disasters; Different Recoveries

  5. Information Technology in a Disaster Recovery Context • People • Are they safe? • Can they be motivated to work? • Premises • Are they accessible? • Are they equipped? • Information resources • Are they operational? • Are they current? • Information services • Is data available? • Are services recoverable? This is where IT fits

  6. Recovering from Predictable Disaster (e.g., storm) • Shut down information systems • Move IT operations to recovery site • Data • People • Client access • Restart services at recovery site

  7. Recovering from Unpredictable Disaster (e.g., fire) • Account for people • Move IT operations to recovery site • Data • People • Client access • Restore data integrity • Restart services

  8. Recovering from Unpredictable Disaster (e.g., flood, civil disturbance,…) • Account for people • Analyze environmental resources • Safety, sustinence, … • Power, HVAC, communications, … • Perform IT application triage • Move IT operations to recovery site • Data • People • Client access • Restore data integrity • Restart critical services

  9. Recovering from Rolling Disaster (e.g., virus) • Contain destruction • Roll data back to a known good point • Move IT operations to alternate systems (maybe) • Data • People • Client access • Restore data integrity • Restart services

  10. Exploiting IT Resources To The Max

  11. Key Phrases • “Restore data integrity” • “Restart services” • “Roll data back to a known good point” …and one HUGE assumption • Data is available at a recovery site

  12. Do You Follow 1960’s American Movies? "I just want to say one word to you...just one word.“—Mr. McGuire (Walter Brooke) to Benjamin Braddock (Dustin Hoffman) "Yes, sir."—Ben "Are you listening?"—Mr.McGuire "Yes, sir. I am."—Ben "Plastics."—Mr. McGuire —The Graduate 1967

  13. I just want to say one word to you: Storage Networks • What • Connect your data storage to a network(instead of to your servers) • Why • Return on investment • Better resource utilization • Lower management cost • But most of all • Increase your chances of organizational survival

  14. “Heterogeneity” (Different server and storage types) Sun Different interconnects Bridging to legacy storage EMC EMC EMC EMC Long distances between devices Enterprise RAID RAID EMC Tape Library Storage Networks Network Remote SAN   

  15. “Heterogeneity” (Different server types) Sun Different interconnects Bridging to legacy storage Total connectivityofservers to data EMC EMC EMC EMC Long distances between devices Enterprise RAID RAID EMC Tape Library And…The Most Important Thing about Storage Networks Network Remote SAN   

  16. Why Connectivity Is Important • Data can be “passed” from server to server • Therefore:more of your servers can do more of your jobs • Therefore: you are less likely to “lose” Information Services When a disaster happens

  17. almost The Huge AssumptionUp-to-date Data at the Recovery Site • Backup • Regular • Managed • Data replication • Maintaining up-to-date copies of operational data at remote locations • Sounds like mirroring …but • Different assumptions • Remote sites take longer to reach • Long distance communications are less reliable

  18. Primary Site Secondary Site Secondary Site Application Application Application Database Database Database File System File System File System Volume Manager Volume Manager Volume Manager Data Replication • Several variations • What gets replicated? • Where is the work done? • Key issues • Data currency vsapplication performance • Restoring data integrity at recovery site • VERITAS advice: volume replication for disaster recovery

  19. Key Phrase“Restore Data Integrity” • What: disasters occur while data is “in flight” • Result: debits don’t match credits • Result: shipping, billing, etc. instructions not cut • Issue: recovery site data integrity • Techniques • Restore backup & roll database forward from logs • “Mount” replicated volumes and file system check • “Mount” replicated volumes and roll database forward from replicated log • Multi-dimensional tradeoff: timeliness vs. cost

  20. Key Phrase“Restart Services” • Service = Data + Application • “Restart data ” = Baseline + restored integrity • Restore a backup • Mount a replica Then play back logs • “Application restart” = cluster-enabled “failover”

  21. SAN Clustering in 2 Minutes • Two (or more) servers interconnected • By storage • By clients • Application “failover” model • What resources in what order • Result • Information service recovery …within a data center

  22. Singapore Site Cluster 1 Tokyo Site Cluster 2 Cluster 5 Cluster 7 Sydney Site Slave Slave Site Master Cluster 4 Slave Slave Cluster 3 Cluster 6 Site & Global Master Site Master Alas, Disasters Don’t Stay within the Data Center • Needed: “Global Clusters” • Challenges • Getting the data there? • Busy or broken? • Solution: “clusters of clusters” • Replicated data • Recovery hierarchy • Human control • Implementationflexibility

  23. Key Phrase“Roll Back To A Known Good Point” • Some disasters corrupt data • Recovery: “turn back the clock” • Technology options • Backups with offsite vaulting • Data snapshots

  24. Recovery server Service Service data manager data manager Snapshots Snapshots Roll back Roll back Replica Replication & Snapshots Application server Live data Recovery can be both remote and restorative

  25. Information Service Continuity Planning • Key questions • What services do I need to run what’s left of my enterprise? • What are the dependencies? How much can I afford to lose? • Data • Hours of service • Answers determine recovery priorities • Recovery plan must be evaluated in light of actual circumstances

  26. Planning for IT Resiliency • Self-educate • Technology • Other resources • Declare vendor independence • Assess threats • Plan • Information service priorities • Recovery site procurement & provisioning • Teams & processes • Only then… • Hand off to IT for implementation

  27. Use IT Expertise Throughout The Process • Does a storage network solve our problems? • Cost-performance • Technology options • Which protection for which data? • Backups? • Mirrors? • Replicas? • To cluster or not to cluster? • In the data center? • Between sites?

  28. INVESTMENT The Mandatory VERITAS Commercial Business Continuance Traffic Management Global Clustering WAN Clustering LAN Clustering C l u s t e r S e r v e r Synchronous Replication Asynchronous V o l u m e R e p l i c a t o r Replication Storage Checkpoints E d i t i o n s AVAILABILITY Block-Level Incremental Backups Hot & Cold N e t B a c k u p Backups Journaled F o u n d a t i o n S u i t e File System Data Redundancy Low-Level SLA Medium-Level SLA High-Level SLA

  29. Global Cluster Manager (GCM)™ Volume Replicator (VVR)™ VERITAS Foundation Suite™ VERITAS DB Edition for Oracle™ VCS™ Application Servers Backup Server SANPoint Control™ VERITAS NetBackup™ & NetBackupPro™ Tape Library VERITAS Volume Manager™ Using Our Own Products #2 Report Database & Failover #1 Production Database SAN Directors

  30. Global Cluster Manager VERITAS Foundation Suite ™ (GCM)™ VERITAS DB Edition for Oracle™ Volume Replicator (VVR)™ Database Server Backup Server SANPoint Control™ Application Server SAN Director VERITAS NetBackup™ Disk Array Tape Library ServPoint SAN Appliance™ VERITAS Recovery Site

  31. The Resilient Enterprise • 911-motivated, but long overdue • Group project • 9 authors + 4 “contributors” • 2 CEOs, 4 CTOs, one HP Fellow • Compressed schedule • 8 weeks to write; 10 weeks to produce • Inspiring story:The New York Board of Trade If you read nothing else, read chapter 1

  32. Summary • Information service recovery is only part of enterprise resiliency • Different disasters require different recoveries • The “silver bullet”: storage networks • Maximize ability to react to the unforeseen • Exploit IT resources to the maximum • Storage networks enable; software realizes • Base IT recovery strategy on business needs

  33. Thank you for your attention Electronic copy on demand from VERITAS (paul.schmidt@veritas.com)

More Related