460 likes | 548 Views
Internet2 Fall 2007 San Diego, CA. Getting Back to Business in Higher Education. Paul Schopis Jim Gerrity. Ohio Supercomputer Center. Disasters can be very bad. Outages can affect large regions. The increasing reach of service affecting events. Agenda.
E N D
Internet2 Fall 2007 San Diego, CA Getting Back to Business in Higher Education Paul Schopis Jim Gerrity Ohio Supercomputer Center
Outages can affect large regions The increasing reach of service affecting events
Agenda Part 1: Business Resumption Planning • Identifying the Need for Business Continuity Planning • What to Plan For • Identifying Key Function RTO & RPO • The Key Planning Function • An Example Part 2: Off-site Backup & Recovery Considerations and Strategies • Recovery Strategy Prerequisites • Recovery technology options and RPO & RTO metrics • Overview of D2D replication/mirroring solutions • Q & A
Identifying the Need for Business Continuity Planning • Student Services • Grants and Endowments • General Administration and Finance • Distance Learning
What to Plan For • Risk of Common Outages • Power loss • Cooling (water) • Network loss • Risk of Disaster Impact • Reduce likelihood of impact DR site by same disaster • Risk of Terrorism • Proximity to possible targets
What to plan for • Availability of Staff (Pandemic) • Ability of staff to get to DR location • Technology Considerations Data replication • Asynchronous or Synchronous • Data Locations • tapes off-site • Delivery to DR location • Cost Considerations
Identifying Key Function RTO & RPO A Good Risk Analysis is Important • Identifies Key Function • Provides Recovery Timeframes • Provides Recovery Point Objectives • Identifies “cost” of downtime or importance of recovery
The Key Planning Function Disaster Recovery is: • A flexible response to a crisis • A Place to recover (location/equipment/network) • A communications Plan • A defined recovery set • Reliable backups • Test / maintain / test • Service continuity
Disaster Recovery is NOT: • Recovery of all services • A business continuity plan
Some Key DR Planning Mistakes • One recovery plan for all scenarios • Modules that fit a broader business continuity plan • Planning and testing with IT personnel only • Adopt an integrated approach to planning and testing • Perform a business impact analysis • Further away is better • Conduct a risk impact analysis • Invest in infrastructure that ensure availability of resources that are beyond your control • Power, telecommunictions
Some Key DR Planning Mistakes • One copy of mirrored data at the recovery site is appropriate • What happens on resync • The planned telecommunications bandwidth should exceed the peak data transfer requirements • Only needed for synchronous remote copy • Not Planning for transfer back
Establish a Foundation for Business Resumption • Identify Facilities Required • Ensure Telecommunications Needs are Met • Cost Effectiveness
University DR Planning • Universities should consider a centralized state or regional DR facility • Already geographically dispersed • Limited impact from a common event • Reduced costs • Common Network Access
The Internet as a Key Component of a DR Plan • Ability to transport key data securely • Reduced storage / recovery costs • VOIP • Staff location
An Example • Statewide Disaster RecoveryFor Ohio Higher Education Institutions • Ohio State University and University of Cincinnati
Introduction The Ohio State University and the University of Cincinnati have collaborated to provide reciprocal disaster preparedness resources to our respective institutions and have subsequently expanded the capability and now offer similar capabilities to other institutions in the state of Ohio.
How The Relationship Began • OSU and UC happened to sit across from each other at a Microsoft briefing asking each other “what do you do for DR for your mainframe?” • This led to: • What size is your mainframe? • Do you have spare capacity? • What’s your storage environment look like? • What kind of staff do you have? • What skills do they have? • What are you doing for open systems?
What We Had Going For Us • Both data center facilities meet Tier 3 standards for DR operability • Each has sufficient space to accommodate additional systems in the event of a disaster • Our facilities are 105 miles apart, neither is in a flood plane or earthquake zone • Columbus and Cincinnati have separate utility, transportation and telecommunications infrastructures • We were fairly close technically & determined we could put something basic in place without too much difficulty or expense
Other Factors That We Considered The ‘state’ of Ohio in June 2003 • The two Only 2 of 15 public universities had a disaster recovery plan or option in place • Institutions paid for third party DR options at a cost of over $300,000 per year • None of the Universities shared services – even if they were just down the road Our schools represent combined assets of: • $8 billion total revenues • 320,000 students • $625 Million in net assets
The Common Needs Were Obvious • Both organizations had a need for a functioning DR capability • Neither had unlimited funding so cost was a major consideration • We saw the opportunity to be able to start basic and improve our capabilities over time
Result - A Decision to Collaborate • OSU and UC determined that enough motivation and synergies existed to make a mutual DRP endeavor practical and desirable. • There was also a reasonable expectation that other institutions in the state would be interested in playing in this space. In addition, we believed that between us we would have the capability to support these institutions.
Our Initial Strategy Embark on a phased approach that targeted: 1st Meeting the short term goal to put a working solution in place, then add sophistication while addressing the long term needs of our institutions. 2nd Develop a flexible solution that could be made available to other institutions in the state
Our Goals - Mainframe • Target Mission Critical mainframe systems using a tape based recovery approach. • Be capable of having a recovery environment operational and ready to accept application recovery efforts within 4 hours of an emergency being declared. • Implement an electronic data exchange so that data could be copied in near real time, virtually reducing data loss to zero by the end of 2006.
Our Goals – Non-mainframe • Support a drop ship, tape recovery strategy • Allow for hosting skeleton infrastructure • Allow for hosting cold, warm or hot systems • Allow for real time data synchronization
Part 1 / Part 2 Presentation Break
Internet2 Fall 2007 San Diego, CA Off-site Backup & Recovery Considerations and Strategies Leveraging the WDM infrastructure for Business Resumption
Developing Recovery Strategies: Prerequisites • Executive level sponsorship • Business Impact Analysis (BIA) • Quantifies risk levels – acceptable downtime parameters and financial, legal, social impact for business and academic functions • Personnel • Processes • Technology • Findings include two key metrics : RPO & RTO
RPO and RTO Disaster strikes Last data backup Application back online time Recovery Point Objective (RPO) Recovery Time Objective (RTO) RPO RPO: Point in time data must be restored after an outage RTO: Period of time systems, applications, functions must be recovered after an outage hours minutes seconds seconds minutes hours days RTO
Recovery Strategy components • Back Office Resources • Facility, hardware, network, software, data, staff • Establishing an Alternate Site(s) • Backup Hardware • Technologies
IPL & Network System Restore Database Restore Transaction Recreation Transactions Not Captured Declaration Data Retrieval Transit Hours of Lost Transactions (RPO) Hours Required to Resume Business (RTO) Sample Recovery Strategies and RTO/RPO Considerations Traditional Recovery - Standby OS - Electronic Vaulting - Remote Journaling - Replication/Mirroring - Clustering - Sources: BIA, GIAC -24 -12 0 12 24 36 48 60 72 84
…..additional considerations • Along with RTO/RPO, must factor in backup windows • Consider the recovery process carefully; what’s involved in restoration…and who can initiate the process • Security elements • Optimum recovery solution is a function of ‘Cost of Impact’ Vs. ‘Cost of Recovery’
Accessing Business Impact and Technology Options for Off-site DR
Disk-to-Disk (D2D) backup replacing tape….. • Tape most widely deployed, but D2D rapidly gaining ground • Tape still ‘key’ for archiving, ‘D2D2T’ • Tape roughly 50% less expensive than ‘Tier 1’ disk-based solutions • Tier 1 Disk $$ are decreasing • Tier 2 SATA RAID-6, high capacity platforms available and proven • D2D (including Virtual Tape Libraries (VTL) ) remedy for Tape reliability and performance issues • VTL – disk-based but emulate tape libraries • Resides between tape libraries and disk on the RPO/RTO continuum • Preserves investment in existing tape backup software & systems • Can use as part of tiered disk and tape backup strategy • Data Replication/Mirroring most popular D2D remote backup solution for critical data, applications • Replication/Mirroring has several flavors……
Disk Mirroring/Replication • Many choices….and combinations…… Virtualization Point-in-Time Copies Synchronous Snap Shot Copy ?? Array-based In-band CDP Asynchronous Data Deduplication Host-based Fabric-based
Disk Mirror - Sync operation Data Center Site-A Site-B Sync Mirror Servers/mainframes Servers/ mainframes Up to 200km • Servers not required at Site B ChannelDirector Fiber DISK (Source) NMS Tape vault DISK (Target) Synchronous operation: Local transaction will only complete when remote transaction completes
Disk mirror – Sync operation • Provides ‘real-time’ data copy…..file level protection • Transparent to systems being mirrored • S/W, H/W often vendor proprietary • Due to response time objectives subject to distance limitations; up to 200km • Must have enough FC ‘buffer credits’ in switch and/or WDM • Performance dependent of number of I/O’s and bandwidth • May configure for multiple, concurrent I/Os to multiple volumes • WDM addresses bandwidth
Disk Mirror - Async operation Data Center Site-A Site-B Async Mirror Servers/mainframes Servers/ mainframes Up to 1000’s km • Servers not required at Site B ChannelDirector Fiber DISK (Source) NMS Tape vault DISK (Target) Asynchronous operation: • No specific link between completion of a local and remote transaction
Disk mirror – Async operation • Provides ‘near real-time’ data copy…..file level protection • Some data loss may occur • “Point-in-Time” Async addresses “file level” issue….but adds to RPO • Less expensive than Sync • Transparent to systems being mirrored • S/W, H/W often vendor proprietary • Not subject to Sync distance limitations • Like Sync, still must have enough FC ‘buffer credits’ in switch and/or WDM • Performance; supports multiple, concurrent I/O’s
Combined Sync/Async operation andTier 1 and Tier 2 storage…..and ILM Intermediate site-B Sync Mirror DR Site-C Async Mirror Data center Site-A Servers/Mainframes Servers/ Mainframes 0-200 km 0-1000s km Servers ChannelDirector ChannelDirector Fiber Fiber Tape Tape DISK (First Copy) DISK (Third Copy) NMS DISK (Second Copy) Tier 1 ‘FC” or ‘SCSI’ Disk Tier 2 ‘SATA’ Disk Tier 1 ‘FC’ or SCSI and Tier 2 ‘SATA’ Disk Supports DR, Reduces Costs, enables Information Life Cycle Management
Host-based replication/mirroring • Storage platform agnostic • Servers required at all DR sites • Software-based; consumes host resources…can affect production application performance • Operating System dependent • More complex installing, implementing and trouble-shooting problems • Management complexity increases as backup data increases
New, emerging technologies….. …that compliment replication/mirroring to evaluate: • Continuous Data Protection (CDP) • Virtualization • Data Deduplication
WDM benefits for remote storage networking • Enterprise Elasticity • Platform, protocol and bit rate agnostic • Support for multiple interfaces and networks • Low latency; required for most storage networking applications • Capacity and performance • Centralized management, distributed GMPLS control plane - Lower TCO by doing more with less Reliable, Future proof, scalable, flexible, cost-effective
Summary • No DR/BC strategy will work without Sr. Executive support and a comprehensive Business Impact Analysis (BIA); including an understanding of RPO/RTO of applications and data • No single backup & restore solution fits an organization’s over-all DR/BC plan • Ensure your D2D investments are compatible and complimentary with new and emerging replication technologies • Look to utilize lower cost SATA disk and VTL technology where applicable (RPO/RTO) • Regardless of the strategy…..backed up data should still be copied to offline media and rotated to off-site storage
Thank You Jim Gerrity Director, Enterprise Vertical Markets Development and Storage Solutions + 203 483 4313 jgerrity@advaoptical.com