240 likes | 424 Views
Business Continuity & Disaster Recovery. Daniel Griggs Solutions Architect Ohio Valley September 30, 2008. Agenda. Disaster defined/Types of disasters Who is impacted? What should we do? Where should we recover? When should we test? How will we keep our costs down?
E N D
Business Continuity&Disaster Recovery Daniel Griggs Solutions Architect Ohio Valley September 30, 2008
Agenda • Disaster defined/Types of disasters • Who is impacted? • What should we do? • Where should we recover? • When should we test? • How will we keep our costs down? • Your Partner for Business Continuity Solutions • How can we help? • Thank You!
Disaster Recovery • Disaster defined • An adverse, unfortunate and unforeseen event! • Being down • Being unable to service/support customers • What is the largest enemy in a disaster? • Having an untested plan • Time!
Types of Disasters • Natural (fire, flood, wind, earthquake, etc.) • Malicious intent (virus, burglary, vandalism, etc.) • Localized outages: • Hardware • Power • Telecom • Software • Data Corruption
Who is Impacted? • Customers? • Staff? • The Business could be at risk! • Disaster Examples • How many businesses never reopened after Katrina? • Over 80% of companies affected went out of business within 18 months as a consequence • Source: Survive, 2007
Top Disaster Recovery Concerns • Planning for the disaster • Resources to build and test BCP and DR plans • Communication of the plan • Inherent infrastructure problems • Backup challenges • Archival strategies • Replication strategies
What Should we do? • Conduct a Business Impact Analysis (BIA) • Create a Business Continuity Planning Office (BCPO) • Establish Incident Management team (IMT) • Establish a Life Safety – Emergency Response Team (ERT) • Define DR plan owner • Define DR strategy
Incident Management Team (IMT) Executives Business Continuity Plan (BCP) Life Safety- Emergency Response Team (ERT) IS Recovery (DRP) Business Continuity Planning Office (BCPO) – Plan Integration
Disaster Recovery Lost Revenue Per Hour • Source: IT Performance Engineering & Measurement Strategies: Quantifying Performance Loss, Meta
Disaster Recovery • Where should we recover? • Cold site • Hot site • Production site and Dev/QA site • When should we test? • Should test as often as possible (at least twice per year) • Involve business in testing • Increase complexity of each test
Why Should we worry • Risk + Probability of failure • Local failure (fault tolerance). Most likely scenario. • Disk • HBA • SAN Switch • SAN • Core Switch • Proximity - location increases risk of incident • Highway • Airport (Memphis) • Water
DR Requirements Match your internal fault tolerance and DR capabilities to: • The overall availability requirements of your company • Do you need 99.999s? • Recovery Time Objective (RTO) • Determined via BIA • Base your plan on lowest RTO • Data loss tolerance - RPO • Determined via BIA • Base your plan on lowest RPO
Examples • RTO = 4 hrs, RPO = 0, Availability = 99.999 • All local infrastructure fault tolerant • Critical applications are clustered • Synchronous replication to hot site • Tape backup plan for recovery; tapes sent off-site every day • RTO = 72 hrs, RPO = 24 hours, Availability = 99.5 • SAN fault tolerant; core fault tolerant • Little to no clustering • Tape backup plan for recovery at a cold site; tapes sent off-site every day
How will we keep our costs down? • Virtualize • Virtualization greatly simplifies DR • Virtualization reduces the cost of DR • Reducing Backup Pain • De-duplication • 20x data reduction • Extend disk backup • Backup to disk (VTL) • Eliminate tapes in remote sites • Enable fast backup AND recovery • Use tapes for long-term archival only
How will we keep our costs down? • Improving efficiency of SAN • Document Management Policy/Practice • Archiving (based on policy) • Save $1000s on tapes while still protecting your data • Archiving will allow you to quickly restore business critical data • By using a tiered storage solution, you will have already separated your business critical data from the rest • Improve TCO of SAN • Use DR site for Dev/QA • Production replicates real time to DR site • Dev/QA replicates at a reduced interval back to SAN at production site
Archiving RemoteVolumes • Store more intelligently • Classify and tier • Archive inactive data • Eliminate redundant data • Streamline backups • Utilize snaps for incr. changes • Virtualize servers Clones Tier 1 Production Data Tier 2 Snaps Snaps Snaps Tier 3 Snaps BackupData Archive Data
Archiving • iSCSI • Least expensive connectivity • Easy to replicate • Pay as you grow technology • Fast deployment
In the Face of a Disaster – Case #1 • What if everything is lost? • CDW’s Enterprise Configuration Center can be your DR Site • In your time of need, you HAVE to have fast response • Detroit-area customer • Fire on Friday • Weekend re-build and re-image • Delivery
In the Face of a Disaster – Case #1 • CDW drop-shipped • Imaged desktops and notebooks • Fully-configured • Routers • Switches • Firewalls • Wireless APs • Installed server racks
In the Face of a Disaster – Case #2 • What’s your backup plan? • Environmental consulting firm • 10-ft. under water • Had developed plan with CDW
In the Face of a Disaster – Case #2 • BC/DR Plan • Hot site in Mississippi • Relocation within 48 hours • Asynchronous replication – SAN • MPLS (IP-VPN) • Meshed environment • “My CDW team is like an extension of my IT department”
Your Partner for BC Solutions We Implement CDW Services Custom Onsite Solutions CDW Technology Architect Team Rack configuration services Custom Imaging services Asset tagging 22 We Support • 24x7x365 tech support • Priority vendor support • Multiple options – phone, chat, email • Knowledgeable • Responsive We Assess • CDW Specialists • Server • Storage • Networking • Power/Cooling • Software • Onsite Virtualization partners • Assessments
Questions Glen Coleman Enterprise Architect, Security Officer Ohio Department of Health Daniel Griggs Solutions Architect, CDW