270 likes | 447 Views
Copyright 2007 The William Travis Group, Inc.. 2. Who is WTG?. Disaster Recovery and Business Continuity Consultants . but
E N D
1. DR and BC Mythconceptions”…there is a better way
2. Copyright 2007 The William Travis Group, Inc. 2
3. Copyright 2007 The William Travis Group, Inc. 3 WTG Credentials 25 years of industry-leading experience
Built and managed largest commercial hotsites
Developed today’s virtualized hotsite standards
Developed much of the industry’s methodology
Completed the industry’s largest projects
Designed first integrated DR/BCP planning tool
Pioneered business continuity planning
Authors of the only NextGenTM methodology
4. Copyright 2007 The William Travis Group, Inc. 4 Today’s Objectives To stimulate your thinking
To challenge the standard industry drone
To provide insights to alternative approaches
To validate what others have done and what you can do
To remind you of what you really already know
5. Copyright 2007 The William Travis Group, Inc. 5 Obvious similarities,or subtle differences?
6. Copyright 2007 The William Travis Group, Inc. 6 Shattering Some Popularand Some Not So Popular…
7. Copyright 2007 The William Travis Group, Inc. 7 Mythconception #1 “Successful DR/BC requires senior management’s commitment”
actually... commitment of senior management comes from successful DR/BC planning
NextGen Alternative
revisit your costs—if you haven’t zero-based your architecture in the last two years, you are probably paying 30 - 50% too much
shorten contract terms to realize better pricing
re-architect your solution to incorporate the improved price and performance of new technology
evaluate new vendors, products/services to leverage market pressures
reduce the cost and improve the performance of DR/BC and watch senior management commitment blossom
8. Copyright 2007 The William Travis Group, Inc. 8 Mythconception #2 “You can’t prevent a disaster”
with today’s tools you often can!
NextGen Alternative
maximize use of existing current locations and assets
decentralize business operations to insulate from regional and targeted risks, infrastructure failures and wide-scale unavailability of staff – certain geographies are simply unacceptable for housing core processes
decentralize large, monolithic IT shops to reduce disaster impact
leverage production initiatives to produce disaster resilience—convert HA to CA, CA to active/active
build disaster resilience into the production organization
9. Copyright 2007 The William Travis Group, Inc. 9 Mythconception #3 “Disaster recovery is a business problem”
it’s not a business problem… it’s a enterprise problem and the solution always requires a technical foundation
NextGen Alternative
facilitate coordination ATOD—departmental ownership creates an unrealistically complex recovery model
centralize DR/BC ownership to eliminate “islands of recoverability” but…do not over-commit centralized resources
centralize control of all remote Open Systems and data backups to simplify recoverability and cross-application synchronization
10. Copyright 2007 The William Travis Group, Inc. 10 Mythconception #4 “DR/BC is a program, not a project”
10 to 20 year programs and still...no end to end test—no comprehensive application recovery—no unrehearsed data synchronization—no surprise tests—no substantial cross-platform recovery—no achieving RTOs—no appropriate backup policies—no critical workforce recovery—no business unit commitment—etc., etc., etc.
NextGen Alternative
ensure core requirements are recoverable within 12 months
make pragmatic assumptions--accept reality and do not sugar coat a self-fulfilling prophecy
use today’s tools to a ensure pragmatic recovery architecture
plan vertically, not horizontally - all of something is much better than some of everything
11. Copyright 2007 The William Travis Group, Inc. 11 Mythconception #5 “There has never been an unsuccessful recovery”
do you count dramatically missing your RTOs—most RTCs are 3 to 5 times the planned RTOs
how about massive data loss—most companies cannot achieve even a 24 hour RPO
or no “business” recovery—less than 25% of companies have adequate work area recovery
NextGen Alternative
realize that your business is more resilient than you think
accept that the “fail first then recover” model will never meet the RTOs or RPOs of most large shops
do not assume that the “business” can make-up for the shortfalls of the DR/BC plan—utilize “advanced” technologies to create an inherently disaster resistant environment
12. Copyright 2007 The William Travis Group, Inc. 12 Mythconception #6 “All disaster recovery planning should start with a BIA“
only if you have a lot of time, money and patience
the fundamental BIA process is critically flawed—impacts are not additive—risks are not manageable
NextGen Alternative
recast or eliminate the traditional BIA process - change focus from potential loss to certain dependencies
focus on applications as enablers of business functionality
seriously question the practicality of manual alternatives
do not neglect your upstream/downstream and industry responsibilities
13. Copyright 2007 The William Travis Group, Inc. 13 Mythconception #7 “There is no such thing as a failed test”
actually 90%+ of all tests should be considered a failure
traditional testing is an artificial feel-good exercise
excluding the first few tests, the extent of preparation efforts are a geometrically inverse indicator of actual recoverability!
NextGen Alternative
if you cannot prepare for and complete a successful end-to-end test in less than 96 hours your recovery capability is probably inadequate
establish an active-active model when ever possible
focus now on production data backup and restoration
expand unit testing to maximize integrated testing
match vertical DR/BC planning with vertical testing
focus on test results not testing activity
14. Copyright 2007 The William Travis Group, Inc. 14 Mythconception #8 “Business proponents must be the ones to determine recovery requirements”
business proponents are typically not qualified to determine application requirements
the complex interaction of business process, applications and platforms largely invalidates departmental input
NextGen alternative
define minimum standardized recovery levels
mandate DR/BC participation and hold accountable—minimal compliance is not a departmental decision
determine criteria for optimal compliance based on up/downstream dependencies, not simple departmental impact
15. Copyright 2007 The William Travis Group, Inc. 15 Mythconception #9 “Planning scope should address complete facility loss”
9/11, the NE Power Outage, and Katrina and other massive disasters have forever changed the rules of the game
total site loss is no longer adequate
personal priorities trump business requirements
NextGen Alternative
deeper planning is required—longer disasters, tertiary site
broader planning is required—eliminate dependency on national infrastructure… regional utilities, communications and transportation are unreliable
wider planning is required—upstream and downstream dependencies must be addressed
16. Copyright 2007 The William Travis Group, Inc. 16 Mythconception #10 “BC, not DR, is the real objective”
business continuity is the goal and business processes are the drivers but technical recovery is the solution
NextGen alternative
achieve simpler Business Continuity by focusing more on Disaster Recovery
eliminate manual bridging, lost data re-entry and business catch-up—the more technology recovered the less the business user’s must accommodate ATOD
replace unrealistic, high-overhead manual processes with the applications that you use every day—incremental capabilities are “cheap”
17. Copyright 2007 The William Travis Group, Inc. 17 Mythconception #11 “Our nightly tape backups will ensure a 24 hour RPO”
few large shops will achieve the recovery they anticipate by relying on their “normal“ tape backups
actual backups are never what they are expected to be
technology cannot eliminate the need for understanding interdependencies and synchronization requirements
NextGen alternative
conduct a detailed application data analysis immediately
assign ownership of disaster data recovery to applications
conduct a zero-based analysis of your production data backup policies and modify them to facilitate disaster data restores
use advanced disk technology to achieve your real backup requirements
18. Copyright 2007 The William Travis Group, Inc. 18 Mythconception #12 “The best recovery plans consist of team to do lists”
most plans are far too simplistic and are not useable at time of disaster—remember, most employees have never done this before
most plans are largely unreadable due to their disorganized mix of everything DR
Simpler is better, but simplicity comes from clarity which comes from detail and organization
NextGen alternative
100% action oriented plans – understand the difference between filler, methodology and recovery tasks
explicitly define the recovery timeline from first step to the return home including escalation and de-escalation
one size does not fit all – DR Master Site Plan, BC Master Site Plan, Regional Office Plan, Small Office Plan, Home Office Plan, S.O.A.P.s
19. Copyright 2007 The William Travis Group, Inc. 19 Mythconception #13 “Shorter RTO’s require “advanced” recovery techniques”
reduce RTO through planning not spending
focus on meeting your RTOs before you focus on shortening them
NextGen alternative
improve notification and communication processes
shorten disaster assessment to achieve faster recovery
mobilize and deploy preemptively to speed recovery
better define recovery tasks and interdependencies
often, days can be shaved off the recovery timeline just by tightening the process
20. Copyright 2007 The William Travis Group, Inc. 20 Mythconception #14 “Data synchronization is the user’s responsibility”
synchronization problems can completely invalidate a recovery capability
realistically consider the impact of lost data
few if any business departments can still recover lost data manually
NextGen alternative
use technology to solve the RPO problem—advanced data availability technologies solve the unsolvable
eliminate decades of futility with a one time capital expense
enjoy new price-performance with new “second tier” technologies
shave years off the development effort and man-years off the maintenance effort with data mirroring technologies
21. Copyright 2007 The William Travis Group, Inc. 21 Mythconception #15 “The RTO and RPO are completely different issues”
only in the most literal interpretation
the largest part of RTO is dependant on the RPO
most RPO solutions are needed in order to shorten RTOs
NextGen alternative
include core infrastructure in your RTO calculations and pre-stage it whenever possible
remember to consider dependency groups
eliminate tape recovery for all except stand-alone applications
when using tapes, optimize for restoration not backup
22. Copyright 2007 The William Travis Group, Inc. 22 Mythconception #16 “Communicating ATOD is the most important aspect of a successful recovery”
communications not communicating is the most critical factor, and it doesn’t happen naturally!
NextGen alternative
differentiate operational communicating from strategic communications
pre-define messages, audiences, vehicles for in-bound and out-bound
frame all communications and don’t forget the “little timeline”
pre-develop all communications messages
error towards over-communication
use an automated system and all possible vehicles – redundancy saves the day
23. Copyright 2007 The William Travis Group, Inc. 23 Mythconception #17 “Working from home is the least expensive and most effective work area replacement”
casual telecommuting misleads us to believe that working from home ATOD is a fully viable solution—it usually isn’t
unless specifically pre-planned, telecommuting is usually a non-starter
NextGen alternative
in typical client-server environments, discount home computers as viable workstations
existing RAS capabilities often don’t meet required recovery capacities
do not underestimate the physical proximity requirements of many business processes
maximize shift work and account for the increasing/decreasing needs as the recovery unfolds
24. Copyright 2007 The William Travis Group, Inc. 24 Mythconception #18 “Disaster Recovery is becoming too complicated… it’s really a fairly simple process”
anyone who really believes that DR/BC is a simple process, obviously doesn’t understand the problem
NextGen alternative
the only way to simplify the issue is to proactively design and document the complexity out of the process
step-by-step recovery plans with explicit instructions
choreographed mobilization, deployment, workarea usage
pre-drafted communications with “multi flavor” messages
detailed alternative procedures per scenario, particularly on the business side
25. Copyright 2007 The William Travis Group, Inc. 25 Mythconception #19 “An organization needs both a Disaster Recovery Plan and a Business Continuity Plan to meet Best Practice requirements”
there is a new sheriff in town, All-Risk Incident Management trumps Business Continuity
today’s planning requirements are much broader than simple Business Continuity
NextGen alternative
implement a holistic sub-plan approach to deal with any and all risks – pandemic operations, succession planning, supply chain planning, crisis communications, product liability, etc.
implement a single common communications structure between sub-plans
employ milestone management at the senior management top level
26. Copyright 2007 The William Travis Group, Inc. 26 Mythconception #20 “7 out of 10 businesses that experience a disaster without a DR plan are out of business within five years”
or… “Of the companies experiencing disasters, 43% never reopen, and 29% close within two years” or…“Some 40% of companies that experience a devastating loss to their data systems never reopen their doors” or…“Of 350 businesses in the World Trade Center before the bombing, 150 were out of business a year later”
NextGen alternative
implement a pragmatic recovery capability NOW!
realistically address data backup, synchronization and restoration
develop a scenario-based, multi-threaded plan that will really work
change untenable recovery architectures and get production performance from your DR investment
27. Copyright 2007 The William Travis Group, Inc. 27 Why NextGen? Plan for new risks - targeted attacks, wide scale unavailability of staff
Apply new technology and replace tape recovery for critical functions or large environments
Build independence from over-allocated shared commercial facilities
Implement active site model to insure continuity through actual use versus testing
New breadth of recoverability… city or even metro-wide
New depth of recoverability… length of disaster (tertiary site)
Address worst case volumes not average case
Re-evaluate DR/BC ownership model
Leverage DR/BC industry confusion and weakness
Re-purpose existing assets and resources
Simplify and reduce traditional documentation
Define pragmatic limits
28. Copyright 2007 The William Travis Group, Inc. 28
29. Copyright 2007 The William Travis Group, Inc. 29 Ask the Hard Questions Is your recovery solution as “right” as it was just 1 or 2 years ago?
Do you understand what won’t be recovered?
How many employees will be out-of-work after a disaster?
Are you prepared to permanently lose the amount of data your current backup model risks?
How much of you business is dependant on paper records?
Can your critical functions wait while systems return to normal?
How many skilled technicians will it take to recovery 100s of servers?
Do you have enough technical staff to cover 3 or more sites ATOD?
Can you really synchronize thousands of files to a single point in time?
Are you certain that you are not overpaying for your recovery capability?
30. Copyright 2007 The William Travis Group, Inc. 30 A New Checklist for Effective DR/BC …NextGen Axioms Implement a more pragmatic recovery architecture
Define and mandate minimum standardized recovery levels
Establish a CCO position
Recast the BIA process
Centralize control of Open Systems
Decentralize business operations
Eliminate the “fail first then recover” model
Centralize ownership of DR/BC
Eliminate tape-based recovery
Implement a new, more effective plan model
Achieve Business Continuity by focusing on Disaster Recovery
Reduce RTO through planning not spending
Plan vertically, not horizontally
Use technology to solve the RPO problem
Pursue an Active-Active model and restructure testing
Leverage industry weakness while it lasts
Implement a holistic approach to DR-BC-CM
Leverage production inactivates for recovery purposes
Eliminate dependency on national infrastructure
31. Copyright 2007 The William Travis Group, Inc. 31 Mythconception #21 “A shared hotsite is the most cost-effective recovery solution“
mainframe-centric—inverse benefits of shared cost model
the traditional fail-first-then-recover model is too complicated for large, multi-platform shops
commercial site “virtualization” and changing risk management limits further complicate recoverability
NextGen alternatives
proactively evaluate the new price/performance of in-house recovery and leverage production initiatives for disaster resilience
leverage the inherent redundancy of your open system environment
leverage server consolidation and repurpose existing assets for simplified and cost-effective recovery
maximize hybrid solutions and multi-tiered architectures
explore the possibilities of limited risk consortiums—again!
32. Copyright 2007 The William Travis Group, Inc. 32 Mythconception #22 “An automated planning package is the best way to develop and maintain your DR/BC plan”
less is more when it comes to documentation tools
few organizations need the overhead of an automated tool
remember the old saw “do things in recovery just as you would in production”
NextGen alternative
simplify the recovery plan by improving the recovery architecture
automate processes and eliminate documents
simplify with built-to-purpose documents—differentiate recovery procedures from operating procedures
centralize maintenance to reduce effort and improve results
use your common toolset – web sites, version control, collaboration
33. Copyright 2007 The William Travis Group, Inc. 33 Mythconception #23 “Quickship is the preferred solution to recover non-critical systems”
many Quickship “offerings” are now backed-up by manufacturer and/or distributor agreements vs. physical inventories
consider the recovery site and connectivity speed
NextGen alternative
hotsite hardware is now “mobile”
installed hotsite hardware can sometimes be priced as Quickship
manufacturer maintenance services can provide inexpensive replacements along with a tech, often in the same timeframe
you usually can’t recover 100s of systems in 2-3 days anyway
34. Copyright 2007 The William Travis Group, Inc. 34 Mythconception #24 “As the relative cost of hardware decreases, disaster recovery planning becomes much more cost effective”
what ever happened to Moore’s law?
most expensive aspect of DR is not hardware
upgrade costs vary inversely to hardware costs
HA breaks the traditional model
NextGen alternative
leverage industry weakness while it lasts
shorter term contracts - aggressive T&Cs
upgrade concessions - understand pricing units
understand physical vs. subscription configs
understand the shell game - where is the gear?
35. Copyright 2007 The William Travis Group, Inc. 35 Mythconception #25 “Network (LAN) recovery is the easiest part of DR”
the complexity of the LAN is usually underestimated and it’s recovery is under orchestrated
NextGen alternative
don’t waste time on rediscovery—backup and store device configurations
realize the impact DNS changes have on most organizations
subnets, VLANs, AD, Single Sign-ons, Firewalls, Proxy Servers, etc. develop a life of their own over time which is nearly impossible to re-develop ATOLD ad hoc
production bandwidth is seldom replicated in recovery mode—understand the impact to operations
36. Copyright 2007 The William Travis Group, Inc. 36 Mythconception #26 “A well-documented shop already has much of what it needs for disaster recovery”
the difference between production procedures and recovery procedures are subtle but significant—they are not interchangeable
inventories are easily re-purposed, procedures are not
“To-Do List” plans are not worth the effort to develop
NextGen alternative
develop “timeline based” plans
focus on single-purpose procedures
maximize use of self-documenting resources – config files, third parties, PBX directories, etc.
reference documentation from original sources – HR, vendors, etc.
37. Copyright 2007 The William Travis Group, Inc. 37 Mythconception #27 “The planning process is essentially the same for open systems as for mainframe systems”
less location constrained, less costly environmentals and support, more existing redundancy, shorter RTO and RPO but simpler data availability
more unique configurations, greater quantities, more dependant tiers
NextGen Alternative
differentiate production, test, development, pre-release, etc. environments and maximize the leverage of repurposing
separate existing assets and leverage inherent modularity
understand the breakage associated with recovering dozens or hundreds of servers and develop intelligent and pragmatic solutions
38. Copyright 2007 The William Travis Group, Inc. 38 Mythconception #28 “Functional alignment of DR/BC should fall under security (or operations, or risk management, or…)”
none of these departments can effectively cross all necessary business units to meet the need
each prevents a holistic approach to DR, BC, CM and Security
absent a cross discipline authorization, board review can only offer lip service
NextGen alternative
reframe DR/BC governance and establish CCO position
raise DR/BC to a legitimate board level concern with legitimate ownership