120 likes | 487 Views
.99999. Dan Oberst, Princeton University. Some Definitions. Reliability Metrics: Percent Uptime. Reliability Gotchas. 2 hour outage in 1 year Requires 23 years of 100% uptime for .99999 99% Availability (88 hours/year) One 3+ day outage One ~7 hour outage every month
E N D
.99999 Dan Oberst, Princeton University
Some Definitions • Reliability Metrics: Percent Uptime Dan Oberst, Princeton University
Reliability Gotchas • 2 hour outage in 1 year • Requires 23 years of 100% uptime for .99999 • 99% Availability (88 hours/year) • One 3+ day outage • One ~7 hour outage every month • One ~1½ hour outage every week • Reliability isn’t the whole story Dan Oberst, Princeton University
The Weakest Link • No system can be more reliable than any of its components • System reliability is product of component reliability Dan Oberst, Princeton University
Beyond Uptime • Scheduled Uptime • How much can you afford to be down? • = How much do you need to plan to be up? • 24x7, 24x6.75, 18x7, etc. • RTO (Recovery Time Objective) • How long before the system is back? • How long can you afford to be without it? • RPO (Recovery Point Objective) • How much lost work? Dan Oberst, Princeton University
Example Service Levels Dan Oberst, Princeton University
How’re We Doin’? • Gartner CIO Poll • How would you rank your most critical applications in unplanned downtime in the past year? Dan Oberst, Princeton University
How’re We Doin’? (cont.) • How would you rank your most-critical application in planned downtime during the past year? Dan Oberst, Princeton University
Getting to .99999 • Enhanced Availability • Redundancy • RAID • High Availability • Clustering • Remote mirroring • Fault-Tolerant • All resources (including application) replicated Dan Oberst, Princeton University
Five Nines • It’s hard, it’s expensive. • Match the reliability to the service. • Improve the component with the fewest nines. • Find the cheapest nines in the chain. • Review assumptions. • Practice3!! • Moore’s Law is your friend. Dan Oberst, Princeton University
Resources • CIO Update: Poll Shows Application Availability Levels Have Increased, D. Scott, Gartner Article G00120892, 12 May, 2004. • Real-Time Enterprise: Business Continuity and Availability, D, Scott, J. Krischer, Gartner Research Note SPA-18-1683, 24 September, 2002. • Performance Tuning Active Call Center for Enterprise Applications, Sunny Beach Technology, Inc. White Paper, 7 January, 2001, http://www.sunny-beach.net. Dan Oberst, Princeton University