1 / 18

J. Gray, Dependability in the Internet Era (acknowledgement: slides from J.Gray, E.Brewer)

J. Gray, Dependability in the Internet Era (acknowledgement: slides from J.Gray, E.Brewer). Telephone Systems. Computer Systems. Internet. Cell phones. The Last 10 Years: Availability Dark Ages Ready for a Renaissance?. Things got better, then things got a lot worse!. 99.999%. 99.999%.

liora
Download Presentation

J. Gray, Dependability in the Internet Era (acknowledgement: slides from J.Gray, E.Brewer)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. J. Gray, Dependability in the Internet Era • (acknowledgement: slides from J.Gray, E.Brewer)

  2. Telephone Systems Computer Systems Internet Cell phones The Last 10 Years: Availability Dark AgesReady for a Renaissance? • Things got better, then things got a lot worse! 99.999% 99.999% 99.99% Availability 99.9% 99% 9% 1950 1960 1970 1980 1990 2000 2010

  3. DEPENDABILITY: The 3 ITIES • RELIABILITY / INTEGRITY: Does the right thing.(also MTTF>>1) • AVAILABILITY: Does it now. (also 1 >> MTTR ) MTTF+MTTRSystem Availability:If 90% of terminals up & 99% of DB up?(=>89% of transactions are serviced on time). • Holistic vs. Reductionist view Security Integrity Reliability Availability

  4. Fail-Fast is Good, Repair is Needed Lifecycle of a module fail-fast gives short fault latency High Availability is low UN-Availability Unavailability ~ MTTR MTTF Improving either MTTR or MTTF gives benefit

  5. Disks (raid) the BIG Success Story • Duplex or Parity: masks faults • Disks @ 1M hours (~100 years) • But • controllers fail and • have 1,000s of disks. • Duplexing or parity, and dual path gives “perfect disks” • Wal-Mart never lost a byte (thousands of disks, hundreds of failures). • Only software/operations mistakes are left.

  6. Fault Tolerance vs Disaster Tolerance • Fault-Tolerance: mask local faults • RAID disks • Uninterruptible Power Supplies • Cluster Failover • Disaster Tolerance: masks site failures • Protects against fire, flood, sabotage,.. • Also, software changes, site moves,… • Redundant system and service at remote site.

  7. 9 9 9 9 9 Availability Un-managed Availability well-managed nodes Masks some hardware failures well-managed packs & clones Masks hardware failures, Operations tasks (e.g. software upgrades) Masks some software failures well-managed GeoPlex Masks site failures (power, network, fire, move,…) Masks some operations failures

  8. Case Studies - Tandem Trends MTTF improved Shift from Hardware & Maintenance to from 50% to 10% to Software (62%) & Operations (15%) NOTE: Systematic under-reporting of Environment Operations errors Application Software

  9. Dependability Status circa 1995 • ~4-year MTTF • 5 9s for well-managed sys. Fault Tolerance Works. • Hardware is GREAT (maintenance and MTTF). • Software masks most hardware faults. • Many hidden software outages in operations: • New Software. • Utilities. • Need to make all hardware/software changes ONLINE.

  10. Telephone Systems Computer Systems Internet Cell phones Progress? • MTTF improved from 1950-1995 • MTTR incremental improvements 1970 --- failover • Hardware and Software online change (pNp) is now standard • Then the Internet arrived: • No project can take more than 3 months. • Time to market is everything • Change is good.

  11. 1990 Phones delivered 99.999% ATMs delivered 99.99% Failures were front-page news. Few hackers Outages last an “hour” 2005 Cell phones deliver 90% Web sites deliver 99% Failures are business-page news Many hackers. Outages last a “day” The Internet Changed Expectations This is progress?

  12. 2006

  13. AtomicityConsistencyIsolationDurabilty Availability? Strong consistencyIsolation Focus on commit Conservative (Pessimistic) Difficult evolution (e.g. schema) Nested transactions BasicAvailabilitySoft StateEventual Consistency Availability FIRST Weak consistencystale data is OKApproximate answers OK Best effort Aggressive (optimistic) Easier Evolution. Simpler! Faster Eric Brewer said it best:ACID vs BASEthe internet litmus test I think it is a spectrum

More Related