390 likes | 528 Views
Availability in Globally Distributed Storage Systems. Derek Weitzel. Failures in the System. Two major components in a Node. Applications. System. Failures in the System. Nebraska. Google. Bigtable. Cluster Scheduler. Application. GFS. Hadoop. File Systems. File Systems. System.
E N D
Availability in Globally Distributed Storage Systems Derek Weitzel
Failures in the System • Two major components in a Node Applications System
Failures in the System Nebraska Google Bigtable Cluster Scheduler Application GFS Hadoop File Systems File Systems System Hard Drive Hard Drive
Failures in the System • Similar systems at Nebraska Nebraska Google Bigtable Cluster Scheduler Application GFS Hadoop File Systems File Systems System Hard Drive Hard Drive
Failures in the System • Similar systems at Nebraska Nebraska Google Bigtable Cluster Scheduler Application GFS Hadoop File Systems File Systems System Hard Drive Hard Drive Failure will cause unavailability
Failures in the System • Similar systems at Nebraska Nebraska Google Bigtable Cluster Scheduler Application GFS Hadoop File Systems File Systems Could cause data loss System Hard Drive Hard Drive Failure will cause unavailability
Unavailability: Defined • Data on a node is unreachable • Detection: • Periodic heartbeats are missing • Correction: • Lasts until node comes back • System recreates the data
Unavailability: Measured Replication Starts
Unavailability: Measured Question: After replication starts, why does it take so long to recover? Replication Starts
Node Availability Storage Software Restart
Node Availability Storage Software Restart Software is fast to restart
Node Availability: Time Planned Reboots
Node Availability: Time Node updates (planned reboots) cause the most downtime. Planned Reboots
MTTF for Components • Even though Disk failure can cause data loss, node failure is much more often • Conclusion: Node failure is more important to system availability
Correlated Failures • Large number of nodes failing in a burst can reduce effectiveness of replication and encoding schemes • Losing nodes before replication can start can cause unavailability of data
Correlated Failures Rolling Reboots of cluster
Correlated Failures Oh s*!t, datacenter on fire! (maybe not that bad)
Coping with Failure Encoding Replication
Coping with Failure Encoding Replication 27,000 Years 27.3 M Years 3 replicas is standard in large clusters
Coping with Failure Cell Replication (Datacenter Replication)
Cell Replication Cell 1 Cell 2 Block A Block A Block A Block A
Cell Replication Cell 1 Cell 2 Block A Block A Block A Block A
Cell Replication Cell 1 Cell 2 Block A Block A Block A Block A
Cell Replication Cell 1 Cell 2 Block A Block A Block A Block A
Modeling Failures We’ve seen the data, now lets model the behavior.
Modeling Failures • A chunk of data can be in one of many states. • Consider when Replication = 3 3 2 1 0 Lose a replica, but still 2 available
Modeling Failures • A chunk of data can be in one of many states. • Consider when Replication = 3 Recovery 3 2 1 0 0 replicas = service unavailable
Modeling Failures • Each loss of a replica has a probability • The recovery rate is also known Recovery 3 2 1 0 0 replicas = service unavailable
Markov Model ρ= recovery λ= failure rate s = block replications r = minimum replication
Modeling Failures • Using Markov models, we can find:
Modeling Failures • Using Markov models, we can find: 402 Years Nebraska
Modeling Failures • For Multi-Cell Implementations
Paper Conclusions • Given enormous amount of data from Google, can say: • Failures are typically short • Node failures can happen in bursts, and are not independent • In modern distributed file systems, disk failure is the same as node failure. • Built Markov Model for failures that accurately reason about past and future availability.
My Conclusions • This paper contributed greatly by showing data from very large scale distributed file systems. • If Reed – Solomon striping is so much more efficient, why isn’t it used by Google? Hadoop? Facebook? • Complicated code? • Complicated administration?