270 likes | 454 Views
StarFish : highly-available block storage. Eran Gabber Jeff Fellin Michael Flaster Fengrui Gu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003 USENIX Annual Technical Conference Presenter: D00922019 林敬棋. Introduction. Important data need to be protected .
E N D
StarFish: highly-available block storage Eran Gabber Jeff Fellin Michael Flaster FengruiGu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003 USENIX Annual Technical Conference Presenter: D00922019 林敬棋
Introduction • Important data need to be protected. • Making replicas. • Replication on remote sites • Reduce the amount of data lost in failure. • Decrease the time required to recover from catastrophic site failure.
StarFish • A highly-available geographically-dispersed block storage system. • Does not require expensive dedicated communication lines to all replicas to achieve highly-available . • Achieves good performance even during recovery from a replica failure. • Single-owner access semantics.
Architecture • StarFish consists of • One Host Element(HE) • Provides storage virtualization and read cache. • N Storage Element(SE) • Q: write quorum size. • Synchronous updates to a quorum of Q SEs, and asynchronous updates to the rest.
Recommended Setup N = 3, Q = 2 MAN : Metropolitan Area Network WAN :Wide Area Network
SE Recovery • Write log • HE keeps a circular buffer of recent writes. • Each SE maintains a circular buffer of recent writes on a log disk. • Three types of recovery • Quick recovery • Replay recovery • Full recovery
Availability and Reliability • Assume that the failure and recovery processes of the network links and SEs are i.i.d Poisson processes with combined mean failure and recovery rates of λ and μ per second. • Similarly, the HE has Poisson-distributedλhe and μhe .
Availability • The steady-state probability that at least Q SEs are available. • Derived from the standard machine repairman mode.
Availability(cont.) • X★9:the number of 9s in an availability measure. • Achieve a much higher availability when N = 2Q + 1. • For fixed N, availability decrease with larger quorum size. • Increasing quorum size trades off availability for reliability.
Reliability • The probability of no data loss. • The reliability increases with larger Q. • Two approaches • Make Q > floor(N/2) and at least Q SEs are available. • Reduce availability and performance. • Read-only consistency
Read-only Consistency • Available in read-only mode during failure. • Read-only mode obviates the need for Q SEs to be available to handle updates. • Increase availability
Observations • If ρhe = 0, availability is independent of Q. • Can always recover from HE. • If ρhe increase, availability increase with Q. • Largest increase occurs from Q = 1 to Q = 2, and bounded by 3/16 when ρ = 1. • Diminishing gain after Q = 2. • Suggest Q = 2 in practical system.
Performance Measurements • Compares with a direct-attached RAID unit.
Settings • Different network delays • 1, 2, 4, 8, 23, 36, 65 ms • Different bandwidth limitations • 31, 51, 62, 93, 124 Mb/s. • Benchmark: • Micro-benchmark • Read hit • Read miss • Write • PostMark
Effects of network delays and HE cache size • Near SE delay: 4ms; Far SE delay: 8ms • No cache miss if HE cache size = 400MB
Observation • Large HE cache improves performance. • HE can respond to more read requests without communicating with SE. • Does not change write requests. • Especially beneficial when local SE has significant delays. • Q = 2 and 400MB cache size is not influenced by the delay to local SE. • Depend on near SE.
Normal Operation and placement of the far SE • 1-8: 1, 2, 4, 8 ms; 4-12: 4, 8, 12 ms • 23-65: 23, 36, 65 ms; 31-124: 31,51,62,93,124 Mbps • Local SE delay: 0ms N = 3
Normal Operation and placement of the far SE(Cont.) N = 3 8 threads
Observation • Performance is influenced mostly by two parameters • Write quorum size • Delay to the SE. • StarFish can provide adequate performance when one of the SEs is placed in a remote location. • At least 85% of the performance of a direct-attached RAID.
Recovery • Performance degrades more during full recovery.
Conclusion • The StarFish system reveals significant benefits from a third copy of the data at an intermediate distance. • A StarFish system with 3 replicas, a write quorum size of 2, and read-only consistency yields better than 99.9999% availability assuming individual Storage Element availability of 99%.