StarFish : highly-available block storage

StarFish: highly-available block storage 資訊三李益昌B00902051 資訊三何柏勳 B00902097

Introduction • Data protection • Disk failure V.S. catastrophic site failure • Low price of disk drives and high-speed networking infrastructure • StarFish • Survive catastrophic site failure • Use IP network : (1) geographically-dispersed (2) inexpensive • Good performance • Block level

Architecture • One Host Element(HE) • Provides storage virtualization and read cache • N Storage Element(SE) • Q: write quorum size. • Synchronous updates to a quorum of Q SEs, and asynchronous updates to the rest. • Communicate by TCP/IP over high speed network

Architecture • Recommended configuration • N=3 Q=2

Architecture • Another configuration

Data consistency and SE recovery • Log • sequential number • NVRAM • Data consistency • Failure • RAID or network connection fails • SE recovery • Quick recovery • Replay recovery • Full recovery

Availability and reliability analysis • Parameter • SE failure process : • SE recovery process : • Number of SEs : N • Quorum size : Q • Model • SEs failure process is i.i.d Poisson process with mean rate • SEs recovery process is i.i.d Poisson process with mean rate • HE failure process Poisson process with mean rate • HE recovery process Poisson process with mean rate

Availability • a HE or SE is available if it can serve data • Availability of StarFish A(Q, N) : the steady-state probability that at least Q SEs are available • is called load , • Repairman model

Availability(cont.)

Availability(cont.) • SE availability = 1- • X★9:the number of 9s in an availability measure • Fixed N, availability decreases with large Q • Trade off availability for reliability

Reliability • Probability of data loss • HE and Q SEs fails • The reliability increases with larger Q • Two approach • Q > floor(N/2) and at least Q SEs are available • Reduce availability • Read-only consistency

Read-only consistency • Available in read-only mode during failure. • Read-only mode obviates the need for Q SEs to be available to handle updates. • Increase availability

Availability with Read-only Consistency

Implementation

Performance measurements

Setting • Gigabit Ethernet(GbE) with dummynet controlling delays and bandwith limit to model Internet links • Different network delays • 1, 2, 4, 8, 23, 36, 65 ms • Different bandwidth limitations • 31, 51, 62, 93, 124 Mb/s • Benchmark • Micro-benchmark • PostMark

Effects of network delays and HE cache size • Larger cache improves performance • Larger cache doesn’t change the response time of write requests

Normal Operation and placement of the far SE

Observation • Performance is affected by two parameters • Write quorum size Q • Delay to the SE • StarFish performs adequately when one of the SEs is placed in a remote location • At least 85% of the performance of a direct-attached RAID

Recovery • Performance degrades more during full recovery

Conclusion • The StarFish system reveals significant benefits from a third copy of data at an intermediate distance • A StarFish system with 3 replicas, a write quorum size of 2, and read-only consistency yields better than 99.9999% availability assuming individual Storage Element availability of 99%

StarFish : highly-available block storage