90 likes | 122 Views
Efficient Replica Maintenance for Distributed Storage Systems. B-G Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. Kaashoek, J. Kubiatowicz, and R. Morris, In Proc. of NSDI, May 2006. Presenter: Fabi án E. Bustamante. Replication in Wide-Area Storage.
E N D
Efficient Replica Maintenance for Distributed Storage Systems B-G Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. Kaashoek, J. Kubiatowicz, and R. Morris, In Proc. of NSDI, May 2006. Presenter: Fabián E. Bustamante Fabián E. Bustamante, Fall 2005
Replication in Wide-Area Storage • Applications put & get objects in/from the wide-area storage system • Objects are replicated for • Availability • Get on an object will return promptly • Durability • Object put by the app are not lost due to disk failures • An object may be durably stored but not immediately available EECS 443 Advanced Operating SystemsNorthwestern University
Goal: durability at low bandwidth cost • Durability is a more practical & useful goal • Threat to durability • Loose the last copy of an object • So, create copies faster than they are destroyed • Challenges • Replication can eat your bandwidth • Hard to distinguish bet/ transient & permanent failure • After recover, some replicas may be in nodes the lookup algorithm does not check • Paper presents Carbonite – efficient wide-area replication technique for durability EECS 443 Advanced Operating SystemsNorthwestern University
System Environment • Use PlanetLab (PL) as representative • >600 nodes distributed world-wide • History traces collected by CoMon project (every 5’) • Disk failures from event logs of PlanetLab Central • Synthetic traces • 632 nodes as PL • Failure inter-arrival times from exponential dist. (mean session time and downtime as in PL) • Two years instead of one and avg node lifetime of 1 year • Simulation • Trace-driven event-based simulator • Assumptions • Network paths are independent • All nodes reachable from all other nodes • Each node with same link capacity EECS 443 Advanced Operating SystemsNorthwestern University
Understanding durability • To handle some avg. rate of failure – create new replicas faster than they are destroyed • Function of per-node access link, number of nodes, amount of data stored per node • Infeasible system – unable to keep pace w/ avg. failure rate – will eventually adapt by discarding objects (which ones?) • If creation rate is just above failure rate – failure burst may be a problem • Target replicas to maintain – rL • Durability does not increased continuously with rL EECS 443 Advanced Operating SystemsNorthwestern University
Improving repair time • Scope – set of other nodes that can hold copies of the objects a node is responsible for • Small scope • Easier to keep track of copies • Effort of creating copies fall on a small set of nodes • Addition of nodes may result on needless copying of objects (when combined w/ consistent hashing) • Large scope • Spread work among more nodes • Network traffic source/destination are spread • Temp failures will be noticed by more nodes EECS 443 Advanced Operating SystemsNorthwestern University
Reducing transient costs • Impossible to distinguish transient/permanent failures • To minimize net traffic due to transient failures: reintegrate replicas • Carbonite • Selecet a suitable value for rL • Respond to detected failure by creating new replica • Reintegrate replicas Bytes sent by different maintenance algorithms EECS 443 Advanced Operating SystemsNorthwestern University
Reducing transient costs Bytes sent w/ and w/o reintegration Impact of timeouts on bandwidth and durability EECS 443 Advanced Operating SystemsNorthwestern University
Assumptions • The PlanetLab testbed can be seen as representative of something • Immutable data • Relatively stable system membership & data loss driven by disk failures • Disk failures are uncorrelated • Simulation • Network paths are independent • All nodes reachable from all other nodes • Each node with same link capacity EECS 443 Advanced Operating SystemsNorthwestern University