1 / 33

SPANStore: Cost-Effective G eo-Replicated Storage Spanning Multiple Cloud Services

SPANStore: Cost-Effective G eo-Replicated Storage Spanning Multiple Cloud Services. Zhe Wu , Michael Butkiewicz, Dorian Perkins, Ethan Katz-Bassett, Harsha V. Madhyastha UC Riverside and USC. Geo-distributed Services for Low Latency. Cloud Services Simplify Geo-distribution.

alayna
Download Presentation

SPANStore: Cost-Effective G eo-Replicated Storage Spanning Multiple Cloud Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services Zhe Wu, Michael Butkiewicz, Dorian Perkins, Ethan Katz-Bassett, Harsha V. Madhyastha UC Riverside and USC

  2. Geo-distributed Services for Low Latency

  3. Cloud Services Simplify Geo-distribution

  4. Need for Geo-Replication • Data uploaded by a user may be viewed/edited by users in other locations • Social networking (Facebook, Twitter) • File sharing (Dropbox, Google Docs)  Geo-replication of data is necessary • Isolated storage service in each cloud data center Application needs to handle replication itself

  5. Geo-replication on Cloud Services Minimizing cost • Lots of recent work on enabling geo-replication • Walter(SOSP’11), COPS(SOSP’11), Spanner(OSDI’12), Gemini(OSDI’12), Eiger(NSDI’13)… • Faster performance or stronger consistency • Added consideration on cloud services

  6. Outline Problem and motivation SPANStore overview Techniques for reducing cost Evaluation

  7. SPANStore • Key value store (GET/PUT interface) spanning cloud storage services • Main objective: minimize cost • Satisfy application requirements • Latency SLOs • Consistency (Eventual vs. sequential consistency) • Fault-tolerance

  8. SPANStore Overview Data center A Data center B Read/write data based on optimal replication policy SPANStore Metadata lookups Library App Data center C Return data/ACK request Data center D

  9. SPANStore Overview Application Input SPANStore Characterization Latency, consistency and fault tolerance requirements Inter-DC latencies Pricing policies Data center B SPANStore Data center A SPANStore App Placement Manager Data center C Data center D workload Replication policy SPANStore SPANStore App App

  10. Outline Problem and motivation SPANStore overview Techniques for reducing cost Evaluation

  11. Questions to be addressed for every object: Where to store replicas How to execute PUTs and GETs

  12. Cloud Storage Service Cost Storage cost (the amount of data stored) + Request cost (the number of PUT and GET requests issued) Storage service cost = + Data transfer cost (the amount of data transferred out of data center)

  13. Low Latency SLO Requires High Replication in Single Cloud Deployment Latency bound = 100ms R R R R AWS regions

  14. Technique 1: Harness Multiple Clouds Latency bound = 100ms R R R R R R AWS regions

  15. Price Discrepancies across Clouds Leveraging discrepancies judiciously can reduce cost

  16. Range of Candidate Replication Policies Strategy 1: single replica in cheapest storage cloud R High latencies

  17. Range of Candidate Replication Policies Strategy 2: few replicas to reduce latencies High data transfer cost High data transfer cost High data transfer cost R R

  18. Range of Candidate Replication Policies Strategy 3: replicated everywhere High storage cost Optimal replication policy depends on: 1. application requirements 2. workload properties R R R R PUT High latencies& cost of PUTs

  19. High Variability of Individual Objects Error can be as high as 1000% 60% of hours have error higher than 50% 20% of hours have error higher than 100% Estimate workload based on same hour in previous week Analyze predictability of Twitter workload

  20. Technique 2: Aggregate Workload Prediction per Access Set • Observation: stability in aggregate workload • Diurnal and weekly patterns • Classify objects by access set: • Set of data centers from which object is accessed • Leverage application knowledge of sharing pattern • Dropbox/Google Docs know users that share a file • Facebook controls every user’s news feed

  21. Technique 2: Aggregate Workload Prediction per Access Set Aggregate workload is more stable and predictable Estimate workload based on same hour in previous week

  22. Optimizing Cost for GETs and PUTs Use cheap (request + data transfer) data centers R R R GET R

  23. Technique 3: Relay Propagation Asynchronous propagation (no latency constraint) 0.2$/GB 0.12$/GB R R R 0.19$/GB PUT R R 0.19$/GB 0.25$/GB

  24. Technique 3: Relay Propagation Asynchronous propagation (no latency constraint) Synchronous propagation (bounded by latency SLO) 0.2$/GB Violate SLO 0.12$/GB R R R 0.19$/GB PUT R R 0.19$/GB 0.25$/GB

  25. Summary • Insights to reduce cost • Multi-cloud deployment • Use aggregate workload per access set • Relay propagation • Placement manager uses ILP to combine insights • Other techniques • Metadata management • Two phase-locking protocol • Asymmetric quorum set

  26. Outline Problem and motivation SPANStore overview Techniques for reducing cost Evaluation

  27. Evaluation • Scenario • Application is deployed on EC2 • SPANStore is deployed across S3, Azure and GCS • Simulations to evaluate cost savings • Deployment to verify application requirements • Retwis • ShareJS

  28. Simulation Settings • Compare SPANStore against • Replicate everywhere • Single replica • Single cloud deployment • Application requirements • Sequential consistency • PUT SLO: min SLO satisfies replicate everywhere • GET SLO: min SLO satisfies single replica

  29. SPANStore Enables Cost Savings across Disparate Workloads Savings by price discrepancy of PUT request Savings by relay propagation #1: big objects, more GETs (Lots of data transfers from replicas) #2: big objects, more PUTs (Lots of data transfers to replicas) #3: small objects, more GETs (Lots of GET requests) #4: small objects, more PUTs (Lots of PUT requests) Savings by price discrepancy of GET request Savings by reducing data transfer

  30. Deployment Settings • Retwis • Scale down Twitter workload • GET: read timeline • PUT: make post • Insert: read follower’s timeline and append post to it • Requirements: • Eventual consistency • 90%ile PUT/GET SLO = 100ms

  31. SPANStore Meets SLOs SLO Insert SLO 90%ile

  32. Conclusions • SPANStore • Minimize cost while satisfying latency, consistency and fault-tolerance requirements • Use multiple cloud providers for greater data center density and pricing discrepancies • Judiciously determine replication policy based on workload properties and application needs

More Related