1 / 28

Edelweiss: Automatic Storage Reclamation for Distributed Programming

Edelweiss: Automatic Storage Reclamation for Distributed Programming. Neil Conway Peter Alvaro Emily Andrews Joseph M. Hellerstein University of California, Berkeley. Mutable shared state. Frequent source of bugs. Hard to scale. Accumulate & exchange sets of immutable events

alvin-mccoy
Download Presentation

Edelweiss: Automatic Storage Reclamation for Distributed Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Edelweiss:Automatic Storage Reclamation for Distributed Programming Neil Conway Peter Alvaro Emily Andrews Joseph M. Hellerstein University of California, Berkeley

  2. Mutable shared state Frequent sourceof bugs Hard to scale

  3. Accumulate& exchange sets of immutable events • No mutation/deletion • To delete: add new event • “Event X should be ignored” • Current state: query over event log EventLogging

  4. Example: Key-Value Store Event Logging i_log = Set.new d_log = Set.new Insert(k, v): i_log << [k,v] Delete(k): d_log << k View(): i_log.notin(d_log, :k => :k) Mutable State tbl = Hash.new Insert(k, v): tbl[k] = v Delete(k): tbl.delete(k) View(): tbl Update-in-place Set union Deletion Compute “live” keys

  5. Benefits of Event Logging • Concurrency • Replication • Undo/redo • Point-in-time query, audit trails (Sometimes: performance!)

  6. Example Applications • Multi-version concurrency control (MVCC) • Write-ahead logging (WAL) • Stream processing • Log-structured file systems Also: CRDTs, tombstones, purely functional data structures, accounting ledgers.

  7. Observation: Logs consume unbounded storage Solution: Discard log entries that are“no longer useful”(garbage collection)

  8. Observation: Logs consume unbounded storage Challenge: Discard log entries that are“no longer useful”(garbage collection)

  9. Traditional Approach “No longer useful” defined by application semantics • No framework support • Every system requires custom GC logic • Reinvented many times • >25 papers propose ~same scheme!

  10. Engineering Challenges • Difficult to implement correctly • Too aggressive: destroy live data • Too conservative: storage leak • Ongoing maintenance burden • GC scheme and application code must be updated together

  11. Our Approach • New language: Edelweiss • Based on Datalog • No constructs for deletion or mutation! • Automatically generate safe, application-specific distributed GC protocols • Present several in-depth case studies • Reliable unicast/broadcast, key-value store, causal consistency, atomic registers

  12. Base Data (“Event Logs”) Derived Data ( “Live View”) Query

  13. A log entry is useful iff it might contribute to the view. The queries define how log entries contribute to the view. Goal:Find log entries that will never contribute to the viewin the future.

  14. Semantics of Base Data • Accumulate and broadcast to other nodes • Datalog: monotonic • Set union: grows over time • CALM Theorem [CIDR’11]: event log guaranteed to be eventually consistent

  15. Semantics of Derived Data Growsand shrinksover time • e.g., KVS keys added and removed Hence,not monotonic

  16. Common Pattern Live View = set difference between growing sets

  17. Semantics of Set Difference X= Y – Z • Z grows: Xshrinks • If tappears in Z, t will never again appear in X • “Anti-monotone with respect to Z” i_log = Set.new d_log = Set.new Insert(k, v): i_log << [k,v] Delete(k): d_log << k View(): i_log.notin(d_log, :k => :k) Can reclaim from i_logupon match in d_log

  18. Other Analysis Techniques • Reclaim from negative notin input • Often called “tombstones” • E.g., how to reclaim from d_log in the KVS • Reclaim from join input tables • DisseminateGC metadata automatically • Exploit user knowledge for better GC • Punctuations [Tucker & Maier ‘03]

  19. Whole Program Analysis • For each query q, find condition when input t will never contribute to q’s output • “Reclamation condition” (RC) • For each tuple t, find the conjunction of the RCs for t over all queries • When all consumers no longer need t: safe to reclaim

  20. Input program + deletion rules “Positive” program:no deletion or statemutation Edelweiss Input Program Source To Source Rewriter Datalog Output Program Datalog Evaluator Compute RCs, add deletion rules

  21. Comparison of Program Size Only19 rules!

  22. Takeaways • No storage management code! • Similar tomalloc/free vs. GC • Programs are concise and declarative • Developer: just compute current view • Log entries removed automatically • Reclamation logic  application code always in sync

  23. Conclusions • Event logging: powerful design pattern • Problem: need for hand-written distributed storage reclamation code • Datalog: natural fit for event logging • Storage reclamation as a compiler rewrite? Results: • Automatic, safe GC synthesis! • High-level, declarative programs • No storage management code • Focus on solving domain problem

  24. Thank You!

  25. Future Work: Checkpoints • Closely related to simple event logging • Summarize many log entries with a single “checkpoint”record • View = last checkpoint + Query(¢Logs) • General goal: reclaim space by structural transformation, not just discarding data

  26. Future Work: Theory • Current analysis is somewhat ad hoc • If program does not reclaim storage, two possibilities: • Program is “not reclaimable” in principle • (Possible program bug!) • Our analysis is not complete • (Possible analysis bug!) How to characterize the class of “not reclaimable” programs?

  27. Reclaiming KVS Deletions • Good question  • X.notin(Y): how to reclaim from Y? • Y is a dense ordered set; compress it. • Prove that each Y tuple matches exactly oneX tuple i_log = Set.new d_log = Set.new Insert(k, v): i_log << [k,v] Delete(k): d_log << k View(): i_log.notin(d_log, :k => :k) k is a keyof i_log

More Related