1 / 48

Actor Model: A Scalable Solution for Concurrent and Distributed Programming

Explore the benefits of the Actor model for concurrent and distributed programming, including lightweight processes, asynchronous message passing, and coordination-free consistency.

apopp
Download Presentation

Actor Model: A Scalable Solution for Concurrent and Distributed Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anna: A KVS For Any ScaleChenggang Wu, Jose M. Faleiro, Yihan Lin, Joseph M. HellersteinUC Berkeley, Yale University, Columbia University Presented by Cheng Li Advanced Data Systems Lab, USTC 2018-3-23

  2. Different scales

  3. Concurrent programming • = when programs are designed as collection of interacting computational processes that may be executed in parallel (Wikipedia) • Inter-dependent processes • executing simultaneously • affecting each-other's work • must exchange information to do so

  4. Threads & Shared Memory • Smallest executable unit is thread • Communication is implemented by sharing variables

  5. Threads & Shared Memory

  6. Threads & Shared Memory • Cons: • Complicated & error-prone client code • Not extendable to distributed programming • Threads are heavy-weight – not too scalable • Examples: • Standard concurrency libraries in Java, C#, etc..

  7. Distributed programming X Client a.withdraw(10) A T 1 T Y T = openTransaction openSubTransaction T B b.withdraw(20) 2 a.withdraw(10); openSubTransaction b.withdraw(20); Z openSubTransaction . T c deposit(10) C c.deposit(10); 3 openSubTransaction d.deposit(20) D d.deposit(20); T 4 closeTransaction

  8. Distributed programming • Distributed transactions across machines • Strong assumptions: ACID properties • High cost for coordinating on conflictsand tolerating faults

  9. Solutions • Wait-free execution • Replacing Threads + shared memory with Actor model • Coordination-free consistency • From strong ACID to weak ACI properties with replication

  10. Wait-free execution

  11. Preliminaries 1: Actor Model • Smallest executable unit is an actor • An actor is a concurrency primitive that does not share any resources with other actors • Communication is implemented by actors sending each-other messages

  12. Workflow: event-driven + asynchronous

  13. Actors and event-drivenprogramming • No “sleeping” or “waking up” • Actors get passive when no more messages of interest to be processed • Passive actors activate immediately when an interesting message arrives

  14. Asynchronous messagepassing • Each actor has a “mailbox”. • If message arrives when actor is busy, it gets stored in it's mailbox • If actor arrives at a point where it waits new messages to continue, it picks up the first suitable message to proceed

  15. Benefits of Actor model • Light-weight processes • 1 code-level thread = 1 operating system thread • 1 operating system thread = 1 active actor + unlimited amount of inactive actors • Natural extension to distributed Env • Local and remote actors “look the same” • Additional lookup and identification required

  16. Coordination-free consistency

  17. Preliminaries 2: ACI properties ACID ACI Associative Commutative Idempotence • Atomicity • Consistency • Isolation • Durability

  18. Preliminaries 2: ACI properties • Replicate data at many nodes • Performance: local reads • Fault-tolerance: no data loss unless all replicas fail or become unreachable • Availability: data still available unless all replicas fail or become unreachable • Scalability: load balance across nodes for reads • Updates • Push to all replicas • Consistency: expensive!

  19. Conflicts • Updating replicas may lead to different results inconsistent data s1 3 7 5 s2 3 7 5 s3 5 7 3

  20. Strong Consistency • All replicas execute updates in same total order • Deterministic updates: same update on same objects  same result s1 3 7 5 3 s2 3 7 5 s3 7 5 7 3 coordinate

  21. Strong Consistency • All replicas execute updates in same total order • Deterministic updates: same update on same objects  same result • Requires coordination and consensus to decide on total order of operations • N-way agreement, basically serialize updates  very expensive!

  22. Eventual Consistency • If no new updates are made to an object all replicas will eventually converge to the same value • Update local and propagate • No consensus in the background  scale well for both reads and writes • Expose intermediate state • Assume, eventual, reliable delivery • On conflict • Arbitrate & Rollback

  23. Eventual Consistency • If no new updates are made to an object all replicas will eventually converge to the same value • Move consensus to background • However: • High complexity • Unclear semantics if application reads data and then we have a rollback!

  24. Strong Eventual Consistency • Like eventual consistency but with deterministic outcomes of concurrent updates • No need for background consensus • No need to rollback • Available, fault-tolerant, scalable • But not general; works only for a subset of updates

  25. CRDTs Conflict-free Replicated Data Types • Data Types whose operations that are • Associative — A • (B • C) = (A • B) • C • Commutative — A • B = B • A

  26. Portfolio of CRDTs • Counter • Unlimited • Non-negative • Map • Set of Registers • … … • Register • Last-Writer Wins • Set • Grow-Only • 2P • Observed-Remove • … …

  27. Observed-Remove Set • Sequential specification: • {true} add(e) {e ∈ S} • {true} remove(e) {e ∉ S} • {true} add(e) || remove(e) {????} • linearizable? • add wins? • remove wins? • last writer wins? • error state?

  28. Observed-Remove Set

  29. Observed-Remove Set

  30. Observed-Remove Set

  31. Observed-Remove Set

  32. Observed-Remove Set

  33. Observed-Remove Set

  34. Observed-Remove Set

  35. Observed-Remove Set

  36. Observed-Remove Set • add(e) = A ≔ A ∪ {(e, α)} • Remove: all unique elements observed remove(e) = R ≔ R ∪ { (e, –) ∈ A} • lookup(e) = ∃ (e, –) ∈ A \ R • merge (S, S') = (A ∪ A', R ∪ R') • {true}add(e) || remove(e) {e ∈ S}

  37. Summary • Anna is motivated by • Threads + shared memory is costly and not applicable to distributed env • ACID properties are too strong • High performance at various scales is demanded • Anna is an efficient KVS relying on: • Actor model, a nice abstraction for scaling concurrent and distributed programs • CRDTs eliminating coordination by restricting types of supported objects for concurrent updates

  38. Open questions? • Do you think this work is novel? • Theory innovations? • Engineering innovations? • Is it sufficient to offer multi-key transactions with only atomicity and durability?

  39. Thanks for listening! Cheng Li 2018-3-23

  40. Outline • Background • Introduction to CRDTs • A case for scaling relational databases • Summary

  41. Two-tier Application Model • Observation: Side effects are encapsulated into a sequence of DB statements App Server App Server App Server Database

  42. Two-tier Application Model • Observation: Side effects are encapsulated into a sequence of DB statements • Insight: We can model the database using commutative replicated data types (CRDTs) App Server App Server App Server CRDT Database

  43. Leveraging CRDTs • Transform each DB statement into one or more CRDT operations • Programmers only annotate schema with CRDTs: Counter/ Rewritable value DB Table Set DB Field Transaction: [CRDT_OP1; CRDT_OP2; CRDT_OP3;…]

  44. CRDT Annotation Example

  45. CRDT Annotation Example

  46. Experimental Setting • Local cluster • Maximum of 10 nodes • Clients spread across 5 nodes • TPC-C benchmark • Gold standard for database transaction processing • TPC-C standard + Read Dominant workloads • Baselines • MySQL-Cluster • Sharding • Galera-Cluster • Full replication

  47. Performance Evaluation 5 replicas TPC-C Standard

  48. Scalability Evaluation TPC-C Standard

More Related