90 likes | 107 Views
Wide Area Events Using DDS. by Ramakrishna Gummadi and Barbara Hohlt. We have a model that can efficiently support a family of applications, Publish-Subscribe-Notify. To realize this model, we implemented Distributed Data Structures [GBHC2000] with B-trees.
E N D
Wide Area Events Using DDS by Ramakrishna Gummadi and Barbara Hohlt We have a model that can efficiently support a family of applications, Publish-Subscribe-Notify. To realize this model, we implemented Distributed Data Structures [GBHC2000] with B-trees. The B-tree DDS with FSMs and explicit scheduling give better concurrency than traditional multithreaded straight line code.
Example User Query: //product[price/msrp<300]/name Unlike current the implementation of Xfilter [AF2000], we only need to retrieve just those profiles which have price < 300 when we have B-tree DDS support. Data(from [AF2000]): <?xml version=“1.0”?> <catalog> <product id=“Kd-245”> <name> “Color Monitor” </name> <price currency=“USD”> <msrp>310.40</msrp> <wholesale> 257.80 </wholesale> </price> <notes> Hottest Product </notes> </product> </catalog>
Key Ideas fsms + queues = reduced locking The use of control-flow driven concurrency control and explicit scheduling permits us to remove locks. fsms + prefetching = reduced blocking “Locking without blocking on I/O” can be enforced more easily by prefetching data and dispatching requests in the order that data becomes available.
Architecture clients interact with any service “front-end” [all persistent state is in DDS and is consistent across cluster] Event Event Event Pull Pull WAN Service (Worker) Service (Worker) Service (Worker) DDS lib DDS lib DDS lib SAN “brick” is durable single-node B-Tree or HT plus RPC skeletons for network access storage “brick” storage “brick” storage “brick” storage “brick” storage “brick” storage “brick” example of a distributed DDS partition with 3 replicas in group service interacts with DDS via library [library is 2PC coordinator, handles partitioning, replication, etc., and exports B-Tree + HT API]
PSN Building Blocks e RPC (Work) e e e Post e e SINK Push Pull Subscribe/Unsubscribe TRANSDUCER CHANNEL e1 Channel: e e2 e1 e3 e FILTER e2 e1 e3 e e2 FILTER e3 FILTER Notes: 1) Reliability and delivery semantics embedded within events 2) Events strongly or weakly typed, locally sequenced 3) Channel controls accesses to posts; subscribers use capabilities to unsubscribe 4) As an example, a Channel might notify subscribers when free food is available. The Transducer receives mail events fromamail serviceand posts food events to the Channel.
Component Layers Application Distributed Btrees Single-Node Btrees Buffer Cache queued completions asynchronous I/O Core: “sinks and sources” queued requests TCP network file system storage VIA network raw disk storage The application layer makes “search” and “insert” requests to a btree instance. The btree determines what data blocks it needs and fetches them from the global buffer cache. If the cache does not have the needed blocks, it fetches them from the global I/O core, which is transparent to the btree instance.
Salient Features of the Implementation • Completely lock-free implementation: • Possible because of non-blocking FSMs • Isolation guaranteed by using queues • Invariant: if one request modifies a page ahead of another request, it continues to do so for any other pages that the two happen to access (the second writer always gets the updates of the first) • Sufficient for us because we guarantee A&D only for an action. What happens with transactions?(Delay other requests in queues?) • Useful property enforced easily with FSMs and queues: avoid blocking on I/O while holding a lock • Enforced because we can do explicit scheduling with queues: check if you have what you (mostly) need before you start the FSM, otherwise, pipeline-prefetch (by not blocking) what you may need before you start.
Plan for Success • Complete the Single Node B-tree implementation • Implement the Distributed B-tree implementation • Combine Prof. Franklin’s work with ours to leverage cluster properties • Implement a real-world application…
Conclusions • B-trees are good to have as part of DDS for efficiently supporting scalable Publish-Subscribe-Notify (Selective Dissemination of Information) applications • Non-blocking FSMs good for highly-concurrent scalable systems: FSMs give scalability, Non-blocking APIs give concurrency • The DDS API is a useful abstraction on which to easily build cluster applications: • As proof of concept, we have built a “Food in the Woz” application in less than a week • The “Food in the Woz” app makes use of “NinjaMail”, suggesting it is possible to develop reusable components using Ninja