410 likes | 528 Views
Content-Based Publish-Subscribe Over Structured P2P Networks. Peter Triantafillou and Ioannis Aekaterinidis Research Academic Computer Technology Institute and Department of Computer Engineering and Informatics, University of Patras , Greece. Agenda. Introduction/Goal
E N D
Content-Based Publish-Subscribe Over Structured P2P Networks Peter Triantafillou and IoannisAekaterinidis Research Academic Computer Technology Institute and Department of Computer Engineering and Informatics, University of Patras, Greece
Agenda • Introduction/Goal • Publish-Subscribe Systems • Publish-Subscribe over Chord • Processing Subscriptions • Processing Events • Improving Performance • Conclusion
Introduction • Publish Subscribe systems are becoming very popular for building large scale distributed systems and applications • Anonymity between publisher and subscriber • Centralized: • Adv: Global image of system making matching algorithm easy to implement • Dis: Scalability • Decentralized: • Adv: Scalability • Challenge: development of efficient distributed matching algorithm
Goal • Chose to use Chord because: • Simplicity • Popularity • Scalable • Self-Organizing • Well Performing • Challenge: to develop a strategy for using DHTs to provide good support for range predicates • Which are popular when specifying subscription attributes
Agenda • Introduction/Goal • Publish-Subscribe Systems • Publish-Subscribe over Chord • Processing Subscriptions • Processing Events • Improving Performance • Conclusion
Publish-Subscribe Systems • Asynchronous messaging paradigm • Senders (publishers) of messages are not programmed to send their messages to specific receivers (subscribers) • Published messages are characterized into classes (without knowledge of what subscribers there may be) • Subscribers express interest in one or more classes and only receive messages that are of interest (without knowledge of what publishers there are) • This Decoupling of publishers and subscribers allows for greater scalability and a more dynamic network topology
Pub-Sub Message Filtering • Subscribers receive only a subset of the total messages published • 2 Main Types • Topic Based • Content Based • Hybrid • Coupling of topic and content based systems
Topic Based Pub/Sub Systems • Much like newsgroups • Users join a group (topic) • All messages related to that topic are broadcasted to all users participating in the specific group
Content Based Pub/Sub Systems • Preferable • Give users the ability to express their interest by specifying predicates over the values of a number of well defined attributes • Matching of publications (events) to subscriptions (interest) is done based on the content (values of attributes)
Hybrid System • Publishers post messages to a topic while subscribers register content-based subscriptions to one or more topics • Publications and subscriptions are automatically classified in topics (using an application-specific schema) • Drawbacks: • Design of the domain schema plays fundamental role in the system’s performance • Likely many false positives may occur.
Agenda • Introduction/Goal • Publish-Subscribe Systems • Publish-Subscribe over Chord • Processing Subscriptions • Processing Events • Improving Performance • Conclusion
Event Schema • Set of typed attributes • Each attribute ai consists of: • Type – belong to predefined set of primitive data types • Name - string • Value v(ai) • Any range defined by the minimum and maximum values (vmin(ai), vmax(ai)) along with the attribute’s precision Vpr(ai)
Subscription Schema • Contains all interesting subscription-attribute data types (integers, strings, etc.) and all common operators (=, ≠, <, >, etc.) • Event matches subscription iff all the subscription’s attribute predicates/constraints are satisfied • Can have two or more constraints for the same attribute
Subscription Identifier • Concatenation of 3 parts: • C1: id of the node receiving the subscription • Size: m bits in a Chord ring with an m-bit address space • C2: id of the subscription itself • Size: bits equal to the rounded-up base-2 logarithm of the maximum number of outstanding subscriptions a node can have • C3: number of attributes on which constraints are declared • Size: max value = total number of attributes supported by the system
Subscription ID Example • Assume Chord ring with a 3-bit identifier address space • Each node can support 8 outstanding subscriptions with an attribute schema including 7 attributes • Depicts subscription 3 (C2=3), belonging to node 4 (C1=4), comprised of constraints on 5 attributes (C3=5)
Storing Subscriptions • Done using the hash function provided by Chord (SHA-1) • Returns an identifier uniformly distributed in the address space • k=h(v(ai)) • Following the Chord API, the subID is placed at node: successor(k)
Storing Subscriptions • Procedure:
Storing Example • Attributes are processed one at a time • Subscription ID is stored at: • Successor(h(“NYSE”)), in the list dedicated for attribute Exchange • Successor(h(“OTE”)), in the list dedicated for attribute Symbol • Since the Price attribute is over a range of 8.30<Price<8.70 with a precision of .01 the subscription ID is stored at: • Successor(h(Price)); for the values 8.31,8.32, …, 8.69.
Updating Subscriptions • Updating attributes of a subscription with equality only 2 nodes are affected: • Delete the Subscription ID from: • nodeID = successor(h(vstale_value(ai))) • Add the subscription ID to node: (appropriate list) • nodeID = successor(h(vupdated_value(ai)))
Updating Subscriptions • The procedure for updating a range value depends on the new values of the range bounds (vlow_NEW(ai) and vhigh_NEW(ai)) compared to the old values • If vlow_NEW(ai) < vlow(ai) store the subID to the nodes that cover [vlow_NEW(ai), vlow(ai)) range • If vhigh_NEW(ai) > vhigh(ai) store the subID to the nodes that cover (vhigh(ai), vhigh_NEW(ai)] range
Updating Subscriptions • If vlow_NEW(ai) > vlow(ai) delete the subID from the nodes that cover [vlow(ai), vlow_NEW(ai)) range. • If vhigh_NEW(ai) < vhigh(ai) delete the subID from the nodes that cover (vhigh_NEW(ai), vhigh(ai)] range
Matching Events with Subscriptions Example • Suppose we have Subscriptions 1 and 2 generated by two clients connected to a Chord node and Event 1 • First, the algorithm will collect all the subIDs lists in which the values of the event attributes satisfy the corresponding constrains of the subscriptions
Matching Events with Subscriptions Example (continued) • The algorithm starts with attribute Exchange = “NYSE” and retrieves the subID list (LExchange) from node successor(h(“NYSE”)) • This list contains only the subID1 • LExchange -> subID1
Matching Events with Subscriptions Example (continued) • Next attribute Symbol = “OTE”; subID list (LSymbol) from node successor(h(“OTE”)) is retrieved • LSymbol -> subID1, subID2 • Since both subscriptions are satisfied for the event
Matching Events with Subscriptions Example (continued) • Next attribute Price = 8.40; subID list (LPrice) from node successor(h(8.40)) is retrieved • LPrice -> subID1 • Since only subscription 1 has a price that falls within this range.
Matching Events with Subscriptions Example (continued) • Lastly attribute Low = 8.22; subID list (LLow) from node successor(h(8.22)) is retrieved • LLow -> subID2 • Since only subscription 2 has an attribute Low
Matching Events with Subscriptions Example (continued) • After this phase of the matching process the collected subscription ID lists are: • LExchange -> subID1 • LSymbol -> subID1, subID2 • LPrice -> subID1 • LLow -> subID2 • Subscription1 was found in 3 lists while subscription2 was found in 2 • By processing the subIDs of the subscriptions (c3 part) we can find out that both subscriptions have constraints over 3 attributes.
Matching Events with Subscriptions Example (continued) • Since subscription 1 was found in 3 lists, a match is implied and it’s subID is kept in order to inform the node which generated the subscription about the matched event. • While holding metadata info for subID1 in order to locate the IP address of the client that generated the subscription • The node storing the subscription is contacted (using nodeID equal to c1 field of the subID1) and the event is delivered to the interested client
Expected Performance • Subscription Storage Procedure: • Average number of hops needed to store a subID depends on the type of constraints over the attributes • Equality: ½ log(N) • subID is stored in a single node • Range Constraint: Nodes affected which leads to r*1/2 log(N) hops on average to store the subID
Expected Performance • Update/Deletion of Subscription • Again, depends on the type of constraints over the attributes • Equality: update performed by contacting Log(N) nodes • Ranges: number of nodes is k*log(N) on average • K depends on whether the new range is smaller or wider than the old range
Expected Performance • Event-Processing (matching) • Involves contacting nodes to collect the subscription id lists • Reminder: a Chord network with N nodes and a 2m-bit address space, the average number of nodes that must be contacted to find a successor is: • ½ log(N) hops • By design, this proposal leads to fast and scalable event matching.
Agenda • Introduction/Goal • Publish-Subscribe Systems • Publish-Subscribe over Chord • Processing Subscriptions • Processing Events • Improving Performance • Conclusion
Improving Performance • Currently storing a subscription over the Chord ring takes r* ½ * log(N) hops on average for every attribute • r depends on precision, high, and low values • Using an order preserving hash function we can optimize to r+ ½ *log(N) hops
Order Preserving Chord • Using a 2m- order preserving hash function • Expected performance: • ½ log(N) hops to locate node storing minimum value of the range (vlow(ai)) • Then, perform r hops to store remaining values in the range to lead to r+ ½ log(N) total hops
Order Preserving Hash Function • Suppose every attribute is characterized by • vmin(ai): minimum value ai can take • vmax(ai): maximum value ai can take • vpr(ai): precision of ai • vj(ai) is any value in [vlow(ai), vhigh(ai)] • OPHF is:
Subscription and Event Processing with OPHF • Example: Storing Subscription • Consider Chord ring with 3-bit ids and 8 nodes • Subscription of a single integer attribute a arriving at node 3 with constraint 0<v(a)<4 • Using Chord requires O(r*log(N)) hops to store the subID at three nodes
Subscription and Event Processing with OPHF • Using the OPHF with Chord: • Perform O(log(N)) hops only once to reach the first node (node 6) • Storing the subID at nodes 7 and 0 requires 2 more hops
Agenda • Introduction/Goal • Publish-Subscribe Systems • Publish-Subscribe over Chord • Processing Subscriptions • Processing Events • Improving Performance • Conclusion
Conclusion • Not Addressed • Load Balancing • Small Domain Problem • Able to support equality and range attributes while leveraging Chord to build a scalable, self-organizing, well performing content based publish-subscribe system.
Sources Cited • http://en.wikipedia.org/wiki/Publish/subscribe