260 likes | 423 Views
Definition. Content-based matching: publish-subscribe system with attributes associated with each eventExample:stock events with attributes:issuerpricevolume. Background. Mathematical model for content-based pub/sub system: paper by M.K. Aguilera et. al. ?Matching events in a content-based subs
E N D
1. Analysis and Algorithms for Content-based Event Matching Satyen Kale et. al.
Princeton University
2. Definition Content-based matching: publish-subscribe system with attributes associated with each event
Example:
stock events with attributes:
issuer
price
volume
3. Background Mathematical model for content-based pub/sub system: paper by M.K. Aguilera et. al. Matching events in a content-based subscription system
Given a publish/subscribe system using content-based matching, we have the following questions:
How to match subscribers and events without much a priori information about the events?
How about the performance of the algorithm?
The proof and correctness for the proposed algorithm?
Which algorithm?
4. Previous Algorithms Two popular matching algorithms
Counting-based algorithm
This algorithm maintains a counter for each subscription that records the number of its predicates satisfied by the current event
Tree-based algorithm
This algorithm organizes subscriptions into a rooted search tree
We mainly discuss tree-based algorithm which is a foundation of this work
5. Primitive Algorithm Before Tree Given:
attributes A = {a1,a2,,ak}
event e
subscription sub
The primitive algorithm will try to match the event, hence sub(e) = true by testing all attributes
Time complexity for N subscriptions: O(kN)
Space complexity for N subscriptions: O(kN)
6. Tree Based Algorithm This algorithm allows pre-processing of the subscriptions before matching an event
pre_process(Sub) takes the set of subscriptions Sub and outputs an internal representation of the subscription
match(pre_processed_data,event) takes the internal representation and an event, and outputs subscriptions that match the event
It aims at reducing time complexity to be sub-linear while keeping the space complexity linear
If subscriptions consists of equality test
Time complexity is reduced into
Lambda depends on the number and type of attributes, 0<lambda<1
7. Pre-Processing Algorithm This algorithm is used to build the matching tree that will be used in the subsequent event-matching process
Components
subscription Sub
sets of path T (v,r,v)
V: an arbitrary node
V: an adjacent node
r: an edge connecting v to v
10. Pre-Processing Algorithm for Equality Test This algorithm is used if subscription is determined only by equality tests of the attributes
In pseudo code:
11. Pre-Processing Algorithm for Equality Test
12. Tree Matching Algorithm Each subscription is a conjunction of elementary predicates
An elementary predicate represents one possible result of an elementary test
An elementary test is a simple operation on one or more attributes of an event e
13. Tree Matching Algorithm Example of tree matching
14. Tree Matching Algorithm Example of tree matching with *-edge
*-edge: subscriptions through the edge do not care about the result of a test
15. General Matching Algorithm Matching algorithm in pseudocode
16. Matching Equality Tests This test is specialized to match event with only subscription attributes value
Each node at the same depth in the tree represents an attribute i
17. Matching Equality Tests Measuring the complexity:
Pre-processing time complexity: O(NK)3, K is a constant -> linear
Space complexity: O(NK), K is a constant -> linear
Matching time complexity: O(KN1-lambda) -> sub linear
19. Referred Publish/Subscribe Model There exists a publisher with various topics/events
Subscribers can be bound to 3 different states:
1 indicates interest
0 indicates disinterest
* indicates dont care
The three states is inflicted on each predicate (was named attribute in previous section) of subscription, that is the content of event a subscriber is interested in
20. Matching problem Given a set of subscription
over k predicates p1,p2,,pk and event
The matching problem is to identify all subscription in S that e matches
21. RAPID Match Key concept:
In real-world pub/sub applications, many events only have a few relevant properties
Simply speaking, the number of 0s outweighs 1s in the properties of an occurring event
Definition:
c-light event: an event which has at most c 1s in it
exactly c-light event: an event which has exactly c 1s in it
This algorithm exploits the nature that most events have c<< k (k = total number of attributes in a subscription) -> Zipf distribution
22. RAPID Match Light Query Model:
For parameters (c, alpha)
For all t < c, the probability that an event is exactly-t-light event is proportional to t-alpha for some constants alpha>=1. For any t, all exactly-t-light events are equally likely to appear
Partitioning the content-space
k properties is partitioned into t+1 block with k/(t+1) width
23. RAPID Match Partitioning example
for an exactly-5-light event (t=5)
24. RAPIDMatch Data Structure
25. RAPID Match Rapid Match matching algorithm
26. Implementation Comparing tree-based vs RAPID Match
Analyzing the output for arbitrarily created events
Improvements?