200 likes | 278 Views
Management of Uncertainty in Publish/Subscribe Systems. Haifeng Liu. Department of Computer Sceince University of Toronto. AMGN=58. Publications. Publisher. Publisher. IBM=84. ORCL=12. JNJ=58. HON=24. INTC=19. MSFT=27. Subscriptions: IBM > 85 ORCL < 10 JNJ > 60. Notification.
E N D
Management of Uncertainty in Publish/Subscribe Systems Haifeng Liu Department of Computer Sceince University of Toronto
AMGN=58 Publications Publisher Publisher IBM=84 ORCL=12 JNJ=58 HON=24 INTC=19 MSFT=27 Subscriptions: IBM > 85 ORCL < 10 JNJ > 60 Notification Notification Subscriber Subscriber Publish/Subscribe Model Stock markets TSX NYSE NASDAQ Broker Network Subscriptions
Applications Enabled by Publish/Subscribe • Selective information dissemination • Information Filtering on the Internet • Location-based services • Workflow management • Intra-enterprise process automation • Logistics and supply chain management • Enterprise application integration • Network monitoring and (distributed) system management
Types of Uncertainties • Lack of information • Buy a cheap car • Imprecision • Sensor data: temperature 15~20ºC, • Location: location (x,y) location t+1(x’,y’) • Semantics • Synonyms: vehicle vs. automobile • Class taxonomy: CD player vs. electronics • Different expression: 5 years experience vs. graduated in 2001 Problem: manage uncertainties, imprecision and semantics in publish/subscribe system
Agenda • Distributed Publish/Subscribe Model and Content-based Routing • Uncertainties in Publish/Subscribe • Research Challenges • Approximate P/S Model • Graph-structured Model • Current Status • Research Plan
Publish/Subscribe Messages • Advertisement (ad) • publication patterns used by publishers to announce the set of publications they are going to publish • E.g. { (stock, any), (price, any) } • Subscription (sub) • User interest specification • E.g. (stock = “yahoo”) & ( price ≤ $35) • Publication (pub) • Information, data, event • E.g. { (stock, “yahoo”), (price, $32.79) }
Content-based Routing Advertising Advertisement Distributed Overlay Broker Network … … *Adopted from SIENA, Gryphon, REBECA and Hermes
Content-based Routing Subscribing Subscription Distributed Overlay Broker Network … … *Adopted from SIENA, Gryphon, REBECA and Hermes
Content-based Routing Publishing Publication Distributed Overlay Broker Network … … *Adopted from SIENA, Gryphon, REBECA and Hermes
Subscription Forwarding I Covering optimization S1: (car=Honda) & (price <= $30K) S2: (car=Honda) & (price <= $25K) S1 covers S2 P: {(car = Honda), (price,$20K)} s1 Distributed Overlay Broker Network … … S2 *Adopted from SIENA, Gryphon, REBECA and Hermes
Subscription Forwarding II Merging optimization S1: (car=Honda) & (price ≤ $30K) S2: (car=Toyota) & (price ≤ $25K) S’ : (car = any) & (price ≤ $30K) P: {(car = Honda), (price,$20K)} S1 Distributed Overlay Broker Network S’ … … S2 *Adopted from SIENA, Gryphon, REBECA and Hermes
Publish/Subscribe Router • Forwarding of advertisements • Via flooding • Forwarding of subscriptions • Forward along reverse ad path • Matching of ad and sub (Intersecting) • Optimizations • Covering/merging of subs • Forwarding of publications • Forward along reverse sub path • Matching of sub and pub
Uncertainties in Distributed Publish/Subscribe System • Messages • uncertain subscription • uncertain publication • Relations • Between sub and pub • Between sub and sub • Result • Return top K matches } representation: modeling Matching Covering Merging } computation: } aggregation: ranking
Research Challenges • Develop a publish/subscribe model to express uncertainties/semantics in publications and subscriptions • Model approximate matching and semantic matching • Model approximate covering/merging and semantic covering/merging • Scalability to large number of subscribers and high publishing rate
Approximate Matching Model • Model • Sub: fuzzy set • Pub: possibility distribution • Matching • Possibility measure • Necessity measure • Ranking • “min” or “product” for conjunction • “max” or “plus” for disjunction
PAPER17 AUTHOR CONFERENCE “Arno Jacobsen” YEAR LOCATION “California” “2001” SIGMOD Academic Publication Publication Jacobsen’sPublications Proceedings Report WWW VLDB PAPER17 Graph-structured Model • Model • Pub: directed graph • Sub: directed graph pattern • Semantic: ontology • Matching • Pattern graph maps to data graph if the topology (structure) of the two graphs matches and all variable constraints (literal and ontology) are satisfied • Ranking
Current Status • Work to date • Develop an approximate p/s model to express uncertainties and an efficient algorithm to do approximate matching • Develop a covering and merging optimizations for approximate content-based routing • Develop a graph-based p/s architecture applied to the dissemination of RDF metadata (including RSS) • Develop two novel algorithms (covering and merging) for creation of a distributed content-based routing network for graph-structured data.
Comments from Previous Meeting • Probability model • Qualitative similarity measure • Validate our results • Real data set • Interactive evaluation
Research Plan I • Membership Function Mining • Get a real data set • “Learn” the membership function • Clustering: K-means, DBscan • Regression: neural network • Semantic Matching and Routing Computation • Matching on ontology • Covering on ontology • Merging on ontology
Research Plan II • Design an experiment to validate the mining results • Design a method to combine possibility measure and necessity measure for ranking • Push thresholds down the matching plan to increase the efficiency of matching algorithm • Use probabilities as an alternative to model uncertainties and imprecision