280 likes | 366 Views
Routing of XML and XPath Queries in Data Dissemination Networks. Guoli Li, Shuang Hou Hans-Arno Jacobsen Middleware Systems Research Group University of Toronto. Agenda. Motivation Advertisement-based routing Covering Evaluation Conclusions. XML. XML. Motivation. Queries.
E N D
Routing of XML and XPath Queries in Data Dissemination Networks Guoli Li, Shuang Hou Hans-Arno Jacobsen Middleware Systems Research Group University of Toronto ICDCS 2008 @ Beijing China
Agenda • Motivation • Advertisement-based routing • Covering • Evaluation • Conclusions ICDCS 2008 @ Beijing China
XML XML Motivation Queries • Data sources: publish XML data • Data users: register XPath queries • The data dissemination network: deliver matching results to a large and dynamically changing group of users Content-based Data Dissemination Results … … … … Results Queries ICDCS 2008 @ Beijing China
Publisher Subscriber Subscriber Publish/Subscribe Advertisement (DTD) • Matching of XMLs and XPaths [ICDE’06] • Matching of Advertisements and XPaths • Exploring relations among XPaths Publication (XML) Subscription (XPath) ICDCS 2008 @ Beijing China
Covering-based Routing 1 5 3 4 2 6 ICDCS 2008 @ Beijing China
Language Model • Advertisement: generated from DTDs • Non-recursive advertisement • e.g., A = /t1/t2/t3…/tn-1/tn • Recursive advertisement • Simple A = A1(A2)+A3 • Series A = A1(A2)+A3(A4)+A5 • Embedded A = A1(A2(A3 )+ A4)+A5 <?xml encoding="UTF-8"?> <!ELEMENT personnel (person)+> <!ELEMENT person (name,email*,url*,link?)> <!ATTLIST person id ID #REQUIRED> <!ELEMENT name ((family,given)|(given,family))> <!ELEMENT family (#PCDATA)> <!ELEMENT given (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT url EMPTY> <!ATTLIST url href CDATA 'http://'> <!ELEMENT link EMPTY> <!ATTLIST link manager IDREF #IMPLIED> … … /personnel/person /personnel/person/name /personnel/person/name/family /personnel/person/name/given /personnel/person/email /personnel/person/url /personnel/person/link Advertisements DTD ICDCS 2008 @ Beijing China
Language Model • Subscription: XPaths • Absolute • e.g., /c/d/*/e • Relative • e.g., c/d/*/e • Descendant operators • e.g., c//e/*/c c e d b * * e c a ICDCS 2008 @ Beijing China
Advertisement-based Routing Broker Subscription (S) P(A) A1: /a/b/*/e A2: /b/e A3: /a/b/d A4: /a/b/e … … P(S) P(S) P(A) P(A) P(S) P(A) P(S) ICDCS 2008 @ Beijing China
Overlapping Algorithms • Basic case: • Other cases: A = /a /b /c /* /b /c /* /b /e e.g, S = /a /b //c /* /b //e S = /a /b /c /* /b /e Next Table /a /b /c /* /b /c /* /b /e /a /b /c /* /b/c /* /b /e /a /b /c /* /b /c /* /b /e /a /b /c /* /b /e /a /b /c /* /b /e /a /b /c /* /b/e ICDCS 2008 @ Beijing China
Subscription Tree • Subscriptions are maintained in a hierarchical tree • A child has more than one parent • Siblings may intersect • If a publication does not match a node, it does not match any of the descendants pointer ROOT /a /*/b /b d/a /a/c /a/*/d /a/b /b/e /b/d /a/c/d /a/b/d /b/e/c/f /b/d/a ICDCS 2008 @ Beijing China
Tree Maintenance • Insert • Delete ICDCS 2008 @ Beijing China
Similar to Adv-Sub overlapping algorithms Absolute simple XPEs Relative simple XPEs XPEs with // operator e.g., Covering Algorithms S1 = /* /a //e /c S2 = /a /a /* //c /e /c /d /e /c /* /a /a /a /*//c /e /c /d /a /a /* // c /e /c /d ICDCS 2008 @ Beijing China
P(S) P(S1) P(S2) Merging Rules • Rules • XPEs with one difference (e.g., element, op) • e.g., S1= /a/*/c/d S2 = /a/*/c/e S = /a/*/c/* • XPEs with different sub-XPEs • e.g., S1 … … XPE1 … … S // … … … … S2 … … XPE2 … … • Merge degree ICDCS 2008 @ Beijing China
Evaluation • Setup • Implemented in C++ • Overlay with 127 content-based routers • Cluster (each node:1.86GHz, 4G) vs. PlanetLab • Workloads are generated from two DTDs: NITF and PSD • Metrics • Number of subscriptions per router • Network traffic • XPE processing time • Notification delay ICDCS 2008 @ Beijing China
Routing Table Size ICDCS 2008 @ Beijing China
Routing Table Size ICDCS 2008 @ Beijing China
Network Traffic ICDCS 2008 @ Beijing China
Process Time ICDCS 2008 @ Beijing China
Notification Delay (PSD) ICDCS 2008 @ Beijing China
Notification Delay (NITF) ICDCS 2008 @ Beijing China
Related Work • Locating data sources in large distributed systems [Galanis et al. 2003] • DHT based approach • Data summary • Query aggregation for scalable data dissemination [Chan et al. 2002] • Equivalence between the original query set and the aggregated set • ONYX [Diao et al. 2004] • Deliver part of the XML documents • Share common prefixes among queries using NFA • XTreeNet [Fenner et al. 2005] • Unify the pub/sub model and the query/response model • Avoid repeatedly matching at each hop ICDCS 2008 @ Beijing China
Conclusions • Investigate advertisement-based routing for XML data dissemination networks • Propose a novel data structure to maintain covering & merging relationships among XPEs. • Perform experimental evaluation on a 127 broker overlay to demonstrate the approach • Reduce routing table by up to 90% • Improve routing latency by roughly 85% • Future work • Extend to tree patterns • Share common prefixes among XPEs in overlapping and covering algorithms ICDCS 2008 @ Beijing China
Q & A Thank You! • Contact • gli@cs.toronto.edu • jacobsen@eecg.toronto.edu • Middleware systems research group, University of Toronto • www.msrg.eecg.toronto.edu ICDCS 2008 @ Beijing China
140 120 100 Time (ms) 80 60 40 20 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Number of Subscriptions Process Time ICDCS 2008 @ Beijing China
Notification Delay (NITF) ICDCS 2008 @ Beijing China
16 12 Notification Delay (ms) 8 4 0 2 3 4 5 6 Number of Hops Notification Delay (PSD) ICDCS 2008 @ Beijing China
False Positives ICDCS 2008 @ Beijing China
Conclusions • Investigate advertisement-based routing for XML data dissemination networks • Present algorithms to determine the covering relations among arbitrary XPEs • Propose a novel data structure to maintain covering & merging relationships among XPEs. • Explore rules to merge similar XPEs in order to further reduce the routing table size • Perform experimental evaluation on a 127 broker overlay to demonstrate the approach • Reduce routing table by up to 90% • Improve routing latency by roughly 85% ICDCS 2008 @ Beijing China