1 / 28

Routing of XML and XPath Queries in Data Dissemination Networks

Routing of XML and XPath Queries in Data Dissemination Networks. Guoli Li, Shuang Hou Hans-Arno Jacobsen Middleware Systems Research Group University of Toronto. Agenda. Motivation Advertisement-based routing Covering Evaluation Conclusions. XML. XML. Motivation. Queries.

livana
Download Presentation

Routing of XML and XPath Queries in Data Dissemination Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Routing of XML and XPath Queries in Data Dissemination Networks Guoli Li, Shuang Hou Hans-Arno Jacobsen Middleware Systems Research Group University of Toronto ICDCS 2008 @ Beijing China

  2. Agenda • Motivation • Advertisement-based routing • Covering • Evaluation • Conclusions ICDCS 2008 @ Beijing China

  3. XML XML Motivation Queries • Data sources: publish XML data • Data users: register XPath queries • The data dissemination network: deliver matching results to a large and dynamically changing group of users Content-based Data Dissemination Results … … … … Results Queries ICDCS 2008 @ Beijing China

  4. Publisher Subscriber Subscriber Publish/Subscribe Advertisement (DTD) • Matching of XMLs and XPaths [ICDE’06] • Matching of Advertisements and XPaths • Exploring relations among XPaths Publication (XML) Subscription (XPath) ICDCS 2008 @ Beijing China

  5. Covering-based Routing 1 5 3 4 2 6 ICDCS 2008 @ Beijing China

  6. Language Model • Advertisement: generated from DTDs • Non-recursive advertisement • e.g., A = /t1/t2/t3…/tn-1/tn • Recursive advertisement • Simple A = A1(A2)+A3 • Series A = A1(A2)+A3(A4)+A5 • Embedded A = A1(A2(A3 )+ A4)+A5 <?xml encoding="UTF-8"?> <!ELEMENT personnel (person)+> <!ELEMENT person (name,email*,url*,link?)> <!ATTLIST person id ID #REQUIRED> <!ELEMENT name ((family,given)|(given,family))> <!ELEMENT family (#PCDATA)> <!ELEMENT given (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT url EMPTY> <!ATTLIST url href CDATA 'http://'> <!ELEMENT link EMPTY> <!ATTLIST link manager IDREF #IMPLIED> … … /personnel/person /personnel/person/name /personnel/person/name/family /personnel/person/name/given /personnel/person/email /personnel/person/url /personnel/person/link Advertisements DTD ICDCS 2008 @ Beijing China

  7. Language Model • Subscription: XPaths • Absolute • e.g., /c/d/*/e • Relative • e.g., c/d/*/e • Descendant operators • e.g., c//e/*/c c e d b * * e c a ICDCS 2008 @ Beijing China

  8. Advertisement-based Routing Broker Subscription (S) P(A) A1: /a/b/*/e A2: /b/e A3: /a/b/d A4: /a/b/e … … P(S) P(S) P(A) P(A) P(S) P(A) P(S) ICDCS 2008 @ Beijing China

  9. Overlapping Algorithms • Basic case: • Other cases: A = /a /b /c /* /b /c /* /b /e e.g, S = /a /b //c /* /b //e S = /a /b /c /* /b /e Next Table /a /b /c /* /b /c /* /b /e /a /b /c /* /b/c /* /b /e /a /b /c /* /b /c /* /b /e /a /b /c /* /b /e /a /b /c /* /b /e /a /b /c /* /b/e ICDCS 2008 @ Beijing China

  10. Subscription Tree • Subscriptions are maintained in a hierarchical tree • A child has more than one parent • Siblings may intersect • If a publication does not match a node, it does not match any of the descendants pointer ROOT /a /*/b /b d/a /a/c /a/*/d /a/b /b/e /b/d /a/c/d /a/b/d /b/e/c/f /b/d/a ICDCS 2008 @ Beijing China

  11. Tree Maintenance • Insert • Delete ICDCS 2008 @ Beijing China

  12. Similar to Adv-Sub overlapping algorithms Absolute simple XPEs Relative simple XPEs XPEs with // operator e.g., Covering Algorithms S1 = /* /a //e /c S2 = /a /a /* //c /e /c /d /e /c /* /a /a /a /*//c /e /c /d /a /a /* // c /e /c /d ICDCS 2008 @ Beijing China

  13. P(S) P(S1) P(S2) Merging Rules • Rules • XPEs with one difference (e.g., element, op) • e.g., S1= /a/*/c/d S2 = /a/*/c/e S = /a/*/c/* • XPEs with different sub-XPEs • e.g., S1 … … XPE1 … … S // … … … … S2 … … XPE2 … … • Merge degree ICDCS 2008 @ Beijing China

  14. Evaluation • Setup • Implemented in C++ • Overlay with 127 content-based routers • Cluster (each node:1.86GHz, 4G) vs. PlanetLab • Workloads are generated from two DTDs: NITF and PSD • Metrics • Number of subscriptions per router • Network traffic • XPE processing time • Notification delay ICDCS 2008 @ Beijing China

  15. Routing Table Size ICDCS 2008 @ Beijing China

  16. Routing Table Size ICDCS 2008 @ Beijing China

  17. Network Traffic ICDCS 2008 @ Beijing China

  18. Process Time ICDCS 2008 @ Beijing China

  19. Notification Delay (PSD) ICDCS 2008 @ Beijing China

  20. Notification Delay (NITF) ICDCS 2008 @ Beijing China

  21. Related Work • Locating data sources in large distributed systems [Galanis et al. 2003] • DHT based approach • Data summary • Query aggregation for scalable data dissemination [Chan et al. 2002] • Equivalence between the original query set and the aggregated set • ONYX [Diao et al. 2004] • Deliver part of the XML documents • Share common prefixes among queries using NFA • XTreeNet [Fenner et al. 2005] • Unify the pub/sub model and the query/response model • Avoid repeatedly matching at each hop ICDCS 2008 @ Beijing China

  22. Conclusions • Investigate advertisement-based routing for XML data dissemination networks • Propose a novel data structure to maintain covering & merging relationships among XPEs. • Perform experimental evaluation on a 127 broker overlay to demonstrate the approach • Reduce routing table by up to 90% • Improve routing latency by roughly 85% • Future work • Extend to tree patterns • Share common prefixes among XPEs in overlapping and covering algorithms ICDCS 2008 @ Beijing China

  23. Q & A Thank You! • Contact • gli@cs.toronto.edu • jacobsen@eecg.toronto.edu • Middleware systems research group, University of Toronto • www.msrg.eecg.toronto.edu ICDCS 2008 @ Beijing China

  24. 140 120 100 Time (ms) 80 60 40 20 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Number of Subscriptions Process Time ICDCS 2008 @ Beijing China

  25. Notification Delay (NITF) ICDCS 2008 @ Beijing China

  26. 16 12 Notification Delay (ms) 8 4 0 2 3 4 5 6 Number of Hops Notification Delay (PSD) ICDCS 2008 @ Beijing China

  27. False Positives ICDCS 2008 @ Beijing China

  28. Conclusions • Investigate advertisement-based routing for XML data dissemination networks • Present algorithms to determine the covering relations among arbitrary XPEs • Propose a novel data structure to maintain covering & merging relationships among XPEs. • Explore rules to merge similar XPEs in order to further reduce the routing table size • Perform experimental evaluation on a 127 broker overlay to demonstrate the approach • Reduce routing table by up to 90% • Improve routing latency by roughly 85% ICDCS 2008 @ Beijing China

More Related