380 likes | 390 Views
This publication discusses the challenges of XML dissemination and proposes an automata-based approach for efficient data dissemination on structured overlay networks.
E N D
XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki Zoi Kaoudi Manolis Koubarakis Department of Informatics and Telecommunications National and Kapodistrian University of Athens
Outline • XML Dissemination scenario • Problems • Background: DHTs • Our approach • Experiments • Future work
Publication monitoring News monitoring XML XML XML XML XML XML Dissemination scenario Centralized Distributed Publisher Subscriber XPath/XQuery ? XML Dissemination system Parallel/Hierarchical XTrie Index-Filter ONYX YFilter XPath/XQuery ? Snoeren [SOSP 2001] XTrie FiST Subscriber XPush Publisher Gong et al. [ICDE05] Publisher
XML XML XML XML XML XML Dissemination: Broker-based architecture • Mesh or tree-based overlays Publisher Subscriber XPath/XQuery ? XPath/XQuery ? Subscriber Publisher Publisher
Problems • Load imbalances
XML XML XML XML XML XML Dissemination: Broker-based architecture • Systems like ONYX and work of Gong et al. [ICDE05] • Mesh or tree-based overlays Publisher Subscriber XPath/XQuery ? XPath/XQuery ? Subscriber Publisher Publisher
Problems • Load imbalances • Centralized control Single point of failure and bottleneck
XML XML XML XML XML XML Dissemination: Broker-based architecture • Systems like ONYX and work of Gong et al. [ICDE05] • Mesh or tree-based overlays Publisher Subscriber XPath/XQuery ? XPath/XQuery ? Subscriber Publisher Publisher
Problems • Load imbalances • Centralized control Single point of failure and bottleneck • Scalability (size of routing tables)
XML XML XML XML XML XML Dissemination: Broker-based architecture • Systems like ONYX and work of Gong et al. [ICDE05] • Mesh or tree-based overlays Publisher Subscriber XPath/XQuery ? XPath/XQuery ? Subscriber Publisher Publisher
Background: DHTs • Structured overlay networks • Solve the item location problem in a distributed and dynamic network of nodes (in O(log N) hops): • Let x be some data item. Find x! • Distributed version of hash tabledata structure • id=Hash(K) • Main operations: • Put: given a key (for a data item), map the key onto a node. • Get: Find the location of a data item with a given a key. • Successor peer → responsible peer
XML XML XML XML XML XML Dissemination revisited:Structured overlay network architecture Publisher Subscriber XPath/XQuery ? XPath/XQuery ? Subscriber Publisher Publisher
Problems revisited • Load imbalances • Centralized control Single point of failure and bottleneck • Scalability (size of routing tables)
Automata-based approaches • XFilter and YFilter, ONYX, XTrie, IndexFilter, FiST etc. • Main idea • Construct an automaton from a set of XPath/Xquery queries • Use it as a matching engine against the XML documents
dblp YFilter – NFA Construction Q1: /dblp/phdthesis/year = ‘2008’ Q1 year 3 2 phdthesis 1 0
Q1 Q2 dblp phdthesis year school 3 5 YFilter – NFA Construction Q1: /dblp/phdthesis/year = ‘2008’ Q2: /dblp/proceedings/school = ‘Univ. of Athens’ 2 proceedings 1 4 0
Q2 Q1 dblp phdthesis school year 3 5 title Q3 6 YFilter – NFA Construction Q1: /dblp/phdthesis/year = ‘2008’ Q2: /dblp/proceedings/school = ‘Univ. of Athens’ Q3: /dblp/proceedings/title = ‘XML Dissemination’ 2 proceedings 1 4 0
Q4 Q2 Q1 dblp phdthesis author school year 5 8 3 title Q3 6 YFilter – NFA Construction Q1: /dblp/phdthesis/year = ‘2008’ Q2: /dblp/proceedings/school = ‘Univ. of Athens’ Q3: /dblp/proceedings/title = ‘XML Dissemination’ Q4: /dblp/*/author = ‘John Doe’ 2 proceedings 1 4 * 0 7
Q4 Q2 Q1 dblp phdthesis year author school 3 8 5 title Q3 6 Q5 cite * 11 10 YFilter – NFA Construction Q1: /dblp/phdthesis/year = ‘2008’ Q2: /dblp/proceedings/school = ‘Univ. of Athens’ Q3: /dblp/proceedings/title = ‘XML Dissemination’ Q4: /dblp/*/author = ‘John Doe’ 2 Q5: //*/cite = [12743] proceedings 1 4 * 0 7 * ε 9
Q4 Q2 Q1 dblp phdthesis year author school 3 8 5 title Q3 6 Q5 cite * 11 10 YFilter – NFA Construction Q1: /dblp/phdthesis/year = ‘2008’ Q2: /dblp/proceedings/school = ‘Univ. of Athens’ Q3: /dblp/proceedings/title = ‘XML Dissemination’ Q4: /dblp/*/author = ‘John Doe’ 2 Q5: //*/cite = [12743] proceedings 1 4 * 0 7 * ε 9
Main idea • Utilize a distributed version of a state-of-the-art approach YFilter • Instead of a centralized NFA • Distribute the NFA in the DHT
Distributing the NFA on top of DHT P1 P2 P10 P9 P3 P8 P4 P7 P5 P6
Distributing the NFA on top of DHT P1 P2 P10 P9 P3 P8 P4 P7 P5 P6
Distributing the NFA on top of DHT ℓ=0 ℓ=1 P1 P2 P10 P9 P3 P8 P4 P7 P5 P6
Distributing the NFA on top of DHT P1 P10 P2 P9 P3 P8 P4 P7 P5 P6
dblp YFilter - NFA Execution These paths can be executed in parallel! Incoming XML document Start of document <dblp> <proceedings> <school> Univ. of Athens </school> <title > XML and DHTs </title> </proceedings> </dblp> proceedings school * End of document title ε 5 9 10 6 9 10 * 4 7 9 10 Runtime stack * 1 9 10 0
Distributed NFA execution – Iterative Incoming XML document <dblp> <proceedings> <school> Univ. of Athens </school> <title > XML and DHTs </title> </proceedings> </dblp> Start of document 6 9 10 5 9 10 Publisher 4 7 9 10 1 9 10 P1 P2 0 P10 P9 P3 End of document P8 Publisher becomes overloaded! P4 P7 P5 P6
4 4 1 1 0 0 Distributed NFA execution - Recursive Incoming XML document Start of document <dblp> <proceedings> <school> Univ. of Athens </school> <title > XML and DHTs </title> </proceedings> </dblp> Publisher 10 1 9 10 9 10 P1 P2 0 P10 0 P9 P3 End of document 9 10 P8 9 P4 0 6 P7 P5 4 7 P6 5 4 7 1 1 4 1 4 0 0 1 1 0 0 0
Experimental evaluation • Chord simulator • 2 different document workloads • Aggregated • Including DBLP, NITF, ebXML, Auction (XMark) • NITF • 2 kinds of query sets • Random • Distinct
Metrics • Network traffic • total number of messages • Latency • longest chain of hops • Filtering load • number of messages received during execution
Load balancing • Virtual peers • Originally proposed in Chord • Mapping of multiple virtual peers to each real peer • Load-shedding • Replicate on demand
Conclusions • DHT-based protocols overcoming weaknesses of broker-based architectures • Utilize a distributed YFilter engine • Exploit inherent parallelism of an automaton • Experimental evaluation
Future Work • Implementation and experimenting on an Internet-scale testbed like PlanetLab • More sophisticated methods for predicate evaluation
Thank you for your attention Questions?