120 likes | 221 Views
CMS-ToPSS: Efficient Dissemination of RSS Documents. Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto. Information Dissemination. Easy to use web publishing tools (blog, wiki) are fueling the increase in the number of web publishers
E N D
CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto VLDB2005
Information Dissemination • Easy to use web publishing tools (blog, wiki) are fueling the increase in the number of web publishers • RSS frequently used to disseminate update to interested users • CNN.com, Yahoo! News, Amazon.com, MSN search (beta) Problem:Polling based architecture RSSreaders RSSpublishers RSSaggregator VLDB05
Solution! Current rss dissemination architecture G-ToPSS rss dissemination architecture VLDB05
MatchingRSS feeds MatchingRSS feeds Interaction Model: Publish/Subscribe Publisher Publisher RSS feeds Broker Queries over all RSS Subscriber Subscriber VLDB05
Research challenges • Need a subscription (query) language suitable for filtering of rss documents • Need an efficient matching algorithm based on graph representation • Structurally matching • Constraint matching • Scalability to a large number of subscriptions and high publishing rate VLDB05
Subscription Scalability VLDB05
Memory Scalability VLDB05
Matching Semantics PAPER17 Publication ?y(?y <= Publication) AUTHOR CONFERENCE AUTHOR CONFERENCE “Arno Jacobsen” SIGMOD SIGMOD “Arno Jacobsen” YEAR “2001” YEAR LOCATION “California” ?z(?z > 2000) Subscription VLDB05
Data Model (RSS Documents) • Publications are represented as directed graphs with node and edge labels • Node labels are typed • Literal value • Class • Edge labels are typed • Class • Classes can be related using multiple inheritance ontology VLDB05
Query Language (GQL) • Queries are represented as directed graph patternswith node and edge labels • Node labels are variables • Variables can be constrained by • Classes • Class instances and literal values • Edge labels are class instances • Mapping (matching) semantics • Pattern graph maps to data graph if the topology (structure) of the two graphs matches and all variable constraints are satisfied VLDB05
Conclusion and Future Work • Proposed a prototype for graph-based metadata filtering • G-ToPSS supports high matching rate for an expressive subscription language • Extend G-ToPSS with full RDF language features • Optimize constraint processing during matching VLDB05