1 / 22

Towards an Internet-Scale XML Dissemination Service

This paper discusses the design and implementation of an Internet-scale XML dissemination service, focusing on core techniques and applications in news feeds, mobile services, stock tickers, and network monitoring. It also explores the use of YFilter for efficient XML filtering and transformation.

thanhm
Download Presentation

Towards an Internet-Scale XML Dissemination Service

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards an Internet-Scale XML Dissemination Service Yanlei Diao Shariq Rizvi Michael J. Franklin EECS, U.C. Berkeley

  2. Outline • XML dissemination services • System model • Core techniques • Status and conclusions

  3. Applications of XML Dissemination • News feeds via RSS (Really Simple Syndication) • My Yahoo!: updated headlines from BBC, CNet, NPR. • Mobile services • Mobile operators: connect content providers with millions of clients running a multitude of operating systems. • Stock tickers • QuoteMedia: fast access to real-time and historical stock data. • Online auctions: • freebidingtools.com: create your own feed for your favorite eBay search. • Network monitoring: • Ganglia: a distributed monitoring system for clusters and grids.

  4. XML streams user queries Data Source Data Source Data Source query results YFilter: An XML Dissemination Service YFilter • User queries: Specification of data interests, written in an XML query language. • Data sources: Continuously publish XML data items. • The service: Delivers to each user the XML data items that match her data interests; the delivered results are presented in a customized format.

  5. Broker Broker Broker Broker Broker U1 U2 U4 U3 Data Source U5 Data Source Data Source Broker ONYX: Large-Scale XML Dissemination ONYX • Operator Network using YFilter for XML Dissemination YFilter • An overlay network of information brokers running YFilter. • Underlying infrastructures: • A dedicated network • Peer-to-peer • Collaboration among administrative domains

  6. Design Space: Expressiveness Expressiveness: data model + query language a service supports • Subject-based • Messages: a subject label • Queries: a specific label or a wildcard • Predicate-based • Messages: attribute-value pairs • Queries: a set of predicates • XML filtering • Messages: XML • Queries: subset of XPath 1.0 • XML filtering and transformation • Messages: XML • Queries: subset of XQuery

  7. Design Space: Why Distributed Processing? • Privacy • Regulations: e.g., CA Senate Bill No. 1386. • Policies: e.g., customers’ data stay behind the firewall. • Locality of data interests • Disseminate regional data directly to local subscribers. • Scalability • Data volume:number of messages per second up to thousands, message size from 1 KB to 20 KB. • Query population: up to millions. • Frequency of query updates: from a daily basis to every few minutes. • Result Volume: can amplify the input data volume by a large factor.

  8. Related Systems

  9. Content of the Paper • Content-driven routing • Need to handle both structural and value-based constraints. • Leverage YFilter: NFA-based operator networks, distributed construction. • Filtering power of routing (i.e., fraction of messages filtered) • Filtering power can be inherently limited. • Use query partitioning (if possible) to improve it. • Distributed transformation • Currently either at the publishers’ side or at the edge brokers. • Perform cascading message transformation during routing. • Efficient XML transmission • Verbosity of XML, and XML parsing at each routing step. • Investigate different XML formats for XML transmission. • Detailed architectural design • Other optimization techniques…

  10. Outline • XML dissemination services • System model • Core techniques • Status and conclusions

  11. Broker 2: /nitf/head/pubdata[@edition.area=“NY”] Broker 4: /nitf/head/pubdata[@edition.area=“SF”] Data Source data flow Broker 2: … Broker 1 Broker 5: /nitf//toject.subject[@toject.subject.type=“Stock”] or /nitf//toject.subject[@toject.subject.matter=“fishing”] Broker 6: /nitf//toject.subject[@toject.subject.type =“Sports”] Broker 2 Broker 4 query flow Broker 3 Q4: … Q5: … Broker 6 Broker 5 Q1: /nitf[head/pubdata[@edition.area=“SF”]] [.//toject.subject[@toject.subject.type=“Stock”]] Q2: /nitf[head/pubdata[@edition.area=”SF”]] [.//toject.subject[@toject.subject.matter=“fishing”]] Q3: /nitf[head/pubdata[@edition.area=“SF”]] [.//series[@series.name=“Tide Forecasts”]] Operations on Data/Query flows [a transformation query*]

  12. Build routing tables Lookup in routing tables Execute transformation plans Build transformation plans Execute query plans Build query plans System Tasks on Data/Query Planes Processing planes:query planeanddata plane

  13. Outline • XML dissemination services • System model • Core techniques • Status and conclusions

  14. Routing Table Design • A routing table: mapping from output links to routing queries. • a routing query: the data interests of queries down from an output link. • data interest of a query: XPathexpressions, for and where clauses of FLWOR expressions. • Routing table design • a canonical form of routing queries; • a representation of routing tables; and • an algorithm constructing them from a distributed query population. • Two (conflicting) goals • High filtering power of routing • Fraction of messages filtered in routing. • High routing efficiency • Number of messages routed per second.

  15. 1 nitf head  2 * 3 5 pubdata toject. subject 6 4    Q2 Q1 YFilter Basics • An XML filtering and transformation engine that processes multiple queries in a shared fashion. • ANon-Deterministic Finite Automaton(NFA)-basedoperator network. • Benefits for routing: • Fast structure matching. • A small maintenance cost for query updates. • Extensibility for supporting new operators. Q1: /nitf [head/pubdata[@edition.area=“SF”]] [.//tobject.subject[@tobject.subject.type=“Stock”]] Q2: /nitf [head/pubdata[@edition.area=“SF”]] [.//tobject.subject[@tobject.subject.matter=“fishing”]] • Y. Diao and M.J. Franklin. Query Processing for High-Volume XML Message Brokering. VLDB 2003. • Y. Diao, et al. Path Sharing and Predicate Evaluation for High-Performance XML Filtering. TODS, Dec. 2003. • YFilter v1.0 release: Coming later this month!

  16. Our Solution • Routing queries are a disjunction of path expressions • Each XPath expression (equivalent of the for and where clauses of FLOWR expressions) is a routing query. • Multiple routing queries can be connected by or. • Routing table representation • Merge routing queries into a singlecombined operator network. • Construction algorithm • Map(): a user query  a routing query in the canonical form. • Collect(): routing queries sent from child brokers  a routing table. • Aggregate(): all the routing queries (at a node) a new routing query.

  17. Broker 4 A new routing query (4d) Aggregate( ) Routing Table (mapping from links to routing queries) (4c) Collect( ) from Broker 5 from Broker 6 Broker 5 Broker 6 (5b) A new routing query (6b) A new routing query Aggregate( ) Aggregate( ) Routing queries Routing queries Map( ) Map( ) (5a) (6a) Q1: /nitf[head/pubdata/@edition.area=“SF”] [.//toject.subject/@toject.subject.type=“Stock” ] Q2: /nitf[head/pubdata/@edition.area=“SF”] [.//toject.subject/@toject.subject.matter=“fishing”] Q3: /nitf[head/pubdata/@edition.area=“SF”] [.//series/series.name=“Tide Forecasts” ] An Example Scenario

  18. (5b) (6b) 1 1 1 nitf nitf nitf Collect( ) head head head    2 2 2 * * * 3 3 3 5 5 5 pubdata pubdata pubdata toject. subject toject. subject toject. subject 4 7 6 4 6 4 7 7 6 4 4     series  1 1 nitf nitf Broker5 Broker5 Broker6   head head 2 2 * * 3 3 5 5 series series Map( ) & Aggregate( ) Map( ) &Aggregate( ) (4c’) pubdata pubdata (5a) (6a)          Broker5 Broker5 Broker6 Q3 Q1 Q2 Example (continued)

  19. (4c) Broker 5:Broker 6: (5b) (6b) 1 1 nitf nitf head head   2 2 * * 3 3 5 5 pubdata pubdata toject. subject toject. subject 7 7 4 4 4 6 6 (5b) (6b) series 1 nitf  head 2 (4c’) * 3 5 series pubdata      Broker5 Broker5 Broker6     Broker5 Broker6 Broker5 Sharing and Short-cut Evaluation • A problem with sharing • Separate routing query representations: short-cut evaluation. • Combined one: sharing may sacrifice the short-cut evaluation strategy. • Solution: dynamic pruning of the operator network at runtime • Each operator/NFA state has a static set of broker ids that it can reach. • System keeps a dynamic set of broker ids that have been reached. • YFilter execution is extended to prune the operator network using these sets.

  20. Other Routing Considerations: • Content Generalization • Large routing tables can be a problem. • Introduce content generation as an additional step in Collect( ) or Aggregate( ). • Generalization methods. • Trade off filtering power for routing (space) efficiency. • Filtering Power of Routing • Fraction of messages filtered by routing. • Selectivity of the union of the user queries at the node. • Loss in precision in the routing queries representing this node. • If inherently low, partition the query population to improve it. • An Exclusiveness Pattern: e.g.,“/a/b[@id=?]” • Identify a set of such patterns, and partition queries using them.

  21. Status and Conclusions • Queries bring intelligence to the network routing fabric. • We present a detailed architectural design of ONYX. • We address fundamental issues. • YFilter’s NFA-based operator networks are good for routing! • Locality of data interests is key to filtering power! • Status: YFilter release, XML transmission, other implementation underway. • This is an area full of opportunities for optimization. • Improving routing efficiency. • Improving filtering power of routing. • Incremental message transformation. • Sharing among different processing tasks. • Schema-based optimization…

  22. Questions ONYX

More Related