1 / 21

The Data Ring: Community Content Sharing

The Data Ring: Community Content Sharing. Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz). Data Sharing Communities. Data sharing community: a group of users that share and query information within some domain. Examples: UCSC genome browser, SwissProt, Flickr

bryga
Download Presentation

The Data Ring: Community Content Sharing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

  2. Data Sharing Communities Data sharing community: a group of users that shareandqueryinformation within some domain • Examples: UCSC genome browser, SwissProt, Flickr • Interesting data management problem • Shared information is heterogeneous • Data is distributed and dynamic • Lack of central administration • Users are not database savvy

  3. The Data Ring • P2P middleware system that provides: • Monitoring • Querying • …and other database-like services over the distributed information • Main goal: simplicity of use

  4. Data abstraction in the data ring • Topological layer • Physical layer • External layer

  5. Data abstraction in the data ring • Declarative query services • Data and query model based on XML Topological Layer

  6. Data abstraction in the data ring • Basic service is distributed query evaluation • Comprises the overlay network (DHT), physical access structures (indices, replicas, views), and the catalog. Physical Layer

  7. Data abstraction in the data ring External Layer • Provides semantically richer data models

  8. Data abstraction in the data ring • Our focus is on the topological and physical layer • External layer is equally important and an active research area Topological Layer Physical Layer

  9. Thesis #1: formalism for distributed XML data and queries

  10. Distributed XML data and queries • What made the relational model successful: • A logic for describing tables • An algebra for query optimization • We need the equivalent for trees in a distributed context: • A logic for describing distributed XML data • An algebra for optimizing distributed XML queries

  11. Desiderata for description logic • Seamless transition between data and services • Important for loose data integration • Support for XML streams • Streams are essential for subscription services • They are also necessary to support recursion

  12. Starting point: AXML • AXML: XML tree with embedded web service calls • Seamless transition between intentional and extensional data • Provides a simple mechanism for loose data integration • Core concept: XML streams • A web service call returns a stream of elements • Support for both push and pull semantics

  13. Desiderata for algebra • Be amenable to rewrites • Capture the topology of distributed computation • Allow seamless transition between logical and physical state • Plans may need to be re-optimized in mid-flight • It may be necessary to perform partial optimization • Error recovery

  14. A proposal based on AXML • A distributed plan is a workflow of web services … which is exactly a AXML tree • Components: • An encoding of distributed plans in AXML • Rewrite rules • A nice bonus: plans can be readily exchanged between nodes

  15. Disclaimer • AXML is a starting point, not a panacea • Bottom line: we need formalisms for distributed XML queries

  16. Thesis #2: autonomic administration

  17. Autonomic administration • Users are not database experts • Typically, scientists with computer experience • Users are averse to too many “knobs” • No central authority that is responsible for administration • Autonomic administration is a necessity -- not a gadget

  18. Facets of autonomy • Self-monitoring • Self-tuning • Self-healing

  19. Some issues • System integration • Distribution • On-line tuning • Pro-active tuning

  20. Distributed vs. local tuning • Distributed tuning • Based on the global workload • Catalog organization, replication • Local tuning • Based on local workload • Physical design tuning

  21. Data activation for files • A large portion of the data is expected to be in files • We need to develop query processors for data residing in files • File activation: optimize access to the file based on the local workload • E.g., instantiate an index on file contents or materialize a relational view • Local tuning is essential in this context

More Related