210 likes | 383 Views
The Data Ring: Community Content Sharing. Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz). Data Sharing Communities. Data sharing community: a group of users that share and query information within some domain. Examples: UCSC genome browser, SwissProt, Flickr
E N D
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)
Data Sharing Communities Data sharing community: a group of users that shareandqueryinformation within some domain • Examples: UCSC genome browser, SwissProt, Flickr • Interesting data management problem • Shared information is heterogeneous • Data is distributed and dynamic • Lack of central administration • Users are not database savvy
The Data Ring • P2P middleware system that provides: • Monitoring • Querying • …and other database-like services over the distributed information • Main goal: simplicity of use
Data abstraction in the data ring • Topological layer • Physical layer • External layer
Data abstraction in the data ring • Declarative query services • Data and query model based on XML Topological Layer
Data abstraction in the data ring • Basic service is distributed query evaluation • Comprises the overlay network (DHT), physical access structures (indices, replicas, views), and the catalog. Physical Layer
Data abstraction in the data ring External Layer • Provides semantically richer data models
Data abstraction in the data ring • Our focus is on the topological and physical layer • External layer is equally important and an active research area Topological Layer Physical Layer
Distributed XML data and queries • What made the relational model successful: • A logic for describing tables • An algebra for query optimization • We need the equivalent for trees in a distributed context: • A logic for describing distributed XML data • An algebra for optimizing distributed XML queries
Desiderata for description logic • Seamless transition between data and services • Important for loose data integration • Support for XML streams • Streams are essential for subscription services • They are also necessary to support recursion
Starting point: AXML • AXML: XML tree with embedded web service calls • Seamless transition between intentional and extensional data • Provides a simple mechanism for loose data integration • Core concept: XML streams • A web service call returns a stream of elements • Support for both push and pull semantics
Desiderata for algebra • Be amenable to rewrites • Capture the topology of distributed computation • Allow seamless transition between logical and physical state • Plans may need to be re-optimized in mid-flight • It may be necessary to perform partial optimization • Error recovery
A proposal based on AXML • A distributed plan is a workflow of web services … which is exactly a AXML tree • Components: • An encoding of distributed plans in AXML • Rewrite rules • A nice bonus: plans can be readily exchanged between nodes
Disclaimer • AXML is a starting point, not a panacea • Bottom line: we need formalisms for distributed XML queries
Autonomic administration • Users are not database experts • Typically, scientists with computer experience • Users are averse to too many “knobs” • No central authority that is responsible for administration • Autonomic administration is a necessity -- not a gadget
Facets of autonomy • Self-monitoring • Self-tuning • Self-healing
Some issues • System integration • Distribution • On-line tuning • Pro-active tuning
Distributed vs. local tuning • Distributed tuning • Based on the global workload • Catalog organization, replication • Local tuning • Based on local workload • Physical design tuning
Data activation for files • A large portion of the data is expected to be in files • We need to develop query processors for data residing in files • File activation: optimize access to the file based on the local workload • E.g., instantiate an index on file contents or materialize a relational view • Local tuning is essential in this context