A Generic Framework for Querying and Updating Secondary XML Index Structures

A Generic Framework for Querying and Updating Secondary XML Index Structures Katharina Grün

Research Methodology

Motivation • Widespread use of XML • XML databases for efficient query and update processing • Require index structures on content and structure of documents • primary index structure • default index • on whole document • not optimized for specific queries • secondary index structures • created on demand • on specific document fragments • adapted to query workload • Framework for querying and updating secondary XML index structures(SCIENS) Become aware of problem

Running example path: projects/project[1]/@name labelpath: projects/project/@name • //resource[@date>= '2005-01-01'] • //project[@name='sciens']/milestone[@id=2]/resource[@date>='2007-01-01'] • //element(resource, Report)[author='Smith'] Become aware of problem

Challenges • Which secondary index structures are necessary? • each kind of query is best supported by different index structure • not possible to provide one index structure for each possible query • How to integrate them into a common framework? • each secondary index can index arbitrary properties of arbitrary fragments • query and update processing must not depend on specific indices defined • How to update them when documents change? • document updates must be propagated to affected index structures • incremental index maintenance algorithm Become aware of problem

Related work (1) • XML databases • limited support for secondary index structures • XML index structures • structure and/or content • mostly primary index structure • based on different models, proprietary structures • Object-oriented index structures • proprietary structures to support queries on path navigation and/or inheritance hierarchies • Multidimensional index structures • support several value dimensions • do not consider structure Become aware of problem

Related work (2) • Extensible indexing • object-relational databases • adapt index structures to different data types • Indexing tasks • Maintain secondary indices when documents are updated (KeyX1) • Select optimal index for specific query (XML Access Modules2) • Suggest set of indices for query workload (KeyX1) • currently no integrated approach for processing secondary index structures in an XML database 1) B.C.Hammerschmidt: KeyX: Selective Key-Oriented Indexing in Native XML Databases. Phd Thesis, University of Lübeck, 2005. 2) Arion, A., Benzaken, V. and Manolescu, I.: XML Acess Modules: Towards Physical Data Independence in XML Databases. Ximep workshop, 2005. Become aware of problem

SCIENS - Ideas • Structure and Content Indexing with Extensible, Nestable Structures • Which secondary index structures are necessary? • select a small set of index structures and adapt them to various properties • nest index structures to reflect hierarchical queries • How to integrate them into a common framework? • provide an index model • common index interface to query and update indices • How to update them when documents change? • index maintenance algorithm that determines updates for arbitrary indices • based on update fragments and index definitions Suggest solution

Index structures – one dimension (1) • Value indexing • hashtable or B+-tree on value • @date>= '2005-01-01' • Structure indexing • hashtable or B+-tree on path/labelpath/type • //resource • /project[1]//resource • /project[2]/milestone[2]/resource Construct solution

Index structures – one dimension (2) Construct solution

Index structures – multiple dimensions (1) 1) Robinson, J.: The KDB-tree: A search Structure for Large Multidimensional Dynamic Indexes. Sigmod, ACM Press, 1981. Construct solution

Index structures – multiple dimensions (2) Construct solution

Comparison • queries and indices on milestone hierarchy and date • e.g. //project[1]//resource[@date>2005-01-01] • define index that best matches query workload Evaluate solution

Index framework (1) • index • search function consisting of a set of index entries • provides interface to update and retrieve index entries • index entry • maps index keys (value, type, path,…) -> returned nodes • TechnicalReport, Smith -> 3.2.1, 4.3.1,... • index definition • selects nodes to be indexed • //element(resource, $V1)[author=$V2] • represented as unordered tree pattern with index variables • index structure • specific data structure (hash table, prefix B+-tree, kdb-tree) • one index can use several index structures (index nesting) Construct solution

Index framework (2) • index configuration • provides mapping from index to specific index structure • associates with each index variable the index structure to be used • $T1, $E2: kdb-tree • $E2: hash table, $T1: B+-tree • search configuration • used to access index • associates index key to be searched with each index variable • generated by index selection tool • $T1= Report, $E2= 'Smith' Construct solution

Index maintenance • propagate document updates to affected indices • steps • find embeddings of index patterns in update fragments • execute queries • generate index entries [(TechnicalReport, 'Smith')  resource] [(TechnicalReport, 'Tim')  resource] • up to 9 times faster than existing approach (KeyX) Construct / evaluate solution

Conclusion • select secondary index structures for XML • extensible: various properties and operations on these properties • nestable: adapt indices to hierarchical queries • integrate index structures into framework • hides indexing tasks from query and update processing tasks • provides index model (common index interface) • index maintenance algorithm • propagate updates to index structures • flexibility to define indices that match the query workload

A Generic Framework for Querying and Updating Secondary XML Index Structures

A Generic Framework for Querying and Updating Secondary XML Index Structures

Presentation Transcript

Querying XML

Querying XML

Querying XML

Querying XML

A Generic SaveAs(XML)

Querying and storing XML

Index Structures for Querying the Deep Web

ViST: a dynamic index method for querying XML data by tree structures

Querying and Storing XML

Updating JUPITER framework using XML interface

Querying and storing XML

Querying and Storing XML

XML Querying and Views

Querying XML Views

Optimized Index Structures for Querying RDF from the Web

7 Querying XML

Index Structures 13.2 – Secondary Index

Querying and storing XML

Querying XML Documents