170 likes | 328 Views
A Generic Framework for Querying and Updating Secondary XML Index Structures. Katharina Grün. Research Methodology. Motivation. Widespread use of XML XML databases for efficient query and update processing Require index structures on content and structure of documents
E N D
A Generic Framework for Querying and Updating Secondary XML Index Structures Katharina Grün
Motivation • Widespread use of XML • XML databases for efficient query and update processing • Require index structures on content and structure of documents • primary index structure • default index • on whole document • not optimized for specific queries • secondary index structures • created on demand • on specific document fragments • adapted to query workload • Framework for querying and updating secondary XML index structures(SCIENS) Become aware of problem
Running example path: projects/project[1]/@name labelpath: projects/project/@name • //resource[@date>= '2005-01-01'] • //project[@name='sciens']/milestone[@id=2]/resource[@date>='2007-01-01'] • //element(resource, Report)[author='Smith'] Become aware of problem
Challenges • Which secondary index structures are necessary? • each kind of query is best supported by different index structure • not possible to provide one index structure for each possible query • How to integrate them into a common framework? • each secondary index can index arbitrary properties of arbitrary fragments • query and update processing must not depend on specific indices defined • How to update them when documents change? • document updates must be propagated to affected index structures • incremental index maintenance algorithm Become aware of problem
Related work (1) • XML databases • limited support for secondary index structures • XML index structures • structure and/or content • mostly primary index structure • based on different models, proprietary structures • Object-oriented index structures • proprietary structures to support queries on path navigation and/or inheritance hierarchies • Multidimensional index structures • support several value dimensions • do not consider structure Become aware of problem
Related work (2) • Extensible indexing • object-relational databases • adapt index structures to different data types • Indexing tasks • Maintain secondary indices when documents are updated (KeyX1) • Select optimal index for specific query (XML Access Modules2) • Suggest set of indices for query workload (KeyX1) • currently no integrated approach for processing secondary index structures in an XML database 1) B.C.Hammerschmidt: KeyX: Selective Key-Oriented Indexing in Native XML Databases. Phd Thesis, University of Lübeck, 2005. 2) Arion, A., Benzaken, V. and Manolescu, I.: XML Acess Modules: Towards Physical Data Independence in XML Databases. Ximep workshop, 2005. Become aware of problem
SCIENS - Ideas • Structure and Content Indexing with Extensible, Nestable Structures • Which secondary index structures are necessary? • select a small set of index structures and adapt them to various properties • nest index structures to reflect hierarchical queries • How to integrate them into a common framework? • provide an index model • common index interface to query and update indices • How to update them when documents change? • index maintenance algorithm that determines updates for arbitrary indices • based on update fragments and index definitions Suggest solution
Index structures – one dimension (1) • Value indexing • hashtable or B+-tree on value • @date>= '2005-01-01' • Structure indexing • hashtable or B+-tree on path/labelpath/type • //resource • /project[1]//resource • /project[2]/milestone[2]/resource Construct solution
Index structures – one dimension (2) Construct solution
Index structures – multiple dimensions (1) 1) Robinson, J.: The KDB-tree: A search Structure for Large Multidimensional Dynamic Indexes. Sigmod, ACM Press, 1981. Construct solution
Index structures – multiple dimensions (2) Construct solution
Comparison • queries and indices on milestone hierarchy and date • e.g. //project[1]//resource[@date>2005-01-01] • define index that best matches query workload Evaluate solution
Index framework (1) • index • search function consisting of a set of index entries • provides interface to update and retrieve index entries • index entry • maps index keys (value, type, path,…) -> returned nodes • TechnicalReport, Smith -> 3.2.1, 4.3.1,... • index definition • selects nodes to be indexed • //element(resource, $V1)[author=$V2] • represented as unordered tree pattern with index variables • index structure • specific data structure (hash table, prefix B+-tree, kdb-tree) • one index can use several index structures (index nesting) Construct solution
Index framework (2) • index configuration • provides mapping from index to specific index structure • associates with each index variable the index structure to be used • $T1, $E2: kdb-tree • $E2: hash table, $T1: B+-tree • search configuration • used to access index • associates index key to be searched with each index variable • generated by index selection tool • $T1= Report, $E2= 'Smith' Construct solution
Index maintenance • propagate document updates to affected indices • steps • find embeddings of index patterns in update fragments • execute queries • generate index entries [(TechnicalReport, 'Smith') resource] [(TechnicalReport, 'Tim') resource] • up to 9 times faster than existing approach (KeyX) Construct / evaluate solution
Conclusion • select secondary index structures for XML • extensible: various properties and operations on these properties • nestable: adapt indices to hierarchical queries • integrate index structures into framework • hides indexing tasks from query and update processing tasks • provides index model (common index interface) • index maintenance algorithm • propagate updates to index structures • flexibility to define indices that match the query workload