160 likes | 177 Views
This workshop at IIT Bombay explores the need for XML data indexing and the different types of indexes that can be used. Topics include queries and indexes in traditional DBMS, querying in XML, path and value indexes, and performance improvements with indexing.
E N D
Indexing of XML Data Raghuraman Rangarajan KReSIT, IIT Bombay. XML Workshop, IIT Bombay
Plan of Talk • Why is indexing needed? • Queries and Indexes in Traditional DBMS • Querying in XML • Indexes: Path, Value • Conclusion XML Workshop, IIT Bombay
Why is Indexing Needed? • Allows fast access to data by replicating portions of the data in special purpose structures. • Despite the additional cost (storage, maintenance and complexity) they have shown to be useful in evaluating queries. XML Workshop, IIT Bombay
Queries and Indexes in Traditional DBMS XML Workshop, IIT Bombay
An XML Fragment part subpart part name supplier supplier name name subpart supplier name name address name supplier address address name address name (with leaf values omitted) XML Workshop, IIT Bombay
Queries in XML • SELECT X • FROM part._*.supplier.name X 2. Select X From part._*.supplier: {name X, address: “Mumbai”} XML Workshop, IIT Bombay
Indexes for XML • Path indexes: regular path expressions • Value Indexes: locating atomic objects XML Workshop, IIT Bombay
Building A Path Index part subpart part name supplier supplier name name subpart name name address name supplier address address name address name h1 part subpart h2 name supplier subpart name supplier h6 h3 name address name supplier h4 name address h7 name address h5
Path Index h1 part subpart h2 name supplier subpart name supplier h6 h3 name address name supplier h4 name address h7 name address h5 • Index summarises path information • Each entry: list of pointers to data nodes XML Workshop, IIT Bombay
Using Path Index for Regular Path Expressions h1 part subpart h2 name supplier subpart name supplier h6 h3 name address name supplier h4 name address h7 name address h5 (R1) part.name (R2) part.supplier.name (R3) _*.supplier.name (R4) part._*.subpart.name XML Workshop, IIT Bombay
Path Indexes • XSet project (Berkeley) • Dataguides (Lore, Stanford) XML Workshop, IIT Bombay
Value Index • Useful for comparisons (=, <, etc.) • Example: Find supplier whose name is “XYZ”? part subpart VIndex(name) supplier supplier name address name address “XYZ” “ABC” XML Workshop, IIT Bombay
Other Indexes • Text Indexes: Information retrieval style keyword search. Example: Find the suppliers in Mumbai(“address”) Also supports search features like AND, OR, NEAR, etc. XML Workshop, IIT Bombay
Conclusion • Performance improves significantly when indexing is used for query processing (Lore). • Performance of the path indexes depends on the type of queries. XML Workshop, IIT Bombay
References • The Lore Project (www-db.stanford.edu/lore) • Work done by Dan Suciu (www.research.att.com/~suciu/) • Data on the Web: Serge Abiteboul, et al. XML Workshop, IIT Bombay
Indexing of XML Data Raghuraman Rangarajan KReSIT, IIT Bombay. XML Workshop, IIT Bombay