1 / 28

Indexing method

Indexing method. Data Warehousing Lab. M.S. 3 HyunSuk Jung 2003.9.30. 목차. Index in Lore DataGuides Index Fabric Toxin. Index in Lore. Value Index, Vindex Locates atomic object with certain value Text Index, Tindex

Download Presentation

Indexing method

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing method Data Warehousing Lab. M.S. 3 HyunSuk Jung 2003.9.30

  2. 목차 • Index in Lore • DataGuides • Index Fabric • Toxin

  3. Index in Lore • Value Index, Vindex • Locates atomic object with certain value • Text Index, Tindex • locates string atomic values containing specific words or groups of words • Link Index, Lindex • locates parents parents of a specific objects • Path Index, Pindex • locates objects reachable via given labeled path

  4. 1. Vindex • Satisfy basic comparisons, e.g, =, < • Query takes a triple (l, op, v) • Return one or a set of objects • Example • Suppose: have Vindex for label section • Query: values > 15.00 with incoming edge section • Result: {&3, &4} 16

  5. 2. Tindex • For keyword search • Query takes two values: (w, l) • Return oid and posting: <o, n> • Example • Suppose: have Tindex for label section • Query: select objects contain word “index” with incoming edge section • Result: {<&3, 1>, <&4, 2>}

  6. 3. Lindex • Retrieve parents of an object via given label • Query takes “child” object c and a label l • Return all parents such that: there is an l-labeled edge from p to c • Example • Suppose: located all objects containing “index” via Tindex • Query: select parents objects via incoming edge section • Result: {&2}

  7. 4. Pindex • Search all objects reachable via path P • DataGuide: a dynamic structural summary of all possible paths • Store OIDs and statistics • Example • Query: select book.chapter.section • Result: {&3, &4}

  8. DataGuides: Enabling Query Formulation and Optimization inSemistructured Databases Roy Goldman, Jennifer Widom VLDB 1997

  9. Foundations(1/2) • Definition • Label path: object에서 시작해서 검색할 수 있는 dot으로 나눠지는 하나이상의 labels, l1.l2…ln ex)object1의 label path:Restaurant.name, Bar • Data path: l1.o1.l2.o2…ln.on ex)object1의 data path:Restaurant.2.name.5 • Target set:t={o|l1.o1.l2…ln.o}주어진 label path를 검색해서 도달되는 모든 object의 집합. ex)Restaurant.Entrée의 target set={6,10,11}

  10. Foundations(2/2) • Database tree vs. schema tree

  11. Concept of a DataGuide • Summary of label paths from the root (= simple paths) • Concise: describe every unique simple path exactly once, regardless of the number of times it appears • Accuracy: do not contains label paths that do not appear in the data • Convenience: can store and access it using similar techniques available for processing semistructured data

  12. Notice • DataGuides contains no atomic values. • Since a DataGuide is intended to reflect the structure of a database, atomic values are unnecessary. • Every target set in a DataGuide is a singleton set. • Since any DataGuide label path has just one data path instance, the target set contains only one object.

  13. Existance of Multiple DataGuides • Minimal DataGuides • (c) - smallest possible DataGuide, • minimal DataGuide가 항상 best는 아니다. • Incremental maintenance문제 • Annotation문제 x E E <Figure 3. A source and two DataGuides>

  14. 1 1 1 1 A A B A B A B A A B 2,4 6 2 4 6 2 4 6 2,4 6 C C C C C C C C C C C 3,5 3 5 3 5 3,5 5 Source Strong DataGuide Source Strong DataGuide Strong DataGuide • If the sets of nodes which are reachable for simple paths are equal, then the simple paths are represented as a single node. • Linear time and linear space for tree structured data • Exponential time and exponential space for graph structured data

  15. Incremental maintenance • Update하는 방법 • 점선으로 된 B edge를 추가하기 전의 DataGuide : (b) • 점선으로 된 B edge를 추가한 후의 DataGuide : (c) • B edge의 target set이 역시{2,3}이므로 (b)의 10번 노드가 사라지고 (c)에서 처럼 B edge도 9번 노드로 향하게 된다. <Insertion of an edge> Strong DataGuide

  16. A Fast Index for Semistructured Data Brian F. Cooper, Neal Sample, Michael J. Franklin, Gísli R. Hjaltason, Moshe Shadmon VLDB 2001

  17. Index Fabric • IndexFabric indexes both paths and content of tree databases in a balanced hierarchyu of Patricia Tries. • Trie & Patricia trie <Trie> <Patricia Trie> • 기존의 Trie를 string 압축을 통해 강화한 것이다. • lossy 압축:잘못된 matching 우려. Ex) inbox ->annotated IndexFabric으로 해결

  18. Index Fabric • Tree Structured Data • Conceptual similar to strong DataGuide • Layered structure • Use Patricia trie to index a large number of search keys • The simple path of an element which has a data value is encoded as a special character sequence • Keeps the key which is the combination of encoded sequence and data value.

  19. Indexing XML with the Index Fabric • Designator “IBNABC Corp” • Raw paths • Root-to-leaf까지의 경로를 스트링으로 압축하여 XML의 계층적 구조를 인덱스한다. Ex) <A>alpha<B>beta<C>gamma</C></B></A> • Root-to-leaf 경로의 3가지 경로 <A>alpha, <A><B>beta, <A><B><C>gamma <invoice> <buyer><name> ABC Corp </name></buyer> </invoice>

  20. Indexing XML with the Index Fabric • Refined paths • Specialized paths through the XML that optimize frequently occurring access patterns. • Ex) “company X가 company Y에 판 invoice를 찾아라” 1. “Z”와 같은 designator를 할당한다. 2. 인덱스된 정보를 압축한다. 만약 “Acme Inc” 가 “ABC Corp”에 물건을 팔았다면 다음과 같은 키를 생성할 것이다. “Z ABC Corp Acme Inc” 3. 생성한 키를 fabric으로 삽입한다. <Sample XML>

  21. Index Fabric vs. strong DataGuides (d): resulting index is too restricted, lacking references to part of the database. (C): compress atomic database content but not structure, stores node IDs in inner index nodes, too.

  22. Features • Data representation • Patricia Trie indexing combined label and character strings. • Navigation • Top-down • Since no secondary context index is used, there is only a single combined look-up for structure and content. • Path templates • The pre-evaluated hits are inserted as refined paths into the same IndexFabric as the non-privileged raw paths. • They can be compared to materialied views on the document node reference

  23. Experimental result • 같은 RDBMS상에서 자체 인덱싱방법과 Index Fabric을 비교 <Query> <Experimental results>

  24. Experimental result • Query B, D의 결과 비교 <Query B: find conference paper by author.> <Query D: find publications by co-authors>

  25. Conclusion • 실험결과에서 보듯이 Index Fabric은 RDBMS 자체의 인덱스보다 길고, 복잡한 string에 대해서 좋은 효과를 보여준다. • Index Fabric 은 많은 key도 잘 수용하며, key의 길이나 복잡도에 민감하지 않다.

  26. Indexing XML Data with ToXin Flavio Rizzolo, Alberto Mendelzon WebDB2001

  27. ToXin • [Rizzolo, Mendelzon: WebDB 01] • Tree Structured Data • Conceptually Similar to strong DataGuide (not minimal DataGuide) • Support navigation of forward and backward traversal • Path Tree ( = strong DataGuide) • A node of Path Tree has an Index Table or Value Tables • Index Table (IT): parent-child relationships • Value Table (VT): owner-value relationships

  28. LibraryDB:IT book:IT paper:IT title:VT section title:VT chapter author:VT author:VT ToXin • Since ToXin keeps parent-child relationships, ToXin supports path expression with value predicates • ex) /libraryDB/book[author = author1] • Index Tables 0 LibararyDB parent child null 1 LibraryDB.book parent child 1 2 LibraryDB.paper parent child 1 6 1 2 3 • Value Tables • LibraryDB.book.author • parent value • author1 7 4 5 6 8 9 …

More Related