140 likes | 200 Views
21. 25. 31. 48. 72. 41. 12. 15. 59. 84. 91. 1,4,8,11. 12 ,13. 15 ,18,19. 21,24. 25 ,26. 31 ,38. 41 ,43,46. 48,49,50. 59 ,68. 72,78. 84 ,88. 91 ,92,99. <. <. <. <. <. <. <. <. <. <. <. §6 B+ Trees.
E N D
21 25 31 48 72 41 12 15 59 84 91 1,4,8,11 12,13 15,18,19 21,24 25,26 31,38 41,43,46 48,49,50 59,68 72,78 84,88 91,92,99 < < < < < < < < < < < §6 B+ Trees 【Definition】A B+ tree of order M is a tree with the following structural properties: (1) The root is either a leaf or has between 2 and M children. (2) All nonleaf nodes (except the root) have between M/2 and M children. (3) All leaves are at the same depth. Assume each nonroot leaf also has between M/2 and M children. Each interior node contains M pointers to the children. And M 1 smallest key values in the subtrees except the 1st one. All the actual data are stored at the leaves. A B+ tree of order 4 (2-3-4 tree) 1/13
§6 B+ Trees 22: 22: 16: 11:16 41:58 41:58 1, 8 8,11,12 11,12 16,17 16,17,18 22,23,31 22,23,31 41,52 41,52 58,59,61 58,59,61 16:22 11: 18: 41:58 1, 8 11,12 16,17 18,19 22,23,31 41,52 58,59,61 A B+ tree of order 3 (2-3 tree) 16,17,18 Find:52 Insert:18 Insert:1 Insert:19 Insert:28 2/13
§6 B+ Trees 22: 16: 41: 11: 18: 28: 58: 1, 8 11,12 16,17 18,19 22,23 28,31 41,52 58,59,61 Insert:70 First find a sibling with 2 keys and adjust. Keep more nodes full. Deletion is similar to insertion except that the root is removed when it loses two children. 3/13
§6 B+ Trees For a general B+ tree of order M Btree Insert ( ElementType X, Btree T ) { Search from root to leaf for X and find the proper leaf node; Insert X; while ( this node has M+1 keys ) { split it into 2 nodes with (M+1)/2 and (M+1)/2 keys, respectively; if (this node is the root) create a new root with two children; check its parent; } } Home work: p.138 4.36 Access a 2-3 tree Discussion 7: Depth(M, N) = ? Tinsert = ? Tfind = ? 4/13
Research Project 3 Family of B Trees (23) In computer science, there is a family of B trees – B- trees, B+ trees, B* trees, B# trees, and B x-trees. They are tree data structures that keep data sorted and allow searches, insertions, and deletions in logarithmic (amortized) time. In this project, you are supposed to introduce the B- trees and compare it with B+ trees. Detailed requirements can be downloaded from http://acm.zju.edu.cn/dsaa/ 5/13
Research Project 4 Tries (23) A trie is an index structure that is particularly useful when the keys vary in length. It is also called a prefix tree, and is used to store an associative array where the keys are usually strings. In this project, you are supposed to introduce the tries and compare with ordinary binary search trees. Detailed requirements can be downloaded from http://acm.zju.edu.cn/dsaa/ 6/13
Inverted File Index How can I find in which retrieved web pages that include "Computer Science"? 7/13
Inverted File Index Wait till your next life ! Solution 1: Scan each page for the string "Computer Science". How did Google do? 8/13
Inverted File Index silver truck Inverted File Index Solution 2: Inverted File Index 【Definition】 Index is a mechanism for locating a given term in a text. 【Definition】 Inverted file contains a list of pointers (e.g. the number of a page) to all occurrences of that term in the text. 〖Example〗 Document sets 9/13
Inverted File Index Discussion 8: How to easily print the sentences which contain the words and highlight the words? 10/13
Inverted File Index 〖Example〗 Process processing processes processed says said saying say process Word Stemming Process a word so that only its stem or root form is left. Stop Words Some words are so common that almost every document contains them, such as “a” “the” “it”. It is useless to index them. They are called stop words. We can eliminate them from the original documents. 11/13
Inverted File Index Access Methods Solution 1: Search trees ( B- trees, B+ trees, Tries, ... ) Solution 2: Hashing Discussion 9: What are the pros and cons of using hashing? Discussion 10: How to improve the quality of search results? 12/13
Research Project 5 Roll Your Own Mini Search Engine In this project, you are supposed to create your own mini search engine which can handle 1 million inquiries over 100 files in 1 second. You may download the functions for handling stop words and stemming from the Internet. Detailed requirements can be downloaded from http://acm.zju.edu.cn/dsaa/ 13/13