1 / 24

Understanding Binary Search Trees (BST) and Hash Tables

Explore the concepts of Binary Search Trees and Hash Tables, including traversal methods, deletion cases, and performance analysis for searching. Learn about hash functions, collision resolution strategies, and factors affecting search performance.

laurij
Download Presentation

Understanding Binary Search Trees (BST) and Hash Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. the BSTree<TE, KF> class • BSTreeNode has same structure as binary tree nodes • elements stored in a BSTree are a key-value pair • must be a class (or a struct) which has • a data member for the value • a data member for the key • a method with the signature: KF key( ) const; where KF is the type of the key

  2. an example struct treeItem { int id; // key string data; // value int key( ) const { return id; } }; BSTree<treeItem, int> myBSTree;

  3. basic BST search algorithm void search (bstree, searchKey) { if (bstree is empty) //base case: item not found // take needed action else if (key in bstree's root == search Key) // base case: item found // take needed action else if (searchKey < key in bstree's root ) search (leftSubtree, searchKey); else search (rightSubtree, searchKey); }

  4. deletion cases • item to be deleted is in a leaf node • pointer to its node (in parent) must be changed to NULL • item to be deleted is in a node with one empty subtree • pointer to its node (in parent) must be changed to the non-empty subtree • item to be deleted is in a node with two non-empty subtrees

  5. 36 20 42 45 12 24 39 21 40 the easy cases

  6. 36 20 42 45 12 24 39 21 40 the “hard” case

  7. 36 20 42 45 12 24 39 21 40 the “hard” case replace with smallest in right subtree (inorder successor) replace with largest in left subtree (inorder predecessor)

  8. traversing a binary search tree • can use any of the binary tree traversal orders – preorder, inorder, postorder • base case is reaching an empty tree • inorder traversal visits the elements in order of their key values • how would you visit the elements in descending order of key values?

  9. big Oh of BST operations • measured by length of the search path • depends on the height of the BST • height determined by order of insertion • height of a BST containing n items is • minimum: floor (log2 n) • maximum: n - 1 • average: ?

  10. faster searching • "balanced" search trees guarantee O(log2 n) search path by controlling height of the search tree • AVL tree • 2-3-4 tree • red-black tree (used by STL associative container classes) • hash table allows for O(1) search performance • search time does not increase as n increases

  11. Hash Table • a hash table is an array of size Tsize • has index positions 0 .. Tsize-1 • two types of hash tables (Nyhoff – Ch.9.3) • open hash table • array element type is a <key, value> pair • all items stored in the array • chained hash table • element type is a pointer to a linked list of nodes containing <key, value> pairs • items are stored in the linked list nodes • keys are used to generate an array index • home address (0 .. Tsize-1)

  12. Considerations • How big an array? • load factor of a hash table is n/Tsize • Hash function to use? • int hash(KeyType key) -> 0 .. Tsize-1 • Collision resolution strategy? • hash function is many-to-one

  13. Hash Function • a hash function is used to map a key to an array index (home address) • search starts from here • insert, retrieve, update, delete all start by applying the hash function to the key

  14. Some hash functions • if KeyType is int - key % TSize • if KeyType is a string - convert to an integer and then % Tsize • goals for a hash function • fast to compute • even distribution • cannot guarantee no collisions unless all key values are known in advance

  15. An Open Hash Table Hash (key) produces an index in the range 0 to 6. That index is the “home address” 0 1 2 3 4 5 6 K3 K3info K1 K1info Some insertions: K1 --> 3 K2 --> 5 K3 --> 2 K2 K2info key value

  16. Handling Collisions 0 1 2 3 4 5 6 K6 K6info Some more insertions: K4 --> 3 K5 --> 2 K6 --> 4 K3 K3info K1 K1info K4 K4info K2 K2info Linear probing collision resolution strategy K5 K5info

  17. Search Performance Average number of probes needed to retrieve the value with key K? 0 1 2 3 4 5 6 K6 K6info K hash(K) #probes K1 3 1 K2 5 1 K3 2 1 K4 3 2 K5 2 5 K6 4 4 K3 K3info K1 K1info K4 K4info K2 K2info 14/6 = 2.33 (successful) K5 K5info unsuccessful search?

  18. 0 1 2 3 4 5 6 K3 K3info K5 K5info K1 K1info K4 K4info K6 K6info K2 K2info linked lists of synonyms A Chained Hash Table insert keys: K1 --> 3 K2 --> 5 K3 --> 2 K4 --> 3 K5 --> 2 K6 --> 4

  19. 0 1 2 3 4 5 6 K3 K3info K5 K5info K1 K1info K4 K4info K6 K6info K2 K2info Search Performance Average number of probes needed to retrieve the value with key K? K hash(K) #probes K1 3 1 K2 5 1 K3 2 1 K4 3 2 K5 2 2 K6 4 1 8/6 = 1.33 (successful) unsuccessful search?

  20. successful search performance open addressing open addressing chaining (linear probing) (double hashing) load factor 0.5 1.50 1.39 1.25 0.7 2.17 1.72 1.35 0.9 5.50 2.56 1.45 1.0 ---- ---- 1.50 2.0 ---- ---- 2.00

  21. Factors affecting Search Performance • quality of hash function • how uniform? • depends on actual data • collision resolution strategy used • load factor of the HashTable • N/Tsize • the lower the load factor the better the search performance

  22. Traversal • Visit each item in the hash table • Open hash table • O(Tsize) to visit all n items • Tsize is larger than n • Chained hash table • O(Tsize + n) to visit all n items • Items are not visited in order of key value

  23. Deletions? • search for item to be deleted • chained hash table • find node and delete it • open hash table • must mark vacated spot as “deleted” • is different than “never used”

  24. Hash Table Summary • search speed depends on load factor and quality of hash function • should be less than .75 for open addressing • can be more than 1 for chaining • items not kept sorted by key • very good for fast access to unordered data with known upper bound • to pick a good TSize

More Related