1 / 33

CS 225 Data Structures & Software Principles

CS 225 Data Structures & Software Principles. Section 7 Binary Search Trees and Tries. Discussion Topics. Binary Search Trees Binary Tree and BST properties Sample Code: find, remove Practice Problems Tries Regular Tries Patricia Trees De la Briandes Trees Practice Problems Exams. 6.

frey
Download Presentation

CS 225 Data Structures & Software Principles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 225Data Structures & Software Principles Section 7 Binary Search Trees and Tries

  2. Discussion Topics • Binary Search Trees • Binary Tree and BST properties • Sample Code: find, remove • Practice Problems • Tries • Regular Tries • Patricia Trees • De la Briandes Trees • Practice Problems • Exams

  3. 6 10 4 7 12 1 5 Binary Search Trees • Definition : • Have a value associated with each node • the values have a linear order • Every node has a value greater than any value in the left sub-tree and less than any value in the right sub-tree. • Abbreviated as BST

  4. Binary Search Trees • Height of a complete binary tree with n nodes is exactly lg n • The maximum height binary tree with n nodes has a height n-1 • The minimum height binary tree with n nodes has height lg n

  5. Binary Search Trees • The worst-case search time • for all possible search trees with n nodes is O(n) • for the best search tree with n nodes is O(lg n) 4 1 vs. 2 6 2 NULL 3 NULL 5 7 1 3 … NULL

  6. One Implementation:use Tree Nodes class BSTree { // public stuff… private: class TreeNode { public: TreeNode() : element(), left(NULL), right(NULL) {} TreeNode( Etype elmt, TreeNode* leftPtr = NULL, TreeNode* rightPtr = NULL ) : element(elmt), left(leftPtr), right(rightPtr) {} Etype element;// element of node TreeNode* left;// pointer to left subtree TreeNode* right;// pointer to right subtree }; typename BSTree<Etype>::TreeNode* root; // root node of tree int size; // # nodes in tree };

  7. Basic BST Operations Find • Recursive implementation • Iterative implementation • Insert • Remove • The entire code is available is ~cs225/src/library/07-bst/_latestBST

  8. Basic BST Operations: Find • Recursive Find Algorithm (pseudo-code) int Find(treePtr P, key K) { if ( P == NULL) return 0 else if ( K == Pkey ) return 1 else if ( K < Pkey ) return Find(PLeftChild, K) else return Find(PRightChild, K) }

  9. Basic BST Operations: Find • Iterative Find Algorithm (pseudo-code) int Find(treePtr P, key K) { while ( P != NULL) { if ( K == Pkey ) return 1 else if ( K < Pkey ) P = PLeftChild else P = PRightChild } return 0 }

  10. Basic BST Operations: Insert • Insertion • Must ensure that tree remains a binary search tree after insertion • Determine where the element would have been if it were actually in the BST. Insert there. • Compare Insert() vs. Find() void Insert(typename BSTree<Etype>::TreeNode * & ptr, Etype const & insElem);

  11. Basic BST Operations: Remove • Remove • More tricky than Insertion • First find node with element to remove • Split into three cases • Node to be deleted is a leaf • Node to be deleted has one child • Node to be deleted has two children void Remove(typename BSTree<Etype>::TreeNode * & ptr, Etype const & remElem);

  12. Terminology for Remove • Consider root node (6) • In-order predecessor: greatest (right-most) element in left subtree • In-order successor: smallest (left-most) element in right subtree 6 10 4 7 12 1 5

  13. Basic BST Operations: Remove • Leaf case • Simply delete the node • One-child case • Just connect the node’s child to it’s parent • Two-child case • Replace the node by it’s in-order successor and delete the in-order successor • Alternatively, we could use also the in-order predecessor

  14. Basic BST Operations: Remove • Two-child case • Replace node with in-order successor (predecessor) and delete the in-order successor (predecessor) typename BSTree<Etype>::TreeNode* tempPtr; if ((ptr->left != NULL) && (ptr->right != NULL)) { // Replace with smallest in right subtree tempPtr = ptr->right; while (tempPtr->left != NULL) tempPtr = tempPtr->left; ptr->element = tempPtr->element; Remove(ptr->right, ptr->element); }

  15. Basic BST Operations: Remove • Leaf case • Simply delete the node else if ((ptr->left == NULL) && (ptr->right == NULL)) { delete ptr; ptr = NULL; }

  16. Basic BST Operations: Remove • One-child case • Just connect the node’s child to it’s parent else { tempPtr = ptr; if (ptr->left == NULL) // only a right child ptr = ptr->right; else // ptr->right == NULL // only a left child ptr = ptr->left; delete tempPtr; }

  17. Practice Problem • Write an algorithm for a level-order traversal in a binary tree. void levelOrdered(TreeNode* root);

  18. Discussion Topics • Tries • Basics • Jason’s Code • Patricia Trees • De La Briandais Trees • Hybrid Trees

  19. Tries • Data structure optimized for lookups on a key that can be decomposed into characters • Represented using a tree of arrays • For a character set of size k, the corresponding Trie structure is a (k+1)-ary tree

  20. Tries • i-th character (starting at 0) in the key corresponds to a node at depth i • Need a mapping of character to an array index • The extra cell in the array represents the “null character” (  ) • Points to a leaf • Ideally, no need to store key in a leaf, since it is completely determined by path followed • Info stored at the leaf • Spend only constant time at each level

  21. r s a b c … 0 a t 1 1 f a i 2 2 t r r 3 3 3 t 4 4 4 star 5 raft start stir Trie Example … z Words in Trie raft star start stir

  22. Tries • Running time of Find operation: O(L) where L is the length of the string we are looking for • Unique trie for any set of search keys • Advantage: NOT dependent on the number of strings we have in the Trie structure • Disadvantage: memory waste • 27 cell array, one per character needed for Strings • Space: (k+1) * #nodes * sizeof(pointer)

  23. Jason’s Code:TrieNode Data class TrieNode { int nodeLevel; // level of the node bool isLeaf; // 0 for leaf, 1 for interior Array<TrieNode*> subtries; // array is indexed // starting at 1! String key; // string key in leaf nodes Etype storedInfo; } • Available on the EWS network at: ~cs225/src/library/11-trie/

  24. Code Review: Trie Find • template <typename Etype> • pair<bool, Etype> Trie<Etype>::find(String const & searchKey, • typename Trie<Etype>::TrieNode const * nodePtr) const • { • if (nodePtr == NULL) • return pair<bool, Etype>(false, Etype()); • else if (nodePtr->isLeaf == true) • { • if (searchKey == nodePtr->key) • return pair<bool, Etype>(true, nodePtr->storedInfo); • else • return pair<bool, Etype>(false, Etype()); • } • else // nodePtr->isLeaf == false • { • int index = ascIndex(searchKey[nodePtr->nodeLevel]); • return find(searchKey, (nodePtr->subtries)[index]); • } • }

  25. Code Review : ascIndex • template <typename Etype> • int Trie<Etype>::ascIndex(char indexChar) const • { • if ((indexChar >= 65) && (indexChar <= 90)) • return indexChar - 64; • if ((indexChar >= 97) && (indexChar <= 122)) // lowercase letter • return (indexChar - 96); • else if (indexChar == 0) // null character • return 0; • else • Assert("Bizarre character in string!"); • }

  26. Code Review: Trie Insert • template <typename Etype> • void Trie<Etype>::insert(String const & insKey, Etype const & insInfo, • typename Trie<Etype>::TrieNode * & nodePtr, int prevLevel) • { • if (nodePtr == NULL) • { • if (prevLevel == insKey.length()) • { • nodePtr = new TrieNode(insKey, insInfo); • nodePtr->nodeLevel = prevLevel + 1; • } • else • { • nodePtr = new TrieNode(); • nodePtr->nodeLevel = prevLevel + 1; • insert(insKey, insInfo, • (nodePtr->subtries)[ascIndex(insKey[nodePtr->nodeLevel])], • nodePtr->nodeLevel); • } • }// more…

  27. …Trie Insert • else if (nodePtr->isLeaf == true) // leaf case • { • cout << "This key already exists in the trie!" << endl; • return; • } • else // nodePtr->isLeaf == false, array node case • insert(insKey, insInfo, • (nodePtr->subtries)[ascIndex( • insKey[nodePtr->nodeLevel])], nodePtr->nodeLevel); • }

  28. Patricia Trees • Acronym: Practical Algorithm To Retrieve Information Coded In Alphanumeric • Trick: only allocate arrays that make a “decision” • Do not store nodes with only one non-NULL cell • Store in each node the index of the character position on which it discriminates • Tradeoff: Less space required, but more work for Insert and Remove • Key no longer uniquely determined by path • Now we must store keys in the leaf

  29. a b c … r s … z 0 t a 1 1 a i f 2 2 t r r 3 3 3 t 4 4 4 star raft stir 5 start Patricia Tree Example Words in Trie raft star start stir a b c … r s … z 0 skip:“” a i raft 2 skip:“t” t stir 4 skip:“r” star start • What if I wanted to find “spam”?

  30. De La Briandais Trees • Trick: convert arrays in Trie to sparse arrays • Allocate space only for used cells in the arrays • Each node now has a linked list • Array cells are now nodes that not only point down, but to the next used character on that level • Advantage: can save much space; good when the linked lists are not long • Disadvantage: search is now dependent on k (alphabet size)

  31. a b c … r s … z 0 t a 1 1 a i f 2 2 t r r 3 3 3 t 4 4 4 star raft stir 5 start de la Briandais TreeExample root Words in Trie r s 0 raft star start stir t a 1 1 a i f 2 2 t r r 3 3 3 4 4 4 t star raft stir 5 start

  32. Hybrid Structures • Patricia/de la Briandais • Uses both optimizations • Eliminate all one-node linked lists in the de la Briandais tree • Trie/Patricia/de la Briandais • Highly optimized data structure • Upper levels use arrays, lower levels use linked lists

  33. Tries: Practice Problem Write a function that given a Patricia tree, will calculate the largest skipped gap in the tree. You should be considering the gap if any, between a leaf and non-leaf parent. class TrieNode { int nodeLevel; // level of the node bool isLeaf; // 0 for leaf, 1 for interior Array<TrieNode*> subtries; // array is indexed // starting at 1! String key; // string key in leaf nodes Etype storedInfo; }

More Related