340 likes | 526 Views
CS 225 Data Structures & Software Principles. Section 7 Binary Search Trees and Tries. Discussion Topics. Binary Search Trees Binary Tree and BST properties Sample Code: find, remove Practice Problems Tries Regular Tries Patricia Trees De la Briandes Trees Practice Problems Exams. 6.
E N D
CS 225Data Structures & Software Principles Section 7 Binary Search Trees and Tries
Discussion Topics • Binary Search Trees • Binary Tree and BST properties • Sample Code: find, remove • Practice Problems • Tries • Regular Tries • Patricia Trees • De la Briandes Trees • Practice Problems • Exams
6 10 4 7 12 1 5 Binary Search Trees • Definition : • Have a value associated with each node • the values have a linear order • Every node has a value greater than any value in the left sub-tree and less than any value in the right sub-tree. • Abbreviated as BST
Binary Search Trees • Height of a complete binary tree with n nodes is exactly lg n • The maximum height binary tree with n nodes has a height n-1 • The minimum height binary tree with n nodes has height lg n
Binary Search Trees • The worst-case search time • for all possible search trees with n nodes is O(n) • for the best search tree with n nodes is O(lg n) 4 1 vs. 2 6 2 NULL 3 NULL 5 7 1 3 … NULL
One Implementation:use Tree Nodes class BSTree { // public stuff… private: class TreeNode { public: TreeNode() : element(), left(NULL), right(NULL) {} TreeNode( Etype elmt, TreeNode* leftPtr = NULL, TreeNode* rightPtr = NULL ) : element(elmt), left(leftPtr), right(rightPtr) {} Etype element;// element of node TreeNode* left;// pointer to left subtree TreeNode* right;// pointer to right subtree }; typename BSTree<Etype>::TreeNode* root; // root node of tree int size; // # nodes in tree };
Basic BST Operations Find • Recursive implementation • Iterative implementation • Insert • Remove • The entire code is available is ~cs225/src/library/07-bst/_latestBST
Basic BST Operations: Find • Recursive Find Algorithm (pseudo-code) int Find(treePtr P, key K) { if ( P == NULL) return 0 else if ( K == Pkey ) return 1 else if ( K < Pkey ) return Find(PLeftChild, K) else return Find(PRightChild, K) }
Basic BST Operations: Find • Iterative Find Algorithm (pseudo-code) int Find(treePtr P, key K) { while ( P != NULL) { if ( K == Pkey ) return 1 else if ( K < Pkey ) P = PLeftChild else P = PRightChild } return 0 }
Basic BST Operations: Insert • Insertion • Must ensure that tree remains a binary search tree after insertion • Determine where the element would have been if it were actually in the BST. Insert there. • Compare Insert() vs. Find() void Insert(typename BSTree<Etype>::TreeNode * & ptr, Etype const & insElem);
Basic BST Operations: Remove • Remove • More tricky than Insertion • First find node with element to remove • Split into three cases • Node to be deleted is a leaf • Node to be deleted has one child • Node to be deleted has two children void Remove(typename BSTree<Etype>::TreeNode * & ptr, Etype const & remElem);
Terminology for Remove • Consider root node (6) • In-order predecessor: greatest (right-most) element in left subtree • In-order successor: smallest (left-most) element in right subtree 6 10 4 7 12 1 5
Basic BST Operations: Remove • Leaf case • Simply delete the node • One-child case • Just connect the node’s child to it’s parent • Two-child case • Replace the node by it’s in-order successor and delete the in-order successor • Alternatively, we could use also the in-order predecessor
Basic BST Operations: Remove • Two-child case • Replace node with in-order successor (predecessor) and delete the in-order successor (predecessor) typename BSTree<Etype>::TreeNode* tempPtr; if ((ptr->left != NULL) && (ptr->right != NULL)) { // Replace with smallest in right subtree tempPtr = ptr->right; while (tempPtr->left != NULL) tempPtr = tempPtr->left; ptr->element = tempPtr->element; Remove(ptr->right, ptr->element); }
Basic BST Operations: Remove • Leaf case • Simply delete the node else if ((ptr->left == NULL) && (ptr->right == NULL)) { delete ptr; ptr = NULL; }
Basic BST Operations: Remove • One-child case • Just connect the node’s child to it’s parent else { tempPtr = ptr; if (ptr->left == NULL) // only a right child ptr = ptr->right; else // ptr->right == NULL // only a left child ptr = ptr->left; delete tempPtr; }
Practice Problem • Write an algorithm for a level-order traversal in a binary tree. void levelOrdered(TreeNode* root);
Discussion Topics • Tries • Basics • Jason’s Code • Patricia Trees • De La Briandais Trees • Hybrid Trees
Tries • Data structure optimized for lookups on a key that can be decomposed into characters • Represented using a tree of arrays • For a character set of size k, the corresponding Trie structure is a (k+1)-ary tree
Tries • i-th character (starting at 0) in the key corresponds to a node at depth i • Need a mapping of character to an array index • The extra cell in the array represents the “null character” ( ) • Points to a leaf • Ideally, no need to store key in a leaf, since it is completely determined by path followed • Info stored at the leaf • Spend only constant time at each level
r s a b c … 0 a t 1 1 f a i 2 2 t r r 3 3 3 t 4 4 4 star 5 raft start stir Trie Example … z Words in Trie raft star start stir
Tries • Running time of Find operation: O(L) where L is the length of the string we are looking for • Unique trie for any set of search keys • Advantage: NOT dependent on the number of strings we have in the Trie structure • Disadvantage: memory waste • 27 cell array, one per character needed for Strings • Space: (k+1) * #nodes * sizeof(pointer)
Jason’s Code:TrieNode Data class TrieNode { int nodeLevel; // level of the node bool isLeaf; // 0 for leaf, 1 for interior Array<TrieNode*> subtries; // array is indexed // starting at 1! String key; // string key in leaf nodes Etype storedInfo; } • Available on the EWS network at: ~cs225/src/library/11-trie/
Code Review: Trie Find • template <typename Etype> • pair<bool, Etype> Trie<Etype>::find(String const & searchKey, • typename Trie<Etype>::TrieNode const * nodePtr) const • { • if (nodePtr == NULL) • return pair<bool, Etype>(false, Etype()); • else if (nodePtr->isLeaf == true) • { • if (searchKey == nodePtr->key) • return pair<bool, Etype>(true, nodePtr->storedInfo); • else • return pair<bool, Etype>(false, Etype()); • } • else // nodePtr->isLeaf == false • { • int index = ascIndex(searchKey[nodePtr->nodeLevel]); • return find(searchKey, (nodePtr->subtries)[index]); • } • }
Code Review : ascIndex • template <typename Etype> • int Trie<Etype>::ascIndex(char indexChar) const • { • if ((indexChar >= 65) && (indexChar <= 90)) • return indexChar - 64; • if ((indexChar >= 97) && (indexChar <= 122)) // lowercase letter • return (indexChar - 96); • else if (indexChar == 0) // null character • return 0; • else • Assert("Bizarre character in string!"); • }
Code Review: Trie Insert • template <typename Etype> • void Trie<Etype>::insert(String const & insKey, Etype const & insInfo, • typename Trie<Etype>::TrieNode * & nodePtr, int prevLevel) • { • if (nodePtr == NULL) • { • if (prevLevel == insKey.length()) • { • nodePtr = new TrieNode(insKey, insInfo); • nodePtr->nodeLevel = prevLevel + 1; • } • else • { • nodePtr = new TrieNode(); • nodePtr->nodeLevel = prevLevel + 1; • insert(insKey, insInfo, • (nodePtr->subtries)[ascIndex(insKey[nodePtr->nodeLevel])], • nodePtr->nodeLevel); • } • }// more…
…Trie Insert • else if (nodePtr->isLeaf == true) // leaf case • { • cout << "This key already exists in the trie!" << endl; • return; • } • else // nodePtr->isLeaf == false, array node case • insert(insKey, insInfo, • (nodePtr->subtries)[ascIndex( • insKey[nodePtr->nodeLevel])], nodePtr->nodeLevel); • }
Patricia Trees • Acronym: Practical Algorithm To Retrieve Information Coded In Alphanumeric • Trick: only allocate arrays that make a “decision” • Do not store nodes with only one non-NULL cell • Store in each node the index of the character position on which it discriminates • Tradeoff: Less space required, but more work for Insert and Remove • Key no longer uniquely determined by path • Now we must store keys in the leaf
a b c … r s … z 0 t a 1 1 a i f 2 2 t r r 3 3 3 t 4 4 4 star raft stir 5 start Patricia Tree Example Words in Trie raft star start stir a b c … r s … z 0 skip:“” a i raft 2 skip:“t” t stir 4 skip:“r” star start • What if I wanted to find “spam”?
De La Briandais Trees • Trick: convert arrays in Trie to sparse arrays • Allocate space only for used cells in the arrays • Each node now has a linked list • Array cells are now nodes that not only point down, but to the next used character on that level • Advantage: can save much space; good when the linked lists are not long • Disadvantage: search is now dependent on k (alphabet size)
a b c … r s … z 0 t a 1 1 a i f 2 2 t r r 3 3 3 t 4 4 4 star raft stir 5 start de la Briandais TreeExample root Words in Trie r s 0 raft star start stir t a 1 1 a i f 2 2 t r r 3 3 3 4 4 4 t star raft stir 5 start
Hybrid Structures • Patricia/de la Briandais • Uses both optimizations • Eliminate all one-node linked lists in the de la Briandais tree • Trie/Patricia/de la Briandais • Highly optimized data structure • Upper levels use arrays, lower levels use linked lists
Tries: Practice Problem Write a function that given a Patricia tree, will calculate the largest skipped gap in the tree. You should be considering the gap if any, between a leaf and non-leaf parent. class TrieNode { int nodeLevel; // level of the node bool isLeaf; // 0 for leaf, 1 for interior Array<TrieNode*> subtries; // array is indexed // starting at 1! String key; // string key in leaf nodes Etype storedInfo; }