CS 225 Data Structures & Software Principles

CS 225Data Structures & Software Principles Section 7 Binary Search Trees and Tries

Discussion Topics • Binary Search Trees • Binary Tree and BST properties • Sample Code: find, remove • Practice Problems • Tries • Regular Tries • Patricia Trees • De la Briandes Trees • Practice Problems • Exams

6 10 4 7 12 1 5 Binary Search Trees • Definition : • Have a value associated with each node • the values have a linear order • Every node has a value greater than any value in the left sub-tree and less than any value in the right sub-tree. • Abbreviated as BST

Binary Search Trees • Height of a complete binary tree with n nodes is exactly lg n • The maximum height binary tree with n nodes has a height n-1 • The minimum height binary tree with n nodes has height lg n

Binary Search Trees • The worst-case search time • for all possible search trees with n nodes is O(n) • for the best search tree with n nodes is O(lg n) 4 1 vs. 2 6 2 NULL 3 NULL 5 7 1 3 … NULL

One Implementation:use Tree Nodes class BSTree { // public stuff… private: class TreeNode { public: TreeNode() : element(), left(NULL), right(NULL) {} TreeNode( Etype elmt, TreeNode* leftPtr = NULL, TreeNode* rightPtr = NULL ) : element(elmt), left(leftPtr), right(rightPtr) {} Etype element;// element of node TreeNode* left;// pointer to left subtree TreeNode* right;// pointer to right subtree }; typename BSTree<Etype>::TreeNode* root; // root node of tree int size; // # nodes in tree };

Basic BST Operations Find • Recursive implementation • Iterative implementation • Insert • Remove • The entire code is available is ~cs225/src/library/07-bst/_latestBST

Basic BST Operations: Find • Recursive Find Algorithm (pseudo-code) int Find(treePtr P, key K) { if ( P == NULL) return 0 else if ( K == Pkey ) return 1 else if ( K < Pkey ) return Find(PLeftChild, K) else return Find(PRightChild, K) }

Basic BST Operations: Find • Iterative Find Algorithm (pseudo-code) int Find(treePtr P, key K) { while ( P != NULL) { if ( K == Pkey ) return 1 else if ( K < Pkey ) P = PLeftChild else P = PRightChild } return 0 }

Basic BST Operations: Insert • Insertion • Must ensure that tree remains a binary search tree after insertion • Determine where the element would have been if it were actually in the BST. Insert there. • Compare Insert() vs. Find() void Insert(typename BSTree<Etype>::TreeNode * & ptr, Etype const & insElem);

Basic BST Operations: Remove • Remove • More tricky than Insertion • First find node with element to remove • Split into three cases • Node to be deleted is a leaf • Node to be deleted has one child • Node to be deleted has two children void Remove(typename BSTree<Etype>::TreeNode * & ptr, Etype const & remElem);

Terminology for Remove • Consider root node (6) • In-order predecessor: greatest (right-most) element in left subtree • In-order successor: smallest (left-most) element in right subtree 6 10 4 7 12 1 5

Basic BST Operations: Remove • Leaf case • Simply delete the node • One-child case • Just connect the node’s child to it’s parent • Two-child case • Replace the node by it’s in-order successor and delete the in-order successor • Alternatively, we could use also the in-order predecessor

Basic BST Operations: Remove • Two-child case • Replace node with in-order successor (predecessor) and delete the in-order successor (predecessor) typename BSTree<Etype>::TreeNode* tempPtr; if ((ptr->left != NULL) && (ptr->right != NULL)) { // Replace with smallest in right subtree tempPtr = ptr->right; while (tempPtr->left != NULL) tempPtr = tempPtr->left; ptr->element = tempPtr->element; Remove(ptr->right, ptr->element); }

Basic BST Operations: Remove • Leaf case • Simply delete the node else if ((ptr->left == NULL) && (ptr->right == NULL)) { delete ptr; ptr = NULL; }

Basic BST Operations: Remove • One-child case • Just connect the node’s child to it’s parent else { tempPtr = ptr; if (ptr->left == NULL) // only a right child ptr = ptr->right; else // ptr->right == NULL // only a left child ptr = ptr->left; delete tempPtr; }

Practice Problem • Write an algorithm for a level-order traversal in a binary tree. void levelOrdered(TreeNode* root);

Discussion Topics • Tries • Basics • Jason’s Code • Patricia Trees • De La Briandais Trees • Hybrid Trees

Tries • Data structure optimized for lookups on a key that can be decomposed into characters • Represented using a tree of arrays • For a character set of size k, the corresponding Trie structure is a (k+1)-ary tree

Tries • i-th character (starting at 0) in the key corresponds to a node at depth i • Need a mapping of character to an array index • The extra cell in the array represents the “null character” (  ) • Points to a leaf • Ideally, no need to store key in a leaf, since it is completely determined by path followed • Info stored at the leaf • Spend only constant time at each level

r s a b c … 0 a t 1 1 f a i 2 2 t r r 3 3 3 t 4 4 4 star 5 raft start stir Trie Example … z Words in Trie raft star start stir

Tries • Running time of Find operation: O(L) where L is the length of the string we are looking for • Unique trie for any set of search keys • Advantage: NOT dependent on the number of strings we have in the Trie structure • Disadvantage: memory waste • 27 cell array, one per character needed for Strings • Space: (k+1) * #nodes * sizeof(pointer)

Jason’s Code:TrieNode Data class TrieNode { int nodeLevel; // level of the node bool isLeaf; // 0 for leaf, 1 for interior Array<TrieNode*> subtries; // array is indexed // starting at 1! String key; // string key in leaf nodes Etype storedInfo; } • Available on the EWS network at: ~cs225/src/library/11-trie/

Code Review: Trie Find • template <typename Etype> • pair<bool, Etype> Trie<Etype>::find(String const & searchKey, • typename Trie<Etype>::TrieNode const * nodePtr) const • { • if (nodePtr == NULL) • return pair<bool, Etype>(false, Etype()); • else if (nodePtr->isLeaf == true) • { • if (searchKey == nodePtr->key) • return pair<bool, Etype>(true, nodePtr->storedInfo); • else • return pair<bool, Etype>(false, Etype()); • } • else // nodePtr->isLeaf == false • { • int index = ascIndex(searchKey[nodePtr->nodeLevel]); • return find(searchKey, (nodePtr->subtries)[index]); • } • }

Code Review : ascIndex • template <typename Etype> • int Trie<Etype>::ascIndex(char indexChar) const • { • if ((indexChar >= 65) && (indexChar <= 90)) • return indexChar - 64; • if ((indexChar >= 97) && (indexChar <= 122)) // lowercase letter • return (indexChar - 96); • else if (indexChar == 0) // null character • return 0; • else • Assert("Bizarre character in string!"); • }

Code Review: Trie Insert • template <typename Etype> • void Trie<Etype>::insert(String const & insKey, Etype const & insInfo, • typename Trie<Etype>::TrieNode * & nodePtr, int prevLevel) • { • if (nodePtr == NULL) • { • if (prevLevel == insKey.length()) • { • nodePtr = new TrieNode(insKey, insInfo); • nodePtr->nodeLevel = prevLevel + 1; • } • else • { • nodePtr = new TrieNode(); • nodePtr->nodeLevel = prevLevel + 1; • insert(insKey, insInfo, • (nodePtr->subtries)[ascIndex(insKey[nodePtr->nodeLevel])], • nodePtr->nodeLevel); • } • }// more…

…Trie Insert • else if (nodePtr->isLeaf == true) // leaf case • { • cout << "This key already exists in the trie!" << endl; • return; • } • else // nodePtr->isLeaf == false, array node case • insert(insKey, insInfo, • (nodePtr->subtries)[ascIndex( • insKey[nodePtr->nodeLevel])], nodePtr->nodeLevel); • }

Patricia Trees • Acronym: Practical Algorithm To Retrieve Information Coded In Alphanumeric • Trick: only allocate arrays that make a “decision” • Do not store nodes with only one non-NULL cell • Store in each node the index of the character position on which it discriminates • Tradeoff: Less space required, but more work for Insert and Remove • Key no longer uniquely determined by path • Now we must store keys in the leaf

a b c … r s … z 0 t a 1 1 a i f 2 2 t r r 3 3 3 t 4 4 4 star raft stir 5 start Patricia Tree Example Words in Trie raft star start stir a b c … r s … z 0 skip:“” a i raft 2 skip:“t” t stir 4 skip:“r” star start • What if I wanted to find “spam”?

De La Briandais Trees • Trick: convert arrays in Trie to sparse arrays • Allocate space only for used cells in the arrays • Each node now has a linked list • Array cells are now nodes that not only point down, but to the next used character on that level • Advantage: can save much space; good when the linked lists are not long • Disadvantage: search is now dependent on k (alphabet size)

a b c … r s … z 0 t a 1 1 a i f 2 2 t r r 3 3 3 t 4 4 4 star raft stir 5 start de la Briandais TreeExample root Words in Trie r s 0 raft star start stir t a 1 1 a i f 2 2 t r r 3 3 3 4 4 4 t star raft stir 5 start

Hybrid Structures • Patricia/de la Briandais • Uses both optimizations • Eliminate all one-node linked lists in the de la Briandais tree • Trie/Patricia/de la Briandais • Highly optimized data structure • Upper levels use arrays, lower levels use linked lists

Tries: Practice Problem Write a function that given a Patricia tree, will calculate the largest skipped gap in the tree. You should be considering the gap if any, between a leaf and non-leaf parent. class TrieNode { int nodeLevel; // level of the node bool isLeaf; // 0 for leaf, 1 for interior Array<TrieNode*> subtries; // array is indexed // starting at 1! String key; // string key in leaf nodes Etype storedInfo; }

CS 225 Data Structures & Software Principles