240 likes | 349 Views
Section 7: BSTs and Tries. CS 225: Data Structures & Software Principles. 6. 10. 4. 7. 12. 1. 5. Binary Search Trees. A Binary Search Tree is a binary tree with the following properties: Values associated with nodes have a linear order (i.e. we can define "less-than" on node values)
E N D
Section 7: BSTs and Tries CS 225: Data Structures & Software Principles
6 10 4 7 12 1 5 Binary Search Trees • A Binary Search Tree is a binary tree with the following properties: • Values associated with nodes have a linear order (i.e. we can define "less-than" on node values) • Every node's value is greater than any value in its left sub-tree and less than any value in its right sub-tree • Abbreviated as BST
Binary Search Trees • Tree height • The height of a complete binary tree with n nodes is exactly log n • The maximum height of a binary tree with n nodes is n-1 • The minimum height of a binary tree with n nodes is log n • Why do we care?
Binary Search Trees • Searching in a BST requires looking at one node per tree level (in the worst case) • Worst-case search time • for all possible search trees with n nodes: O(n) • for the best search tree with n nodes: O(log n) vs. …
BST Implementation • We can build BSTs from last week's BinaryTree code – no implementation tricks are needed. (A BST is just a normal binary tree, used in a special way.) • Functions we might like: • Find (search for an item) • Insert (add a new item) • Remove (delete an item)
Basic BST Operations: Find • Basic algorithm: • If we're searching in an empty tree, the value we're looking for isn't here. • Is the value we're looking for at the root? • If so, we're done – we found it • Otherwise, compare the value at the root to the one we're looking for, to figure out which subtree it should be in
Basic BST Operations: Find template <typename Etype> bool BinarySearchTree<Etype>::find(Etype const & searchElem, typename BinarySearchTree<Etype>::TreeNode const * treePtr) const { // what goes here? }
Basic BST Operations: Find template <typename Etype> bool BinarySearchTree<Etype>::find(Etype const & searchElem, typename BinarySearchTree<Etype>::TreeNode const * treePtr) const { if (treePtr == NULL) return false; else if (searchElem == treePtr->element) return true; else if (searchElem < treePtr->element) return find(searchElem, treePtr->left); else // searchElem > treePtr->element return find(searchElem, treePtr->right); }
Basic BST Operations: Find Recursion incurs extra overhead; let's make this iterative. template <typename Etype> bool BinarySearchTree<Etype>::find(Etype const & searchElem, typename BinarySearchTree<Etype>::TreeNode const * treePtr) const { // what now? }
Basic BST Operations: Find Recursion incurs extra overhead; let's make this iterative. template <typename Etype> bool BinarySearchTree<Etype>::find(Etype const & searchElem, typename BinarySearchTree<Etype>::TreeNode const * treePtr) const { while (treePtr != NULL) { if (searchElem == P->element) return true; else if (searchElem < P->element) P = P->left; else P = P->right; } return false; }
Basic BST Operations: Insert • Insert • Must ensure that tree remains a binary search tree after insertion • Determine where the element would have been if it were actually in the BST; insert there • What does this mean implementation-wise? • Compare Insert() vs Find()
Basic BST Operations: Remove • Remove • Not as easy • Start by finding the node we want to remove • Next, there are three cases to consider: • The node is a leaf • The node has one child • The node has two children
Terminology for Remove • Consider root node (6) • In-order successor: smallest (left-most) element in right subtree • In-order predecessor: greatest (right-most) element in left subtree 6 10 4 7 12 1 5
Basic BST Operations: Remove • If the node we're removing has… • No children: • Just delete it! • One child: • Attach the node's parent to its child • Then delete it! • Two children: • Find the in-order successor • Swap the values between the node and its IOS • Remove the "old" value from the right subtree(how do we know this removal will be "easy"?)
Intermission • Midterm 1 is graded! • Average: ~65 (out of 90); s: ~17 • To request a regrade, write up a list of the problems you think we should look at, and why we should look at them. Give it to a TA. • If you want (or might want) a regrade, don't take your exam home today (leave it with me).
Tries • Data structure optimized for lookups on a key that can be decomposed into characters • Represented using a tree of arrays • For a character set of size k, the corresponding Trie structure is a (k+1)-ary tree
Tries • i-th character (starting at 0) in the data corresponds to a node at depth i • Need a mapping of character to an array index • The extra cell in the array represents the “null character” ( ) • Represents the end of a word • Points to a leaf • Ideally, no need to store key in a leaf, since it is completely determined by path followed • Info stored at the leaf • Spend only constant time at each level
a b c … r s … z 0 a t 1 1 f a i 2 2 t r r 3 3 3 t 4 4 4 star stir raft 5 start Trie Example Words in Trie raft star start stir
Tries • Running time of Find operation: O(L) where L is the length of the string we are looking for • Advantage: NOT dependent on the number of strings we have in the Trie structure • Disadvantage: memory waste • 27 cell array, one per character needed for Strings • Space: (k+1) * #nodes * sizeof(pointer) • Although, could be better for a large number of short strings
Jason’s Code:TrieNode Data TrieNode { int nodeLevel; // level of the nodebool isLeaf; // is this a leaf?Array<TrieNode*> subtries; //array nodesString key; // string key in leaf nodes Etype storedInfo; // associated info in leaf nodes }
Code Review: Trie Search • template <class Etype> • pair<bool, Etype> Trie<Etype>::find(String const & searchKey, typename Trie<Etype>::TrieNode const * nodePtr) { • if (nodePtr==NULL) • return pair<bool, Etype>(false, Etype()); • else if (nodePtr->isLeaf == true) { // found a leaf • if (searchKey == nodePtr->key) • return pair<bool, Etype>(true, nodePtr->storedInfo); • else • return pair<bool, Etype>(false, Etype()); • } • else { // not a leaf • int index = ascIndex(searchString[nodePtr->nodeLevel]); • return find(searchKey, (nodePtr->subtries)[index]); • } • }
Code Review: Trie Insert • template <class Etype> • void Trie<Etype>::insert(String insKey, Etype insInfo, typename Trie<Etype>::TrieNode * & nodePtr, int prevLevel) { • if (nodePtr == NULL) { // NULL case • if (prevLevel == insKey.length()) { // make leaf node • nodePtr = new TrieNode(insKey, insInfo); • nodePtr->nodeLevel = prevLevel+1; • } • else { // make internal node • nodePtr = new TrieNode(); • nodePtr->nodeLevel = prevLevel+1; • insert(insKey, insInfo, • (nodePtr->subtries)[ ascIndex( • insKey[nodePtr->nodeLevel]) ], nodePtr->nodeLevel); • } • } // more…
…Trie Insert • else if (nodePtr->isLeaf == true) { // leaf case • cout << "This key already exists in the trie!" << endl; • return; • } • else // nodePtr->nodeType == false, array node case • insert(insString, insInfo, • (nodePtr->subtries)[ascIndex( • insString[nodePtr->nodeLevel])], nodePtr->nodeLevel); • }
Trie Remove • What's the basic idea? • Search for the key • If it's not there, give up • Otherwise, remember the leaf that corresponds to it • Delete the leaf • Figure out if this makes any of the internal nodes "empty"; if so, delete them too • When we actually implement this, we'll execute multiple steps at the same time…