CS503: Eighth Lecture, Fall 2008 Binary Trees

CS503: Eighth Lecture, Fall 2008Binary Trees Michael Barnathan

Project Ideas • General idea: something that won’t take more than a couple of weeks but that you can use in a portfolio. These are just my ideas; feel free to come up with your own. • Web crawler. • Sockets, Trees, Recursion. • Caveats: dead links, status codes, redirects. • Fast file indexer based on frequent words. • File I/O, Trees, Analysis of Algorithms. • Compression (storing the index). • Works well because of Zipf/power law distributions. • Caveats: binary files, errors opening files. Make sure you open read-only. • Trend analysis tool using regression. • Sorting, Recursion, Analysis. • Predictors: I can help you with these. • AI opponent for a simple game (such as checkers or Reversi) • Trees, Storage, Recursion, Heuristic Search, Analysis of Algorithms. • Distributed command server (to order around a bunch of machines). • Sockets, File I/O, Priority Queues (especially for synchronization), Analysis of Algorithms.

Grading • Not having exams is going to require adjusting the percentages a bit. • Assignments 40%, Project 30%, Labs 20%, Participation 10%?

Here’s what we’ll be learning: • Data structures: • Binary Trees. • Binary Search Trees. • Theory: • Tree traversals. • Preorder. • Inorder. • Postorder. • Balanced and complete trees. • Recursion on Binary Trees.

Linear Structures • Arrays, Linked Lists, Stacks, and Queues are linear data structures. • Even circularly linked lists. • One element follows another: there is always just one “next” element. • As we mentioned, these usually yield recurrences of the form T(n) = T(n-1) + f(n). • What if we violate this assumption? What if a structure had two next elements? • In a way, we started with the most restrictive structure (arrays), which we are progressively relaxing.

Frankenstein’s Data Structures It hardly even makes sense to talk about a node having more than one successor in an array. What would data[2] be here? It’s not clear. 4 2 1 5 Linked lists make more sense, but what is 1->next now? We need more information. 6 3 7

Binary Trees • There are now two next nodes, not one. • Clearly, we need two pointers to model them. • So let’s rotate that list… • Let’s call the pointers “left” and “right”. • This structure is known as a binary tree. • Binary: branches into two nodes. • Higher-order trees exist too; we’ll talk about these later. • Why we call it a tree should be obvious. 1 Left Right 3 2 6 7 4 5

Nomenclature • Some from botany, some from genealogy. • Root: The “highest” node in the tree; that is, the one without a parent. • Almost all tree algorithms start at the root. • Child/Subtree: A node one level below the current node. Traversing the left or right pointers will bring you to a node’s “left” or “right” child. • Leaf: A node at the “bottom” of the tree; i.e. one without children (or really with two null children). • Parent: The node one level above. • Siblings: Nodes with the same parent. • “Complete” tree: a tree where every node has either 0 or 2 children and all leaves are at the same level. (Basically, it’s fully “filled in”).

Recursive Definition • Binary trees have a nice recursive definition: • A binary tree is a value, a left binary tree, and a right binary tree. • Thus, each individual node is itself a tree. • Base case: the empty tree. • Leaves’ left and right children are both empty. • We usually represent this with nulls.

Node Access • All you must store is a reference to the root. • You can get to the rest of the nodes by traversing the tree. • Example: Accessing 5. • There’s a problem. • Anyone see it? Root 1 3 2 6 7 4 5

Traversal • We have no way of knowing where 5 is. • In order to find it, we need to check every node in the tree. • So what’s the complexity of access? • “Check every” should ring alarm bells by now. • This is a nonlinear data structure, so we have more than one way to traverse, however. • There are three common tree traversals: • Preorder, inorder, and postorder. • There are some more exotic ones, too: traversals based on pointer inversion, threaded traversal, Robson traversal… • Since binary trees are recursive structures, tree algorithms are usually recursive. Traversals are no exception. • Remember how we reversed the output of printTo(n) by moving the output above or below the recursive call? • It turns out you can change the order of the traversal in the same way.

Traversals and “Visiting” • You can do anything to a node inside of a tree traversal algorithm! • You certainly can search for a value. • But you can also output its value, modify the its value, insert a node there, etc. • This generic action is simply called “visiting” the node when discussing traversals.

Preorder Traversal • If I gave you a binary tree and asked you to search for an element, how would you go about it? • You would check the value. • You would search the left subtree. • You would search the right subtree. • This is how preorder traversal works. • We check/output/do something with the node value. • We recurse on the left subtree. • We recurse on the right subtree. • We stop once we run out of nodes. • This is also called depth-first traversal, because it first focuses on traversing down specific nodes before broadly visiting others. • Like taking a lot of CS courses first before satisfying a core curriculum. • Preorder traversal is a special case of depth-first search on data structures called graphs, which we will discuss soon.

Preorder Traversal BinaryTree preorder(BinaryTree node, inttargetvalue) { if (node == null) //Base case. return null; else if (node.value == targetvalue) //Found it. return node; BinaryTree lchild = preorder(node.left); //Traverse left. if (lchild != null) return lchild; //Found on the left. return preorder(node.right); //Traverse right. }

Preorder Traversal: Illustration Root 1 3 2 6 7 4 5 Order: [1 2 4 5 3 6 7] The root is always the first node to be visited.

Inorder Traversal • We have two recursive calls in the preorder traversal: left and right. • In preorder, we checked the node before calling either of them. • In an inorder traversal, we check in-between the two calls. • We dive down all the way on the left before outputting, then we visit the right. • To use recursive stack language, we output after popping on the left but before pushing on the right. • There is no inherent advantage to choosing one traversal over another on a regular binary tree unless you deliberately want a certain ordering. • However, inorder traversal is important on a variation of the binary tree. More on that in just a moment.

Inorder Traversal BinaryTree inorder(BinaryTree node, inttargetvalue) { if (node == null) //Base case. return null; //All we did was swap the order of these two lines. BinaryTree lchild = inorder(node.left); //Traverse left. if (node.value == targetvalue) //Found it. return node; if (lchild != null) return lchild; //Found on the left. return inorder(node.right); //Traverse right. }

Inorder Traversal: Illustration Root 1 3 2 6 7 4 5 Order: [4 2 5 1 6 3 7] The root is always the middle node.

Postorder Traversal • The obvious next step: output after both recursive calls. • This causes the algorithm to dive down to the bottom of the tree and output/visit the node when going back up. • Similar to what we did in printTo(), actually. • We are outputting on the pop.

Postorder Traversal BinaryTree postorder(BinaryTree node, inttargetvalue) { if (node == null) //Base case. return null; BinaryTree lchild = postorder(node.left); //Traverse left. BinaryTree rchild = postorder(node.right); //Traverse right. if (node.value == targetvalue) //Found it. return node; if (lchild != null) return lchild; //Found on the left. //Found on the right or not at all. return rchild; }

Postorder Traversal: Illustration Root 1 3 2 6 7 4 5 Order: [4 5 2 6 7 3 1] The root is always the last node to be visited.

CRUD: Binary Trees. • Insertion: ? • Access: ? • Updating an element: ? • Deleting an element: ? • Search/Traversal: O(n). • All three traversals are linear: They visit every node in sequence. • They each just follow different sequences. • You can search by traversing, so search is also O(n). • How long would it take to access a node, though? • If I knew I wanted the left child’s left child, how many pointers would I need to follow to get to it?

Tree Height. • To analyze worst-case access, we need to talk about tree height. • The height of a tree is the number of vertical levels it contains, not including the root level. • Or you can think of it as the number of times you’d have to traverse down the tree to get from the root to the lowest leaf node. • Nodes in the tree are said to have a depth, based on how many vertical levels they are down from the root. • The root itself has a depth of 0. • The root’s children have a depth of 1. • Their children have a depth of 2… • Etc. • The height is thus also the depth of the lowest node.

Height Root 1 Depth = 0 3 2 Depth = 1 6 7 4 5 Depth = 2 Height = 2 Remember, don’t count the root level.

Height Balance • A tree is considered balanced (or height-balanced) if the depth of the highest and lowest leaves differs by no more than 1. • This turns out to be an important property because it forms a lower bound on the access time of the tree and lets us find the height. • Question: If we have n nodes in a balanced binary tree, what is the height of the tree? • floor(log2 n) • Note that we had 7 nodes in the previous tree, but a height of 2. The tree was full; adding an 8th node would take the height to 3. • The time to access a node depends on the height, thus we know it is O(log n) on a balanced tree.

Degeneracy • Trees with only left or right pointers degenerate into linked lists. • Which gives you another perspective on why Quicksort became quadratic with everything on one side of the pivot. • Access on linked lists is O(n). • Performance gets worse even as we approach this condition, so we want to keep trees balanced. 1 1 2 2 3 3

CRUD: Balanced Binary Trees. • Insertion: ? • Access: O(log n). • Updating an element: ? • Deleting an element: ? • Search/Traversal: O(n). • Once we know where to insert, insertion is simple. • Just add a new leaf there: O(1). • However, discovering where to insert is a bit trickier. • Anywhere that a null child used to be will work. • We don’t want to upset the balance of the tree. • A good strategy is to traverse down the tree based on the value of each node. This creates a partitioning at each level.

Binary Tree Insertion void insert(BinaryTree root, BinaryTree newtree) { //This can only happen now if the user passes in an empty tree. if (root == null) root = newtree; //Empty. Insert the root. else if (newtree.value < root.value) { //Go left if <. if (root.left == null) //Found a place to insert. root.left = newtree; else insert(root.left, newtree); //Keep traversing. } else { //Go right if >=. if (root.right == null) root.right = newtree; //Found a place to insert. else insert(root.right, newtree); //Keep traversing. } }

Insertion Analysis • This is similar to a traversal, but guided by the value of the node. • We choose left or right based on whether the node is < or >=. • We split into one subproblem of size n/2 each time we traverse. • What recurrence would we have for this? • What would be the solution?

CRUD: Balanced Binary Trees. • Insertion: O(log n). • Access: O(log n). • Updating an element: O(1). • Deleting an element: ? • Search/Traversal: O(n). • If we’re already at the element we need to update, we can just change the value, thus O(1). • Note that we can say the same for insertion, but finding a place to put the node is usually considered part of it. • Deletion is quite complex, on the other hand. • If there are no children, just remove the node – O(1). • If there is one child, just replace the node with its child – O(1). • If there are two… well, that’s the tricky case.

Deletion • If we need to delete a node with two children, we need to find a suitable node to replace it with. • One good choice is the inorder successor of the node, which will be the leftmost leaf of the right child we’re deleting from. • Inorder successor meaning the next node in an inorder traversal. • So our course is clear: inorder traverse, stop at the next node, swap.

Deletion void deleteWithTwoSubtrees(BinaryTreetargetnode) { if (targetnode == null) //Deleting a null is a no-op. return; //Find the inorder successor and its parent. BinaryTree inorder_succ; BinaryTree inorder_parent = targetnode; for (inorder_succ = targetnode.right; inorder_succ.left != null; inorder_succ = inorder_succ.left) inorder_parent = inorder_succ; //Keep track of the parent. //Set the value of the parent to that of the inorder successor… targetnode.value = inorder_succ.value; //Delete the inorder successor (here’s why we needed the parent): inorder_parent.left = null; }

CRUD: Balanced Binary Trees. • Insertion: O(log n). • Access: O(log n). • Updating an element: O(1). • Deleting an element: O(log n). • Search/Traversal: O(n). • Finding the inorder successor requires time proportional to the height of the tree. If the tree is balanced, this is O(log n).

CRUD: Unbalanced Binary Trees. • Insertion: O(n). • Access: O(n). • Updating an element: O(1). • Deleting an element: O(1). • Search/Traversal: O(n). • The worst sort of unbalanced tree is just a linked list. • The deletion algorithm would always hit the second case (only one child), so we’d never experience O(log n) behavior… • But the insertion algorithm is not as efficient as that of a linked list. • Unless we check for this condition explicitly, in which case we get O(1).

Binary Search Trees • Binary Search Trees (BSTs) capture the notion of “splitting into two”. • Or, to use the Quicksort term, partitioning. • The value of a node is the pivot. • The left tree contains elements < the pivot. • The right tree contains elements >= the pivot. • They are simply binary trees that are kept sorted in the manner stated above.

BSTs: What do they entail? • Like priority queues and sorted arrays, binary search trees are inherently sorted containers. • This means inserting a sequence of elements and then reading them back will get them out in sorted order. • Ah, but this time we have three ways to read them back out. All three can’t give us the same order. • The elements of an inorder traversal are sorted in binary search trees. • It also means that we’ll have to do some extra work to ensure that this guarantee is true. • But will this work influence the asymptotic performance?

A Binary Search Tree Root 4 6 2 5 7 1 3 Inorder traversal: [1 2 3 4 5 6 7] < on the left, >= on the right.

CRUD: Binary Search Trees. • Insertion: O(log n). • Access: O(log n). • Updating an element: ? • Deleting an element: O(log n). • Search: O(log n). • Traversal: O(n). • Search and traversal are no longer the same operation! • Traversal is analogous to linear search: look at every element, one at a time, and try to find the target. • Search on a BST is analogous to binary search: the data is sorted around the value of the node we’re at, so it guides us to eliminate half of the remaining elements at each step. • Just like other unsorted containers, we have to traverse to search a standard binary tree. And like other sorted containers, a BST lets us do a binary search. • Remember, BSTs are sorted in an inorder traversal. • Therefore, the deletion algorithm we previously specified will preserve the ordering.

Access on a BST • Use the same strategy we used in binary search: • Compare the node. • If the target value is less than the node’s value, go left (eliminates the right subtree). • If the target value is greater than the node’s value, go right (eliminates the left subtree). • If it’s equal, we’ve found the target. • If we hit a NULL, the target isn’t in the tree. • This exhibits the same performance: O(log n). • If the tree is balanced. In the degenerate case, we are binary searching a linked list, which is O(n).

Insertion on a BST • The algorithm I gave you for insertion was actually the BST insertion algorithm as well. • That was one of the reasons why I chose that strategy, although it does result in a fairly balanced tree if the data distribution is uniform. • In order to keep elements partitioned around the pivot, we need to traverse left when the new element has a value < the pivot and right when it’s >=. • It was O(log n) before, and it still is.

Deletion on a BST • I also gave you the BST deletion algorithm. • As the inorder traversal is in sorted order, the inorder successor is the next element after the one we’re deleting in sorted order. • If we replace the element we’re deleting with the next element in the sequence, the sequence is still sorted. • e.g., [ 1 3 5 8 13 ] after deleting 3 -> [ 1 5 8 13]. • It was O(log n) before, and it still is.

Updating a BST • Ah, here’s something different. • Updating unsorted containers is usually a constant-time operation, while updating sorted containers usually takes longer. • When we change the value of a node in a BST, we may be required to change the node’s position in the tree to preserve the ordering. • This is why updating sorted containers is usually a slow operation. • No one seems to want to deal with updating these, so most sources (including your textbook) just define it as “delete and reinsert”. • Which fully works and is very simple to do. • Don’t be afraid to do “quick and dirty” things if they don’t harm your performance. • So does this harm performance? • Insertion is O(log n). • Deletion is O(log n). • Unless we can update in O(1) on a BST (we can’t), then no.

CRUD: Binary Search Trees. • Insertion: O(log n). • Access: O(log n). • Updating an element: O(log n). • Deleting an element: O(log n). • Search: O(log n). • Traversal: O(n). • This is the ultimate compromise data structure. • Arrays, Lists, Stacks, and Queues all did some things in constant time and other things in linear time. • This does everything (except traversal, which is inherently a linear operation) in logarithmic time. • But remember, logarithmic time isn’t much worse than constant. • So these are pretty good data structures. • As usual, there’s a catch…

The Important of Balance • Every operation on a tree begins to degenerate when balance is lost. • And in the worst case, you end up with a less efficient linked list. • Keeping the tree balanced is thus important. • There is one who is prophesized to bring balance to the Force, but I don’t think that includes your trees. • So the burden falls on you, my young padawan. • Since BSTs are the structural analogue of Quicksort, you may have an idea of what insertion sequence will produce the worst case. • Yep, sorted or inverse-sorted, just as in Quicksort. • Most data is not arranged like this already, and on average, BSTs stay fairly well balanced. • But this is enough of a problem where various self-balancing structures have been invented. We will discuss these next week.

A General Note • Although I put numbers in most of my examples, any sort of data can go in these. • Strings, Objects, Employees. • Caveat: When using Java’s sorted containers, make sure your class implements Comparable. • Java doesn’t give you a BinaryTree class outright, but it does give you TreeSet and TreeMap. • TreeMap in particular is very neat; check it out. • We’ll do some things with these in Thursday’s lab.

“In all my life I’ll never see / a thing so beautiful as a tree.” • The study of trees goes very deep. • We’ve just scratched the surface. • We’ll come back to self-balancing trees, heaps, and perhaps splay trees. • The lesson: • Ideas are universal. They can come from your study. They can come from outside of your study. They can come from nature. They can come from anywhere. • Next class: Linear-time sorting, B+ trees, lab.

CS503: Eighth Lecture, Fall 2008 Binary Trees

CS503: Eighth Lecture, Fall 2008 Binary Trees

Presentation Transcript

Trees, Binary Trees, and Binary Search Trees

Analysis Of Stripped Binary Code

Lecture 14: Scientific Visualization Information Visualization CPSC 533C, Fall 2006

Additive Models and Trees

Tournament Trees

Mid-Term Review

Algorithms for Port of Entry Inspection: Finding Optimal Binary Decision Trees

From Gene Trees to Species Trees

From Gene Trees to Species Trees

Binary heaps

CSC 211 Data Structures Lecture 18

Chapter 10

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Binary phase diagrams

MI-Access Fall 2008 Webcast

Discrete Maths

AVL-Trees (Part 1)

BINARY TREES

269202 Algorithms for iSNE

CS235102 Data Structures

Computer Organization and Architecture

Binary Search Trees