Introduction to Computer Science 2 Trees

Introduction to Computer Science 2Trees Prof. Neeraj Suri Abdelmajid Khelil

Definition: Tree • Trees have particular relevance: • Support of sorting algorithms • Support of searching algorithms • Decision trees in the decision theory • Representation of expressions in compilers • Trees are abstractions of hierarchies: • Family tree • Authority in armed forces, (companies) • Trees build an important subclass of graphs: • A tree is a connected, acyclic, undirected graph • For n nodes a tree has exactly n - 1 edges (One usually use “node” for trees instead of “vertex”)

Oriented Trees • Now consider directed/oriented Trees: • There is one distinguished node, called root • The root defines (implicit) the orientation of the edges • The edges are either consistently directed to the root or outwards from it Root Tree with implicit orientation Tree with explicit orientation

Recursive definition of Trees • Trees are sets with a hierarchical (recursive) structure • Definition: An oriented rooted tree (or shortly: oriented tree) is a finite set T of objects, which are either empty, or the following properties hold: • There is an element r in T (the rootof T), which has no ancestors • The elements of T \ { r } can be divided into disjoint subsets T1, T2, ... , Tm, where each Ti is an oriented tree • In other words: • The empty set { } and the single element set { r } are both trees • If T1, ... , Tm are trees, then T = { r }  T1 T2 ... Tm is also a tree • T1, T2, ..., Tm are called subtrees of T • Operations on trees are often easily defined recursively

Representation of trees • Trees are first of all defined independently of graphs • However there is for each tree an “isomorphic” graph • Other representations: • Bracketed terms:{ A, { E, { F }, { G, { H }, { I } } }, { J, { K }, { M }, { L } }, { B, { C, { D } } } } • Representation as a set: A B J K C D L M E G H F I

A B J E C K L M F G D H I Representations • Representation by indentation: A E F G H I J K L M B C D • Representation as graphs is often the most suitable: • The root is always at the top, edges are oriented from the root to the leaves • Then arrows can be omitted

A B J E C K L M F G D H I Definitions • A is the root • J is the root in the subtree containing {J,{K},{L},{M}} • A is parentnode (or ancestor or father) of B, J, E • B is child (or descendant or son) of A • Nodes, which have the same parent node are called siblings • K, L & M

A B J E C K L M F G D H I Definitions • The degree of a node is the number of children • Nodes with degree 0 are called leaves • All the other nodes are called internal nodes • Each node except the root has exactly one parent node • The degree of a tree is the maximum degree of all its nodes(sometimes known as fan-out) • The descendants of a node v are all nodes, which belong to the subtrees of a tree with v as a root

A B J E C K L M F G D H I Definitions (3) Notice that there is also another definition for the height of a tree: Height = the number of edges on the longest path from the root to a leaf We stick to this one for exercises and exam: The height of a node in a tree is the number of nodes on the longest simple downward path from the node to a leaf • The depth (or level) of a node v is the length of the path from the root to v: • Depth of the root = 0. • Depth of a node v = 1 + depth of the parent node • The heightof a node in a tree is the number of nodes on the longest simple downward path from the node to a leaf! • The heightof a tree is the height of its root • We’ll come to the weight of a tree later …

Example A 0 • Internal nodes: A, B, C, J, E, G • Leaves: D, K, L, M, F, H, I • Depths are indicated on the right side in the figure • Height: 4 • Descendents of E: F, G, H, I B J E 1 C K L M F G 2 D H I 3

Ordered Trees • If the sequence of the subtrees is relevant, one can define an ordered (rooted) tree • Definition: In an ordered tree, the subtrees Ti of each node form an ordered set. The descendants of a node are designated as first, second, ... child • The order cannot be expressed by the representation as a set (is not thus suitable for ordered trees) • In other representations the order can be expressed by the sequence, in which the subtrees and nodes are written

Ordered Trees • Example of use of ordered trees (syntax trees): • internal nodes are operators • leaves contains the parameters of the operators • The order is important here, because some operators are not commutative - Which order is useful for syntax trees? (a+(b*c2))-(d/e) + / a * d e b ^ c 2

Binary trees • Most important special case: binary tree = ordered tree with degree 2 Definition: A binary tree T is a finite set of elements, which are either empty, or have a distinguished element r and the following properties: • the rest of the elements are divided into two disjoint subsets • each subset itself is a binary tree called left and right subtree of T • Every binary tree (apart from the empty tree) is ordered • Not every ordered tree is a binary tree

A A C B C B D D Examples • The order between the right and the left subtree is important! • Two different binary trees, which represent the same tree {a, {b, {d}}, {c}}

A x C z B y D u E v F w Similarity and equivalence • Two binary trees are similar, if they have the same structure • Two binary trees are equivalent, if they are similar and contain the same information

x z y r s u v Complete Binary Trees • A complete binary tree contains the maximum number of nodes for its height More precise definition: A complete binary tree of height h has the following properties: • each node at depth h-1 is a leaf • each node at depth i < h-1 has non-empty left and right subtrees Example (complete binary tree of depth 2):

Theorem on binary trees Theorem: In a binary tree, the maximum number of nodes is • at level i ( i  0) equal to 2i and • in an entire/complete tree of height h equal to 2h - 1. Proof of the first statement by induction over i: • Consider i = 0: The root is the only node at level 0, so 2i = 20 = 1.  • Inductive hypothesis: The maximum number of nodes at level j, 0  j  k, is 2j. • Inductive step: • To prove: The maximum number of nodes at step k + 1 is 2k+1. • Proof: Using the inductive hypothesis the maximum number of nodes at level k is equal to 2k. Each node has at most two descendents, i.e., at level 2k+1 there are 2•2k = 2k+1 nodes. 

level nodes x 0 1 z 1 2 y r s 2 4 u v Theorem on binary trees (2) Proof of the second statement (in an entire/complete tree of height h there are at most 2h - 1 nodes): • A tree of height h has h-1 levels • The maximum number of nodes at level i is 2i • Sum up the maximum number of nodes per level:

Strict binary trees • Definition: In a strict binary tree each internal node has degree 2 • All complete binary trees are strict, but not the other way around a x c z b y r s d e u v f g two strict binary trees

Nearly complete binary trees • Definition: A nearly complete binary tree is a binary tree with the following property: There is an integer k  0 so that • each leaf in the tree is either at level k or k+1 • if an internal node has a right descendent at level k+1, then its left subtree is complete with leaves at level k+1 • each node at a level lower than k has degree 2 x z y Example: r s u v p q Level k Level k + 1

Balanced binary trees • Definition: For a balanced binary tree there is an integer k  0, such that each leaf is at level k or k + 1 and each node at a level lower than k has degree 2 • Nearly complete binary trees are balanced too, but not the other way around x z y r s u v p q m n

Representation of trees • In principle (as by graphs) two possibilities: • static representation (using arrays) • dynamic representation (using references) • Tradeoff between flexibility, required memory and performance: • Current example: Term (a + bc2) - d / e

3 - 4 1 + / 5 6 12 13 a * d e 8 9 b ^ 10 11 c 2 Array Tree Representation • Simulation of a dynamic structure using a (limited) static structure • The tree is stored as an array of triples (Info, Left, Right) • Root and free list pointers needed Root = 3, free list = 2

Parent Node Representation • The same representation like array tree, but each node stores only the reference to its parent node • Optimizes the storage space, but inflexible: traversal only possible from the leaves to the root 1 - 2 6 + / 3 5 13 14 a * d e 7 8 b ^ Order across siblings is lost! 10 11 Root = 1, free list = 4 c 2

(Semi-) Sequential representations • Goal: save as many references as possible • Method: Relations between nodes are expressed by physical neighborhood in a table • Both of them use an array (and manage the free memory space by themselves) • Semi-sequential representation: • One of the descendents is the direct physical neighbor • Refers to the other one explicitly • Needs an extra indicator for leaves

1 - 2 9 + / 3 4 10 11 a * d e 5 6 b ^ 7 8 c 2 (Semi-) Sequential representations • Problem: Changes in the structure of the tree very costly (possibly shifting whole blocks needed) • Storage in pre-order, therefore simple pre-order traversal possible

1 3 2 6 7 4 5 Sequential representation • Structure of the tree is exploited, in order to compute where to place a node in the array • Nodes are consecutively numbered starting at the root and moving stepwise from the left to the right (like BFS) • Computation of the numbers assigned to the nodes: • The root has number 1 • L( i ) has number 2i for 2i  n • R( i ) has number 2i + 1 for 2i + 1  n • Parent node (of node i) has number  i / 2  for i > 1 Example for the numbering of the nodes

Sequential representation (2) 1 • Optimal for complete, balanced or nearly complete binary trees • The more the tree differs from a complete binary, the larger the waste of memory • Worst case: a linear list occupies only k out of 2k - 1 slots in the array a 2 3 b c 4 5 6 7 d e f g 8 9 f g

1 - 2 3 + / 4 5 6 7 a * d e 10 11 b ^ 22 23 c 2 Sequential representation (3) Example of the sequential representation of a non complete binary tree

d e - + / a * b ^ c 2 Linked representation • Highest flexibility (important for dynamic changes) • Memory management is done by the system • Each element has two references (right and left subtree) • Can easily be implemented as objects in Java as well - + / a * d e b ^ c 2

Representing General Trees With Binary Trees • Binary trees are easy to implement, because they are regular • General trees can have nodes with any number of children: • dynamic data structure for each node, or • limited to a maximum number m of children (too static, memory waste!) • Can we somehow express general trees using binary trees (the result of the transformation should be unique)? • If yes: transform general trees into binary trees, store them as binary trees and if necessary retransform them back

Representing General Trees With Binary Trees • Observation: A node in the tree has at the sibling level at most one right neighbor and one “leftmost” child • Transformation: • Connect the siblings from the left to the right using edges, delete all edges from the ancestor to its descendents, except the edge to the leftmost child • Rotate the developed graph 45, in order to be able to distinguish between the right and the left subtrees. • The transformation is reversible!

a b c d e f g i j h k l Transforming the Tree a general tree b c d e f g i j h == k l Step 2 (rotation) a a Step 1 (Move the edges) b e c b c d f h d e f g i j h k g i k l l j

Recap: Traversal of Trees • Traversal produces a linear order for the nodes • “There are only three possibilities”: • The three steps are: • Processing of the root V • Visit left subtree L • Visit right subtree R • Permutations of the three steps: • V L R L V R L R V • V R L R V L R L V • Convention: L before R, so only V L R, L V R, L R V • These three possibilities of traversal are called: • V L R: pre-order • L V R: in-order • L R V: post-order

Example • pre-order: • root • left subtree • right subtree • in-order: • left subtree • root • right subtree • post-order: • left subtree • right subtree • root - pre-order: - + a * b  c 2 / d e + / in-order:a + b * c  2 - d / e a * d e b ^ post-order: a b c 2  * + d e / - c 2

Properties of the traversal types • Post-order preserves the precedence of the operators within a syntax tree (within some pocket calculators the terms have to be entered in post-order) • Within in-order the precedence of the operators is lost (within preorder too?) • In-order within binary search trees (see later) offers a natural sorting order • Visiting the subtrees can be recursive or iterative • In the iterative case the stack management have to be programmed by the developer himself

Recursive Method class TreeNode { /* tree implementation */ T info; TreeNode L, R; static void traverse( TreeNode t, TraversalOrder x ) { if ( t  null ) { if ( x == PreOrder ) { t.visit(); traverse( t.L, x ); traverse( t.R, x ); } else if ( x == InOrder ) { traverse( t.L, x ); t.visit(); traverse( t.R, x ); } else { /* postorder */ traverse( t.L, x); traverse( t.R, x); t.visit(); } } }

Iterative preorder traversal void preOrderTraversal( TreeNode t ) { Stack s = new Stack( ); TreeNode n; s.push( t ); while (!s.empty( ) ) { n = s.pop( ); if ( n != null ) { n.visit( ); s.push( n.R ); /* first R !!! */ s.push( n.L ); } } }

Threads • Recursive methods are elegant, but costly in practice • Iterative solutions with stacks are possible, but complex • Improvement: add pointers to the next node: Threads

Threaded Binary Trees • Thread for the order of traversal (pre/in/post-order): Additional pointer points to the next node in the traversal order • Two types: • Right thread: points to the successor • Left thread: points to the predecessor • Threads correspond to a linear interlinking of the nodes

Example: Right Thread for pre-order Traversal Thread pointer Start - / + e a d * ^ b 2 c

- + / a * d e b ^ c 2 Right vs. Left Threads • Right threads connect the nodes in pre-, in- or post-order • Left threads can be used to connect the nodes in the “mirrored” pre-, in- and post-order: • Pre-order: VLR  VRL • In-order: LVR  RVL • Post-order: LRV  RLV “mirrored” versions: VRL: - / e d + *  2 c b a RVL: e / d - 2  c * b + a RLV: e d / 2 c  b * a + -

Example: Left Thread for pre-order Traversal Left “mirrored” pre-ordered traversal VRL Start - / + e a d * ^ b 2 c

Analysis of Threads • A binary tree with n nodes has 2n pointers • However, only n - 1 pointers are used: • each node (except the root) has a parent • So there are n + 1 NULL pointers. • We can use these pointers for storing threads! • In pre-order thread pointers are redundant in the internal nodes (they are parallel to regular pointers of the tree) • Preorder can be realized completely without additional pointers. • However, we need an extra bit to differentiate between tree pointers and thread pointers in the leaves

Example: Right Thread for pre-order Traversal Pre-order: V L R Start First follow the regular left pointers - / + e a d * ^ b 2 c Bit needed to discriminatebetween tree/thread pointers

Comments • Still half of the leaf pointers are unused • We can use both left and right threads at the same time • Typically only one pointer is used (the thread pointer)

Introduction to Computer Science 2 Trees