290 likes | 425 Views
Binary Trees. CS 1037 Fundamentals of Computer Science II. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A. What is a “Tree”?. A tree is a graph with no cycles A rooted tree is a tree with one node r designated the root
E N D
Binary Trees CS 1037 Fundamentals of Computer Science II TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAA
What is a “Tree”? • A tree is a graph with no cycles • A rooted tree is a tree with one node r designated the root • Choice of root defines ancestry relationships two trees (a “forest”) a tree not a tree r r
Tree Properties • Any two nodes are connected by a path • A tree with n nodes has n¡1 edges • If any edges are removed, the tree becomes a forest • If any edge is added, the graph has a cycle and is no longer a tree
Rooted Tree Terminology • Node y is an ancestor of x if it appears on the path r!x • Node x is a descendent of y if y is an ancestor of x • The subtree rooted aty is the tree of y and its descendents • Node y is the parent of x if it is immediate ancestor of x • Node x is a child of y if the parent of x is y r y x r y
Rooted Tree Terminology • A node w/ descendents is internal • A node w/o descendents is a leaf • The depth of node x is the number of edges on the path r!x • The height of a tree is the largest depth of any node r r x depth(x)=2
Binary Tree Terminology • A tree is k-ary if all its nodes have ·k children • A tree is binary if it is 2-ary • A child node is called either the left child or the right child • A k-ary tree is full if all internal nodes have k children • A tree is complete if it is full and all leaves have same depth 3-ary tree left right left right
Why Study Trees? • Abstract representation of hierarchy • Hierarchies natural in many applications • networks, taxonomies, decision making, graphics • Even when not natural, hierarchies crucial for performance (binary search trees) C: com ca edu users programs music uwo delong skype gaga a “shader tree” for 3D rendering eng csd cs1037 radiohead
Binary Tree Data Structures • A linked data structure (like linked list) • Each node has up to two successors: its left child and right child struct node { node* left; // pointer to root of left subtree node* right; // pointer to root of right subtree ... // application-specific stuff }; root left right data data data data data data
Binary Search Trees (BSTs) • A BST is binary tree where nodes contain an item and are ordered in a special way • Main goal: data structure that supports fast insert, erase, and search • Array-based binary search does not support fast insert/erase! height ¼lgn * assuming uniformly distributed items (completely random)
The Binary Search Tree Property • A tree is a BST if it satisfies the binary search tree property: • Let x be a node in a BST. • If y is a node in the left subtree of x, then y.item <= x.item. • If y is a node in the right subtree of x, then y.item >= x.item 3 2 7 4 9 9 2 7 3 4 4 3 7 2 9
BST Search (iterative) struct node { node* left; // left subtree has values <= item node* right; // right subtree has values >= item int item; // item for this node}; node* root = ...; node* search(int x) { node* n = root; while (n) { if (x < n->item) n = n->left; // look in left subtree elseif (x > n->item) n = n->right; // look in right subtree else break; // found match! } return n; }
BST Search (recursive) • Follows one path down the tree • At most 2(h+1) tests, where h is height of the BST node* search(node* n, int x) { if (!n) return 0; // fell off bottom of tree; no match if (x < n->item) return search(n->left,x); // search left subtree if (n->item < x) return search(n->right,x); // search right subtree return n; } 3 2 7 4 9 node* result = search(root,4); cout << result->item; // prints "4"
Running Time of BST Search • If tree is well-balanced, at most clgn time • If items added in random order, tree well-balanced on average • If tree is highly skewed, up to cn time • If items added in sorted order, tree will be completely skewed
BST Insert (recursive) • At most h+1 tests, where h is height of the BST • May increase height of tree! void insert(node*& n, int item) { if (!n) { n = new node; // we hit bottom, n->item = item; // so insert here n->left = n->right = 0; // (no children yet) } else if (item < n->item) insert(n->left,item); // item belongs to left else insert(n->right,item); // item belongs to right } insert(root,5); root 3 2 7 4 9 5
BST Erase Examples erasemust maintain binary search tree property! erase(root,5) erase(root,4) erase(root,3) case 1: node is a leaf (trivial: delete 5) case 2: has only one child (easy: unlink, then delete 4) case 3: has two children (hard: can’t just unlink 3) 3 3 3 3 3 4 2 7 2 2 2 2 2 7 7 7 7 7 4 9 4 5 5 4 4 9 9 9 9 9 5 5 5
BST Erase Examples (Memory Diagram) case 2 root left right 3 2 7 4 9 5 case 3 root left right 4 3 3 3 2 7 2 2 7 7 4 4 9 9 4 9 5 5 5
(you don’t want to see the “efficient” version!) BST Erase (simple version) void erase(node*& n, int item) { if (!n) return; // no match, ignore erase if (item < n->item) erase(n->left,item); // match must be on left else if (n->item < item) erase(n->right,item); // match must be on right else if (!n->right) { node* temp = n; // case 1 or 2: n = n->left;// bypass n to left subtree delete temp; // (possibly NULL) } else if (!n->left) { node* temp = n; // case 2: n = n->right;// bypass n to right subtree delete temp; } else { node* successor = n->right; // case 3: get smallest while (successor->left) // value in right subtree successor = successor->left; // by descending leftward; n->item = successor->item; // copy its value to n and erase(n->right,successor->item); // delete the easy node instead } }
EXERCISE IN VISUAL STUDIO See Snippet #1
Binary Tree Exercise • 39. [6 marks] Write a function to print the items of a binary tree in • level-order (all items at depth 0, then all items at depth 1, • then all items at depth 2…). Hint: use a queue! • void print_levelorder(node* root) { • } • queue<node*> q; • if (root) • q.push_back(root); • while (!q.empty()) { • } G GDYAWZ D Y A W Z node* n = q.front(); q.pop_front(); cout << n->item; if (n->left) q.push_back(n->left); if (n->right) q.push_back(n->right);
Huffman Trees: Binary Trees for Compression A Totally Different Application/ Interpretation of Binary Trees 0 1 a 0 1 b c
Compression Problem Q: Given list of symbols {a,b,c,...} of size n, what is the shortest string of {0,1} bits that uniquely identifies string baabac? Easy Answer: Use fixed-length binary code of dlgne bits baabac a:00 b:01 c:10 {a,b,c} n=3 01¢00¢00¢01¢00¢10 12 bits binary code
Compression Problem Smart Answer: Use variable-length codes... Frequent symbols should have shorter binary codes than infrequent symbols Need estimate of symbol frequencies! a: 3 times, b: 2 times, c: 1 time baabac baabac 10¢0¢0¢10¢0¢11 10¢11¢11¢10¢11¢0 good prefix code bad prefix code 9 bits 11 bits a:0 b:10 c:11 a:11 b:10 c:0
Optimal Binary CodeProblem • Given symbols S={a,b,c,...} and expected frequencies f(x), which binary code achieves best expected compression? • Answer discovered in 1951 by MIT student David A. Huffman • Build a special binary tree: 0 1 a:0 b:10 c:11 f(a)=3 f(b)=2 f(c)=1 ) ) a 0 1 David Huffman, 1991 b c frequencies Huffman tree optimalcode
Huffman Codes • Observation: 1-to-1 correspondence of possibly optimal codes & full binary trees • Huffman’s algorithm uses f(x) to build an optimal binary tree (a Huffman tree), and thereby optimal binary code! a:0 b:10 c:110 d:111 1 0 a:00 b:01 c:10 d:11 a:00 b:010 c:011 d:1 1 0 1 0 a 1 0 d 1 1 1 0 0 0 b 1 0 a b c d a 1 0 c d b c
Optimal Binary Code Problem (Formal) • Input: set of symbols S={a,b,c,...}, and frequencies f(x) for each x2S • Output: binary codes c(x) such that, for string s[0..n-1] its compressed size is minimized. |y| means length of code y=c(s[i]) for string character s[i] e.g. recall min size(baabac) = 9 bits
s=bbaaddddcaddd Huffman’s Algorithm • Start with list of single-node trees • Take roots i and j with smallest f and make new root with f =fi+ fj • While not a single tree, repeat step 2 x:f(x) a:3 b:2 c:1 d:7 3 a:3 d:7 b:2 c:1 a:00 b:010 c:011 d:1 13 6 d:7 a:3 3 b:2 c:1
Huffman Tree in C++ struct node { node* left; // ptr to root of left subtree node* right; // ptr to root of right subtree char symbol; // symbol represented by this node double frequency; // total frequency of symbols in }; // subtree rooted at this node internal nodes (no symbol) root 1.0 0 1 0.6 b:0.4 -1 1.0 0 1 leaf nodes -1 0.6 a:0.3 c:0.3 'b' 0.4 'a' 0.3 'c' 0.3 left right sym freq
Huffman Tree Operations • build(map<char,double> f) • build optimal tree with each symbol cand its frequency estimate f[c] • string encode(string s) • build string of binary codes from symbols s[i] • string decode(string b) • build string of symbols from binary string b std::map is STL data structure "baabac" 10¢0¢0¢10¢0¢11 "baabac" 10¢0¢0¢10¢0¢11
Huffman Code Summary • Optimal way to compress when each symbol is independently sampled from distribution f • however, most real data is not independent! • in English, is any particular letter likely to be 'u'? what if you knew preceding letter was 'q'? ... • Used everywhere in compression: • image compression (JPEG/PNG/ZIP), networking, text compression (English compressed to ~40% of size) • Totally different from BST, yet still binary tree! • http://en.wikipedia.org/wiki/Huffman_coding