400 likes | 414 Views
Explore the main ideas of DNA LinkStrand and the importance of splicing in low level links. Discover the basics of cut and splice and the benefits of using linked lists. Examine the complexity of string concatenation and the advantages of using StringBuilder. Learn how the JVM can optimize your code and when to optimize.
E N D
Compsci 201Trees, Tries, Tradeoffs Owen Astrachan Jeff Forbes November 1, 2017 Compsci 201, Fall 2017, Tree, Tries, Tradoffs
R is for … • R • Programming language of choice in Stats • Random • From Monte-Carlo to [0,1) • Recursion • Base case of wonderful stuff • Refactoring • Better not different Compsci 201, Fall 2017, Tree, Tries, Tradoffs
Plan for the Day • Tree Review • From Theory to Practice • From Recurrences to Code • What are the main ideas in DNA LinkStrand • Why Splicing matters with low level links • Trees, Tries, Tradeoffs Compsci 201, Fall 2017, Tree, Tries, Tradoffs
From Links to … • What is the DNA/LinkStrand assignment about? • Why do we study linked lists • How do you work in a group? Compsci 201, Fall 2017, Tree, Tries, Tradoffs
Basics of cutAndSplice • Find enzyme like ‘gat’ • Replace with splicee like ‘gggtttaaa’ • Strings and StringBuilderfor creating new strings • Complexity of “hello” + “world”, or A+B • String: A + B, StringBuilder: B Compsci 201, Fall 2017, Tree, Tries, Tradoffs
What do linked lists get us? • Faster run-time, much better use of memory • We splice in constant time? Re-use strings Compsci 201, Fall 2017, Tree, Tries, Tradoffs
String Concatenation Examined • https://coursework.cs.duke.edu/201fall17/d9-linked-trees/blob/master/src/StringPlay.java • Runtime of stringConcat(“hello”,N) • Depends on size of ret, 5, 10, 15, … 5*N public String stringConcat(String s, intreps) { String ret = ""; for(intk=0; k < reps; k++) { ret+= s; } returnret; } Compsci 201, Fall 2017, Tree, Tries, Tradoffs
StringBuilder Examined • https://coursework.cs.duke.edu/201fall17/d9-linked-trees/blob/master/src/StringPlay.java • Runtime of builderConcat(“hello”,N) • 5 + 5 + 5 + … + 5 a total of N times public String builderConcat(String s, intreps) { StringBuilderret = newStringBuilder(); for(intk=0; k < reps; k++) { ret.append(s); } returnret.toString(); } Compsci 201, Fall 2017, Tree, Tries, Tradoffs
Theory and Practice • The JVM can sometimes optimize your code • Don’t optimize what you don’t have to … • http://www.pellegrino.link/2015/08/22/string-concatenation-with-java-8.html • WOTO http://bit.ly/201fall17-nov1-strings Compsci 201, Fall 2017, Tree, Tries, Tradoffs
dana boyd Dr. danahboyd is a Senior Researcher at Microsoft Research, … a Visiting Researcher at Harvard Law School, …Her work examines everyday practices involving social media, with specific attention to youth engagement, privacy, and risky behaviors. She recently co-authored Hanging Out, Messing Around, and Geeking Out: Kids Living and Learning with New Media "From day one, Mark Zuckerberg wanted Facebook to become a social utility. He succeeded. Facebook is now a utility for many. The problem with utilities is that they get regulated." http://bit.ly/ySwjyl Compsci 201, Fall 2017, Tree, Tries, Tradoffs
Trees from Bottom to Top • Trees!! Trying to get the best of a few worlds: • efficient lookup: like sorted array • efficient add: like linked list • range-queries: see java.util.NavigableSet • Reminder: hashing is really, really fast • O(1) add, search, delete independent of N • BUT! No order info, worst case can be bad
Binary Trees • Binary trees: Not just for searching, used in many contexts, • Game trees, collisions, … • Cladistics, genomics, quad trees, … • Search is O(log n) like sorted array • Average case. Note: worst case also be O(log n), e.g., use a balanced tree • insertion/deletion O(1), once location found
“llama” “giraffe” “tiger” “jaguar” “monkey” “elephant” “leopard” “koala” “koala” “koala” “koala” “koala” “pig” “hippo” How do search trees work? • Change doubly-linked lists, no longer linear • Similar to binary search, everything less goes left, everything greater goes right • How do we search? • How do we insert? • Insert: “koala”
A C B F D E G Review: tree terminology • Binary tree is a structure: • empty • root node with left and rightsubtrees • Tree Terminology • parent and child: A is parent of B, E is child of B • leaf node has no children, internal node has 1 or 2 children • path is sequence of nodes (edges), N1, N2, … Nk • Ni is parent of Ni+1 • depth (level) of node: length of root-to-node path • level of root is 1 (measured in nodes) • height of node: length of longest node-to-leaf path • height of tree is height of root
“llama” “giraffe” “tiger” A TreeNode by any other name… • What does this look like? Doubly linked list? public class TreeNode { TreeNode left; TreeNode right; String info; TreeNode(String s, TreeNodellink, TreeNoderlink){ info = s; left = llink; right = rlink; } }
Tree function: Tree height • Compute tree height (longest root-to-leaf path) intheight(Tree root) { if (root == null) return 0; else { return 1 + Math.max(height(root.left), height(root.right)); } } • Find height of left subtree, height of right subtree • Use results to determine height of tree
Tree function: Leaf Count • Calculate Number of Leaf Nodes intleafCount(Tree root) { if (root == null) return 0; if (root.left == null && root.right == null) return 1; return leafCount(root.left) + leafCount(root.right); } • Similar to height: but has two base case(s) • Use results of recursive calls to determine # leaves
Tree functions Analyzed intheight(Tree root) { if (root == null) return 0; else { return 1 + Math.max(height(root.left), height(root.right)); } } • Let T(n) be time for height to run on n-node tree T(n) = 2T(n/2) + O(1) - roughly balanced T(n) = T(n-1) + T(1) + O(1) = T(n-1) + O(1) - unbalanced
Good Search Trees and Bad Trees http://www.9wy.net/onlinebook/CPrimerPlus5/ch17lev1sec7.html
Bad Trees and Good Trees Compsci 201, Fall 2017, Tree, Tries, Tradoffs
Don’t do this at home? • Let T(n) be time for height to execute (n-node tree) • T(n) = T(n/2) + T(n/2) + O(1) • T(n) = 2 T(n/2) + 1 • T(n) = 2 [2(T(n/4) + 1] + 1 • T(n) = 4T(n/4) + 2 + 1 • T(n) = 8T(n/8) + 4 + 2 + 1, eureka! • T(n) = 2kT(n/2k) + 2k-1 why true? • T(n) = nT(1) + O(n) is O(n) if we let n=2k • Different recurrence, same solution if unbalanced
Recurrence Table Develop recurrence, look up solution Remember: goal is big-Oh, recurrence helps Compsci 201, Fall 2017, Tree, Tries, Tradoffs
Balanced Trees and Complexity • A tree is height-balanced if • Left and right subtrees are height-balanced • Left and right heights differ by at most one booleanisBalanced(Tree root){ if (root == null) return true; return isBalanced(root.left) && isBalanced(root.right) && Math.abs(height(root.left)–height(root.right)) <= 1; }
What is complexity? • Assume trees “balanced” in analyzing complexity • Roughly half the nodes in each subtree • Leads to easier analysis • How to develop recurrence relation? • What is T(n)? Time func executes on n-node tree • What other work? Express recurrence, solve it • How to solve recurrence relation • Plug, expand, plug, expand, find pattern • Proof requires induction to verify correctness
Recurrence relation • T(n): time for isBalancedto execute (n-node tree) • T(n) = T(n/2) + T(n/2) + O(n) • T(n) = 2 T(n/2) + n • T(n) = 2 [2(T(n/4) + n/2] + n • T(n) = 4T(n/4) + n + n = 4T(n/4) + 2n • T(n) = 8T(n/8) + 3n, eureka! • T(n) = 2kT(n/2k) + knwhy true? • T(n) = nT(1) + n log(n) let n=2k, so k=log n • So, solution for T(n) = 2T(n/2) + O(n) is • O(n log n) -- base 2, but base doesn't matter
“llama” “giraffe” “tiger” “jaguar” “monkey” “elephant” “leopard” “pig” “hippo” Printing a search tree in order • When is root printed? • After left subtree, before right subtree. void visit(TreeNode t){ if (t != null) { visit(t.left); System.out.println(t.info); visit(t.right); } } • Inorder traversal
“llama” “giraffe” “tiger” “jaguar” “monkey” “elephant” Tree traversals • Inorder visits search tree in order • Visit left-subtree, process root, visit right-subtree elephant, giraffe, jaguar, llama, monkey, tiger • Navigate following nodes/links? • Visit on passing under • Second time by node
“llama” “giraffe” “tiger” “jaguar” “monkey” “elephant” Tree traversals • Preorder good for reading/writing trees • Process root, then Visit left-subtree, visit right-subtree llama, giraffe,elephant jaguar, tiger, monkey • Navigate following nodes/links? • Visit on passing-by to left • First time by node
“llama” “giraffe” “tiger” “jaguar” “monkey” “elephant” Tree traversals • Post order good for deleting tree • Visit left-subtree, visit right-subtree, process root elephant, jaguar, giraffe, monkey, tiger, llama • Navigate following nodes/links? • Visit on passing up • Third time by node
Tree WOTO http://bit.ly/201f17-nov1-trees Pride in social group? Urban dictionary? Compsci 201, Fall 2017, Tree, Tries, Tradoffs
What does insertion look like? • Simple recursive insertion into tree (accessed by root) root = insert("foo", root); TreeNode insert(TreeNode t, String s) { if (t == null) t = new Tree(s,null,null); else if (s.compareTo(t.info) <= 0) t.left = insert(t.left,s); else t.right = insert(t.right,s); return t; }
Notes on tree insert and search • Ineachrecursive insert call • Tree parameter in the call is either the left or right field of some node in the original tree • Will be assignment to a .left or .right field! • Idiom t = treeMethod(t,…) used • Good trees go bad, what happens and why? • Insert alpha, beta, gamma, delta, epsilon, … • https://coursework.cs.duke.edu/201fall17/d9-linked-trees/blob/master/src/TreePlay.java
Insert and Removal • For insertion we can use iteration (see BSTSet) • Traverse left or right and “look ahead” to add • Removal is tricky, depends on number of children • Straightforward when zero or one child • Complicated when two children, find successor • See set code for complete cases • If right child, straightforward • Otherwise find node that’s left child of its parent (why?)
Wordladder Story • Ladder from ‘white’ to ‘house’ • white, while, whale, shale, … • I can do that… optimally • My brother was an English major • My ladder is 16, his is 15, how? • There's a ladder that's 14 words! • The key is ‘sough’ • Guarantee optimality! • QUEUE Compsci 201, Fall 2017, Linked Lists & More
Queue for shortest path public booleanfindLadder(String[] words, String first, String last){ Queue<String> qu = new LinkedList<>(); Set<String> set = new HashSet<>(); qu.add(first); while (qu.size() > 0){ String current = qu.remove(); if (oneAway(current,last)) return true; for(String s : words){ if (! set.contains(s) && oneAway(from,s)){ qu.add(s); set.(s); } } } return false; } Compsci 201, Fall 2017, Linked Lists & More
Shortest Path reprised • How does Queue ensure we find shortest path? • Where are words one away from first? • Where are words two away from first? • Why do we need to avoid revisiting a word, when? • Why do we use a set for this? Why a HashSet? • Alternatives? • If we want the ladder, not just whether it exists • What's path from white to house? We know there is one. Compsci 201, Fall 2017, Linked Lists & More
Shortest path proof • All words one away from start on queue first iteration • What is first value of current when loop entered? • All one-away words dequeued before two-away • See previous assertion, property of queues • Two-away before 3-away, … • Each 2-away word is one away from a 1-away word • So all enqueued after one-away, before three-away • Any w seen/dequeued that's n:awayis: • Seen before every n+k:awayword for k >=1! Compsci 201, Fall 2017, Linked Lists & More
Keeping track of ladder • Find w, a one-away word from current • Enqueue w if not seen • Call map.put(w,current) • Remember keys are unique! • Put word on queue once! • map.put("lot", "hot") • map.put("dot", "hot") • map.put("hat", "hot") Compsci 201, Fall 2017, Linked Lists & More
Reconstructing Word Ladder • Run WordLaddersFull • https://coursework.cs.duke.edu/201fall17/wordladders/blob/master/src/WordLaddersFull.java • See map and call to map.put(word,current) • What about when returning the ladder, why is the returned ladder in reverse order? • What do we know about code when statement adding (key,value) to map runs? Compsci 201, Fall 2017, Linked Lists & More