470 likes | 580 Views
CS 361 – Chapter 2. 2.1 – 2.2 Linear data structures Desired operations Implement as an array or linked list Complexity of operations may depend on underlying representation Later we’ll look at nonlinear d.s. (e.g. trees). Linear. There are several linear data structures
E N D
CS 361 – Chapter 2 • 2.1 – 2.2 Linear data structures • Desired operations • Implement as an array or linked list • Complexity of operations may depend on underlying representation • Later we’ll look at nonlinear d.s. (e.g. trees)
Linear • There are several linear data structures • Each has desired ADT operations • Can be implemented in terms of (simpler) linear d.s. • 2 most common implementations are array & linked list • Common linear d.s. we can create • Stack • Queue • Vector Note: terminology not universal • List • Sequence • …
Implementation • Array implementation • Already defined in programming language • Fast operations, easy to code • Drawbacks? • Linked list implementation • We define a head and a tail node • Each node has prev and next pointers, so there are no orphans • Space efficient, but trickier to implement • Need to allocate/deallocate memory often, which may have unpredictable execution time in practice • Other implementations possible, but unusual • Array for LL, queue for stack, etc.
Vector • Each item in collection has a rank: how many items come “before” me • Rank is essentially an index, but an array implementation is free to put items anywhere (e.g. starting at 1 instead of 0) • Some useful operations we’d like (names may vary) • get(rank) • set(rank, item) • insert(rank, item) See p.66 for meaning of ins/del • remove(rank) Which of these operations require(s) a loop?
List ADT • Not concerned with index/rank • Position of item is at beginning or end of list, or before/after some other item. • The ketchup is next to the peanut butter, … But you first should know where peanut butter is • Some useful operations: • getFirst() and getLast() • prev(p) and next(p) • replace(p, newItem) • swap(p, q) • Inserting an item at either end of list, or before/after existing one • remove(p) Which operations inherently require a loop?
Compare implementations • We can compare array vs. LL implementations based on an analysis of how they perform d.s. operations. • Primarily determined by their representation • “Sequence” • Combine functionality of vector and list • Again: terminology (vector vs. sequence) not universal. • Sometimes we want to exploit array feature or LL feature • Table on p. 73 compares operation complexity • Always O(1): size, prev, next, replace, swap • O(1) for array only: retrieve/replace at specific rank • O(1) for list only: insert/remove at a given node • Always O(n): insert/remove at rank • What about searching for an element?
Trees • Read section 2.3 • Terminology • Desired operations • How to traverse, find depth & height • Binary trees • Binary tree properties • Traversals • Implementation
Definitions • Tree = connected acyclic graph • Rooted tree, as opposed to a free tree • More useful for us • Nodes arranged in a hierarchy, by level starting with the root node • Other terms related to rooted trees: • Relationships between nodes much richer than a LL: parent, child, sibling, subtree, ancestor, descendant • 2 types of nodes: • Internal • External, a.k.a. Leaf
Definitions (2) Continuing with rooted trees from now on… • Ordered tree = children of a node are ranked 1st, 2nd, 3rd, etc. • Binary tree = each node has at most 2 children, called the left and right child • Not the same as an ordered tree with 2 children. If a node has only 1 child, we still need to tell if it’s the left or right child. • (More on binary trees later) • Different kinds of trees difficult to implement a silver-bullet tree d.s. for all occasions
Why trees? • Many applications require information stored hierarchically. • Many classification systems • Document structure • File system • Computer program • Mathematical expression • Others? • We mean the data is hierarchical in a logical sense. The low-level rep’n of the data may still be linear. That will be the programmer’s secret.
Desired tree ops • getRoot() • findParent(v) • findChildren(v) – returns list or iterator • An iterator is an object of a special class having methods next() and hasNext() • isLeaf(v) • isRoot(v) And then some operations not so tree specific: • swapValuesAt(v1, v2) • getValueAt(v) • setValueAt(v)
Desiderata (2) • findDepth(v) – distance to root • findHeight() – max depth of all nodes • preorderTraversal(v) • Initially call with root • Recursive function • Can be done as iterator • postorderTraversal(v) • analogous • Pseudocode: preorder(v): process v for each child c of v: preorder(c) postorder(v): for each child c of v: postorder(c) process v See why they are called pre and post? Try an example tree.
Binary trees • Each node has 2 children. • Very useful for CS applications • Special cases • Full binary tree = each node has 0 or 2 children. Suitable for arithmetic expressions. Also called “proper” binary tree. • Complete binary tree = taking “full” one step further: All leaves have the same depth from the root. As a consequence, all other nodes have 2 children. • Generalizations • Positional tree (as opposed to ordered tree) = children have a positional number. E.g. A node may have three children at positions 1, 3 and 6. • K-ary tree = positional tree where there is no child having position higher than k
Binary tree ops • findLeftChild(v) • findRightChild(v) • findSibling(v) – how would this work? • preorder & postorder traversals can be simplified a little, since we know we have 2 children • A 3rd traversal! inorder inorder(v): inorder(v.left) process v inorder(v.right) • For modeling a mathematical expression, these traversals give rise to: prefix, infix and postfix notation!
Binary tree properties • Suppose we have a full binary tree • n = total number of nodes, h = height of tree • Think about why these are true… • h + 1 # leaves 2h • h # internal nodes 2h – 1 • log2 (n + 1) – 1 h (n – 1) / 2
Expression as tree • Arithmetic expression is inherently hierarchical • We also have linear/text representations. • Infix, prefix, postfix • Note: prefix and postfix do not need grouping symbols • Postfix expression can be easily evaluated using a stack • Example: (25 – 5) * (6 + 7) + 9 into a tree • Which is the last operator performed? This is the root. And we can deduce where left and right subtrees are. • Next, for the subtree: (25 – 5) * (6 + 7), last op is the *, so this is the “root” of this subtree. • Notes: • Resulting binary tree is “full.” • Numbers are leaves; operators are internal. This is why the tree drawing is straightforward.
Postfix eval • Our postfix expression is: 25 5 – 6 7 + * 9 + • When you see a number… push. • When you see an operator… pop 2, evaluate, push. • When no more input, pop answer.
Tree & traversal • Given a (binary) tree, we can find its traversals. √ • How about the other way? • Mathematical expression had enough context information that 1 traversal would be enough. • But in general, we need 2 traversals, one of them being inorder. • Example: Draw the binary tree having these traversals. Postorder: S C X H R J Q T Inorder: S R C H X T J Q • Hint: End of the postorder is the root of the tree. Find where the root lies in the inorder. This will show you the 2 subtrees. Continue with each subtree, finding its root and subtrees, etc. • Exercise: Find 2 distinct binary trees t1 and t2 where preorder(t1) = preorder(t2) and postorder(t1) = postorder(t2).
Euler tour traversal • General way to encompass all 3 traversals. • Text p.88 shows “shrink wrap” image of tree • We visit each node on its left, underneath, and its right. • Pseudocode eulerTour(v): do v’s left side action // west eulerTour(v.left) // southwest do v’s under action // south eulerTour(v.right) // southeast do v’s right side action // east
Applications • Can adapt eulerTour( ): • Preorder traversal: “below” and “right” actions are null • Inorder traversal: “left” and “right” actions are null • Postorder traversal: “left” and “below” actions are null • Elegant way to print a fully parenthesized expression: • Left action: print “(“ • Under action: print node contents • Right action: print “)”
Tree implementation • Binary trees: internal representation may be array or links • General trees: array too unwieldy, just do links • Array-based representation • Assign each node in the tree an index • Root = 1 • If a node’s index is p, left child = 2p and right child = 2p + 1 • Array operations are quick • Space inefficient. In worst case, n nodes would require index values up to 2n–1. (How would this happen?) Exponential space complexity is bad.
Implement as links • For binary tree • Each node needs: • Contents • Pointers to left child, right child, parent • Tree overall needs a root node to start with. • For general rooted tree • Each node needs: • Contents • List of pointers to children; pointer to parent • Tree overall needs a root node to start with.
PQ & heap • Section 2.4 • Priority Queue ADT • Heap data structure • Commitment: • Please read about: heap sort; hash tables.
Priority Queue • An ADT where each item has a special value called its “key” or “priority” (in addition to its contents) • Not really a queue • It’s important to be able to find/extract smallest element • Could just as easily be defined with “largest” • Application: • Scheduling a set of tasks. We could use “Earliest Deadline First” or “Shortest Job Next”. At all times, we need to know the winner. • Desired operations: • insert (element, keyValue) • removeNext ( ) • findNext ( )
Implementation • One approach is a sorted list (array). • Insert: O(n), to find the place in the list to insert, and possibly shift over other elements • Remove and find smallest: O(1), since it’s at the front • Can we do better than O(n) insertion? • Heap implementation • This d.s. is a special type of binary tree • Complete or almost complete: the lowest level may have gap along the right side, but nowhere else • Heap property: For all nodes i in the heap, value(parent(i)) value(i) with the exception of the root which has no parent.
Why a heap? • Designed so that the PQ insert function can run in time proportional to the height of the tree: O(log(n)), rather than O(n). • Because of the heap property, finding the min element is O(1). • On the other hand, searching the heap for an arbitrary value is not a priority. That would still take O(n). In chapter 3, we’ll look at solving that problem.
Insert & heapify up • To insert a node, make it the last child. • But now, we have probably violated the heap property, which we can restore by doing a “heapify up”. Heapify up: Starting with c, the child we just inserted: • If value(c) value(parent(c)), we’re done. • Else: swap c with its parent, and continue up the heap at the new location of c. • Example p. 103 • What is the complexity?
Delete & heapify down How to remove smallest element, which is at the root. • Remove the root, and immediately replace it with the last child. • But, we may have just violated the heap property, so… Heapify down: starting with the root node r: • If value(r) values of both children, we’re done. • Else: swap r with its smaller child, and continue down the heap at the new location of r. • (Why swap with smaller child; does it matter?)
Array, take 2 • Earlier we saw that we didn’t want a sorted array representation. • No need to keep all elements sorted to maintain heap property. • Why is array attractive? • Insert/remove operations need to quickly find last child, which would be at end of array. O(1) • “Almost complete” binary tree: all elements in array contiguous from A[1..n]. • Although internally represented as 1-D array, we still conceive of heap logically as a tree structure.
More on heaps • Finish heap d.s. • Heap sort • How to build heap • Commitment: • Please read through p.124. Take a look at questions on pp. 131-132.
Heap sort • The desired ops for a PQ are enough to allow us to sort some list of items. • Insert(item) • removeMin() • How to sort: • Insert items one at a time • Remove items one by one • Analysis: (n inserts) + (n removeMin’s) • And we know that insert & remove both take O(log n) time. • (Recall that just finding an element is O(1) but need heapify.) • Total time is O(n log n). • More on sorting in Chapter 4.
Optimizing heap sort • Doesn’t mean we’re perfecting it. • Actually, it will still be O(n log n). • The major improvement is in the insertion: we can bring this part down to O(n). • Algorithm p. 109: bottomUpHeap(S) • Given a “sequence” of values, create a heap. • If size(S) < 2, return trivial heap. • Remove S[0] • H1 = bottomUpHeap (firstHalf(S)) • H2 = bottomUpHeap (secondHalf(S)) • H = tree with S[0] at root, and subtrees H1 and H2. • Heapify down H starting at S[0], and return H.
Build heap example • Use the algorithm to build a heap out of: 5,4,7,3,2,6,1 • Not a base case • Remove S[0] = 5 • H1 = bottomUpHeap(4,7,3) • Eventually creates a heap with 3 nodes. • H2 = bottomUpHeap(2,6,1) • Eventually creates a heap with 3 nodes. • H = tree with 7 nodes with 5 at the root. • Need to heapify down. • Try another example. Sometimes the 2 “halves” that we use in recursive call are not exactly the same size. No big deal.
Analysis of bUH • Background: the classic way of creating a heap is to insert n elements one by one: O(n log n). We hope to show bottomUpHeap can do it in O(n). • In the best case, no heapify down is needed. We create n heaps. Creating 1 takes constant time (just an assignment stmt and function call). n * const Ω(n) • What is different about worst case? • Need to heapify down. Count the number of swaps. • Consider case of 15 nodes. (height = 3) • 1 node needs 3 swaps • 2 nodes each need 2 swaps • 4 nodes each need 1 swap • Consider case of 31 nodes. (height = 4)
Analysis (2) • If we have n nodes, h = floor(log n). • Total maximum # swap operations depends on h. • The formula is: the sum i = 1 to h of: (2h – i nodes * i swaps per node) • Let’s work out the sum from i = 1 to h of… Sum (i 2h – i) = 2h Sum (i 2 – i) = 2h Sum (i / 2 i ) • What is this summation? Consider this: Let S = 1/21 + 2/22 + 3/23 + 4/24 + … Then, S/2 = 1/22 + 2/23 + 3/24 + … Subtracting, we obtain S/2 = 1/2 + 1/4 + 1/8 + 1/16 + … = 1. Therefore, S = 2. • Then, the # of swaps is 2h * 2 = O(2h) = O(n).
Finish chapter 2 • Dictionary ADT and Hash tables • Commitment: • Please read section 3.1
Dictionary ADT • Each item in some aggregation is assigned a key value. Look up the item by means of the key. • Sounds like an array, except the key value can be anything convenient for us, rather than restricting us to indices 0,1,2,… • Desired operations • findElement (key) • insertItem (item, key) • remove (key) • Finding and removing could fail if the key value is not found in the dictionary.
Implementation • Simple approach: ArrayList of (element, key) pairs. Called a “log file” d.s. • How would we implement the operations? • Inserting O(1) • Finding / removing O(n) • We would hope there’s a lot of inserting, to make this d.s. worthwhile! • More efficient approach: hash table • Array of “buckets” • Hash function to assign element to a bucket
Hash table • Hash code: In a collection of objects, it’s desirable to assign each object a unique number. • Mathematically determined from its key. • There are good and bad ways to compute hash codes. We’d like these codes to be unique. • Compression: Since the hash code may be a big number, scale it down by performing a “mod” operation. • The result is the array index to insert / find / remove. • Collision: Sometimes a 3rd step is needed, in case 2 items map to the same bucket.
Hashing example • Many objects have composite values, as in a string, list or several attributes per object. • Give them numerical values (e.g. ASCII code) and combine (a0, a1, a2, … an–1) into a hash code. • We could add them all up: hashCode = 0 for i = 0 to n-1 hashCode += a[i] • When would this be a good / bad hash function?
Example 2 • To ensure more unique hash codes, we can use a polynomial approach. hash code = a0 c0 + a1 c1 + a2 c2 + … + an–1 c n–1 where c is some constant e.g. 7 • To avoid computing powers of c, we can rewrite the formula: a0 + c(a1 + c(a2 + c(a3 + … c(an – 1)))…) hashCode = a[n-1] for i = n-2 down to 0 hashCode = c * hashCode + a[i]
Collisions • As we insert objects into hash table, collisions are possible. Various ways to handle collision, such as: • Chaining: maintain a list at each bucket. HashSet does this. • Open addressing: look for another “open” cell. • Practice with Q 19-22 on page 132. • A hash table must be larger than # elements anticipated • We can set up a specific “load factor” of 0.75. If the ratio of elements to max size exceeds this factor, allocate a bigger hash table. • Design issues can be resolved with experimentation on your collection of data.