320 likes | 334 Views
Introduction. Outline. Review of some data structures Array Linked List Sorted Array New stuff 3 of the most important data structures in OI (and your own programming) Binary Search Tree Heap (Priority Queue) Hash Table. Review. How to measure the merits of a data structure?
E N D
Outline • Review of some data structures • Array • Linked List • Sorted Array • New stuff • 3 of the most important data structures in OI (and your own programming) • Binary Search Tree • Heap (Priority Queue) • Hash Table
Review • How to measure the merits of a data structure? • Time complexity of common operations • Function Find(T : DataType) : Element • Function Find_Min() : Element • Procedure Add(T : DataType) • Procedure Remove(E : Element) • Procedure Remove_Min()
Review - Array • Here Element is simply the integer index of the array cell • Find(T) • Must scan the whole array, O(N) • Find_Min() • Also need to scan the whole array, O(N) • Add(T) • Simply add it to the end of the array, O(1) • Remove(E) • Deleting an element creates a hole • Copy the last element to fill the hole, O(1) • Remove_Min() • Need to Find_Min() then Remove(), O(N)
Review - Linked List • Element is a pointer to the object • Find(T) • Scan the whole list, O(N) • Find_Min() • Scan the whole list, O(N) • Add(T) • Just add it to a convenient position (e.g. head), O(1) • Remove(E) • With suitable implementation, O(1) • Remove_Min() • Need to Find_Min() then Remove(), O(N)
Review - Sorted Array • Like array, Element is the integer index of the cell • Find(T) • We can use binary search, O(logN) • Find_Min() • The first element must be the minimum, O(1) • Add(T) • First we need to find the correct place, O(logN) • Then we need to shift the array by 1 cell, O(N) • Remove(E) • Deleting an element creates a hole • Need to shift the of array by 1 cell, O(N) • Remove_Min() • Can be O(1) or O(N) depending on choice of implementation
Review - Summary • If we are going to perform a lot of these operations (e.g. N=100000), none of these is fast enough!
What is a Binary Search Tree? • Use a binary tree to store the data • Maintain this property • Left Subtree < Node < Right Subtree
Binary Search Tree - Implementation • Definition of a Node: Node = Record Left, Right : ^Node; Value : Integer; End; • To search for a value (pseudocode) Node Find(Node N, Value V) :- If (N.Value = V) Return N; Else If (V < N.Value) and (V.Left != NULL) Return Find(N.Left); Else If (V > N.Value) and (V.Right != NULL) Return Find(N.Right); Else Return NULL; // not found
Binary Search Tree - Remove • Case I : Removing a leaf node • Easy • Case II : Removing a node with a single child • Replace the removed node with its child • Case III : Removing a node with 2 children • Replace the removed node with the minimum element in the right subtree (or maximum element in the left subtree) • This may create a hole again • Apply Case I or II • Sometimes you can avoid this by using “Lazy Deletion” • Mark a node as removed instead of actually removing it • Less coding, performance hit not big if you are not doing this frequently (may even save time)
Binary Search Tree - Summary • Add() is similar to Find() • Find_Min() • Just walk to the left, easy • Remove_Min() • Equivalent to Find_Min() then Remove() • Summary • Find() : O(logN) • Find_Min() : O(logN) • Remove_Min() : O(logN) • Add() : O(logN) • Remove() : O(logN) • The BST is “supposed” to behave like that
Binary Search Tree - Problems • In reality… • All these operations are O(logN) only if the tree is balanced • Inserting a sorted sequence degenerates into a linked list • The real upper bounds • Find() : O(N) • Find_Min() : O(N) • Remove_Min() : O(N) • Add() : O(N) • Remove() : O(N) • Solution • AVL Tree, Red Black Tree • Use “rotations” to maintain balance • Both are difficult to implement, rarely used
What is a Heap? • A (usually) complete binary tree for Priority Queue • Enqueue = Add • Dequeue = Find_Min and Remove_Min • Heap Property • Every node’s value is greater than those of its decendants
Heap - Implementation • Usually we use an array to simulate a heap • Assume nodes are indexed 1, 2, 3, ... • Parent = [Node / 2] • Left Child = Node*2 • Right Child = Node*2 + 1
Heap - Add • Append the new element at the end • Shift it up until the heap property is restored • Why always works?
Heap - Remove_Min • Replace the root with the last element • Shift it down until the heap property is restored • Again, why it always works?
Heap - Build_Heap • There is a special operation called Build_Heap • Transform an ordinary into a heap without using extra memory • The Remove_Min operation has two steps • Replace the root with a leaf node • Restore the heap structure by shifting the node down • This is called “Heapify” • If we apply the Heapify step to ALL internal nodes, bottom to up, we get a heap
Heap - Summary • Find() is usually not supported by a heap • You may scan the whole tree / array if you really want • Remove() is equivalent to applying Remove_Min() on a subtree • Remember that any subtree of a heap is also a heap • Summary • Find() : O(N) // We usually don’t use Heap for this • Find_Min() : O(1) • Remove_Min() : O(logN) • Add() : O(logN) • Remove() : O(logN)
What is a Hash Table? • Question • We have a Mark Six result (6 integers in the range 1..49) • We want to check if our bet matches it • What is the most efficient way? • Answer • Use a boolean array with 49 cells • Checking a number is O(1) • Problem • What if the range of number is very large? • What if we need to store strings? • Solution • Use a “Hash Function” to compress the range of values
Hash Table • Suppose we need to store values between 0 and 99, but only have an array with 10 cells • We can map the values [0,99] to [0,9] by taking modulo 10. The result is the “Hash Value” • Adding, finding and removing an element are O(1) • It is even possible to map the strings to integers, e.g. “ATE” to (1*26*26+20*26+5) mod 10
Hash Table - Collision • But this approach has an inherent problem • What happens if two data has the same hash value? • Two major methods to deal with this • Chaining (Also called Open Hashing) • Open Addressing (Also called Closed Hashing)
Hash Table - Chaining • Keep a link list at each hash table cell • On average, Add / Find / Remove is O(1+a) • a = Load Factor = # of stored elements / # of cells • If hash function is “random” enough, usually can get the average case
Hash Table - Open Addressing • If you don’t want to implement a linked list… • An alternative is to skip a cell if it is occupied • The following diagram illustrates “Linear Probing”
Hash Table - Open Addressing • Find() must continue until a blank cell is reached • Remove() must use Lazy Deletion, otherwise further operations may fail
Hash Table - Summary • Find_Min() and Remove_Min() are usually not supported in a Hash Table • You may scan the whole tree / array if you really want • For Chaining • Find() : O(1+a) • Add() : O(1+a) • Remove() : O(1+a) • For Open Adressing • Find() : O(1 / 1-a) • Add() : O(1 / 1-a) • Remove() : O(ln(1/1-a)/a + 1/a) • Both are close to O(1) if a is kept small (< 50%)
Miscellaneous Stuff • Judge problems • 1020 – Left Join • 1021 – Inner Join • 1019 – Addition II • Past contest problems • NOI2004 Day 1 – Cashier • Any more? • Good place to find related information - Wikipedia • http://en.wikipedia.org/wiki/Binary_search_tree • http://en.wikipedia.org/wiki/Binary_heap • http://en.wikipedia.org/wiki/Hash_table