530 likes | 539 Views
Data Structures. Introduction. Why study data structure? Can understand more code. Can choose a correct data structure for any task. P. 2. 1. 4. 3. 5. Example, storing 5 numbers. Linked ist. P. 1. 2. 3. 4. 5. Tree (Binary Search Tree). 5. 4. 3. 2. 1. Choosing how to store.
E N D
Data Structures Vishnu Kotrajaras, PhD.
Introduction • Why study data structure? • Can understand more code. • Can choose a correct data structure for any task. Vishnu Kotrajaras, PhD.
P 2 1 4 3 5 Example, storing 5 numbers Linked ist P 1 2 3 4 5 Tree (Binary Search Tree) Vishnu Kotrajaras, PhD.
5 4 3 2 1 Choosing how to store Heap If we want to always retrieve a maximum value, heap is the best for that. Vishnu Kotrajaras, PhD.
Estimating the program speed • Big O if • where c and N0 are constants and N>=N0 • This is telling us how the program grows. Vishnu Kotrajaras, PhD.
BIG O example • IfT(N) = 339N andf(N) = N*N • Let us haveN0 = 339 และ C = 1 • Therefore 339N0 <= 1*(N0*N0) • ->There are other possible answers. • If we letf(N)=340N, we will have • T(N) <= 1*(340N) <=c*f(N) -> This also fits the definition. • Therefore T(N) <= 1*(340N) is also correct. Vishnu Kotrajaras, PhD.
BIG O example (cont.) • ThereforeT(N)=O(N) is also correct. • Which one should we use as an answer? • Normally, we choose the smallest one. ThereforeO(N) is our answer. • How does it connect to a program speed? Please read on. Vishnu Kotrajaras, PhD.
Find the speed of the following code sigmaOfSquare(int n) // calculate { 1: int tempSum; 2: tempSum = 0; 3: for (int i=1;i<=n;i++) 4: tempSum += i*i; 5: return tempSum; } 1 unit (declare only) 1 unit (assignment) n+1 unit n unit 1 unit Multiply, add, and assignment, each has n times. Therefore we have 3n unit. 1 unit (return) Total time is5n+5 unit. Vishnu Kotrajaras, PhD.
But it’s unreasonable to use so detailed process • It’s better to use an approximation time. That is Big O • From the example, the time can be estimated from the loop (other running times become insignificant) • The loop is performed n times. • Therefore, Big O = O(n) • The detailed time is5n+5, which matches O(n) -> (5n+5<= 6n). Vishnu Kotrajaras, PhD.
FindingBIG O from various loops • For loop-> Its Big O is the number of repetition. • Nested loop 1: for (i = 1; i <= n; i++) 2: for (j = 1; j <= n; j++) statements; n times n times Big O is O(n2). Vishnu Kotrajaras, PhD.
FindingBIG O from various loops(cont.) • Here is the Big O forNested loop: • IfT1(N)=O(f(N)) andT2(N)= O(g(N)), then • T1(N)* T2(N)= O(f(N)*g(N)) • From last page -> f(n) = g(n) = n • Therefore they add up toO(n2). Vishnu Kotrajaras, PhD.
FindingBIG O from various loops(cont2.) • Consecutive Statements 1: for (i = 0; i <= n; i++) 2: statement1; 3: for (j = 0; j <= n; j++) 4: for (k = 0; k <= n; k++) 5: statement2; O(n) O(n2) The answer is their max. -> O(n2) Vishnu Kotrajaras, PhD.
FindingBIG O from various loops(cont3.) • Big O definition for consecutive statements: • IfT1(N)=O(f(N)) andT2(N)= O(g(N)), then • T1(N)+ T2(N)= max(O(f(N),O(g(N))) • From last page -> f(n) = O(n), g(n) = O(n2) • The answer is thereforeO(n2) Vishnu Kotrajaras, PhD.
FindingBIG O from various loops(cont4.) • Conditional statement 1: if (condition) 2: Statement1 3: Else 4: Statement2 O(f(n)) O(g(n)) Use the max -> max(O(f(n),O(g(n))) Vishnu Kotrajaras, PhD.
FindingBIG O from recursion 1:mymethod (int n) { 2: if (n == 1) { 3: return 1; 4: } else { 5: return 2*mymethod(n – 1) + 1; 6: } 7:} n times, big O = O(n) Vishnu Kotrajaras, PhD.
Maximum Subsequence Sum, choosing the best Big O • Maximum Subsequence Sum is: • For integerA1,A2, …,An • Maximum Subsequence Sum is that gives the maximum value. It is a consecutive sequence that gives the highest added value. • Example:-2, 11, -6, 16, -5, 7 • The sum of 11, -6, 16 is 21. But the max sequence is 11, -6, 16, -5, 7 -> the sum is 23. • 23 is the max. sub. Sum. consecutive Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 1st method First index Last index 1: int maxSubSum01 ( int [] a) { 2: int maxSum = 0; 3: for (int i = 0; i < a.length; i++) { 4: for (int j = i; j < a.length; j++) { 5: int theSum = 0; 6: for (int k = i; k <= j; k++) { 7: theSum += a[k]; 8: } 9: if (theSum > maxSum) { 10: maxSum = theSum; 11: } 12: } 13: return maxSum; 14: } 15: } Sum from first to last. Choose to store max value. Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 1st method(cont.) • This first method has big O = O(n3). • Not good enough. Too many redundant calculations. • If we have added elements from index 0 to 2, when we add elements from index 0 to 3, we should not start the addition from scratch. Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 2nd method Starting position 1: int maxSubSum02 (int [] a) { 2: int maxSum = 0; 3: for (int i = 0; i < a.length; i++) { 4: int theSum = 0; 5: for (int j = i; j < a.length; j++) { 6: theSum += a[j]; 7: if (theSum > maxSum) { 8: maxSum = theSum; 9: } 10: } 11: } 12: return maxSum; 13: } Do the addition from the starting position and collect the result. BIG O = O(n2) Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 2nd method(cont.) -2 11 -6 4 when i=0, j=0: theSum = -2maxSum = 0 when i=0, j=1: theSum = -2 + 11 = 9 maxSum becomes 9. when i=0, j=2: theSum = 9 + (-6) = 3 maxSum is still 9. when i=0, j=3: theSum = 3 + 4 maxSum is still 9. Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 3rd method • Use divide and conquer • The result sequence maybe in • The left half or the array, or • The right half, or • Lie between the left half and the right half. (its sequence contains the last element of the left half and the first element of the right half.) Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 3rd method (cont.) Max sub sum on this side is 7. Max sub sum on this side is 10. Max sub sum on the left with (-6) is 1. Max sub sum on the right with (2) is 10. Max sub sum that covers between the left side and the right side is therefore 1 +10 = 11 (this is the final answer). Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 3rd method (cont 2.) 1:int maxSumDivideConquer (int [] array, int leftindex, int rightindex { 2: //assume that the array can be divided evenly. 3: if (leftindex == rightindex) { // Base Case 5: if (array[leftindex] > 0 ) 6: return array[leftindex]; 7: else 8: return 0; // min value of maxSubSum 9: } 10: int centerindex = (leftindex + rightindex)/2; 12: int maxsumleft = maxSumDivideConquer(array, leftindex, centerindex); 13: int maxsumright = maxSumDivideConquer ( array, centerindex + 1, right); T(n) T(n/2) T(n/2) Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 3rd method (cont 3.) 14: int maxlefthalfSum = 0, lefthalfSum = 0; 15: //max sum – from the last element of the left //side to the first element. 16: for (int i = center; i >= leftindex; i--) { 17: lefthalfSum = lefthalfSum + array[i]; 18: if (lefthalfSum > maxlefthalfSum) { 19: maxlefthalfSum = lefthalfSum; 20: } 21: } O(n/2) Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 3rd method (cont 4.) 22: int maxrighthalfSum = 0, righthalfSum = 0; 23: // max sum – from the first element of the right //side to the last element. 24: for (int i = centerindex + 1; i <= rightindex; i++) { 25: righthalfSum = righthalfSum + array [i]; 26: if (righthalfSum > maxrighthalfSum) { 27: maxrighthalfSum = righthalfSum; 28: } 29: } O(n/2) Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 3rd method (cont 5.) 30: //finally, find max of the three. 31: return max3 (maxsumleft, maxsumright, maxlefthalfSum + maxrighthalfSum) } Therefore the total time is T(n) = 2T(n/2) + 2O(n/2) This part takes constant time. We can ignore. Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 3rd method (cont 6.) • We find the totalBIG O: T(n) = 2T(n/2) + 2O(n/2) = 2T(n/2) + O(n) = 2T(n/2) + cn Divide everything by n, we get: O(n) <= c*n according to the definition (1) Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 3rd method (cont 7.) • We can create a series of equations: (2) (3) (X) Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 3rd method (cont 8.) • Do (1) + (2) + (3) +…..+ (x), we get: • The left and right hand side cancel each other out. And c is added for log2 n times. • Multiply both sides by n, we get: • Because T(1) is constant, we can conclude that • Big O = O(nlogn) Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 4th method • We improve on the2nd method, with two points to note: • First, the first element of any maximum subsequence sum cannot be a negative value. • For example: 3, -5, 1, 4, 7, -4 -5 cannot be the first element of our result. It can only make the total smaller. Any single positive number gives a better result anyway. Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 4th method (cont.) • Second, any subsequence that is negative cannot begin max sub sum. • Let us be in a loop execution. Let i be the index of the first element of a subsequence an j be the index of the last element of that subsequence. • Let the last element make this subsequence negative. • Let p be any index between i+1 and j. i p j Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 4th method (cont 2.) • The next step of this loop -> increment j by one. • If a[j] is negative, we will not get a better max sub sum. Max sub sum value will not change. • If a[j] is positive, a[i]+…+a[j] will be greater than a[i]+…+a[j-1]. However, because a[i]+…+a[j-1] is negative, the new sum is never more than a stored max sub sum. The new sum cannot even match a[j] alone. • Therefore if we have a negative subsequence, we should not move j. We should move i instead. Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 4th method (cont 3.) • Should we only increment i by one or more? • From our assumption, we know that a[j] makes a[i]+…+a[j] negative. Therefore, incrementing i by one within the range between i and p will only make a[i]+…+ a[p] smaller. (p is any index between i and j). • If we want to get a larger max sub sum, we must start our subsequence from position j+1. Therefore i should be incremented to j+1. i p j Vishnu Kotrajaras, PhD.
Solvingmax sub sum: 4th method (cont 4.) 1: int maxsubsumOptimum (int[] array) { 2: int maxSum = 0, theSum = 0; 3: for (int j = 0; j < a.length; j++) { 4: theSum = theSum + array [j]; 5: if ( theSum > maxSum) { 6: maxSum = theSum; 7: } else if (theSum < 0) { // if a[j] makes the 8: //sequence negative, 9: theSum = 0; // start again from 10: // position j+1. 11: } 12: } 13: return maxSum; 14: } Vishnu Kotrajaras, PhD.
Logarithm in big O • If we can spend a constant time (O(1)) to divide a problem into equal subproblems (3rd method of the maximum subsequence sum problem), that problem will have big O = O(log n). • Usually ,we make an assumption that all data is in the system. Otherwise, reading data in will take O(n). Vishnu Kotrajaras, PhD.
Example: O(log n) • finding5 in a sorted array. • If we start from the first array member, it takes O(n) to find a number. • But we know that the array is sorted: • So we can look at the middle of the array, and search from there, going to either left or right depending on the value of that middle element. • And keep searching by looking at the middle element of the subarray we are looking at, and so on. • This is called -> Binary Search. Vishnu Kotrajaras, PhD.
int binarySearch (int[] a, int x) { int left = 0, right = a.length – 1; while (left <=right) { int mid = (left + right)/2; if (a[mid] < x ) { left = mid + 1; } else if (a[mid] > x) { right = mid – 1; } else { return mid; } } return -1; // reaching this point means -> not found. } Big O = O(log2 n) Vishnu Kotrajaras, PhD.
Example: O(log n) (cont.) • Greatest common divisor long gcd (long m , long n) { while (n!=0) { long rem = m%n; m = n; n = rem; } return m; } How do we find big O? The reduction of the remainder tells us the Big O. In this program, The remainder decreases without any specific pattern. Vishnu Kotrajaras, PhD.
Big O of gcd • We use the following definition: • ifM > N, M mod N < M/2 • Prove: • if N <= M/2: Because the remainder from M mod N must be less than N, so it must also be less than M/2. • if N > M/2: M divided by N will = 1 + (M-N). The remainder is M-N or M – (> M/2). Therefore the remainder is less than M/2. • If we look at the code for gcd: • The remainder from the xth loop will be used as m of the (x+2)th loop. • Therefore the remainder from the (x+2)th loop must be less than half the remainder from the xth loop. • Meaning -> with 2 iterations passed, the remainder must surely reduce by half or more. Vishnu Kotrajaras, PhD.
gcd (2564, 1988)) Vishnu Kotrajaras, PhD.
Example: O(log n) (cont 2.) • Calculate xn by divide and conquer. long power (long x, int n) { if (n==0) return 1; if (isEven (n)) return power (x*x, n/2); else return power (x*x, n/2)*x; } Big O =O (log2 n) The original problem is divided by half in each method call. Vishnu Kotrajaras, PhD.
O(log n) definition • logk n = O(n) whenk is constant. • This definition tells us that a logarithmic function has a small growth rate. • f(n) = loga n has its big O = O(logb n), where a and b is a positive number more than 1. • Any two logarithmic functions have the same growth rate. Vishnu Kotrajaras, PhD.
Any two logarithmic functions have the same growth rate: a proof • letand Vishnu Kotrajaras, PhD.
Runtime –small(top) to large (bottom) • c • log n • logk n • n • n log n • n2 • n3 • 2n Vishnu Kotrajaras, PhD.
Definitions other than big O • Big Omega ( ) T(N) = (g(N)) if there exist constantC and N0 that • T(N) >= C g(N), whereN>=N0 • From def. iff(N) = (N2), then f(N) = (N) = (N1/2) • We should choose the most realistic answer. Vishnu Kotrajaras, PhD.
Definitions other than big O (CONT.) • Big Theta ( ) • T(N) = (h(N)) ifT(N) = O(h(N)) andT(N) = (h(N)) • There existc1, c2, N0 that makec1*h(N) <= T(N) <= c2*h(N), where N >= N0 Vishnu Kotrajaras, PhD.
Definitions other than big O (CONT 2.) • small O • T(N) = o(p(N)) ifT(N) = O(p(N)) but T(N) (p(N)) Vishnu Kotrajaras, PhD.
Notes from the definitions • T(N) = O(f(N)) has the same meaning asf(N) = (T(N)) • We can sayf(N) is an“upper bound” of T(N), and T(N) is a lower bound of f(N). • f(N) = N2andg(N) = 2N2have the sameBig O และ Big . That is f(N) = (g(N)) • f(N) = N2can have severalBig O -> (O(N3), O(N4)) but the best value is O(N2). • We can usef(N) = (N2) to tell that this value is the best big O. Vishnu Kotrajaras, PhD.
Thus, we have the latest definition: • If T(N) is a Polynomial degree k, then T(N) = (Nk) • From here, • if T(N) = 5N4 + 4N3 + N, we know that T(N) = (N4) Vishnu Kotrajaras, PhD.
Best case, Worst case, Average case • worst case = a maximum running time possible. • best case = a minimum running time possible. • average case? • For eachinput, see how long the program runs. • average case running time = total time from every input divided by the number of input. Vishnu Kotrajaras, PhD.