Data Structures Review Session

Data Structures Review Session Ramakrishna, PhD student. Grading Assistant for this course

Quick Sort Review p p numbers greater than or equal top numbers less thanp PARTITIONING • A key step in the Quick sort algorithm is partitioning the array • We choose some (any) numberpin the array to use as a pivot • We partition the array into three parts:

Best case Partitioning at various levels

Analysis of Quick Sort— Best Case • Suppose each partition operation divides the array almost exactly in half • Then the depth of the recursion in log2n • Because that’s how many times we can halve n • However, there are many recursions! • How can we figure this out? • We note that • Each partition is linear over its sub array • All the partitions at one level cover the array

Best case II • So the depth of the recursion in log2n • At each level of the recursion, all the partitions at that level do work that is linear in n • O(log2n) * O(n) = O(n log2n) • Hence in the average case, quicksort has time complexity O(n log2n) • What about the worst case?

Worst case partitioning

Worst case • In the worst case, partitioning always divides the sizenarray into these three parts: • A length one part, containing the pivot itself • A length zero part, and • A lengthn-1part, containing everything else • We don’t recur on the zero-length part • Recurring on the lengthn-1part requires (in the worst case) recurring to depth n-1

Worst case for quick sort • In the worst case, recursion may be n levels deep (for an array of size n) • But the partitioning work done at each level is still n • O(n) * O(n) = O(n2) • So worst case for Quick sort isO(n2) • When does this happen? • When the array is sorted to begin with!

Typical case for quick sort • If the array is sorted to begin with, Quick sort is terrible: O(n2) • It is possible to construct other bad cases • However, Quick sort is usuallyO(n log2n) • The constants are so good that Quick sort is generally the fastest algorithm known • Most real-world sorting is done by Quick sort

Problems on Quick Sort (1) • What is the running time of QUICKSORT when a. All elements of array A have the same value ? b. The array A contains distinct elements and is sorted in descending order ? Assume that you always use the last element in the sub array as pivot. ANSWER ?

Answer (1) A) Whatever pivot you choose in each sub array it would result in WORST CASE PARTITIONING and hence the running time is O(n2). B) Same is the case. Since you always pick the maximum element in the sub array as the pivot each partition you do would be a worst case partition and hence the running time is O(n2) again !.

Merge Sort • Approach • Partition list of elements into 2 lists • Recursively sort both lists • Given 2 sorted lists, merge into 1 sorted list • Examine head of both lists • Move smaller to end of new list • Performance • O( n log(n) ) average / worst case

Merge Example 2 4 5 2 7 7 4 5 8 8 2 4 5 7 2 7 8 4 5 8 2 4 5 7 8 2 4 7 5 8

Merge Sort Example 2 4 5 7 8 7 2 8 5 4 4 5 8 8 5 4 2 7 7 2 8 4 5 8 5 4 7 2 7 2 4 4 5 5 Split Merge

Problems on Merge Sort (1) 2) Let S be a sequence of n elements. An inversion in S is a pair of elements x and y such that x appears before y in S but x>y. Describe an algorithm running in O (n log n) time for determining the number of inversions in S. Solution ?

Simple Naïve Algorithm Pseudo Code: • For I = 1 to n // n elements in A • For J = I+1 to n • Compare A[I] and A[J]. If A[I] > A[J] then inversions++; Time Complexity : O (n2)

Smart Algorithm • Hint: Modify the MERGE sub procedure to solve this problem efficiently !. • Original Merge Sort Approach • Partition list of elements into 2 lists • Recursively sort both lists • Given 2 sorted lists, merge into 1 sorted list • Examine head of both lists • Move smaller to end of new list • Performance • O( n log (n) ) average / worst case

Merge Example 2 4 5 2 7 7 4 5 8 8 2 4 5 7 2 7 8 4 5 8 2 4 5 7 8 2 4 7 5 8

Merge Sort Example 2 4 5 7 8 7 2 8 5 4 4 5 8 8 5 4 2 7 7 2 8 4 5 8 5 4 7 2 7 2 4 4 5 5 Split Merge

Modified Merge Sort • Modified Merge Sort Approach • Partition list of elements into 2 lists • Recursively sort both lists • Given 2 sorted lists, merge into 1 sorted list • Examine the head of the left list and the right list • If head of left list is greater than the head of right list we have an inversion • Performance • O( n log (n) ) average / worst case

Problems on Merge Sort (2) 3) Given a set A with n elements and a set B with m elements, describe an efficient algorithm for computing A XOR B, which is the set of elements that are in A or B, but not in both. Solution ?

Naïve Algorithm Pseudo Code: • For each element x in A // n elements in A • For each element y in B // m elements in B • Compare x and y. If both are same mark them. • Go through A and B and find the elements that are unmarked. These are the elements in the XOR set. Time Complexity : O (n*m) + O(n)+ O(m) = O(n*m)

Smart Algorithm Pseudo Code: • Sort array A using an O (n log n) sorting algorithm. // n elements in A • For each element x in B // m elements in B • Perform a binary search for x in A. If a match is found, mark x in A ; else add x in the XOR set. • Go through A and copy unmarked elements to the XOR set. Time Complexity : O (nlogn) + O(mlogn) +O(n) = O((m+n)logn)

Problems on Sorting 4) Describe and analyze an efficient method for removing all duplicates from a collection A of n elements. Solution ?

Smart Algorithm Pseudo Code: • Sort array A using an O (n log n) sorting algorithm. // n elements in A • So now since the array is sorted if we have any duplicates at all, they will be adjacent to one another. • Let B the resulting array without duplicates. • B[1] = A[1]; • For I = 2 to n • If A[I] is same as the recently added element in B, then skip it, else add A[I] in B Time Complexity : O (nlogn) + O(n) = O(nlogn)

Problems on Sorting 5) What are the worst-case and average-case running times for insertion sort, merge sort and quick sort ? Answer: Insertion sort is O( n2 ) in both cases. Merge sort is O( n log n ) in both cases. Quick-sort is O( n log n ) on average, O(n2 ) in the worst case. • In instances where the worst-case and average-case differ, give an example of how the worst case occurs ? • Answer: In Quick-sort, the worst case occurs when the partition is repeatedly degenerate.

Problems on Sorting • Which of the following take linear execution time ? • Insertion sort • Merge sort • Quick sort • Quick select • None of the above

Problems on Sorting • When all elements are equal, what is the running time of – • Insertion sort – O(n) ( best case) • Merge sort – O(n logn) • Quick sort - O(n2) ( worst case ) • When the input has been sorted, what is the running time of – • Insertion sort – O(n) ( best case) • Merge sort – O(n logn) • Quick sort - O(n2) ( worst case )

Problems on Sorting • When the input has been reverse sorted, what is the running time of – • Insertion sort – O(n2) ( worst case ) • Merge sort – O(n logn) • Quick sort - O(n2) ( worst case )

Searching Problems 6) Write down the time complexities (best, worst and average cases) of performing Linear-Search and Binary-Search in a sorted array of n elements. Solution ?

Linear Search (1) • Method : Scan the input list and compare the input element with every element in the list. If a match is found, return. • Worst case Time Complexity: The obvious worst case is that the input element is not available in the list. In this case go through the entire list and hence the worst case time complexity would be O(n). • So this search doesn’t really exploit the fact that the list is sorted !!

Linear Search (2) • Average case Time Complexity: Given that each element is equally likely to be the one searched for and the element searched for is present in the array, a linear search will on the average have to search through half the elements. This is because half the time the wanted element will be in the first half and half the time it will be in the second half and hence its time complexity is Θ(n/2) which is Θ (n) similar to the worst case ignoring the constant. • Best Case Time complexity: The obvious best case is that we find the first element of the array equal to the input element and hence the search terminates there itself. Hence the best case is Ω(1).

Binary Search (1) • Binary search can only be performed on a sorted array. • In binary search we first compare the input element to the middle element and if they are equal then the search is over. If it is greater than the middle element, then we search in the right half recursively, otherwise we search in the left half recursively until the middle element is equal to the input element. • Algorithm complexity is O(log2n). • Best, Worst and Average case complexities are : Θ (log2 n)

Problem on Complexity 7) Let f (n), g (n) be asymptotically nonnegative. Show that max( f (n), g (n)) = Θ(f (n) + g (n)). SOLUTION: • According to the definition, f (n) = Θ(g (n)) if and only if there exists positive constants c1, c2 and n0 such that 0<=c1f(n)<=g (n)<=c2f(n) for all n>=n0 • So to prove this, we need to find positive constants c1,c2 and n0 such that 0 <= c1(f(n) + g (n)) <= max (f (n), g (n)) <= c2(f(n) + g (n)) for all n >= n0. • Selecting c2 = 1 clearly shows the third inequality since the maximum must be smaller than the sum. c1 should be selected as 1/2 since the maximum is always greater than the average of f (n) and g (n). Also we can select n0=1.

Master Theorem

Solving Recurrence relations 8) Solve the recurrence equation: T (n) = 16T(n/4) + n2. • Now we solve this recurrence using the master theorem. • This recurrence relation can be solved using the master’s theorem. When this recurrence relation is compared to the general equation: T(n)=aT(n/b) + f(n), we get: • a=16, b=4 and f (n) = n2 which is a quadratic function of n. • Before applying master’s theorem to this problem, we will have to compare the function f(n) with the function nlogba. Intuitively the solution of the recurrence is determined by the larger of the two functions. Here nlogba= nlog416 =n2(Since 16=42 we get nlog416 to be equal to n2as log44 =1).

Continued.. Now here we can see that nlogba= f(n)= n2. According to Master’s theorem when nlogbais larger then the solution is T(n)= Θ(nlogba) and when f(n) is larger, the solution is T(n)= Θ (f(n)) and when both are equal (which is the case here) then we multiply it by a logarithmic factor and the solution is T(n)= Θ(nlogbalogn) = Θ (f(n)logn). Now we can prove that f(n)= Θ(nlogba) as follows: By definition, f(n)=Θ(g(n)) when the following is true: there exists positive constants c1,c2 and n0 such that 0<=c1g(n)<=f(n)<=c2g(n) for all n>=n0. Here we can easily prove that by taking constants c1 and c2 to be 1. Here since nlogba = n2 we substitute that in place of g(n). Then we get 0<= n2 <= n2<= n2 which is true for all n>=0. Hence for constants c1=c2=1 and n0=0 we can prove the inequality to be true and hence f(n)= Θ(nlogba). In general when f(n)=g(n) then we can for sure say that f(n)=Θ(g(n)). Now going back to the master’s theorem, since we have proved the above the alternative 2 of master’s theorem can be used and hence T(n)= Θ(nlogbalogn). So the solution is T(n)= Θ(n2logn).

Data Structures Review Session

Data Structures Review Session

Presentation Transcript

Data Structures

Data Structures

Data Structures

Data Structures Review Session 1

Session 11: Data Structures and Collections

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures

Data Structures Review Session 2

DATA STRUCTURES

Data Structures and Algorithm Design (Review)

Data Structures

Data Structures and Algorithm Design (Review)

Data Structures

Data Structures

Data Structures