Algorithms complexity

Algorithms complexity Parallel computing Yair Toaff 027481498 Gil Ben Artzi 025010679 Orly Margalit 037616638

Parallel computing - MST The problem: Given a graph G= (V , E) with weights. We need to find a minimal spanning tree with the minimum total weight.

Parallel computing - MST Kruskal algorithm • Sort the graphs edges by weight. • In each step add the edge with the minimal weight that doesn’t close a cycle.

Parallel computing - MST Complexity Single processor: Sorting – O(m log m) = O( n2 log n) For each step O(1) there are O(n2) steps Total – O(n2 log n )

Parallel computing - MST O(m) processors: Sorting O( log 2 m ) Each step O(1) Total O( n2 )

Parallel computing - MST Prim algorithm • Randomly choose a vertex for tree initialization. • In every step choose the edge with minimal weight form a vertex in the tree to a vertex not in the tree.

Parallel computing - MST Complexity Single processor: Find the edge in step i O( n * i) Total n + 2n + … + n2 = O(n3)

Parallel computing - MST O(n) processors: There is a processor for each vertex so every step takes O(n) Total O(n2)

Parallel computing - MST O(m) processors In each step there are more processors then edges so finding the minimum takes O( log n) Total O ( n log n)

Parallel computing - MST O(m2) processors In each step finding the minimum takes O( 1) Total O ( n)

Parallel computing - MST Sulin algorithm • Treat every vertex as a tree • In each step randomly choose a tree and find the edge with the minimal weight from a vertex in the tree to a vertex not in the tree

Parallel computing - MST Complexity: Single processor Same as Kruskal algorithm

Parallel computing - MST O(n) processors: There is a processor for every vertex so finding the minimum takes O( n ) In each step only half of the trees remain so there are O ( log n ) steps Total O( n log n)

Parallel computing - MST O( n2 ) processors: There are n processors for every vertex so finding the minimum takes O(log n) Total O(log 2 n )

Parallel computing - MST O( n3 ) processors: There are n2 processors for every vertex so finding the minimum takes O(1) Total O(log n )

Merge Sort MS( p,q,c) - p,q indexes c is the array If ( p < q ) { MS( p , (p+q)/2 , c ) MS( (p+q)/2 , q , c ) merge( p , (p+q)/2 , q , c) }

Merge Sort Single processor In every step the merge takes O(n), there are O(log n) steps. Total O( n log n )

Merge Sort O(n) processors: In every step the merge is done in parallel time( MS(n)) = O(1) + time(merge( n / 2)) By using regular merge we get O( 1 + 2 + 4 + … + n ) = (2log n + 1) = O(n)

Merge Sort Parallel merge The problem: given 2 sorted arrays A,B with size n/2 we need to merge them efficiently while keeping them sorted

Merge Sort Let us define 2 sub arrays: ODD A = [a1 , a3 , a5 …] EVEN A = [a0 , a2 , a4 …]

Merge Sort And 2 functions: Combine( A , B ) = [ a0 , b0 , a1 , b1 , … ] Sort-combined( A ) – for each pair a2i a(2i+1) if they are in the right order do nothing else replace each of them with the other

Merge Sort Parallel merge ( A , B ) { C = parallel merge ( ODD A , EVEN B ) D = parallel merge ( ODD B , EVEN A ) L = combine ( C , D ) Return (sort-combined ( L ) ) }

Merge Sort Complexity: Time ( parallel merge ( n ) ) = Time ( parallel merge ( n/2) ) + O(1) = O(log n)

Merge Sort What is left is to prove the algorithm. Theorem: if an algorithm sort every array of (0 , 1) it will sort every array.

Merge Sort Let us mark the number of ‘1’ in A as 1a and in B as 1b The number of ‘1’ in ODD A is 1a/2 The number of ‘1’ in EVEN A is  1a/2 

Merge Sort As a result of it the difference between the number of ‘1’ in C and in D is 0 or 1.  Array L will be sorted except maybe one point where the ‘0’ and ‘1’ meet  sort-combined will do 1 swap at most.

Merge Sort Complexity of merge sort using parallel merge: Log 1 + log 2 + log 4 + log 8 + … + log n = 0 + 1+ 2 + 3 + … + log n = O( log 2 n)

Sum • Input : Array of n elements of type integer. • Output : Sum of elements. • One processor - O(n) operations. • Two processors - Still O(n) operations.

Sum • What could we do if we have O(n) processors ? • Parallel algorithm • For each phase till we have only one element • Each processor adds two elements together • We have now N/2 new elements • Complexity • We have done more operations , so what have we gained ? • Since in each phase we stay with only half of the elements, we can view it as a binary tree where each level represents the new current elements, overall depth is O(logn) levels. Each level in the tree is O(1), total of O(logn) time.

Max1 – Max2 • Input : Array of n elements of type integer. • Output : The first and the second maximum elements in the array • One processor , 2n operations. • Two processors , each insertion takes 3 operation (compare to each of the other elements that are candidates ) , 2n/3 operations

Max1 – Max2 • Parallel algorithm - recursive solution • Divide 2 groups (G1,G2). • Find MAX for each group (LocalM1,LocalM2) • If LocalM1>LocalM2 • Create new group G3 := (LocalM2+G1) • MAX2 must be in G3, since in G2 there is no element that is bigger than LocalM2

Max1 – Max2 • Example • End of recursive M1[10] * M1[7] * M1[1] * M1[3] * M1[100] * M1[8] * M1[55] * M1[6] • Up one phase M1[10],M2[7] * M1[3],M2[1] * M1[100],M2[8] * M1[55],M2[6] • Up one phase M1[10],M2[7,3] * M1[100],M2[8,55] • The result M1[100] * M2 [10,8,55]

Max1 – Max2 • Complexity • 1 processor • n operations of comparing all elements in tree for Max1 , logn operation comparing elements for Max2, Total (n+logn) • O(n) processors • We could find Max1and rerun the algorithm to find Max2, each in logn, total of 2logn. • However , we can use the previous algorithm and add G3 in parallel , and we get logn for finding Max1, loglogn for finding Max2

Max & Min groups • Input : 2 groups ( G1,G2) of sorted elements • Output : 2 groups (G1`,G2`), where in one group all elements are bigger than all the elements in the other group • One processor - Insert all elements into 2 stack, always compare the stack heads, the minimum is inserted into the Min group. • Complexity - O(n) operations

Max & Min groups • There is a major subtle in the previous algorithm when trying to apply it to parallel computing – each element must be compared until we will find an element that is higher himself. • We would like to find a method to compare as less as we can each elements with the others , the best is only one comparison per element. • Any member of the min group is necessarily smaller than at least half of the elements. • If we could conclude this, we can classified the element in the right group immediately • Any suggestion ?

Max & Min groups • Parallel algorithm • Insert all elements from G1 into list L1 in a reverse order , and all elements of G2 into list L2 in regular order • Element j in L1 is bigger than n-j-1 elements of his list • Element j in L2 is bigger than j-1 elements of his list • So , by comparing element i in both lists we get • If L1[i]>L2[i] , L1[i] is bigger than n-i-1 elements in L1 , and i+1(including L2[i]) elements in L2 , total of n elements. L2[i] is smaller than n-i elements of L2 and i+1 elements element of L1 , total of n elements. • And vice versa • We can now insert the element immediately to their groups

Max & Min groups • Example • Groups • G1 = 7,10,100,101 • G2 = 1,11,18,99 • Lists • L1 = 101,100,10,7 • L2 = 1, 11,18, 99 • Comparing : (101,1),(100,11),(10,18),(7,99) • Result : G1’= 101,100,18,99 ,G2’ = 1,11,10,7

Max & Min groups • Complexity • We have compare element i of each lists • Each element has only one comparison • O(n) processor , O(1) time ! • Can we do better for one processor now ?

Signed elements • Input : Array of elements , some of them are signed • Output : 2 Arrays of elements , one contain the signed , the other the unsigned, keeping the order between the elements • One processor • Make one pass , drop each element into the correct array • O(n) operations • Since we need to maintain the order between the elements , we must know for each element , how many elements should be before him • how could we improve the Algorithm by adding more processors ?

Signed elements array • Parallel algorithm • Create another array (A2) of elements, where in each location of a signed element insert 1 and in each location of unsigned elements insert 0 • Now we can do the parallel prefix algorithm and obtaining each element position in the destination array • We can do the same for the unsigned elements

Signed elements array • Example • Input : [x1,x2,x3`,x4,x5`,x6,x7`,x8`,x9] • A2 : [0 , 0 , 1 , 0 , 1 ,0 ,1 , 1 ,0 ] • Prefix: [0 , 0 , 1 , 1 , 2 , 2 ,3 , 4 , 4 ] • Result: x3’1 , x5`2 , x7`3 , x8`4 • Complexity • O(n) processor , O(logn) time !

Scheduling • Input : Array of jobs , contains the time for executing each job , and the deadline for finishing it. • Output : Is there a scheduling satisfying the above condition? • Parallel algorithm • Sort the deadlines • Create prefix for executing time of each job • In order to exist a scheduling , PrefixExecTime(i)<DeadLine[i] • Complexity O(n) processors • O(lognlogn) to sort, O(logn) to do prefix , O(1) to compare

CAG - Clique • Input : CAG • Output : maximum clique exist • Reminder • Clique : A vertex is in a clique iff there is an edge from each of the vertex in the clique to himself • CAG : Circular Arc Graph , A graph where each vertex is on a circle . There is an edge between two vertex iff there is a join segment on the circle between those two vertex

CAG – Clique • Examples • Clique [V1,V2,V3] • CAG v1 v4 v2 v3 v3 v1 v2 v4

CAG - Clique • Parallel algorithm • Loop through element list twice • If Element == start of a vertex , BoundriesArray[i]=+1; • If Element == end of a vertex , and we already pass the start of this vertex , BoundriesArray[i]= -1 ; • PrefixArray := Prefix ( BoundriesArray) • MaxClique := Max ( PrefixArray)

CAG - Clique • Example , CAG from previous slide • BoundriesArray [ (v1,+),(v2,+),(v1,-),(v4,+),(v3,-),(v4,-),(v2,+),(v1,+ ),(v3,+ )(v2,-),(v1,-)] • PrefixArray [1,2,1,2,1,0,1,2,3,2,1] • MaxClique is 3 ! • Note : There is a need to loop twice trough the list of vertex since we consider only end of vertex that we already pass the start.

CAG – Clique • Complexity • One processor , O(n) • O(n) processors , logn + logn • O( n^2) processors , logn + o(1)

Exclusive Read & Exclusive Write • EREW • Most simple computer • Only one processor can read/write to a certain memory block at a time

Concurrent Read & Exclusive Write • CREW • Only one processor can write to a certain memory block at a time. • Multiple processors can simultaneously read from a common memory block.

Exclusive Read & Concurrent Write • ERCW • Only one processor can read a certain memory block at a time. • Multiple processors can simultaneously write to a common memory block.

Algorithms complexity