240 likes | 432 Views
PERMUTATION CIRCUITS. Presented by Wooyoung Kim, 1/28/2009 CSc 8530 Parallel Algorithms , Spring 2009 Dr. Sushil K. Prasad. Outline. Introduction: Problem Definition, Terminology. L ower bound. Permutation Circuit Design Example Constructive Proof Analysis Application. Definition.
E N D
PERMUTATION CIRCUITS Presented by Wooyoung Kim, 1/28/2009 CSc 8530 Parallel Algorithms, Spring 2009 Dr. Sushil K. Prasad
Outline • Introduction: Problem Definition, Terminology. • Lower bound. • Permutation Circuit Design • Example • Constructive Proof • Analysis • Application
Definition A permutation circuit is a combinational circuit that applies a given permutation ψn to its input x1,x2,… xn to get an output y1, y2, … , yn such that: y1, y2, …yn = ψn (x1, x2, …, xn) An example: Ψ8 = 1 2 3 4 5 6 7 8 4 8 3 2 1 7 6 5 • This means: • Input Output • 4 • 8 ….. Hence, y1 = 5, y2 = 4, y3 = 3, y4=1, …
Circuit component : Switch • A switch as the name suggests is a simple component that can do the following: • OFF state – Inputs are sent to output in the same order. • ON state – Inputs are switched or interchanged at the output. OFF ON
Depth Width Some basic terminology Sizeof a circuit – Number of components in the circuit. Depth of a circuit – Max number of stages in the path. Widthof a circuit – Max number of components in a stage. Hence,
Lower Bounds Let us say that for an input size n we need s switches. Each switch 2 stages (ON/OFF) s switches 2s stages To satisfy any permutation, 2s >= n! s >= n log n . LB on size is Ω (n log n)
Lower Bounds Therefore, , since there are n inputs and n outputs • , since • each input line must have a path to each output line. • each switch has only two inputs and two outputs.
Permutation Vs. Sorting The order in which the inputs to a Sorting Circuit appear at the output, depends on the values of the input. Hence, by having inputs in the form of a pair (i, j) (which implies input i is sent to output j )we can perform permutations by using a sorting circuit and sorting by the j values.
Permutation Vs. Sorting Sorting circuits are self-routing. That is, each comparator makes its decision as to which way the data it receives are to be directed; this decision is made when they reach the comparator and is based on their values. In permutation circuits, switches are to be set ahead of time.
Circuit Design Once again we use a recursive design based on smaller permutation circuits. The basic idea is to design the circuit in 3 layers: Stage 1 – The first layer decides which of the 2 Stage 2 circuits the Input goes to. Stage 2 – Permutes the input at one scale less. Stage 3 – Decides where Output of Stage 2 goes in the final output sequence.
Description • We need to show that any permutation can be performed for the given input. • If for some output yl we trace back to input x2k then select its neighbor in switch Ik (x2k-1) and set the switches from there to its correct output. If neighbor is already selected, select any other i/p. • If for some input xloutput y2k is reached, select its neighbor in switch Ok (y2k-1) and set switches from there to correct input. • Ping – Pong Technique
An Example Let us construct the circuit for the example shown earlier. It is given below. Ψ8 = 1 2 3 4 5 6 7 8 4 8 3 2 1 7 6 5 We shall consider it step by step. Our basic building blocks are a based on the following: n=1 No switches needed. n=2 One Switch sufficient. n > 2 Input fed into switches I that direct them towards two n/2 permutation circuits.
Constructive Proof [Waksman67] Consider a network like the one above with no links. We are given any arbitrary permutation. The upper n/2 circuit is called Pa and the lower Pb. Start with y1 and establish a link through Pa to some x through its corresponding I. Switch I is set if u is even. Proceed next with the second u associated with this I and establish a link through Pb to its y through the O associated with it. Set this O if y is even.
Repeat the process until all input-output pairs have been matched. Now, since by construction Pa and Pb, are each associated with exactly N/2 inputs and N/2 outputs, and since by assumption Pa and Pb are permutation networks the assignment is complete and the link pattern is as in the figure.
Analysis • Depth • d(n) = d (n/2) + 2 • = [ d(n/4) + 2 ] + 2 • = d( n/ 2k) + 2k ( d(2)=1) • n/ 2k = 2 • log n = k+1; k=log n -1 • d(n) = 2 log n – 1
Analysis (contd.) 2. Width : n/2 3. Size -> p(n) : p(1) = 0 , p(2) =1 p(n) = 2 p (n/2) + n – 1 Hence, p(n) = n log n –n +1
Applications Fast Parallel Permutation Algorithms[Hagerup 95] • Investigate the problem of permuting n data items on an EREW PRAM with pprocessors using little additional storage. • Present a simple algorithm with run timeO(n/p logn) and an improved algorithm with run time O(n/p+ lognlog log(n/p)). • Bothalgorithms require n additional global bits and O local storage per processor. • If prexsummation is supported at the instruction level the run time of the improved algorithm isO(n/p) • The algorithms can be used to rehash the address space of a PRAM emulation
Applications Sequential Algorithm • Permute along the cycle until you reach it again. • Mark all positions that visited. • Continue until all positions have been visited. • O(n) to move all items and O(n) t search for unvisited positions.
Applications Basic Algorithm • EREW PRAM with p processors. • Each processor P takes care of one block of B=n/p positions. • P starts with x in its block and follows the cycle until it meets a position y that is already marked as visited. • P is one of three states: searching, working on a cycle, terminated. • Time: O((n/p)logn)
Applications Improved Algorithm • Basic algorithm is not optimal because many processors could terminate early- unbalanced. • The array of items is dynamically partitioned into active and passive blocks. • passive: all positions have been visited. • active: split into smaller ones as the algorithm proceeds. • Time: O(n/p + logn log log (n/p))
References [Akl97] Selim G Akl, Parallel Computation, Prentice Hall, New Jersey, 1997. [Waksman68] A permutation network. Journal of the ACM, Vol. 15, 1968, pp. 159-163. [Hagerup 95] Fast Parallel Permutation Algorithms, Journal of Parallel Processing Letters, Vol. 5, No. 2, 995, pp. 139-148.