270 likes | 391 Views
CS137: Electronic Design Automation. Day 13: February 8, 2006 NC. Things we’ve seen. Add two N-bit numbers in O(log(N)) time on O(N) processors (gates) Sort N elements in O(log(N)) time on O(N) processors Evaluate an FSM on N inputs in O(log(N)) time on O(N) processors
E N D
CS137:Electronic Design Automation Day 13: February 8, 2006 NC
Things we’ve seen • Add two N-bit numbers in O(log(N)) time on O(N) processors (gates) • Sort N elements in O(log(N)) time on O(N) processors • Evaluate an FSM on N inputs in O(log(N)) time on O(N) processors • Find the I’th element in a collection of N items in O(log2(N)) time on O(N) processors • Compute issuable instructions in O(log(N)) time with O(N) hardware
Complexity Class • What are the complexity classes for parallelism? • Suggested not all tasks have perfect area-time tradeoffs • How well can we parallelize problems? • Differentiate things which parallelize well • …things that don’t parallelize so well
If we use enough space… • Exponential space: P=NP • NTM runs in time f(N) • Use 2f(N) PEs • Each evaluates with a different choice sequence • Prefix on completion • Solve problem in f(N) time • Of course, ignores 3-space wire delays
So, we really want to know, how fast something can be run with a “reasonable” number of processor (amount of hardware)
NC • Class of problems that can be: • Computed in polylogarithimic time • Polynomial in logk(N) • E.g. 3log2(N)+2log(N)+234 • Using polynomial hardware • NC for Nick’s Class • Named after Nick Pippenger
All in NC • Can do • Add two N-bit numbers in O(log(N)) time on O(N) processors (gates) • Sort N elements in O(log(N)) time on O(N) processors • Evaluate an FSM on N inputs in O(log(N)) time on O(N) processors • Find the I’th element in a collection of N items in O(log2(N)) time on O(N) processors • Compute issuable instructions in O(log(N)) time with O(N) hardware
Open Question • NC ?= P • Are all Polynomial Time algorithms computable in parallel • Polylog time • Polynomial processors • Suspected they are not • More at end
Transitive Closure • Given a Graph: G=(V,E) • Compute G*=(V,E*) • E* contains an edge e=(Vi,Vj) • Iff there is a path from Vi to Vj in G • Transitive Closure NC
Basic Sequential Algorithm • N=|V| • Think of M=N×N connectivity matrix for G • M2=G2 • M2[i,j]=OR(all k)(M[i,k] & M[k,j]) • M2n[i,j]=OR(all k)(Mn[i,k] & Mn[k,j]) • MN represents GN=G* • Compute in log steps • O(N3log(N))
Parallel Algorithm • Use N3 processor • N processors per element Mn[i,j] • N2 processors to compute all elements of Mn • Group of N processors forMn[i,j] perform an associative reduce O(log(N)) time • Still takes log(N) steps to compute MN • O(log2(N)) with N3 processors in NC • [this construct may be weak?]
All Pairs Shortest Paths • Given a Graph: G=(V,E) • Edge weight on each edge eE • Compute G’=(V,E’) • E’ contains an edge e’=(Vi,Vj) • Iff there is a path from Vi to Vj in G • Edge weight is shortest path from Vi to Vj in G • All Pairs Shortest Path NC • Slight modification on transitive closure
Basic Sequential Algorithm • As before • N=|V| • Think of M=N×N connectivity matrix for G • M2=G2 • Change • OR to MIN • & to + • So • M2[i,j]=OR(all k)(M[i,k] & M[k,j]) • Becomes: M2[i,j]=MIN(all k)(M[i,k] + M[k,j]) • MN represents GN=G’
(Same) Parallel Algorithm • Use N3 processor • N processors per element Mn[i,j] • N2 processors to compute all elements of Mn • Group of N processors forMn[i,j] perform an associative reduce O(log(N)) time • Still takes log(N) steps to compute MN • O(log2(N)) with N3 processors in NC • [this construct may be weak?]
NL • Complexity class • Computations that can be computed using logarithmic space on a Non-Deterministic Turing Machine • Similarly L • logspace on Deterministic TM • Addition L • Certainly: LNL
NL NC • Theorem from Borodin: • If A is accepted by a NDTM using space S(n)log2(n), • then there is a d>0 such that: DEPTHA(n)d×S(n)2. • [Depth here = circuit depth = time] • For NL • S(n)=log2(n) Depth(n)d×log2(n)
Borodin Construction (Idea) • State is bounded • Can construct the graph of all states • This will only take polynomial hardware • Compute transitive closure on graph • O(log2(N)) • Use associative reduce to extract solution • O(log(N))
Borodin States • What states can the NDTM be in? • At most sS(N) values on tape • s=size of symbol set • Head of TM at most S(N) positions • q states for FSM • N locations for input tape head • Total: states=N×q×S(N)×sS(N) • For S(N)=log(N) • N×q×log(N) ×slog(N)=qN(log(s)+1)log(N) • Number of states polynomial in N
Build Graph • Construct graph |V|=# states • M[i,j]=1 iff move from configuration i to j • If Vi is a state that corresponds to the input head being on square k • M[i,j] “enabled” iff move from i to j only when kth input is 1 and inputs is 1. • M[i,j] “enabled” iff move from i to j only when kth input is 0 and input is 0. • Can just be a set of AND’s initially setting up the initial connectivity matrix M
Transitive Closure • Transitive Closure with O(|V|3) PEs • Still polynomial in N • |V|=N×q×log(N) ×slog(N)=qN(log(s)+1)log(N) • O(|V|3) O(N3(log(s)+2)) • In log2(N) time • O(log2(|V|)) O( [log(N (log(s)+2))]2) • O([log(s)+2]2×log2(N))=O(log2(N))
Extract Result • OR reduce on Reachable states • Can reach an accepting state for TM? • Therefore: NL NC
Converse Holds • Borodin • If A is in DEPTH((S(n)) for S(n)log(n) • Then A is in DSPACE(S(n)) • Recursive evaluation of gate value • w/ compact stack representation • Specialized for S(n)=log(n) • If A is in NC, then A is in L • NC L • Know LNL … just showed NLNC • NL = NC
Context Free Languages • Can recognize all context free languages in NC • PDA NC
P-Complete • There are languages that are P-Complete • i.e. if could show these were in NC • Then would show NC=P • E.g. TM simulation
In NC FA PDA L NL Unknown: P=NC (P=NL) Complexity Roundup
Physical Realism • All rely on reductions in log(N) time • With 3D space, speed of light • …there are no log(N) time reductions • Maybe notion of 3-space parallelizable? • Run in O(N1/3) time • O(N) processors • Cannot talk to more than O(N) in O(N1/3) time
Admin • Friday/Monday:?? • Q: requests – what’s missing? • Project: two things due end of next week • Sequential implementation • Proposed plan of attack