1 / 27

CS137: Electronic Design Automation

CS137: Electronic Design Automation. Day 13: February 8, 2006 NC. Things we’ve seen. Add two N-bit numbers in O(log(N)) time on O(N) processors (gates) Sort N elements in O(log(N)) time on O(N) processors Evaluate an FSM on N inputs in O(log(N)) time on O(N) processors

braden
Download Presentation

CS137: Electronic Design Automation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS137:Electronic Design Automation Day 13: February 8, 2006 NC

  2. Things we’ve seen • Add two N-bit numbers in O(log(N)) time on O(N) processors (gates) • Sort N elements in O(log(N)) time on O(N) processors • Evaluate an FSM on N inputs in O(log(N)) time on O(N) processors • Find the I’th element in a collection of N items in O(log2(N)) time on O(N) processors • Compute issuable instructions in O(log(N)) time with O(N) hardware

  3. Complexity Class • What are the complexity classes for parallelism? • Suggested not all tasks have perfect area-time tradeoffs • How well can we parallelize problems? • Differentiate things which parallelize well • …things that don’t parallelize so well

  4. If we use enough space… • Exponential space: P=NP • NTM runs in time f(N) • Use 2f(N) PEs • Each evaluates with a different choice sequence • Prefix on completion • Solve problem in f(N) time • Of course, ignores 3-space wire delays

  5. So, we really want to know, how fast something can be run with a “reasonable” number of processor (amount of hardware)

  6. NC • Class of problems that can be: • Computed in polylogarithimic time • Polynomial in logk(N) • E.g. 3log2(N)+2log(N)+234 • Using polynomial hardware • NC for Nick’s Class • Named after Nick Pippenger

  7. All in NC • Can do • Add two N-bit numbers in O(log(N)) time on O(N) processors (gates) • Sort N elements in O(log(N)) time on O(N) processors • Evaluate an FSM on N inputs in O(log(N)) time on O(N) processors • Find the I’th element in a collection of N items in O(log2(N)) time on O(N) processors • Compute issuable instructions in O(log(N)) time with O(N) hardware

  8. Open Question • NC ?= P • Are all Polynomial Time algorithms computable in parallel • Polylog time • Polynomial processors • Suspected they are not • More at end

  9. Transitive Closure • Given a Graph: G=(V,E) • Compute G*=(V,E*) • E* contains an edge e=(Vi,Vj) • Iff there is a path from Vi to Vj in G • Transitive Closure  NC

  10. Basic Sequential Algorithm • N=|V| • Think of M=N×N connectivity matrix for G • M2=G2 • M2[i,j]=OR(all k)(M[i,k] & M[k,j]) • M2n[i,j]=OR(all k)(Mn[i,k] & Mn[k,j]) • MN represents GN=G* • Compute in log steps • O(N3log(N))

  11. Parallel Algorithm • Use N3 processor • N processors per element Mn[i,j] • N2 processors to compute all elements of Mn • Group of N processors forMn[i,j] perform an associative reduce  O(log(N)) time • Still takes log(N) steps to compute MN • O(log2(N)) with N3 processors  in NC • [this construct may be weak?]

  12. All Pairs Shortest Paths • Given a Graph: G=(V,E) • Edge weight on each edge eE • Compute G’=(V,E’) • E’ contains an edge e’=(Vi,Vj) • Iff there is a path from Vi to Vj in G • Edge weight is shortest path from Vi to Vj in G • All Pairs Shortest Path  NC • Slight modification on transitive closure

  13. Basic Sequential Algorithm • As before • N=|V| • Think of M=N×N connectivity matrix for G • M2=G2 • Change • OR to MIN • & to + • So • M2[i,j]=OR(all k)(M[i,k] & M[k,j]) • Becomes: M2[i,j]=MIN(all k)(M[i,k] + M[k,j]) • MN represents GN=G’

  14. (Same) Parallel Algorithm • Use N3 processor • N processors per element Mn[i,j] • N2 processors to compute all elements of Mn • Group of N processors forMn[i,j] perform an associative reduce  O(log(N)) time • Still takes log(N) steps to compute MN • O(log2(N)) with N3 processors  in NC • [this construct may be weak?]

  15. NL • Complexity class • Computations that can be computed using logarithmic space on a Non-Deterministic Turing Machine • Similarly L • logspace on Deterministic TM • Addition  L • Certainly: LNL

  16. NL  NC • Theorem from Borodin: • If A is accepted by a NDTM using space S(n)log2(n), • then there is a d>0 such that: DEPTHA(n)d×S(n)2. • [Depth here = circuit depth = time] • For NL • S(n)=log2(n)  Depth(n)d×log2(n)

  17. Borodin Construction (Idea) • State is bounded • Can construct the graph of all states • This will only take polynomial hardware • Compute transitive closure on graph • O(log2(N)) • Use associative reduce to extract solution • O(log(N))

  18. Borodin States • What states can the NDTM be in? • At most sS(N) values on tape • s=size of symbol set • Head of TM at most S(N) positions • q states for FSM • N locations for input tape head • Total: states=N×q×S(N)×sS(N) • For S(N)=log(N) • N×q×log(N) ×slog(N)=qN(log(s)+1)log(N) • Number of states polynomial in N

  19. Build Graph • Construct graph |V|=# states • M[i,j]=1 iff move from configuration i to j • If Vi is a state that corresponds to the input head being on square k • M[i,j] “enabled” iff move from i to j only when kth input is 1 and inputs is 1. • M[i,j] “enabled” iff move from i to j only when kth input is 0 and input is 0. • Can just be a set of AND’s initially setting up the initial connectivity matrix M

  20. Transitive Closure • Transitive Closure with O(|V|3) PEs • Still polynomial in N • |V|=N×q×log(N) ×slog(N)=qN(log(s)+1)log(N) • O(|V|3)  O(N3(log(s)+2)) • In log2(N) time • O(log2(|V|))  O( [log(N (log(s)+2))]2) • O([log(s)+2]2×log2(N))=O(log2(N))

  21. Extract Result • OR reduce on Reachable states • Can reach an accepting state for TM? • Therefore: NL  NC

  22. Converse Holds • Borodin • If A is in DEPTH((S(n)) for S(n)log(n) • Then A is in DSPACE(S(n)) • Recursive evaluation of gate value • w/ compact stack representation • Specialized for S(n)=log(n) • If A is in NC, then A is in L • NC  L • Know LNL … just showed NLNC • NL = NC

  23. Context Free Languages • Can recognize all context free languages in NC • PDA  NC

  24. P-Complete • There are languages that are P-Complete • i.e. if could show these were in NC • Then would show NC=P • E.g. TM simulation

  25. In NC FA PDA L NL Unknown: P=NC (P=NL) Complexity Roundup

  26. Physical Realism • All rely on reductions in log(N) time • With 3D space, speed of light • …there are no log(N) time reductions • Maybe notion of 3-space parallelizable? • Run in O(N1/3) time • O(N) processors • Cannot talk to more than O(N) in O(N1/3) time

  27. Admin • Friday/Monday:?? • Q: requests – what’s missing? • Project: two things due end of next week • Sequential implementation • Proposed plan of attack

More Related