1 / 69

CS184a: Computer Architecture (Structure and Organization)

CS184a: Computer Architecture (Structure and Organization). Day 19: February 23, 2005 Retime 1: Transformations. Previously. Reviewed Pipelining basic assignments on Saw spatial designs efficient when reuse logic at maximum frequency Interconnect is dominant delay and dominant area

fcabell
Download Presentation

CS184a: Computer Architecture (Structure and Organization)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS184a:Computer Architecture(Structure and Organization) Day 19: February 23, 2005 Retime 1: Transformations

  2. Previously • Reviewed Pipelining • basic assignments on • Saw spatial designs efficient • when reuse logic at maximum frequency • Interconnect is dominant delay • and dominant area • heavy call to reuse to use efficiently

  3. Today • Systematic transformation for retiming • preserve semantics (meaning)

  4. Motivation

  5. Motivation • FPGAs (spatial computing) • run efficiently when all resources reused rapidly • cycle time minimized • “Everything in the right place at the right time.”

  6. Motivating Questions • Can I build a fixed-frequency (fixed clock) programmable architecture? • Can I always make a design run at maximum clock rate? • How do we systematically transform any computation to • Operate on fixed-frequency array? • Coordinate around mandatory registers in design?

  7. Long Paths Slow Could limit cycle Add registers to long distance interconnect At each switch? In the middle of long wires? How justify these registers? Interconnect Retiming

  8. Day 3 Spatial Quadratic • How do we pipeline a design?

  9. Day 3 Pipelined Spatial Quadratic

  10. How do you use?

  11. How do you use? • To compute A*B+C*D+E

  12. Compute • A*B+C*D+E

  13. How Compute? • Yi=Yi-1 xor Xi • With pipelined nand2 gates?

  14. want have

  15. Retiming Algorithm

  16. Task • Move registers to: • Preserve semantics • Minimize path length between registers • i.e. Make path length 1 for maximum throughput or reuse • …while minimizing number of registers required

  17. Simple Example Path Length (L) = 4 Can we do better?

  18. Legal Register Moves • Retiming Lag/Lead

  19. Canonical Graph Representation Separate arc for each path Weight edges by number of registers (weight nodes by delay through node)

  20. Critical Path Length Critical Path: Length of longest path of zero weight nodes Compute in O(|E|) time by levelizing network: Topological sort, push path lengths forward until find register.

  21. Retiming Lag/Lead Retiming: Assign a lag to every vertex weight(e) = weight(e) + lag(head(e))-lag(tail(e))

  22. Valid Retiming • Retiming is valid as long as: • e in graph • weight(e) = weight(e) + lag(head(e))-lag(tail(e))  0 • Assuming original circuit was a valid synchronous circuit, this guarantees: • non-negative register weights on all edges • no travel backward in time :-) • all cycles have strictly positive register counts • propagation delay on each vertex is non-negative (assumed 1 for today)

  23. Retiming Task • Move registers  assign lags to nodes • lags define all locally legal moves • Preserving non-negative edge weights • (previous slide) • guarantees collection of lags remains consistent globally

  24. Retiming Transformation • N.B.: unchanged by retiming • number of registers around a cycle • delay along a cycle • Cycle of length P must have • at least P/c registers on it • to be retimeable to cycle c

  25. Optimal Retiming • There is a retiming of • graph G • w/ clock cycle c • iff G-1/c has no cycles with negative edge weights • G-  subtract a from each edge weight

  26. 1/c Intuition • Want to place a register every c delay units • Each register adds one • Each delay subtracts 1/c • As long as remains more positives than negatives around all cycles • can move registers to accommodate • Captures the regs=P/c constraints

  27. G-1/c

  28. Compute Retiming • Lag(v) = shortest path to I/O in G-1/c • Compute shortest paths in O(|V||E|) • Bellman-Ford • also use to detect negative weight cycles when c too small

  29. Bellman Ford • For I0 to N • ui  (except ui=0 for IO) • For k0 to N • for ei,jE • ui min(ui ,uj+w(ei,j)) • For ei,jE //still updatenegative cycle • if ui >uj+w(ei,j) • cycles detected

  30. Apply to Example

  31. Try c=1

  32. Apply: Find Lags Negative weight cycles? Shortest paths?

  33. Apply: Lags

  34. Apply: Move Registers 1 1 1 1 1 weight(e) = weight(e) + lag(head(e))-lag(tail(e))

  35. Apply: Retimed

  36. Apply: Retimed Design

  37. Revise Example (fanout delay)

  38. Revised: Graph

  39. Revised: Graph

  40. Revised: C=1?

  41. Revised: C=2?

  42. Revised: Lag

  43. 0 -1 0 Revised: Lag Take ceiling to convert to integer lags:

  44. 0 -1 0 Revised: Apply Lag -1

  45. 0 -1 0 Revised: Apply Lag -1 1 1 0 1 1 0 0 1 0 1 1 1 0

  46. Revised: Retimed 1 1 0 1 1 0 1 0 0 1 0 1 1

  47. Pipelining • We can use this retiming to pipeline • Assume we have enough (infinite supply) registers at edge of circuit • Retime them into circuit

  48. C>1 ==> Pipeline

More Related