1 / 57

CS184a: Computer Architecture (Structure and Organization)

CS184a: Computer Architecture (Structure and Organization). Day 17: February 19, 2003 Retime 1: Transformations. Previously. Reviewed Pipelining basic assignments on Saw spatial designs efficient when reuse logic at maximum frequency Interconnect is dominant delay and dominant area

sfrye
Download Presentation

CS184a: Computer Architecture (Structure and Organization)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS184a:Computer Architecture(Structure and Organization) Day 17: February 19, 2003 Retime 1: Transformations

  2. Previously • Reviewed Pipelining • basic assignments on • Saw spatial designs efficient • when reuse logic at maximum frequency • Interconnect is dominant delay • and dominant area • heavy call to reuse to use efficiently

  3. Today • Systematic transformation for retiming • preserve semantics (meaning)

  4. Motivation • FPGAs (spatial computing) • run efficiently when all resources reused rapidly • cycle time minimized • “Everything in the right place at the right time.”

  5. Motivating Questions • Can I build fixed-frequency (fixed clock) programmable architecture? • Can I always make design run at maximum clock rate? • How do we systematically transform any computation to • Operate on fixed-frequency array? • Coordinate around mandatory registers in design?

  6. Task • Move registers to: • Preserve semantics • Minimize path length between registers • i.e. Make path length 1 for maximum throughput or reuse • …while minimizing number of registers required

  7. Simple Example Path Length (L) = 4 Can we do better?

  8. Legal Register Moves • Retiming Lag/Lead

  9. Canonical Graph Representation Separate arc for each path Weight edges by number of registers (weight nodes by delay through node)

  10. Critical Path Length Critical Path: Length of longest path of zero weight nodes Compute in O(|E|) time by levelizing network: Topological sort, push path lengths forward until find register.

  11. Retiming Lag/Lead Retiming: Assign a lag to every vertex weight(e) = weight(e) + lag(head(e))-lag(tail(e))

  12. Valid Retiming • Retiming is valid as long as: • e in graph • weight(e) = weight(e) + lag(head(e))-lag(tail(e))  0 • Assuming original circuit was a valid synchronous circuit, this guarantees: • non-negative register weights on all edges • no travel backward in time :-) • all cycles have strictly positive register counts • propagation delay on each vertex is non-negative (assumed 1 for today)

  13. Retiming Task • Move registers  assign lags to nodes • lags define all locally legal moves • Preserving non-negative edge weights • (previous slide) • guarantees collection of lags remains consistent globally

  14. Retiming Transformation • N.B.: unchanged by retiming • number of registers around a cycle • delay along a cycle • Cycle of length P must have • at least P/c registers on it • to be retimeable to cycle c

  15. Optimal Retiming • There is a retiming of • graph G • w/ clock cycle c • iff G-1/c has no cycles with negative edge weights • G-  subtract a from each edge weight

  16. 1/c Intuition • Want to place a register every c delay units • Each register adds one • Each delay subtracts 1/c • As long as remains more positives than negatives around all cycles • can move registers to accommodate • Captures the regs=P/c constraints

  17. G-1/c

  18. Compute Retiming • Lag(v) = shortest path to I/O in G-1/c • Compute shortest paths in O(|V||E|) • Bellman-Ford • also use to detect negative weight cycles when c too small

  19. Bellman Ford • For I0 to N • ui  (except ui=0 for IO) • For k0 to N • for ei,jE • ui min(ui ,uj+w(ei,j)) • For ei,jE //still updatenegative cycle • if ui >uj+w(ei,j) • cycles detected

  20. Apply to Example

  21. Try c=1

  22. Apply: Find Lags Negative weight cycles? Shortest paths?

  23. Apply: Lags

  24. Apply: Move Registers 1 1 1 1 1 weight(e) = weight(e) + lag(head(e))-lag(tail(e))

  25. Apply: Retimed

  26. Apply: Retimed Design

  27. Revise Example (fanout delay)

  28. Revised: Graph

  29. Revised: Graph

  30. Revised: C=1?

  31. Revised: C=2?

  32. Revised: Lag

  33. 0 -1 0 Revised: Lag Take ceiling to convert to integer lags:

  34. 0 -1 0 Revised: Apply Lag -1

  35. 0 -1 0 Revised: Apply Lag -1 1 1 0 1 1 0 0 1 0 1 1 1 0

  36. Revised: Retimed 1 1 0 1 1 0 1 0 0 1 0 1 1

  37. Pipelining • We can use this retiming to pipeline • Assume we have enough (infinite supply) registers at edge of circuit • Retime them into circuit

  38. C>1 ==> Pipeline

  39. Add Registers

  40. Pipeline Retiming: Lag

  41. Pipelined Retimed

  42. Real Cycle

  43. Real Cycle

  44. Cycle C=1?

  45. Cycle C=2?

  46. Cycle: C-slow Cycle=c  C-slow network has Cycle=1

  47. 2-slow Cycle  C=1

  48. 2-Slow Lags

  49. 2-Slow Retime

  50. Retimed 2-Slow Cycle

More Related