1 / 49

CS137: Electronic Design Automation

CS137: Electronic Design Automation. Day 8: January 27, 2006 Cellular Placement. Problem Parallelism Cellular Automata Idea Details Avoid Local Minima Update locations Results Directions. Primary Sources Wrighton&DeHon FPGA2003 Wrighton MS Thesis 2003. Today. Placement.

dburrows
Download Presentation

CS137: Electronic Design Automation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS137:Electronic Design Automation Day 8: January 27, 2006 Cellular Placement

  2. Problem Parallelism Cellular Automata Idea Details Avoid Local Minima Update locations Results Directions Primary Sources Wrighton&DeHon FPGA2003 Wrighton MS Thesis 2003 Today

  3. Placement • Problem: Pick locations for all building blocks • minimizing energy, delay, area • really: • minimize wire length • minimize channel density • surrogates: • Minimizing squared wire length • Minimize bounding box

  4. Parallelism • What parallelism exists in placement? • Evaluate costs of prospective moves • One set to many perspective locations • Many moves each to single location • Perform moves

  5. Cellular Automata • Basic idea: regular array of identical cells with nearest-neighbor communication

  6. CA Model • On each cycle: • Each cell exchanges values with neighbors • Updates state/value based on own state and that of neighbors • E.g. Conway’s LIFE

  7. Cellular Automata • Physical Advantage: • No long wires • Area linear in number of nodes • Minimum delaysmall cycle time • Good scaling properties

  8. System Architecture Taxonomy (Subject to continuing refinement and embellishment)

  9. CA Placement • Can we perform placement in a CA?

  10. Mapping • Each cell is a physical placement location • State is a logical node assigned to the cell • Assume: • Cell knows own location • State knows location of connected nodes

  11. Costs • Assume: • Cell knows own location • State knows location of connected nodes • Cell computes: its cost at that location

  12. Moves • Two adjacent cells can exchange graph nodes

  13. Moves • Evaluate goodness of proposed swap • Each cell considers impact of its graph node being in the other cell • Keep if swap reduces cost

  14. Move Costs • Only really need to evaluate delta cost • (src.x-sink.x)2 • Moving sink • d/dx=-2 (src.x-sink.x) • Delta move cost is linear distance

  15. Parallel Swaps • Pair up and perform N/2 swaps in parallel

  16. Movement • Alternate pairings with N,S,E,W neighbor  move any directions

  17. Basic Idea • Pair up PEs • Compute impact of swaps in parallel • Perform swaps in parallel • Repeat until converge

  18. Problems/Details • Greedy swaps  local minima? • How update location of neighbors? • …they are moving, too

  19. Avoid Greedy • Insert randomness in swaps •  Simulated Annealing • Shake up system to get out of local minima • Swap if • Randomly decide to swap • OR beneficial to swap • Change swap thresholds over time

  20. Swap?

  21. Impact of Randomness

  22. Range Limiting Eurgo, Hauck, & Sharma DAC 2005

  23. Local Swaps Only • Assume there’s an ideal location • Each node takes a biased Random Walk away from minimum cost location • Gives node a distribution function around the minimum cost location • If wander into a better “minimum cost” home, then wanders around new centerpoint • Decreasing temperature restricts effective radius of walk

  24. Local Swap Random Walk • Decreasing temperature restricts effective radius of walk

  25. How update locations? • Broadcast? • Pipelined Ring? • Send to neighbors? • Routing network? • Tree? • For whom? • Everyone? Only things moved? Only things moved a lot?

  26. Drop value in ring Shift around entire array Everyone listens for updates Simple Solution: Ring

  27. Simple Solution: Ring • Weakness? • Serial • N cycles to complete • N/2 swaps in O(1) • Then O(N) to update?

  28. Simple Solution: Ring • Linear update bad • Idea: allow staleness • Things move slowly • Estimate of position not that bad… • …and continued operation will correct…

  29. Algorithm

  30. Algorithm Update Locations

  31. Algorithm Try Moves

  32. Quality vs. Parameters

  33. Iso-Quality Pick point on Iso-Quality Curve that minimizes time

  34. FPGA Implementation • Virtex E (180nm) • 10ns cycle (100MHz) • 150 cycles for 4-phase swap • (~40 cycles/swap) • 400 LUTs / Placement Engine • Comparing • 2.2GHz Intel Xeon (L2 512KB)

  35. Results

  36. Tuning Quality

  37. Scaling • Processor cycles O(N4/3) • VPR • Systolic cycles • O(N1/2) – assume geometric refinement; O(N1/2 ) update • O(N5/6) – mesh sort, same number of swaps as VPR (N4/3 / N1/2)

  38. Scaling Also includes technology scaling

  39. Variations • Update Schemes • Cost Functions • Larger bins than PEs

  40. Update Scheme: Tree • Build Reduce Tree (H-Tree) • Route to route in O(N1/2) time • Route from root to leaves in O(N1/2) times • Pipeline • Same bandwidth as Ring (1/cycle) • But less staleness (only O(N1/2))

  41. Reducing Broadcast (Idea 1) • Don’t update things that haven’t moved (much) • …or things that move and move back before broadcast • Keep track of staleness • How far moved from last broadcast • Give priority to stalest data • Max staleness wins at each tree stage • Break ties with randomness

  42. Reducing Broadcast (Idea 2) • Update locally • Don’t need to know if someone far away moved by 1 square • …but need to know if near neighbor did • Multigrid/multiscale scheme • Only alert nodes in same subtree • When change subtrees at a level, alert all nodes underneath

  43. Update Scheme: Mesh Route • Can Route a permutation in O(N1/2) time on a mesh • Build mesh switching • Make O(N) swaps • Then take O(N1/2) time moving/updating • Becomes full simulated annealing • i.e. not just local swaps

  44. Cost Functions

  45. Cost Functions • Bounding Box2 phase update • Phase 1: alert source to location of all sinks • Phase 2: source communicates bbox extents to all sinks

  46. Timing • Linear Update: • Topological ordering of netlist • Use tree to distribute updates • Send updates in netlist order • get delay in one pass • Mesh: • Compute directly with dataflow-style spreading activation • Wait for all inputs; then send output

  47. Bins

  48. Node Bins • Keep more than one graph node per PE • Local swap of one node from each PE node set each step • One with largest benefit? • Randomly select based on cost/benefit? • Like rejectionnless annealing

  49. Admin • Parallel Prefix familiarity? • Due today: literature review • There is class on Monday

More Related