1 / 54

QUIZ

QUIZ. 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which drawbacks of the traditional FPGA CAD flow are targeted with the fragment level moves? . Question. BSPlace : A BLE Swapping technique for placement.

zoltan
Download Presentation

QUIZ

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. QUIZ

  2. 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which drawbacks of the traditional FPGA CAD flow are targeted with the fragment level moves?  Question

  3. BSPlace: A BLE Swapping technique for placement 04.11.2014 Minsik Hong George Hwang HemayaminiKurra MinjunSeo

  4. SCPlace • Introduction • Algorithm flowchart • Net Counting Algorithm • Results • BSPlace • Algorithm • Demo • Backup Slides • If you guys ask minimal questions we can cover more • Net Weighting • VPR Datastructures Outline

  5. Rajavel, SenthilkumarThoravi, and Ali Akoglu. "MO-Pack: Many-objective clustering for FPGA CAD." Proceedings of the 48th Design Automation Conference. ACM, 2011.

  6. Simultaneous timing driven clustering and placement for FPGAs. Chen, Gang, and Jason Cong. Field Programmable Logic and Application. Springer Berlin Heidelberg, 2004. 158-167.

  7. Fragment level move • BLE to a new CLB • Check for valid CLB configuration • Feasibility (number of BLEs and input pins) • Update the cost function • Block level move • CLB to CLB Key concept

  8. Advantages • Fix Packing issues during simulated annealing • Better Congestion Mitigation • Better at Routeability • Disadvantages • Speed • Complexity BLE Level Swapping

  9. SCPlace Algorithm

  10. Additional feature of Journal version SCPlace

  11. Use Novel net weighting Use Novel net weighting

  12. A novel net weighting algorithm for timing-driven placement Kong, Tim Tianming. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design. ACM, 2002.

  13. Accurate All Path Counting

  14. Calculate F(t) ARR/REQ 0/0 13/13 a e • a=2, T: the longest path delay 5 8/8 7/7 7 D{Fs(a, c), T} = D{0,13} = 1 D{Fs(b, c), T} = D{2,13} = = 0.88 D{Fs(c, d), T} = D{0,13} = = 1 D{Fs(d, e), T} = D{0,13} = = 1 D{Fs(d, f), T} = D{0,13} == 1 1 d c 11/13 0/2 3 b 5 f delay 1 1.88 1.88 1.88 1 0 Fs(a, c) = 7 – 0 – 7 = 0 Fs(b, c) = 7 – 0 – 2 = 2 a e 1 1.88 0 0 d c 0 F(c) = F(c) + D{Fs(a, c), T} x F(a) + D{Fs(b, c), T} x F(b) = 0 + 1x1 + 0.88x1 = 1.88 1 0 0 0 b f 2 0

  15. Calculate B(s) ARR/REQ 0/0 13/13 a e • a=2, T: the longest path delay 5 8/8 7/7 7 D{Bs(a, c), T} = D{0,13} = 1 D{Bs(b, c), T} = D{0,13} = 1 D{Bs(c, d), T} = D{0,13} = 1 D{Bs(d, e), T} = D{0,13} = 1 D{Bs(d, f), T} = D{2,13} = 0.88 1 d c 11/13 0/2 3 b 5 f 1.88 0 1 1 0 1.88 0 1.88 Bs(d, e) = 13 – 5 – 8 = 0 Bs(d, f) = 13 – 3 – 8 = 2 a e 0 1.88 1 1 d c 0 0 B(d) = B(d) + D{Bs(d, e), T} x B(e) + D{Bs(d, f), T} x B(f) = 0 + 1x1 + 0.88x1 = 1.88 0 b f 0 2

  16. Calculate AP(s, t) (a=2) F(s)/B(t) D{slack(a, c), T} = D{0,13} = 1 D{slack(b, c), T} = D{2,13} = 0.88 D{slack(c, d), T} = D{0,13} = 1 D{slack(d, e), T} = D{0,13} = 1 D{slack(d, f), T} = D{2,13} = 0.88 slack 1/1.88 1.88/1 1.88/1.88 1.88/1.88 a e 1.88/1 e a 1/1.88 AP(a,c) = F(a) x B(c) x D{slack(a, c), T} = 1 x 1.88 x 1 = 1.88 AP(b,c) = F(b) x B(c) x D{slack(b, c), T} = 1 x 1.88 x 0.88 = 1.65 1.88 1.88 3.53 c d d c 0 f f 1.65 0 b 1.65 b 0 2 2

  17. Results (Only use BLE swapping) CLB = 4

  18. Results (Only use BLE swapping)

  19. Results (BLE + CLB swapping) where 0 ≤ α ≤ 1 The number of CLB moves: The number of BLE moves:

  20. Results (BLE + CLB swapping) T-Vpack+VPR vs SCPlace (α=0.5)

  21. BSPlace

  22. BLE Level Swapping within Simulated Annealing with Rent’s Rule • Advantages • Fix packing issues as they occur. • Potentially better routability. • Potentially better congestion due to combination of placement and packing. • Disadvantages • Execution time – We need to do memory allocation and deallocation for any ble swapping. • Code Complexity – VPR is complex. We focus a lot of time with debugging and testing instead of algorithms. BSPlace

  23. Calculate the k value to get threshold • Enter simulated annealing process • Outer loop process • Inner loop process • Choose random CLB to move from current position to another position • Check Rent’s Rule Threshold • If we get a better result for swap • Queue BLE Swapping • Otherwise • Do CLB swapping :Use T-v place • Loop Through BLE Swapping • Do BLE Swap after checking whether swap overlaps with previous swap • Re-Allocated Memory and return to outer loop Rent’s Rule Threshold Value

  24. Code • Created our own BLE swapping mechanism using VPR data structure. • We have a whole suite of test fixtures to test code. • Testing still continuing, but we are finding minimal issues. • We have done a swap within placement. • We have started to integrate our cost function • Validation • We intend to run VPR benchmarks. Our BLE swapping solution should be better or the same as TV-Place. • Our VPR benchmarks should also be comparable to IRAC. Current Status

  25. Demo The circuit below abstracts the MUX, switchboxes, and connection boxes. The connections represent the direct connections between blesin clbs. Optimize this circuit by performing one BLE swap. Explain why your optimization will result in better performance. Architecture Parameter K = 2 I = 3 N = 2 Measurement Critical Path Delay = 1.182ns

  26. http://www.screenr.com/gJdN Demo

  27. Demo

  28. Thanks.

  29. Backup Slides

  30. Impact of duplication on placement Delay = 2 Delay = 1

  31. A novel net weighting algorithm for timing-driven placement Kong, Tim Tianming. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design. ACM, 2002.

  32. Accurate path counting algorithm • The first known accurate path counting algorithm that considers all paths • Due to experimental number of paths present in the circuit, accurate all path counting has been considered very difficult. • Significant performance improvement • Little loss in total wirelength • No runtime overhead A Novel Net Weighting Algorithm

  33. consider the path sharing effect • If two critical paths share a common segment, the edges in the common segment should receive higher weights. • Define two variables • Forward path F(p) - the number of different critical paths starting from PI elements, terminating at p. • Backward path B(p) – the number of different critical paths staring from PO elements, terminating at p, if we reverse all signal flow directions. A Novel Net Weighting Algorithm

  34. Background

  35. Background

  36. Example Timing of a circuit a e 5 7 1 d c 3 b 5 f The longest path delay (T) 0 13 0 13 5 5 7 7 1 1 8 7 8 7 3 0 5 11 3 2 5 13 ARR(t) REQ(s)

  37. Example 0/0 13/13 5 8/8 7/7 7 1 11/13 0/2 3 5 0 0 5 7 1 0 0 3 2 5 2 Slack(s, t)

  38. Example 0 0 1 0 0 5 7 1 3 2 5 2 0 0 d(π) = 13, slack(π) = 0 d(π) = 9, slack(π) = 4 0 0 5 7 1 1 0 0 0 0 3 2 2 5 d(π) = 11, slack(π) = 2 d(π) = 11, slack(π) = 2

  39. Critical Path counting

  40. Calculate F(p) 0 0 5 7 1 0 0 3 0 5 0 1 0 1 2 5 5 7 7 1 1 0 2 0 2 3 3 1 5 0 1 5 2

  41. Calculate B(p) 0 0 5 7 1 0 0 3 0 5 0 0 1 2 1 5 5 7 7 1 1 0 2 0 2 3 3 0 5 1 2 5 1

  42. Calculate GP(s,t) 2 1 1 2 5 5 7 7 1 2 2 1 2 2 3 2 5 1 3 1 5 2 a e 2 2 4 d c b f 2 2

  43. Use discount function to get accurate counting result • ‘a’ is a positive constant number • x • Fs(s,t) = ARR(t) – ARR(s) – d(s,t) • Bs(s,t) = REQ(t) – REQ(s) – d(s,t) • y is the longest path delay (T) Accurate All Path Counting

  44. Accurate All Path Counting

  45. Ex. Calculate F(t) (a=2) 0/0 13/13 D{Fs(a, c), T} = D{0,13} = 1 D{Fs(b, c), T} = D{2,13} = 0.88 D{Fs(c, d), T} = D{0,13} = 1 D{Fs(d, e), T} = D{0,13} = 1 D{Fs(d, f), T} = D{0,13} = 1 a e 5 8/8 7/7 7 1 d c 11/13 0/2 3 b 5 f 1 1.88 a e 5 1.88 1+0.88 7 1 d c 1.88 1 3 b 5 f

  46. Ex. Calculate B(s) (a=2) 0/0 13/13 D{Bs(a, c), T} = D{0,13} = 1 D{Bs(b, c), T} = D{0,13} = 1 D{Bs(c, d), T} = D{0,13} = 1 D{Bs(d, e), T} = D{0,13} = 1 D{Bs(d, f), T} = D{2,13} = 0.88 a e 5 8/8 7/7 7 1 d c 11/13 0/2 3 b 5 f 1.88 1 a e 5 1+0.88 1.88 7 1 d c 1 1.88 3 b 5 f

  47. Ex. Calculate AP(s,t) (a=2) 1 1.88 1.88 1 a e a e 5 5 1.88 1+0.88 1+0.88 1.88 7 7 1 1 d d c c 1.88 1 1 1.88 3 3 b 5 f b 5 f D{slack(a, c), T} = D{0,13} = 1 D{slack(b, c), T} = D{2,13} = 0.88 D{slack(c, d), T} = D{0,13} = 1 D{slack(d, e), T} = D{0,13} = 1 D{slack(d, f), T} = D{2,13} = 0.88 1*1.88*1 = 1.88 1.88*1*1 =1.88 a e 1.88*1.88*1 =3.53 d c b f 1*1.88*0.88 =1.65 1.88*1*0.88 =1.65

  48. Compare results e a e a 2 1.88 1.88 2 4 3.53 d c d c f 2 b f 1.65 b 2 1.65 Using Critical counting method (GPATH), it is difficult to get accurate result. However, if we use proposed algorithm, we can get more accurate result.

  49. Resource Routing Graph • Physical Block Graph • Netlist • Global CLB Netlist • Global Atom Netlist • Blocks VPR Datastructures

More Related