540 likes | 669 Views
QUIZ. 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which drawbacks of the traditional FPGA CAD flow are targeted with the fragment level moves? . Question. BSPlace : A BLE Swapping technique for placement.
E N D
1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which drawbacks of the traditional FPGA CAD flow are targeted with the fragment level moves? Question
BSPlace: A BLE Swapping technique for placement 04.11.2014 Minsik Hong George Hwang HemayaminiKurra MinjunSeo
SCPlace • Introduction • Algorithm flowchart • Net Counting Algorithm • Results • BSPlace • Algorithm • Demo • Backup Slides • If you guys ask minimal questions we can cover more • Net Weighting • VPR Datastructures Outline
Rajavel, SenthilkumarThoravi, and Ali Akoglu. "MO-Pack: Many-objective clustering for FPGA CAD." Proceedings of the 48th Design Automation Conference. ACM, 2011.
Simultaneous timing driven clustering and placement for FPGAs. Chen, Gang, and Jason Cong. Field Programmable Logic and Application. Springer Berlin Heidelberg, 2004. 158-167.
Fragment level move • BLE to a new CLB • Check for valid CLB configuration • Feasibility (number of BLEs and input pins) • Update the cost function • Block level move • CLB to CLB Key concept
Advantages • Fix Packing issues during simulated annealing • Better Congestion Mitigation • Better at Routeability • Disadvantages • Speed • Complexity BLE Level Swapping
Use Novel net weighting Use Novel net weighting
A novel net weighting algorithm for timing-driven placement Kong, Tim Tianming. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design. ACM, 2002.
Calculate F(t) ARR/REQ 0/0 13/13 a e • a=2, T: the longest path delay 5 8/8 7/7 7 D{Fs(a, c), T} = D{0,13} = 1 D{Fs(b, c), T} = D{2,13} = = 0.88 D{Fs(c, d), T} = D{0,13} = = 1 D{Fs(d, e), T} = D{0,13} = = 1 D{Fs(d, f), T} = D{0,13} == 1 1 d c 11/13 0/2 3 b 5 f delay 1 1.88 1.88 1.88 1 0 Fs(a, c) = 7 – 0 – 7 = 0 Fs(b, c) = 7 – 0 – 2 = 2 a e 1 1.88 0 0 d c 0 F(c) = F(c) + D{Fs(a, c), T} x F(a) + D{Fs(b, c), T} x F(b) = 0 + 1x1 + 0.88x1 = 1.88 1 0 0 0 b f 2 0
Calculate B(s) ARR/REQ 0/0 13/13 a e • a=2, T: the longest path delay 5 8/8 7/7 7 D{Bs(a, c), T} = D{0,13} = 1 D{Bs(b, c), T} = D{0,13} = 1 D{Bs(c, d), T} = D{0,13} = 1 D{Bs(d, e), T} = D{0,13} = 1 D{Bs(d, f), T} = D{2,13} = 0.88 1 d c 11/13 0/2 3 b 5 f 1.88 0 1 1 0 1.88 0 1.88 Bs(d, e) = 13 – 5 – 8 = 0 Bs(d, f) = 13 – 3 – 8 = 2 a e 0 1.88 1 1 d c 0 0 B(d) = B(d) + D{Bs(d, e), T} x B(e) + D{Bs(d, f), T} x B(f) = 0 + 1x1 + 0.88x1 = 1.88 0 b f 0 2
Calculate AP(s, t) (a=2) F(s)/B(t) D{slack(a, c), T} = D{0,13} = 1 D{slack(b, c), T} = D{2,13} = 0.88 D{slack(c, d), T} = D{0,13} = 1 D{slack(d, e), T} = D{0,13} = 1 D{slack(d, f), T} = D{2,13} = 0.88 slack 1/1.88 1.88/1 1.88/1.88 1.88/1.88 a e 1.88/1 e a 1/1.88 AP(a,c) = F(a) x B(c) x D{slack(a, c), T} = 1 x 1.88 x 1 = 1.88 AP(b,c) = F(b) x B(c) x D{slack(b, c), T} = 1 x 1.88 x 0.88 = 1.65 1.88 1.88 3.53 c d d c 0 f f 1.65 0 b 1.65 b 0 2 2
Results (Only use BLE swapping) CLB = 4
Results (BLE + CLB swapping) where 0 ≤ α ≤ 1 The number of CLB moves: The number of BLE moves:
Results (BLE + CLB swapping) T-Vpack+VPR vs SCPlace (α=0.5)
BLE Level Swapping within Simulated Annealing with Rent’s Rule • Advantages • Fix packing issues as they occur. • Potentially better routability. • Potentially better congestion due to combination of placement and packing. • Disadvantages • Execution time – We need to do memory allocation and deallocation for any ble swapping. • Code Complexity – VPR is complex. We focus a lot of time with debugging and testing instead of algorithms. BSPlace
Calculate the k value to get threshold • Enter simulated annealing process • Outer loop process • Inner loop process • Choose random CLB to move from current position to another position • Check Rent’s Rule Threshold • If we get a better result for swap • Queue BLE Swapping • Otherwise • Do CLB swapping :Use T-v place • Loop Through BLE Swapping • Do BLE Swap after checking whether swap overlaps with previous swap • Re-Allocated Memory and return to outer loop Rent’s Rule Threshold Value
Code • Created our own BLE swapping mechanism using VPR data structure. • We have a whole suite of test fixtures to test code. • Testing still continuing, but we are finding minimal issues. • We have done a swap within placement. • We have started to integrate our cost function • Validation • We intend to run VPR benchmarks. Our BLE swapping solution should be better or the same as TV-Place. • Our VPR benchmarks should also be comparable to IRAC. Current Status
Demo The circuit below abstracts the MUX, switchboxes, and connection boxes. The connections represent the direct connections between blesin clbs. Optimize this circuit by performing one BLE swap. Explain why your optimization will result in better performance. Architecture Parameter K = 2 I = 3 N = 2 Measurement Critical Path Delay = 1.182ns
Impact of duplication on placement Delay = 2 Delay = 1
A novel net weighting algorithm for timing-driven placement Kong, Tim Tianming. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design. ACM, 2002.
Accurate path counting algorithm • The first known accurate path counting algorithm that considers all paths • Due to experimental number of paths present in the circuit, accurate all path counting has been considered very difficult. • Significant performance improvement • Little loss in total wirelength • No runtime overhead A Novel Net Weighting Algorithm
consider the path sharing effect • If two critical paths share a common segment, the edges in the common segment should receive higher weights. • Define two variables • Forward path F(p) - the number of different critical paths starting from PI elements, terminating at p. • Backward path B(p) – the number of different critical paths staring from PO elements, terminating at p, if we reverse all signal flow directions. A Novel Net Weighting Algorithm
Example Timing of a circuit a e 5 7 1 d c 3 b 5 f The longest path delay (T) 0 13 0 13 5 5 7 7 1 1 8 7 8 7 3 0 5 11 3 2 5 13 ARR(t) REQ(s)
Example 0/0 13/13 5 8/8 7/7 7 1 11/13 0/2 3 5 0 0 5 7 1 0 0 3 2 5 2 Slack(s, t)
Example 0 0 1 0 0 5 7 1 3 2 5 2 0 0 d(π) = 13, slack(π) = 0 d(π) = 9, slack(π) = 4 0 0 5 7 1 1 0 0 0 0 3 2 2 5 d(π) = 11, slack(π) = 2 d(π) = 11, slack(π) = 2
Calculate F(p) 0 0 5 7 1 0 0 3 0 5 0 1 0 1 2 5 5 7 7 1 1 0 2 0 2 3 3 1 5 0 1 5 2
Calculate B(p) 0 0 5 7 1 0 0 3 0 5 0 0 1 2 1 5 5 7 7 1 1 0 2 0 2 3 3 0 5 1 2 5 1
Calculate GP(s,t) 2 1 1 2 5 5 7 7 1 2 2 1 2 2 3 2 5 1 3 1 5 2 a e 2 2 4 d c b f 2 2
Use discount function to get accurate counting result • ‘a’ is a positive constant number • x • Fs(s,t) = ARR(t) – ARR(s) – d(s,t) • Bs(s,t) = REQ(t) – REQ(s) – d(s,t) • y is the longest path delay (T) Accurate All Path Counting
Ex. Calculate F(t) (a=2) 0/0 13/13 D{Fs(a, c), T} = D{0,13} = 1 D{Fs(b, c), T} = D{2,13} = 0.88 D{Fs(c, d), T} = D{0,13} = 1 D{Fs(d, e), T} = D{0,13} = 1 D{Fs(d, f), T} = D{0,13} = 1 a e 5 8/8 7/7 7 1 d c 11/13 0/2 3 b 5 f 1 1.88 a e 5 1.88 1+0.88 7 1 d c 1.88 1 3 b 5 f
Ex. Calculate B(s) (a=2) 0/0 13/13 D{Bs(a, c), T} = D{0,13} = 1 D{Bs(b, c), T} = D{0,13} = 1 D{Bs(c, d), T} = D{0,13} = 1 D{Bs(d, e), T} = D{0,13} = 1 D{Bs(d, f), T} = D{2,13} = 0.88 a e 5 8/8 7/7 7 1 d c 11/13 0/2 3 b 5 f 1.88 1 a e 5 1+0.88 1.88 7 1 d c 1 1.88 3 b 5 f
Ex. Calculate AP(s,t) (a=2) 1 1.88 1.88 1 a e a e 5 5 1.88 1+0.88 1+0.88 1.88 7 7 1 1 d d c c 1.88 1 1 1.88 3 3 b 5 f b 5 f D{slack(a, c), T} = D{0,13} = 1 D{slack(b, c), T} = D{2,13} = 0.88 D{slack(c, d), T} = D{0,13} = 1 D{slack(d, e), T} = D{0,13} = 1 D{slack(d, f), T} = D{2,13} = 0.88 1*1.88*1 = 1.88 1.88*1*1 =1.88 a e 1.88*1.88*1 =3.53 d c b f 1*1.88*0.88 =1.65 1.88*1*0.88 =1.65
Compare results e a e a 2 1.88 1.88 2 4 3.53 d c d c f 2 b f 1.65 b 2 1.65 Using Critical counting method (GPATH), it is difficult to get accurate result. However, if we use proposed algorithm, we can get more accurate result.
Resource Routing Graph • Physical Block Graph • Netlist • Global CLB Netlist • Global Atom Netlist • Blocks VPR Datastructures