1 / 30

Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects

Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects. C. Long, L. J. Simonson, W. Liao and L. He EDA Lab, EE Dept. UCLA DAC 2004. Outline. Motivation Background Trajectory piecewise-linear CPI model CPI-aware floorplanning Experiment results

didier
Download Presentation

Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects C. Long, L. J. Simonson, W. Liao and L. He EDA Lab, EE Dept. UCLA DAC 2004

  2. Outline • Motivation • Background • Trajectory piecewise-linear CPI model • CPI-aware floorplanning • Experiment results • Conclusion and discussions

  3. Architecture optimization Floorplanning optimization ISA Configuration Performance evaluation Motivation • Traditional design flow • Architecture optimization: minimize CPI • Floorplanning optimization: maximize clock frequency • Architectural optimization is separated from the physical optimization under the assumption that layout does NOT change CPI.

  4. Traditional Flow • A few years ago: • Clock rates were much lower • More time for signal to reach its destination • Inductance was less of a factor in delay • Interconnects delay was smaller • Less resistance • Lower aspect ratio meant less capacitance • Inter-module communication takes less than one cycle • Interconnect length used to determine clock period (just clock it faster until it doesn’t work) • Floorplanning had no impact on the cycle-by-cycle operation (CPI) of the processor

  5. A New Interconnect Centric Reality • Now: • Clock rates have increased by an order of magnitude • My P2 from 1998 is 400MHz, The Prescott P4 will be 4.0GHz by the fourth quarter of ’04 and has 31 pipeline stages for integer operations, some of which are due to interconnect pipelining exclusively • Interconnects have longer delay with higher aspect ratio • Die size is the same • A signal can take up to ten clock cycles to travel from opposite corner to opposite corner of a chip in 90nm technology • Likely, the inter-module communication may take over one cycle • Clock period is now a constraint, not an objective • Interconnect is pipelined when it cannot meet the constraint • A pipelined interconnect delays the cycle a signal arrives • Changes the cycle-by-cycle behavior (CPI) of the system • Determined by floorplanning

  6. How to solve this problem? • Evaluate performance during floorplanning optimization • Efficiency of the evaluation is the key • Cycle-accurate simulation is too slow for this purpose Architecture optimization Floorplanning optimization ISA, Configuration Performance evaluation

  7. Contributions of our work • We have pointed out that the interconnect latency has a significant impact on architecture performance and it is critical to consider it during floorplanning • We have developed an efficient table-based cycle-per-instruction (CPI) model • Called trajectory piece-wise linear (TPWL) model with error less than 3.0% • We have Integrated TPWL CPI model with floorplan optimization • To reduce CPI by up to 28.57% with a small area overhead of 5.72%

  8. Background • Architecture and partitioning • A SuperScalar implementation of the MIPS instruction set • Similar to Alpha 21264 • Twelve blocks

  9. Bus Latency Vectors • Interface between physical level and architecture level • Twelve buses • Bus latency vectors (B) • E.g., B = {3, 4, 7, …} • Characterize a floorplan as a vector containing the latency of each interconnect

  10. Miss Events and Performance Loss • Types of miss events • Data Cache Miss • Instruction Cache Miss • TLB Miss • Branch Miss Prediction • Other sources of performance loss • Data dependencies • Resource Contention

  11. Measuring Performance • No hardware to measure • Need a model of the hardware • Simulate the execution of the machine • Two types of simulation • Trace driven simulation • Shade to generate instruction and address trace, dinero to model cache, etc. • Fast, 10s of instructions on host machine per instruction on target machine • Inaccurate • good for I-Cache performance loss measurement • bad for D-Cache performance loss measurement • poor for branch miss prediction performance loss • very bad for data dependency performance loss • Execution driven simulation • State of target hardware is maintained and updated in memory as each instruction is processed • Slow, ~1000s of instructions on host machine per instruction on target machine • Cycle-accurate, true to cycle by cycle behavior of hardware

  12. Cycle Accurate Simulation • Given B, computeCPI • Modify the architecture according to B • Change the configuration file • Insert buffers between modules • Measure CPI for a subset of the SPEC2000 benchmark suite • Floating point benchmarks: equake and mesa • Integer benchmarks: gzip, vortex and mcf • Take the arithmetic mean of these benchmarks as the CPI for B

  13. CPI Models • A CPI model estimate CPI under interested parameters such as interconnect latency, architecture configuration, etc. • CPI models in the literature • Static simulation [Nussbaum’01] • Based on a single detailed simulation • Generate a synthetic instruction trace • Take advantage of cache and branch prediction statistics • Statistical sampling of cycle accurate simulation • Sampling instead of truncating: selectively measuring in detail only an appropriate benchmark subset • Configuring a systematic sampling simulation run to achieve a desired confidence in estimates • More efficient than cycle-accurate simulation but slow, none of them consider interconnect latency

  14. Traditional floorplanning • Optimize floorplan via simulated annealing (SA) algorithm • Objective function: • Moves • Change the position or shape of blocks • Cooling scheme • Initial temperature • Constant cooling rate

  15. Floorplanning considering CPI • Based on simulated annealing • Objective function: • Extend from traditional floorplanning framework • Key is to estimate CPI efficiently • Moves and cooling schedule remain the same

  16. Trajectory of SA • The path that SA follows during optimization is a trajectory in the solution space • We only need to accurately estimate CPI in the area where the trajectory travels • The trajectory of SA with objective of area, wire length and CPI is close to that of area and wire length only Bus2 Area and wire length Area, wire length and CPI Bus1

  17. Trajectory Piecewise-linear CPI Model • Build a piecewise-linear model for a small solution region around the trajectories of SA • Three phases: sampling, collecting and simulating • An example for 2-dimension bus vector Latency (bus2) simulation Latency (bus1)

  18. TPWL: Sampling • Sample a complete simulated annealing process with objective of area and total wire length to obtain a set of bus latency vectors (points in n-dimension) Latency (bus2) Latency (bus1)

  19. TPWL: Collecting • Collect all the points obtained in the sampling phase in as few as possible “balls” (TPC problem) Latency (bus2) Latency (bus1)

  20. TPWL: Simulating • Obtain CPI by cycle accurate simulation for the center of “balls” • Build a CPI table indexed by these center points Latency (bus2) simulation Latency (bus1)

  21. B1 B2 d1 B3 d2 d3 B d4 B4 d5 B5 CPI estimation under TPWL model • Based on each entry, CPI of target B could be estimated by first order expansion • For each entry, a weight is calculated based on the distance between the target B and the entry in CPI table • The final estimation is the weighted sum of the estimation based on each entry

  22. Start Floorplanning Sampling Trajectory Solve the TPC problem “Balls” to cover trajectory Cycle-accurate simulation CPI Table Integrate to floorplanning Floorplanning considering CPI CPI-aware Floorplanning- Overview • Integrate the TPWL CPI model with a traditional floorplanning tool

  23. Iterative TPWL model • When the trajectory with objective of area and total wire length is significantly different from the trajectory with objective of area, total wire length and CPI, an iterative TPWL model is needed Bus2 Area and wire length iteration = 1 iteration = 2 Area, wire length and CPI Bus1

  24. Start Floorplanning Sampling Trajectory Solve the TPC problem “Balls” to cover trajectory Cycle-accurate simulation CPI Table Integrate to floorplanning Floorplanning considering CPI Iterative TPWL Model • Iteratively expand the CPI table to build a iterative TPWL (iTPWL) model • Based on the TPWL model but from the second iteration one, the objective of SA is area, total wire length and CPI • Improve the accuracy of CPI estimation and the quality of the final floorplan

  25. Summary on TPWL CPI Model • Originally proposed for modeling non-linear systems [Rewienski’03] • Outperforms other techniques based on quadratic reduction • TPWL model is suitable for floorplanning optimization • The trajectory of SA with objective of area, total wire length and CPI is close to that with objective of area and total wire length only • When these two trajectories are not close, iTPWL model is employed to improve the accuracy • Contribution of this paper on TPWL model • Introduce the TPC problem • Expand TPWL model to iTPWL model

  26. Experiment results • Verification of CPI models • Error of TPWL model: 2.62%; Error of iTPWL model: 1.66%

  27. Impact of models to final floorplans • Comparison of the floorplans obtained by access ratio, sensitivity rate model, TPWL and iTPWL model with objective of area, total wire length and CPI • Access ratio: Use access ratio of interconnects to represent the impact to system performance • Estimate CPI based on first order expansion on the original point

  28. Floorplanning with iTPWL Model • Comparison between floorplans obtained by different objectives

  29. Running time • Simple-scalar simulation times to build up the TPWL and iTPWL model

  30. Conclusion and discussion • Propose an accurate CPI model with less than 3.0% error • The CPI-aware floorplaner reduce CPI by 28.57% with a small area overhead of 5.72% • Expand the TPWL model and improve the accuracy of estimation • the accuracy of iTPWL model leads to floorplanning solutions with high quality and enables us to develop good heuristics, such as access ratio, to minimize CPI without explicit CPI calculation. • Plan to apply this model to architecture changes

More Related