1 / 25

OPTIMAL FSMD PARTITIONING FOR LOW POWER

OPTIMAL FSMD PARTITIONING FOR LOW POWER. Nainesh Agarwal and Nikitas Dimopoulos Electrical and Computer Engineering University of Victoria. Summary. Power and energy Power gating Partitioning as means to achieve optimal power gating What next. Computation Power and Energy.

lance
Download Presentation

OPTIMAL FSMD PARTITIONING FOR LOW POWER

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OPTIMAL FSMD PARTITIONING FOR LOW POWER Nainesh Agarwal and Nikitas Dimopoulos Electrical and Computer Engineering University of Victoria

  2. Summary • Power and energy • Power gating • Partitioning as means to achieve optimal power gating • What next

  3. Computation Power and Energy • What is the minimum energy a computation can expend? • Are we there yet?

  4. Computation Power and Energy cont’d • Feynman gives a relation between free energy and computation rate for reversible computation • E = kTlogr • Where r is the computation rate. • This means that at the limit, we may expend zero energy (when r =1) but then the computation will take infinitely long.

  5. Computation Power and Energy cont’d • For irreversible computation, • E=kTblog2 • Where b is the number of bits involved in the computation (entropy)

  6. Computation Power and Energy cont’d • In both cases, these quantities are wxceptionally small. • k =1.3806504×10−23 J/K • At T=300ºK, kT= 4.14x10-21J • A 50W 3GHz processor, in one cycle, consumes 1.65x10-8J

  7. Computation Power and Energy cont’d • DSPstone benchmarks synthesized in 180 nm and 90 nm technologies

  8. DSPstone dynamic energy

  9. DSPstone total energy

  10. Computation Power and Energy cont’d • Computational energy is far above the theoretical minimum (by more than 10 orders of magnitude) • Technological drive reduces total energy (an order of magnitude per generation) • Leakage power has become an issue • Power gating may provide efficiencies to further scale the technology

  11. Partitioning • Controller and datapath are considered together • Problem is formulated as • Integer Linear Programming • Non-linear programming solved using simulated annealing

  12. Notation • si represents a state of a FSMD • vk represents a variable associated with one or more states • A variable vk is considered to be shared between two states si and sj if the variable is read and/or written at both states • Tij Is the total number of bits of all variables shared by states si and sj • Eij is 1 if there is a transition between states si and sj, otherwise it is 0.

  13. ILP formulation • Minimizes the number of bits that are shared between the partitions and the number of times that control could between the partitions • sij is 1 if both states siandsjare in the same partition. Otherwise, it is 0.

  14. ILP formulation - complete

  15. Simulated Annealing formulation • xi is -1 if state si is in the left partition, and it is 1 if si is in the right partition • These quantities count the number of variable bits and transition edges shared between the two partitions

  16. Simulated Annealing formulation • simplification steps • Observe thatis constant (the total number of variable-bits)

  17. Simulated Annealing formulation • Minimizes both the shared bits and the transition edges.

  18. Evaluation • Implemented four integer algorithms • 8-bit counter • 5/3 wavelet transform using lifting • multiplierless approximation to the eight-point Discrete Cosine Transform (DCT) • Integer transform from the H.264 standard • Used CoDeL to implement the designs. • Trace data were obtained from simulations using Synopsys • The ILP model was solved using the CPLEX solver included in the AIMMS modeling environment • The simulated annealing used MATLAB

  19. Evaluation cont’d • Power savings were estimated (no partitioned design implementation yet) • The static power savings depends on the size of the sequential logic and the portion of time spent in each partition. • The dynamic power savings depends on the number of bits that are not clocked while the partition is not powered mediated by the overhead due to data communication when the active partition changes.

  20. Evaluation (Static Power savings)

  21. Evaluation (Dynamic Power Savings)

  22. Results (ILP)

  23. Results (Simulated Annealing)

  24. Discussion • Results show that partitioning the control and datapaths could potentially save up to 50% of power (static power) • Some circuits could not partition (DWT includes one tight loop where it spends more than 90% of the time) • Simulated annealing and ILP (for the partitioned circuits) give identical results. • Simulated annealing is much faster.

  25. Future • Extend methodology to more than 2 partitions • Implement the partitioned FSMD machines and confirm the realized power savings • Lower energy!

More Related