1 / 31

Hardware/Software Mechanisms for Cross-Layer Power Proportionality “Power Prop”

Hardware/Software Mechanisms for Cross-Layer Power Proportionality “Power Prop”. Alex Yakovlev, Andrey Mokhov, Sascha Romanovsky, Max Rykunov, Alexei Iliasov and Danil Sokolov, Schools of EEE and CS, Newcastle University. Power Prop. The more you get The more you give!.

Download Presentation

Hardware/Software Mechanisms for Cross-Layer Power Proportionality “Power Prop”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardware/Software Mechanisms for Cross-Layer Power Proportionality “Power Prop” Alex Yakovlev, Andrey Mokhov, Sascha Romanovsky, Max Rykunov, Alexei Iliasov and Danil Sokolov, Schools of EEE and CS, Newcastle University Power Prop The more you get The more you give!

  2. Moore’s law and Power trends

  3. Part I:Power Proportionality

  4. Power Proportionality Issues reported in literature: • Performance -power tradeoff for commodity systems is linear; the best strategy is “Race to sleep”; additional “run” power states are of little use; changes in existing commodity operating systems have little influence • The focus should be on the time to transition to and from sleep! • For a new type of systems such as WSN there is a non-linear region – the slogan is: learn how to run CMOS slowly and exploit scheduling optimizations Core i7 power drawn at different frequencies Source: S. Dawson-Haggerty et al. Power Optimization – Reality Check, UC Berkeley, 2009

  5. Power proportionality Service-modulated processing Energy-modulated processing

  6. Part II:Reconfigurable Processors

  7. Achieving Power Proportionality • Support for wide range of voltages • Asynchronous design • Unstable voltage supply (energy harvesting) • Components optimised for different modes • Survival mode (power) • Mission mode (energy efficiency) • Emergency mode (performance) • Reconfigurable instructions • Altering instructionbehaviour in runtime

  8. Pathway from a high-level specification a low-level MCU implementation CS + EE Chip design CS + EE Chip tapeout CS + EE Chip Testing EE

  9. Reconfigurable Instructions DP3(x, y) = x1y1 + x2y2 + x3y3

  10. Resource-level refinement • Functionality: DP3(x, y) = x1y1 + x2y2 + x3y3 • Abstract specification: • Initialisation: c := 0 • Invariant: (c = 1) => (res = x1y1 + x2y2 + x3y3) • Event: if (c = 0) then (res := x1y1 + x2y2 + x3y3 & c := 1) • Open the black box and • show what is inside: • - Perform multiplications by 2-input fast multipliers • - Perform addition by 3-input adder

  11. Reconfigurable Instructions 2 multipliers Balanced Fastest 101 111 001 Dedicated component Least peak power 000 011

  12. Reconfigurable Instructions

  13. Reconfigurable Instructions x=1 y=0 z=1

  14. Part III: Intel 8051

  15. Final remarks • Towards power proportionality • Voltage range: 0.2V – 1.5V • Performance range: 2.7K – 67M instructions/sec • Survival of components • Full capability mode: 0.89V – 1.5V • RAM fails at 0.89V • Program counter unreliable below 0.74V • Asynchronous control survives until 0.2V

  16. PCB board for evaluation PCB board with FPGA 16

  17. Project outcomes: • Conference and journal papers: • Towards Reconfigurable Processors for Power-Proportional Computing, A. Mokhov, M. Rykunov, D. Sokolov and A. Yakovlev, Proceedings of the 12th IEEE Low Voltage Low Power Conference (FTFC), Paris, France, 2013. •  Design-for-Adaptivity of Microarchitectures, M. Rykunov, A. Mokhov, D. Sokolov, A. Yakovlev and A. Koelmans, Proceedings of the 24th IEEE International Conference on Application-specific Systems, Architectures and Processors, Washington D.C., USA, 2013.   • Synthesis of processor instruction sets from high-level ISA specifications, A. Mokhov, M. Rykunov, D. Sokolov, A. Yakovlev, A. Iliasov, and A. Romanovsky. IEEE Transactions on Computers, 2013. • Design of Processors with Reconfigurable Microarchitecture, A. Mokhov, M. Rykunov, D. Sokolov, and A. Yakovlev, Journal of Low Power Electronics andApplications, 2013. (Under review).

  18. Project outcomes (cont.): • Several MSc projects • PhD thesis – “Design of Asynchronous Microprocessor for Power Proportionality” (Nov. 2013). • The PowerProp project established several important industrial connections, e.g. Maxeler Technologies, IBM Research, etc. • Some PowerProp theory, tool support and software ideas have moved to a new Programme Grant -- PRiME (EP/K034448/1). • CPU design ideas will be used in SAVVIE project (EP/K012908/1). • Helped to promote joint CS+EE developments in Workcraft (graph-based EDA environment), used in several EPSRC projects.

  19. Thank you!

  20. Parameterised Graphs for formal specification of Multi-modal systems DP3 instruction computes dot product x·y = x1·y1 + x2·y2 + x3·y3. –declaration of the functional units a = unit "2-input adder" b = unit “3-input adder" c = unit “2-input multiplier" d = unit "fast 2-input multiplier" e = unit "dedicated DP3 unit“ –specification of each instruction inst_a = (d1 + d2 + d3) -> b inst_b = c1 -> c1 1 -> c1 -> b inst_c = e inst_d = (c2 + c1) -> a + c1 -> c1 -> a inst_e = d1 -> d1 -> (a + c1) -> a

  21. Parameterised Graphs for formal specification of Multi-modal systems DP3 instruction computes dot product x·y = x1·y1 + x2·y2 + x3·y3. –declaration of the functional units a = unit "2-input adder" b = unit “3-input adder" c = unit “2-input multiplier" d = unit "fast 2-input multiplier" e = unit "dedicated DP3 unit“ –specification of each instruction inst_a = (d1 + d2 + d3) -> b inst_b = c1 -> c1 1 -> c1 -> b inst_c = e inst_d = (c2 + c1) -> a + c1 -> c1 -> a inst_e = d1 -> d1 -> (a + c1) -> a

  22. Intel 8051 Instruction Set

  23. CJNE Instruction

  24. CJNE Instruction

  25. CJNE Instruction Branch taken Branch not taken

  26. Measurements: Current & Latency

  27. Measurements: Power

  28. Measurements: Energy Efficiency

  29. Some measurements… • 0.89V to 1.5V: full capability mode. • 0.74V to 0.89V: at 0.89V the RAM starts • to fail, so the chip can only operate using • internal registers. • 0.22V to 0.74V: at 0.74V the program • counter starts to fail, however the control • logic synthesised using the CPOG model • continues to operate correctly down to 0.22V • 67 MIPS at 1.2 V. • ~2700 instructions per second at 0.25V. 31

More Related