1 / 30

Jason Law Byeong Kil Lee

TM5400/5600 TM5500/5800 TM6000. Jason Law Byeong Kil Lee. Outline. Crusoe technology Crusoe processors / architecture Code morphing software Crusoe hardware support for code morphing LongRun power management Performance comparison Conclusion. Crusoe Technology.

buzz
Download Presentation

Jason Law Byeong Kil Lee

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TM5400/5600TM5500/5800TM6000 Jason Law Byeong Kil Lee

  2. Outline • Crusoe technology • Crusoe processors / architecture • Code morphing software • Crusoe hardware support for code morphing • LongRun power management • Performance comparison • Conclusion

  3. Crusoe Technology • Crusoe processor = Software + hardware • Code Morphing software • Dynamically translates x86 instructions into VLIW instructions • Provides x86 compatibility • Optimization and scheduling by software 3/4 • VLIW hardware • 128 bit Very long Instruction Word Processor • Simple and fast • Fewer transistors 1/4 Low power x86 compatibility PC performance

  4. Crusoe VLIW

  5. Crusoe Processors L1 cache : 128 K DDRAM-SDRAM (100 to 133MHz) SDRAM (66 to 133MHz)

  6. Features • Lighter • Longer • Cooler • x86 compatibility (windows / Linux) • Upgradeable (by software) • Lower cost • MMX support ( not support for SSE / 3dnow! ) • Target : ultra-light mobile notebooks, internet appliance, high-density servers, embedded devices • Products : SONY, Fusitsu, NEC, RLX technology, ….

  7. Crusoe Architecture TM5800

  8. Cont. • VLIW CPU : executing up to 4 operations in each cycle • Molecule: long instruction word (128 bits molecule) • All atoms within a molecule are executed in parallel, in order • 2 ALU, 1FP, 1 load/store, 1 branch unit • In-order 7-stage integer/10-stage FP pipeline • 64 integer registers, 32 FP registers

  9. Crusoe vs. x86 • The blue stuff is silicon, and the yellow is software • Crusoe's blue part is smaller • All of those hardware was moved off the die and into software

  10. Code Morphing Software : A dynamic translation system, reside in a ROM, First program to start executing when booting • Drawing the H/W and S/W line • Software: decoding x86 instructions and generating parallel molecule • Hardware: execute using a simple, high-speed VLIW engine • Decoding and scheduling • Translation cache : CMS translates instructions once, saving the resulting translation for re-use  Skip the translation in the next time

  11. Code Morphing Software Caching • Translation cache : • Resides in a separate memory space • The size can be set at boot time, or OS can make the size adjustable • Crusoe’s CMS monitor actual execution • Keep track of which blocks of code execute most often  Optimizes them accordingly • Keep track of which branches are most often taken  Annotate the code accordingly

  12. Code Morphing Software Filtering & Prediction • Filtering : a wide choice of execution modes for x86 code • Interpretation (no translation overhead), • Translation, • Highly optimized code(takes longest to generate) : Run faster once translated • Prediction • Highly biased branch : frequently taken path • Otherwise : execute both path, select later

  13. Code Morphing Software Translation Process • 1st pass (frontend) • Translate the x86 instructions into a simple sequences of atoms (temporary register used) • 2nd pass(optimizer) • Well-known compiler optimization Common subexpression elimination, loop invariant removal, Dead code elimination • 3rd pass (scheduler) : • Reorders the optimized atoms and groups them into individual molecules (Scheduling by software, more effective scheduling algorithms and consider a larger window of instructions)

  14. Advantages of the Code Morphing Software

  15. Crusoe Hardware Support for Code Morphing : Crusoe hardware has been designed specifically with dynamic translation in mind. • Crusoe's solution of exceptions • All registers holding x86 state are shadowed (two copies of each register, a working copy and a shadow copy) • Normal atoms only update the working copy of the register i) without encountering an exception : "commit" operation : copies all working register into shadow registers ii) exception occurs : "rollback" operation : copies the shadow register values back into the working registers.

  16. Cont. • Store operations by holding store data in a "gated store buffer " • Only released to the memory system at the time of a commit • On a rollback, stores not yet committed : dropped from the store buffer • Safe reordering loads ahead of stores (Alias Hardware) • The load  a "load-and-protect" (data, the address and size of data) • The store  a "store-under-alias-mask " (checks for protected regions) * In the event that the store operation overwrite the previously loaded data  the process raises an exception, and the runtime system can take corrective action.

  17. Sample Translation Code X86 instructions Translated VLIW molecule : They use 2 integer ALU atoms in a molecule

  18. LongRun Power Management • Crusoe was designed for good performance at very low power • Power = 1/2 CV2F • Reduce transistor count to decrease capacitance • Scale voltage and frequency dynamically to give just enough performance for current workload

  19. LongRun Power Management LongRun Power Management Dynamic Power Management • Frequency changes in steps of 33 MHz • Voltage changes in steps of 25mV • Supports up to 200 frequency/voltage changes per second • Can give cubic reductions in power consumption • Reduce C2 and F

  20. LongRun Power Management Conventional Power Profile

  21. LongRun Power Management LongRun Power Profile

  22. LongRun Power Management ACPI Standard • ACPI - Advanced Configuration and Power Interface • joint standard of Microsoft, Intel, and Toshiba • System level technique to reduce power • Allows three low-power states that can be alternated • AutoHALT - processor executes HLT instr • Processor stops its internal clock • QuickStart - Southbridge gives processor STPCLK signal • Processor maintains cache coherency • Deep Sleep - Southbridge disables processor CLK input • Southbridge maintains cache coherency

  23. LongRun Power Management ACPI vs. LongRun

  24. LongRun Power Management Intel Speed Step • Statically lowers voltage/frequency settings at startup • Two operating points: • AC power -- full performance • DC power -- slightly lower performance • Low granularity misses opportunities for power savings

  25. LongRun Power Management How LongRun Compares

  26. Performance The 700 MHz TM5400 was quoted as having comparable performance to a 500-550 MHz Pentium III. Transmeta didn't offer any conventional benchmarks. Rather, it compared the power utilized on a mobile Pentium III to the power utilized on a Crusoe when completing various tasks. It appears that Transmeta would like to dictate to the mobile industry that power is what it's all about, not speed. That is Transmeta's strong suit, but some normal benchmarks would have been nice. Why not show them? If Crusoe did well in those benchmarks, do you think Transmeta wouldn't show them? I'm convinced that the Crusoe is not performing as well as mobile AMD or Intel chips. For the markets it's aimed at, that's not too big a deal, but I'd like to know. - From a article by Rob Hughes, Jan 20, 2000

  27. Relative Performance While Mobile (on Batteries)TM5800 vs. Pentium III ULV 1.0 0.75 0.5 0.25 0 2001

  28. CPUmark99 v1.1 ComparisonCPU + Core Logic power Watt 8.0 6.0 4.0 2.0 0

  29. Business Graphics Winmark v1.1 ComparisonCPU + Core Logic power Watt 8.0 6.0 4.0 2.0 0

  30. Conclusion • Combination of hardware and software • Using software - To decompose complex instructions into simple atoms - To schedule and optimize the atoms for parallel execution  Saves millions of logic transistors Cuts power consumption (60~70%) Enabling aggressive code optimization techniques • LongRun power management  Cuts power consumption by factor of 2 to 10

More Related