310 likes | 522 Views
TM5400/5600 TM5500/5800 TM6000. Jason Law Byeong Kil Lee. Outline. Crusoe technology Crusoe processors / architecture Code morphing software Crusoe hardware support for code morphing LongRun power management Performance comparison Conclusion. Crusoe Technology.
E N D
TM5400/5600TM5500/5800TM6000 Jason Law Byeong Kil Lee
Outline • Crusoe technology • Crusoe processors / architecture • Code morphing software • Crusoe hardware support for code morphing • LongRun power management • Performance comparison • Conclusion
Crusoe Technology • Crusoe processor = Software + hardware • Code Morphing software • Dynamically translates x86 instructions into VLIW instructions • Provides x86 compatibility • Optimization and scheduling by software 3/4 • VLIW hardware • 128 bit Very long Instruction Word Processor • Simple and fast • Fewer transistors 1/4 Low power x86 compatibility PC performance
Crusoe Processors L1 cache : 128 K DDRAM-SDRAM (100 to 133MHz) SDRAM (66 to 133MHz)
Features • Lighter • Longer • Cooler • x86 compatibility (windows / Linux) • Upgradeable (by software) • Lower cost • MMX support ( not support for SSE / 3dnow! ) • Target : ultra-light mobile notebooks, internet appliance, high-density servers, embedded devices • Products : SONY, Fusitsu, NEC, RLX technology, ….
Crusoe Architecture TM5800
Cont. • VLIW CPU : executing up to 4 operations in each cycle • Molecule: long instruction word (128 bits molecule) • All atoms within a molecule are executed in parallel, in order • 2 ALU, 1FP, 1 load/store, 1 branch unit • In-order 7-stage integer/10-stage FP pipeline • 64 integer registers, 32 FP registers
Crusoe vs. x86 • The blue stuff is silicon, and the yellow is software • Crusoe's blue part is smaller • All of those hardware was moved off the die and into software
Code Morphing Software : A dynamic translation system, reside in a ROM, First program to start executing when booting • Drawing the H/W and S/W line • Software: decoding x86 instructions and generating parallel molecule • Hardware: execute using a simple, high-speed VLIW engine • Decoding and scheduling • Translation cache : CMS translates instructions once, saving the resulting translation for re-use Skip the translation in the next time
Code Morphing Software Caching • Translation cache : • Resides in a separate memory space • The size can be set at boot time, or OS can make the size adjustable • Crusoe’s CMS monitor actual execution • Keep track of which blocks of code execute most often Optimizes them accordingly • Keep track of which branches are most often taken Annotate the code accordingly
Code Morphing Software Filtering & Prediction • Filtering : a wide choice of execution modes for x86 code • Interpretation (no translation overhead), • Translation, • Highly optimized code(takes longest to generate) : Run faster once translated • Prediction • Highly biased branch : frequently taken path • Otherwise : execute both path, select later
Code Morphing Software Translation Process • 1st pass (frontend) • Translate the x86 instructions into a simple sequences of atoms (temporary register used) • 2nd pass(optimizer) • Well-known compiler optimization Common subexpression elimination, loop invariant removal, Dead code elimination • 3rd pass (scheduler) : • Reorders the optimized atoms and groups them into individual molecules (Scheduling by software, more effective scheduling algorithms and consider a larger window of instructions)
Crusoe Hardware Support for Code Morphing : Crusoe hardware has been designed specifically with dynamic translation in mind. • Crusoe's solution of exceptions • All registers holding x86 state are shadowed (two copies of each register, a working copy and a shadow copy) • Normal atoms only update the working copy of the register i) without encountering an exception : "commit" operation : copies all working register into shadow registers ii) exception occurs : "rollback" operation : copies the shadow register values back into the working registers.
Cont. • Store operations by holding store data in a "gated store buffer " • Only released to the memory system at the time of a commit • On a rollback, stores not yet committed : dropped from the store buffer • Safe reordering loads ahead of stores (Alias Hardware) • The load a "load-and-protect" (data, the address and size of data) • The store a "store-under-alias-mask " (checks for protected regions) * In the event that the store operation overwrite the previously loaded data the process raises an exception, and the runtime system can take corrective action.
Sample Translation Code X86 instructions Translated VLIW molecule : They use 2 integer ALU atoms in a molecule
LongRun Power Management • Crusoe was designed for good performance at very low power • Power = 1/2 CV2F • Reduce transistor count to decrease capacitance • Scale voltage and frequency dynamically to give just enough performance for current workload
LongRun Power Management LongRun Power Management Dynamic Power Management • Frequency changes in steps of 33 MHz • Voltage changes in steps of 25mV • Supports up to 200 frequency/voltage changes per second • Can give cubic reductions in power consumption • Reduce C2 and F
LongRun Power Management Conventional Power Profile
LongRun Power Management LongRun Power Profile
LongRun Power Management ACPI Standard • ACPI - Advanced Configuration and Power Interface • joint standard of Microsoft, Intel, and Toshiba • System level technique to reduce power • Allows three low-power states that can be alternated • AutoHALT - processor executes HLT instr • Processor stops its internal clock • QuickStart - Southbridge gives processor STPCLK signal • Processor maintains cache coherency • Deep Sleep - Southbridge disables processor CLK input • Southbridge maintains cache coherency
LongRun Power Management ACPI vs. LongRun
LongRun Power Management Intel Speed Step • Statically lowers voltage/frequency settings at startup • Two operating points: • AC power -- full performance • DC power -- slightly lower performance • Low granularity misses opportunities for power savings
LongRun Power Management How LongRun Compares
Performance The 700 MHz TM5400 was quoted as having comparable performance to a 500-550 MHz Pentium III. Transmeta didn't offer any conventional benchmarks. Rather, it compared the power utilized on a mobile Pentium III to the power utilized on a Crusoe when completing various tasks. It appears that Transmeta would like to dictate to the mobile industry that power is what it's all about, not speed. That is Transmeta's strong suit, but some normal benchmarks would have been nice. Why not show them? If Crusoe did well in those benchmarks, do you think Transmeta wouldn't show them? I'm convinced that the Crusoe is not performing as well as mobile AMD or Intel chips. For the markets it's aimed at, that's not too big a deal, but I'd like to know. - From a article by Rob Hughes, Jan 20, 2000
Relative Performance While Mobile (on Batteries)TM5800 vs. Pentium III ULV 1.0 0.75 0.5 0.25 0 2001
CPUmark99 v1.1 ComparisonCPU + Core Logic power Watt 8.0 6.0 4.0 2.0 0
Business Graphics Winmark v1.1 ComparisonCPU + Core Logic power Watt 8.0 6.0 4.0 2.0 0
Conclusion • Combination of hardware and software • Using software - To decompose complex instructions into simple atoms - To schedule and optimize the atoms for parallel execution Saves millions of logic transistors Cuts power consumption (60~70%) Enabling aggressive code optimization techniques • LongRun power management Cuts power consumption by factor of 2 to 10