250 likes | 392 Views
DIXIE Binary Translation and Optimization for Multiple ISAs. Computer Architecture Department Universitat Politècnica de Catalunya-Barcelona. www.ac.upc.es/dixie. UPC people involved. Roger Espasa Agustín Fernández Manel Fernández Victor Moya Juan Lopez Silvia Cernuda Antonio Parada
E N D
DIXIEBinary Translation and Optimizationfor Multiple ISAs Computer Architecture Department Universitat Politècnica de Catalunya-Barcelona www.ac.upc.es/dixie
UPC people involved Roger Espasa Agustín Fernández Manel Fernández Victor Moya Juan Lopez Silvia Cernuda Antonio Parada Albert Ribé Álex Ramírez
Dixie • Static binary translator • Accepts multiple ISAs (Alpha, x86, PPC, Mips, Convex) • Translates to a common IR (Dixie ISA) • Static binary instrumentation • Works on common IR but reflects source ISA • Static binary optimizer • Optimizes the common IR • Generates native code from common IR • Multiple targets supported also (Alpha, Mips) • Dixie Virtual Machine • Can run binaries specified in the common IR • Also runs binaries with mixture of common/native code
Dixie overview Alpha User specification JANGO Convex D I X I E C Alpha Alpha Convex Mips PowerPC PowerPC Dixie binary ... x86 Mips x86 Native ISAs Target ISAs S P E E D Y Mips DVM (Dixie Virtual Machine) Target binaries User simulator
Outline • Motivation • DIXIE Architecture • Debugging Tools • Performance • Summary
Outline • Motivation • DIXIE Architecture • Debugging Tools • Performance • Summary
Binary Translation • For embedded processors • Embedded market is • Rapidly moving • Changes processors frequently • Software (development, porting) is a major cost issue • Binary translation is cheaper than retargeting gcc • Goals • Retargeting must be FAST and EASY • Support different ISAs • Provide good debugging tools • To ease writing ISA description • To verify correctness of translations • Techniques • Static Translation (as much as possible) • Some Dynamic Translation (only if necessary)
Binary Optimization • Inevitably, binary translation introduces overheads • Use static and dynamic optimization to • Adapt better to new chip • Offset overheads of static binary translation • Goals • Eliminate overheads due to • Manual translation process • Intermediate ISA lack of expressiveness • Incremental development of the optimizer • Techniques • Static optimization (as much as possible) • Dynamic optimization (only if necessary) • Optimized blocks still run within Virtual Machine
Instrumentation • Instrumentation of program binaries • For computer architecture research • Due to lack of access to ‘exotic’ machines • Historical origin of Dixie… • Many classes of tools, but... • Different tools for different machines • Porting tools is difficult • Few tools allow research on vector machines or new ISAs • Lack of wrong-path information • Dixie goals • Cross-platform instrumentation • Research on multiple & discontinued ISAs • Full architecture coverage • Wrong-path information
Outline • Motivation • DIXIE Architecture • Debugging Tools • Performance • Summary
Dixie overview Alpha User specification JANGO Convex D I X I E C Alpha Alpha Convex Mips PowerPC PowerPC Dixie binary ... x86 Mips x86 Native ISAs Target ISAs S P E E D Y Mips DVM (Dixie Virtual Machine) Target binaries User simulator
Dixie compiler Alpha User specification JANGO Convex Alpha Alpha D I X I E C Convex Mips PowerPC PowerPC Dixie binary ... x86 Mips x86 Native ISAs Target ISAs Mips S P E E D Y DVM (Dixie Virtual Machine) Target binaries User simulator
Jango Alpha User specification JANGO Convex Alpha Alpha D I X I E C Convex Mips PowerPC PowerPC Dixie binary ... x86 Mips x86 Native ISAs Target ISAs Mips S P E E D Y DVM (Dixie Virtual Machine) Target binaries User simulator
Breakpoints: trace JANGO DIXIEC MOV.lo.32 r11,r10 mov a0,a1 MOV.lo.32 r11,r10 ld.w @8(a1),a2 TRACE vpc,r11,#8 LOAD.lo.32 r500,r11,#8 LOAD.lo.32 r12,r500,#0 sub.w #8,a2 LOAD.lo.32 r500,r11,#8 SUB.c2.32 r12,r12,#8 TRACE vpc,r500,#0 LOAD.lo.32 r12,r500,#0 SUB.c2.32 r12,r12,#8
Speedy & DVM Alpha User specification JANGO Convex Alpha Alpha D I X I E C Convex Mips PowerPC PowerPC Dixie binary ... x86 Mips x86 Native ISAs Target ISAs Mips S P E E D Y DVM (Dixie Virtual Machine) Target binaries User simulator
Speedy & DVM • Dixie binary is optimized by Speedy • Optimizations at basic block (BB) level • Translate Dixie BBs into native code • Generates .speedy sections • Dixie binary is runable on top of the DVM • Emulates the behavior of each Dixie instruction • Interpreting each Dixie instruction • Jumping into sequences of “Speedy” BBs • Interacts with the user simulator • Through trace instructions inserted by Jango • Maps target system calls into host system calls • Through DixOS
DVM Portability Big Endian Little Endian Power2 / AIX Sparc/SUNOS x86 / LINUX 32 bits Alpha / OSF1 IA64/LINUX MIPS / IRIX 64 bits • DVM runs on all major hardware combinations:
Speedy Architecture • Front End: Understands Dixie ISA • Optimizes Dixie Code (NOP, VPC, CSE) • Lowers Representation • Load Virtual Registers into physical registers • Local register allocation • Load large constants into registers • Back End: Translates Dixie ISA into target ISA • Instruction translation • Opcode selection • Big/Little endian memory access • Alignment issues • Peephole Optimizer • Recognize instruction sequences • Remove redundant loads • Remove redundant branches
Outline • Motivation • DIXIE Architecture • Debugging Tools • Performance • Summary
Debugging MOV.lo.32 r(TMP0),ui SHL.lo.32 r(TMP0),r(TMP0),32 AND.lo.32 (ra),r(rs),r(TMP0) CMPLT.c2.32 r(ICR(POSCRI(0))),r(ra),0 CMPGT.c2.32 r(ICR(POSCRI(1))),r(ra),0 CMPEQ.lo.32 r(ICR(POSCRI(2))),r(ra),0 AND.lo.32 r(TMP0),r(XER),0x80000000 CMPNE.lo.32 r(ICR(POSCRI(3))),r(TMP0),0 andiu. ra, rs, ui • Porting to a new ISA is not easy • Many “cut-and-paste” bugs • A trivial bug may take weeks to be found without appropriate tools • We would like developers to • “Test-as-you-go’’ every instruction description • Test each instruction almost in isolation • Quickly compare DVM and native results
Outline • Motivation • DIXIE Architecture • Debugging Tools • Performance • Summary
Performance • Benchmark suite • SPECint95 • Environment • DEC Alpha AXP-21264 running at 625 MHz • OSF/1 v4.0 • Two versions of the Dixie binaries • DVM: “pure” Dixie binaries • Speedy: Dixie binaries optimized using Speedy
DVM slowdown Alpha on Alpha
Outline • Motivation • DIXIE Architecture • Debugging Tools • Performance • Summary
Summary • Binary translation & optimization • Are becoming important tools in the embedded market • Promise lower development costs • When changing architectures • Are also of interest to major computer manufacturers • IA-64 emulation • Transmeta • FX!32 (now obsolete) • DIXIE • Robust tool that meets most translation demands • Multi-ISA, Multi-platform