290 likes | 436 Views
ISAMAP: Instruction Mapping Driven by Dynamic Binary Translation. Maxwell Souza, Daniel Nicácio and Guido Araujo. Motivation. Architecture diversity is increasing There is a need for legacy code to use new architecture features Code portability between architectures is also desirable
E N D
ISAMAP: Instruction Mapping Driven by Dynamic Binary Translation Maxwell Souza, Daniel Nicácio and Guido Araujo
Motivation • Architecture diversity is increasing • There is a need for legacy code to use new architecture features • Code portability between architectures is also desirable • Dynamic Binary Translation (DBT) enables it
ArchC • Processor description language • SystemC compatible • 8 researchers for the last 5 years • Features: • Fast interpreted/compiling simulation • Linux OS syscall emulation • Runs code directly from GCC (allows gdb support) • Processors: • MIPS, SPARC, PPC, 8051, ARM, OR10K, etc. • Runs Mediabench, Mibench and SPEC CInt • Simulation speed: from 100 KIPS to 570 MIPS
Instruction Set Architecture (AC_ISA) AC_ISA(mips1){ ac_format Type_R = "%op:6 %rs:5 %rt:5 %rd:5 0x00:5 %func:6"; ac_format Type_I = "%op:6 %rs:5 %rt:5 %imm:16"; ac_instr<Type_R>add; ac_instr<Type_I>load; ISA_CTOR(mips1) { add.set_asm("add %reg, %reg, %reg“, rd,rs,rt); add.set_decoder(op=0x00, func=0x20); load.set_asm("lw %reg, %imm(%reg)“,rt,imm,rs); load.set_decoder(op=0x23); }; }; Binary field Instruction declaration Decoding order
Architecture Resources (AC_ARCH) AC_ARCH(mips1){ ac_mem MEM:256K; ac_regbank RB:32; ac_reg lo,hi; ac_pipe PIPE = {IF,ID,EX,MEM,WB}; ac_format Fmt_EX_MEM = "%alures:32 %wdata:32 %rdest:5 %regwrite:1 %memread:1 %memwrite:1"; ac_reg<Fmt_EX_MEM> EX_MEM; ac_wordsize 32; ARCH_CTOR(mips1) { ac_isa("mips1_isa.ac"); . . . . }; };
Instruction Behavior (ac_behavior) void ac_behavior( Type_R, int stage ){ switch(stage){ case IF: case ID: /* Checking forwarding for the rs register */ if ( (EX_MEM.regwrite == 1) && (EX_MEM.rdest != 0) && (EX_MEM.rdest == ID_EX.rs) ) operand1 = EX_MEM.alures.read(); else if( (MEM_WB.regwrite == 1) && (MEM_WB.rdest != 0) && (MEM_WB.rdest == ID_EX.rs) ) operand1 = MEM_WB.wbdata.read(); else operand1 = RB.read(rs); ... default: break; } }
Jump and Branches Semantics • Additional information • jump() : target computation • delay() : conditional call.set_decoder(op=0x01); call.jump(ac_pc+(disp30<<2)); call.delay(1, true); call.behavior(writeReg(15, ac_pc)); be.set_decoder(op=0x00, cond=0x01, op2=0x02); be.branch(ac_pc+(disp22<<2)); be.cond(PSR_icc_z); be.delay(1, PSR_icc_z || !an);
ArchC Overview ArchC Description ArchC Pre-processor(acpp) ArchC IR Assembler Generator Simulator Generator Linker Generator Back-end Generator ISAMAP
ISAMAP • Instruction Mapping Description Driven by DBT • Descriptions use ArchC language ISA models • Source architecture ISA • Target architecture ISA • Mapping between source and target • Low-level ISA mapping
Instruction Set Architecture (AC_ISA) ISA(powerpc) { isa_format XO1 = "%opcd:6 %rt:5 %ra:5 %rb:5 %oe:1 %xos:9 %rc:1”; isa_instr <XO1> add, subf; isa_regbank r:32 = [0..31]; ISA_CTOR(powerpc) { add.set_asm(”add %reg %reg %reg", rt, ra, rb); add.set_decoder(opcd=31, oe=0, xos=266, rc=0); subf.set_asm(”subf %reg %reg %reg", rt, ra, rb); subf.set_decoder(opcd=31, oe=0, xos=40, rc=0); }
Instruction Set Architecture (AC_ISA) ISA(x86) { isa_format op1b_r32 = "%op1b:8 %mod:2 %regop:3 %rm:3"; isa_instr <op1b_r32> add_r32_r32, mov_r32_r32; isa_reg eax = 0; isa_reg ecx = 1; ... isa_reg edi = 7; ISA_CTOR(x86) { add_r32_r32.set_operands(”add %reg %reg", rm, regop); add_r32_r32.set_encoder(op1b=0x01, mod=0x3); mov_r32_r32.set_operands(”mov %reg %reg", rm, regop); mov_r32_r32.set_encoder(op1b=0x89, mod=0x3);
ISA Mapping isamap_instrs { isamap_instrs { add %reg %reg %reg; subf %reg %reg %reg; $0 $1 $2 $0 $1 $2 } = { } = { mov_r32_r32 edi $1; mov_r32_r32 edi $2; add_r32_r32 edi $2; sub_r32_r32 edi $1; mov_r32_r32 $0 edi; mov_r32_r32 $0 edi; }; }; (add) (subf)
ISAMAP Flow Target ISA ISA Mapping Source ISA acpp ArchC Host Code DBT Source DBT Libraries Compiler ISAMAP
Overall ISAMAP Structure • Standard DBT implementation • 16MB Code Cache • Block linkage (at first touch) • No traces • Syscall mapping • In addition it provides mapping support • Instruction semantics (load, store, branch, fp) • Register read/write status • Conditional mapping
Register Read Semantics • Avoids unnecessary register reads/writes add_r32_r32.set_asm (”add %reg, %reg", rm, regop); add_r32_r32.set_encoder(op1b=0x01, mod=0x3); add_r32_r32.set_read(regop); mov_r32_r32.set_asm(”mov %reg %reg", rm, regop); mov_r32_r32.set_encoder(op1b=0x89, mod=0x3); mov_r32_r32.set_write(rm);
Conditional Mappings isamap_instrs { or %reg %reg %reg; } = { if ($1 = $2) { mov_r32_m32disp edi $1; mov_m32disp_r32 $0 edi; } else { mov_r32_m32disp edi $1; or_r32_m32disp edi $2; mov_m32disp_r32 $0 edi; } }
Conditional Mapping (cont.) isamap_instrs { rlwinm %reg %reg %imm %imm %imm; } = { if($2 = 0) { mov_r32_m32disp edi $1; and_r32_imm32 edi mask32($3, $4); mov_m32disp_r32 $0 edi; } else { mov_r32_m32disp edi $1; rol_r32_imm8 edi $2; and_r32_imm32 edi mask32($3, $4); mov_m32disp_r32 $0 edi; };
Mapping PPC Instruction cmp 1 2 3 4 5 6 7 8 • Which Whx CR < > = ov 4 bits Which group out of 8?
Mapping PPC Instruction cmp (cont.) • Careful analysis pays off….
At the end: Optimization Steps • Local register allocation • Copy-propagation • Dead-code ellimination
ISAMAP vs. QEMU (Int) • Speed-ups ranging from 1.12 to 3.01
ISAMAP vs. QEMU (FP) • Not fair, as QEMU was not using SSE
ISAMAP Good Side • Allows for a fast implementation • Isolates the translator issues from mapping • Let the focus be on the mapping • Can reuse simulator descriptions
ISAMAP Bad Side • Does not allow high-level C descriptions • Still needs to go through asm details • But on the other hand…. • 1 PhD in one year for the tool • 4-6 months for both descriptions and the mapping (no previous experience)
Related Work • Dynamo • ADORE • Aries • Digital FX!32 • UQDBT • Yirr-Ma • DAISY • QEMU • IA-32 EL
Future Work • Additional issues • Self-modifying code • Cover more SPEC programs • Measure mapping vs. tool speedup contribution • Evaluate the translation overhead • From C to x86 • From C to PPC to x86 • Mappings to embedded engines
The End • Work supported by FAPESP and CNPq • Thanks for the feedback !!