320 likes | 1k Views
Emulation: Interpretation and Binary Translation. Haiyu Li lihaiyu@capp.snu.ac.kr. Content. Emulation Interpreter Decode-and-Dispatch Interpreter Threaded Interpretation Direct Threaded Interpretation with Predecoding Comparison
E N D
Emulation: Interpretation and Binary Translation Haiyu Li lihaiyu@capp.snu.ac.kr
Content • Emulation • Interpreter • Decode-and-Dispatch Interpreter • Threaded Interpretation • Direct Threaded Interpretation with Predecoding • Comparison • Interpreting a Complex Instruction Set (CISC)
IA-32 program FX!32 emulator Alpha ISA Host Emulation Guest Source ISA emulator Target ISA Host
Emulation • Emulation : Interpretation, binary translation • Interpretation: Fetching analyzing performing next
Interpreter Source Context Block Source Memory State Interpreter Code Interpreter Overview
Decode-and-dispatch interpreter • decodes an instruction • dispatches it to an interpretation routine based on the type of instruction.
Decode-and-dispatch interpreter While (!halt&&interrupt){ inst=code[PC]; opcode=extract(inst,31,6); switch(opcode){ case LoadWordAndZero:LoadWordAndZero(inst); case ALU:ALU(inst); case Branch: Branch(inst); · · · · · } Code for Interpreting the PowerPC Instruction Set Architecture.
Instruction function list LoadWordAndZero(inst){ RT=extract(inst,25,5); RA=extract(inst,20,5); displacement= source=regs[RA]; address=source+displacement; regs[RT]=(data[address]<<32)>>32 PC=PC+4; } ALU(inst){ RT=extract(inst,25,5); RA=extract(inst,20,5); RB=extract(inst,15,5); source1= source2= extended_opcode=extract(inst,10,10); switch(extended_opcode){ case Add:Add(inst); case AddCarrying: · · · · · · ·} PC=PC+4; } Decode-and-dispatch interpreter
Decode-and-dispatch interpreter • Advantage: • Low memory requirements • Zero star-up time • Disadvantage: • Steady-state performance is slow • A source instruction must be parsed each time it is emulated • Put a lot of pressure on the cache • Branches .
Switch(opcode) case return Decode-and-dispatch interpreter While (!halt&&interrupt){ switch(opcode){ case ALU:ALU(inst); · · · · · } 1.Switch statement->case Indirect 2.ALU(inst) direct 3.Return from the routine Indirect 4.Loop back-edge direct
Switch(opcode) case case case Threaded Interpretation • Put the dispatch code to the end of each of the instruction interpretation routines. Instruction function list Add: RT=extract(inst,25,5); RA=extract(inst,20,5); RB=extract(inst,15,5); source1=regs[RA]; source2=regs[RB]; sum=source1+source2; regs[RT]=sum; PC=PC+4; If (halt || interrupt) goto exit; inst=code[PC]; opcode=extract(inst,31,6); extended_opcode=extract(inst,10,10); routine=dispatch[opcode,extended_opcode]; goto *routine; }
Threaded Interpretation • Advantage: • low Memory requirements (more than decode-and-dispatch interpretation) • Start-up time is zero • Disadvantage: • Steady-state performance is slow (better than decode-and-dispatch interpretation) - Indirect branch
Predecoding Parsing an instruction, putting it in a form (intermediate form) LoadWordAndZero(inst) RT=extract(inst,25,5); RA=extract(inst,20,5); RB=extract(inst,15,5); RT= code[TPC].dest; RA= code[TPC].src1; displacement=code[TPC].src2 lwz r1, 8(r2) ;load word and zero Add r3, r3, r1 ; r3=r3+r1 stw r3, 0(r4) ;store word Intermediate form
Predecoding • Struct instruction{ unsigned long op; unsigned char dest; unsigned char src1; unsigned int src2; } code [CODE_SIZE Fast interpretation, but Memory requirements are high.
Direct Threaded Interpretation • the instruction codes contained in the intermediate code can be replaced with the actual addresses of the interpreter routines. If (halt || interrupt) goto exit; opcode= code[TPC].op; routine=dispatch [opcode]; goto *routine; If (halt || interrupt) goto exit; routine= code[TPC].op; goto *routine;
Comparison Source codesource codeInterpreter routinesSource codeInterpreter routines dispatch loop (a)(b ) ( c)
Comparison Intermediate code Interpreter routines Source code (d) Predecoder
RISC ISA & CISC ISA • RISC ISA (Power PC) 32 bit register. 32bit length. 0 31 25 20 15 10 Register-register 0 31 25 20 15 Register-immediate 2 Jump/call
Interpreting a Complex Instruction Set CISC instruction sets have wide variety of formats, variable instruction lengths, and even variable field lengths IA-32 Instruction Format Address Displacement Of 1,2,or 4 Bytes or none Immediate data Of 1,2,or 4 Bytes or none Up to four Prefixes of 1 byte each (optional) 1byte (if required) 1-,2-,or 3-byte opcode 1byte (if required) 7 6 5 3 2 0 7 6 5 3 2 0
IA-32 Instruction set • ModR/M: e.g. Mod=10 R/M=000 REG=001 ModR/M=10001000=88H Effective Address = [EAX]+disp32
IA-32 Instruction set • SIB e.g. SS=01=2H; Index=000; Base=001; SIB value=01000001=41H; Scales Index=[EAX*2] Mod bits Effective Address 00 [scales index]+disp32 01 [scales index]+disp8+[EBP] 10 [scales index]+disp32+[EBP]
IA-32 Instruction set • Add AX, 8[ESI]; • Effective addr=ESI+disp • Disp=8 • AX=AX+mem[8+[ESI]]; Mod=01; R/M=110; Reg=000; AX
Interpreting a Complex Instruction Set • Program Flow for a Basic CISC ISA Interpreter • Figure 2.12 (code) General Decode (fill-in instruction Structure) Dispatch Inst.1 Specialized routine Inst.1 Specialized routine Inst.1 Specialized routine ...
Interpreting a Complex Instruction Set • Interpreter Based on Optimization of Common Cases Dispatch On first byte Simple Inst.1 Specialized routine Simple Inst.m Specialized routine Complex Inst.m+1 Specialized routine Complex Inst.m+1 Specialized routine Prefix Set flags ... ... Shared routines
Simple Instruction Specialized routine Simple Instruction Specialized routine Simple Instruction Specialized routine Simple Instruction Specialized routine Simple Decode/ Dispatch Simple Decode/ Dispatch Simple Decode/ Dispatch Simple Decode/ Dispatch Interpreting a Complex Instruction Set • Threaded Interpreter for a CISC ISA Complex Decode/ Dispatch ... ...