530 likes | 772 Views
Virtual Machines: An Architecture Perspective. November 2004 J. E. Smith. Introduction. Why are virtual machines interesting? They involve computer architecture in a pure sense They allow transcending of interfaces (which often seem to be an obstacle to innovation)
E N D
Virtual Machines: An Architecture Perspective November 2004 J. E. Smith
Introduction Why are virtual machines interesting? They involve computer architecture in a pure sense They allow transcending of interfaces (which often seem to be an obstacle to innovation) They enable innovation in flexible, adaptive hardware, security, fault-tolerance, support for network computing (and others) VMs (c) 2004, J. E. Smith
Performance Isn’t Everything • The BIG ideas are all at least 20 years old and they have been very thoroughly explored • Focus research on other important areas • Power efficiency • Performance efficiency • Security • Ease of design • Software compatibility / interoperability • Virtual Machines can be important enablers for all the above VMs (c) 2004, J. E. Smith
Outline • Virtualization • The Family of Virtual Machines • Process VMs and Code Caching • High Level Language VMs • Co-Designed VMs • Research in Co-Designed VMs VMs (c) 2004, J. E. Smith
file file abstraction Abstraction Software 1 Application Programs • Computer systems are built on levels of abstraction • Instruction Set Architecture • Major division between hardware and software 2 Libraries • Application Binary Interface • Observed by user processes • User ISA + OS calls • Higher level of abstraction hide details at lower levels • Example: files are an abstraction of a disk 3 3 Operating System 4 5 6 Memory Drivers Scheduler Manager 8 8 8 8 7 7 Execution Hardware 9 Memory 10 10 Translation System Interconnect (bus) 11 11 12 Controllers Controllers 13 14 I/O devices Main and Memory Networking Hardware VMs (c) 2004, J. E. Smith
Virtualization • An isomorphism from guest to host • Map guest state to host state • Implement “equivalent” functions e (S ) i S S i j Guest V( S ) V( S ) i j e '(S ') i S ' S ' i j Host VMs (c) 2004, J. E. Smith
virtualization file file Virtualization • Similar to abstraction Except • Details not necessarily hidden • Construct Virtual Disks • As files on a larger disk • Map state • Implement functions • Now do the same thing with the whole “machine” VMs (c) 2004, J. E. Smith
The Family of Virtual Machines • Lots of things are called “virtual machines” IBM VM/370 Java VMware Some things not called “virtual machines”, are virtual machines IA-32 EL Dynamo Transmeta Crusoe VMs (c) 2004, J. E. Smith
System Virtual Machines guest guest guest guest guest guest • Provide a system environment • Constructed at ISA level • Persistent • Examples: IBM VM/360, VMware, Transmeta Crusoe process process process process process process Guest OS2 Guest OS VMM VMM HOST PLATFORM virtual network communication VMs (c) 2004, J. E. Smith
Virtual Virtual Machine Machine Virtual Machine VMM VMM Host OS VMM Host OS Hardware Hardware Hardware System Virtual Machines • Native VM System • VMM privileged mode • Guest OS user mode • Example: classic IBM VMs • User-mode Hosted VM • VMM runs as user application • Dual-mode Hosted VM • Parts of VMM privileged, parts non-privileged • Example VMware Non-privileged modes Privileged Mode VMs (c) 2004, J. E. Smith
Process Virtual Machines • Constructed at ABI level • Runtime manages guest process • Guest processes may intermingle with host processes • Not persistent • As a practical matter, guest and host OSes are often the same • Dynamic optimizers are a special case • Examples: IA-32 EL, FX!32, Dynamo guest host process process runtime guest guest host process process process runtime runtime create HOST OS file sharing Disk network communication VMs (c) 2004, J. E. Smith
The Virtual Machine Space Process VMs System VMs different different same ISA same ISA ISA ISA Classic Whole Dynamic Multi OS VMs System VMs Translators programmed Systems Hosted Co-Designed Dynamic HLL VMs VMs VMs Binary Optimizers VMs (c) 2004, J. E. Smith
Architecture Issues: System VMs • Why System VMs are of interest today • Security & Fault Tolerance (isolation) • Platform Consolidation • Application/Environment portability • “Efficiently Virtualizable” Instruction Sets • Goldberg and Popek (1974) should still be required reading (An architecture paper with theorems and proofs!) • Virtual Machine Assists • Compensate for inefficiencies due to privilege level “compression” • Fast emulation of system functions • Many developed for IBM mainframe VMs VMs (c) 2004, J. E. Smith
System Virtualization Application system call/trap • Traps and interrupts (& sys calls) • Transfer to VMM • VMM determines appropriate Guest OS • VMM transfers to Guest OS • Guest performs privileged operation • Trap to VMM • VMM reads/modifies guest state • May modify shadow state • Returns to Guest • Guest OS “return” to user app. • Transfer to VMM • VMM bounces return back to Guest app. Guest OS privileged operation next instruction virtual vector location: VMM check privileges perform operation return vector location: VMs (c) 2004, J. E. Smith
Popek and Goldberg (in brief) • Control Sensitive instructions • All instructions that change hardware resource allocation (or mapping) • Example: write TLB • Behavior Sensitive instructions • All instructions whose outcome depends on hardware resource allocation • Example: read processor mode • Theorem (paraphrase) • Efficiently virtualizable if all sensitive instructions trap in user mode VMs (c) 2004, J. E. Smith
System VM Research • Architecture Challenge: • Make IA-32 efficiently virtualizable • Virtual Machine Assists • Compensate for inefficiencies due to privilege level “compression” • Fast emulation of system functions • Many developed for IBM mainframe VMs • Applications to Chip Multiprocessors • Technology changes often require innovation and “re-invention” VMs (c) 2004, J. E. Smith
The Virtual Machine Space Process VMs System VMs different different same ISA same ISA ISA ISA Classic Whole Dynamic Multi OS VMs System VMs Translators programmed Systems Hosted Co-Designed Dynamic HLL VMs VMs VMs Binary Optimizers VMs (c) 2004, J. E. Smith
Architecture Issues: Process VMs • Generally to allow application migration • Or to run popular software on a less popular platform • Goal is generally to minimize performance loss • Same-ISA dynamic optimizers are special case • HP Dynamo • Architecture problems • Efficient code-caching • Indirect jump problem • Protecting runtime from guest process VMs (c) 2004, J. E. Smith
Staged Emulation with Code Caching • Start interpreting • Profile to find “hot” code regions • An important part of many VM implementations • Translate, optimize & cache frequent code sequences Interpreter Binary Memory Profile Data runtime Code Cache Image Translator/ Optimizer VMs (c) 2004, J. E. Smith
Superblocks • Based on “hot” paths • One entry multiple exits • May contain redundant blocks (tail duplication) A A B D B D F C E F C E G G G G 15 15 VMs (c) 2004, J. E. Smith
x86 Binary 4FD0: addl %edx,(%eax) ;load and accumulate sum movl (%eax),%edx ;store to memory sub %ebx,1 ;decrement loop count jz 51C8 ;branch if at loop end 4FDC: add %eax,4 ;increment %eax jmp 4FD0 ;jump to loop top 51C8: movl (%ecx),%edx ;store last value of %edx xorl %edx,%edx ;clear %edx jmp 6200 ;jump elsewhere PowerPC Translation 9AC0: lwz r16,0(r4) ;load value from memory add r7,r7,r16 ;accumulate sum stw 0(r5),r7 ;store to memory subi. r5,r5,1 ;decrement loop count, set cr0 bez cr0,pc+12 ;branch if loop exit bl F000 ;branch & link to EM 4FDC ;save source PC in link register 9AE4: bl F000 ;branch & link to EM 51C8 ;save source PC in link register 9C08: stw 0(r6),r7 ;store last value of %edx subi r7,r7,r7 ;clear %edx bl F000 ;branch & link to EM 6200 ;save source PC in link register Binary Translation Example VMs (c) 2004, J. E. Smith
Code Caches • Contain • Basic blocks • Superblocks (one entrance, multiple exits) • Optimized Superblocks • A base technology for many VMs • Dynamic binary translators: Intel IA-32 EL, Compaq FX!32 • Dynamic binary optimizers: Dynamo family • Co-designed virtual machines: Transmeta, IBM DAISY • High performance Java virtual machines • System VMs with “inefficiently virtualizable” ISAs • “Sandboxing” secure VMs (x86 DynamoRIO) VMs (c) 2004, J. E. Smith
Superblock Dispatch table lookup code Superblock Superblock Superblock With chaining Indirect Jumps • Translated code cache PC (TPC) differs from Source binary PC (SPC) • Need branch/jump target address translation • (Direct) branches are easier; target address is fixed • Chaining can be used Superblock Dispatch table lookup code Superblock Superblock Without chaining VMs (c) 2004, J. E. Smith
The Indirect Jump Problem • Target addresses (SPCs) can change • SPC needs to be translated at run-time, not translation time • Conventional solution: superblock construction-time software prediction (aka inline caching) • If Rx == #addr_1 goto #target_1 • Else if Rx == #addr_2 goto #target_2 • Else dispatch_table_lookup(Rx); do it the slow way • The biggest overhead in code caches • Compare-and-branch: 6 instructions • Hash table lookup: 15 instructions in Dynamo x86 VMs (c) 2004, J. E. Smith
Protecting the Runtime Runtime mode Emulation mode N N • The runtime shares process memory space with application • Must protect runtime from application • Expensive memory protection changes on switches between runtime and code cache • If guest registers are mapped to host memory • How are memory mapped registers protected? Runtime Runtime R/W N Data Data Runtime Runtime N Ex Code Code Ex R/W Code Cache Code Cache N N R/W R/W Guest Data Guest Data Guest Code R/W Guest Code R VMs (c) 2004, J. E. Smith
Process VM Research • Same-ISA dynamic binary optimizers are probably not a winning proposition • Indirect jumps lead to performance losses on modern processors • (optimizers with patching are better) • Complete (intrinsic) compatibility is extremely difficult • May have to rely on extrinsic assurances • Topic of architecture research similar to Goldberg and Popek • For general process VMs some primitive support in ISA will be useful / necessary • Indirect jumps (more later) • Code caching • Protection VMs (c) 2004, J. E. Smith
Computer Architecture Innovation HLL VMs – software people invent ISA to solve SW problems Co-Designed VMs – hardware people invent ISA to solve HW problems These two are the most interesting VMs from an architecture perspective and provide the biggest opportunities. VMs (c) 2004, J. E. Smith
The Virtual Machine Space Process VMs System VMs different different same ISA same ISA ISA ISA Classic Whole Dynamic Multi OS VMs System VMs Translators programmed Systems Hosted Co-Designed Dynamic HLL VMs VMs VMs Binary Optimizers VMs (c) 2004, J. E. Smith
HLL Program HLL Program Compiler front-end Compiler Intermediate Code Portable Code Virtual ISA ( ) Compiler back-end VM loader Object Code ISA ( ) Virt. Mem. Image VM Interpreter/Translator Loader Memory Image Host Instructions HLL VM Traditional High Level Language Virtual Machines • Raise the “ABI” level of abstraction • User higher level virtual ISA • OS abstracted as standard libraries • A form of process VM VMs (c) 2004, J. E. Smith
Architecture Issues: High Level VMs • Examples: • Sun Java • Microsoft .NET Framework and MSIL • Why are HLL VMs important? • Microsoft says so. • It’s a good idea. • Combines object oriented programming and network computing VMs (c) 2004, J. E. Smith
HLL VMs: Architecture Perspective • Here, architects were deprived (or let themselves be deprived) of some interesting architecture work • Don’t look at it bottom-up, i.e. • Take existing software for supporting HLL VMs, • Generate traces for standard ISAs, • Analyze traces • Conclude its “just like C”… problem solved! • Look top-down – start with features of MSIL and look for computer architecture opportunities • Will require a mix of hardware and software innovation (else just continue to ignore real architecture in favor of implementation) VMs (c) 2004, J. E. Smith
HLL VM Research • Metadata – an interesting concept • Data Set Architecture • Don’t have to discover data structures • – compare with C programs. Virtual Machine Implementation Machine Independent Program File Internal Data Loader Structures Metadata Interpreter Code Native Code Translator VMs (c) 2004, J. E. Smith
HLL VM Research • Precise trap model • Problems in conventional processors: • All state precise • Many instructions can trap • Enable/disable “remote” and at any time • HLL VMs • Not all state must be precise • PC not needed • operand stack never • local variables only if trap is handled locally • Trap enable explicit and locally specified VMs (c) 2004, J. E. Smith
HLL VM Research • Stack tracking • At any given point, operand stack must have same number of elements and types regardless of control flow path • This property could simplify exploitation of control independence VMs (c) 2004, J. E. Smith
HLL VMs Summary • Claim: Slow-downs due to OO programming, probably not dynamic compilation – and not stack-based ISA • Research opportunities abound • For VM implementation • For speeding up OO programs (look beyond C/C++) • Use co-designed HW/SW • Base design on MSIL/Java and implement conventional ISA as the uncommon case VMs (c) 2004, J. E. Smith
The Virtual Machine Space Process VMs System VMs different different same ISA same ISA ISA ISA Classic Whole Dynamic Multi OS VMs System VMs Translators programmed Systems Hosted Co-Designed Dynamic HLL VMs VMs VMs Binary Optimizers VMs (c) 2004, J. E. Smith
User Applications User Applications libs. libs. OS OS V-ISA ISA Software I-ISA Hardware Hardware Co-Designed Virtual Machines • Separate the hardware/software interface from the ISA level of abstraction • Restore the ISA to its “natural” place as an Implementation ISA that reflects actual hardware • Support existing ISAs as a Virtual ISA • Let processor designers use both hardware and software • A form of system VM VMs (c) 2004, J. E. Smith
Co-Designed VMs • Should be of interest to both architects and micro-architects • Offers opportunities for performance, power saving, fault tolerance and other implementation-dependent features • Allows transcending conventional ISAs • Don’t confuse them with VLIW! VMs (c) 2004, J. E. Smith
Architecture Issues: Concealed Memory • VM software resides in memory concealed from all conventional software Code ICache Cache concealed memory Hierarchy VM Code Processor VM Data Core Source ISA Code DCache conventional Hierarchy Source ISA Data memory VMs (c) 2004, J. E. Smith
Another Way of Doing Things conventional Func. Unit Translation Func. Cache Processor Main Memory Unit Unit Hierarchy Pipeline (form uops) . .. Func. Unit Main Memory dynamic translation Software Func. Translator Unit . .. Processor Cache Code Cache Translation Hierarchy Pipeline Unit (form uops) Func. Unit VMs (c) 2004, J. E. Smith
Jump insn JTLT Register file Jump Target SPC Register identifier SPC SPCTPC Jump Target TPC Hit? Match? BTB misprediction: Redirect fetch to jump target TPC from JTLT Yes No Yes BTB prediction correct JTLT miss: Redirect fetch to the dispatch code No Jump Target-address Lookup Table • A hardware cache of dispatch table entries • Similar to software-managed TLB in virtual memory BTB Jump insn TPC Tag TPC Predicted next fetch TPC VMs (c) 2004, J. E. Smith
Push-dual-address-RAS insn SPC TPC SPC TPC Dual-address RAS Dual-address RAS • Problem: function call instruction saves return SPC not TPC • Conventional software-based chaining cannot utilize a RAS • Solution: save both SPC and TPC JTLT SPC TPC VMs (c) 2004, J. E. Smith
original sw_pred.sw_pred sw_pred.sw_pred (private dispatch) sw_pred.ras jtlt.ras 2.4 2.2 2 1.8 IPC 1.6 1.4 1.2 1 0.8 164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf H.mean IPC performance • “Translate” Alpha to Alpha; start with highly optimized code • Conventional method (ala Dynamo) results in 14% IPC loss • Dual-address RAS provides the most benefit • Using both JTLT & RAS, 7.7% IPC improvement • Due to superblock re-layout VMs (c) 2004, J. E. Smith
Research: Efficient Microarchitectures • Wide pipelines are at odds with fast pipelines • Fast pipeline => low complexity per stage • More instructions per stage => high complexity per stage • Process larger atomic units in pipeline stages • Narrower “effective” width • Reduce decoding stages • Do more in software • Pipeline the issue stage VMs (c) 2004, J. E. Smith
Fused Instruction Set • Co-designed VM x86 implementation • Shorten and simplify pipeline front-end • Combine pairs of dependent instructions • For single “unit” for pipeline processing • Use VM software to • “Crack” x86 instructions into RISC-ops • Re-order RISC-ops • Reassemble into (new) fused pairs • Related: Pentium-M fuses in front-end • Using original x86 instructions VMs (c) 2004, J. E. Smith
Conventional Issue Logic • Select and issue instructions free of data dependences • Based on the selection, clear dependences • And “wake-up” newly independent instructions • Single cycle select-wakeup important for good performance Issue Buffer OP R1 R2 Imm. OP R6 R1 R7 fanout/ select wakeup VMs (c) 2004, J. E. Smith
Pipelined Issue Logic • Fuse dependent instructions into single slot • Fused instructions traverse entire pipeline • Make single issue decision for the pair VMs (c) 2004, J. E. Smith
call 0x080af30e (21bit disp) F 10b opcode 21-bit Immediate/Displacement jcc 0x080115a0 jmp 0x080C0988 LIMM.lo Redx, LO(0x0810a7de) F 10b opcode 16-bit immediate / Disp 5b Rds LIMM.hi Redx, HI(0x0810a7de) CMP.cc Reax, 0x4000 LD Reax, mem[Resp + F8] F 10b opcode 11b Immd/Disp 5b Rsr 5b Rds ST Reax, mem[Rebp + 4C] ADD Reax, Rebx, 4c ADD Reax, Redx, Rebx F 16-bit opcode 5b Rsr 5b Rsr 5b Rds Fmac Facc, Fmp1, Fmp2 LD Reax, mem[Rebx + Rebp] à mov esp, ebp MOV Resp, Rebp F 7b op 4b Rs 4b Rd à mov eax,[esp] LD Reax, mem[Resp] à add eax, edx ADD Reax, Redx à sub ecx, 4 SUB Recx, 4 F 7b op 4b I 4b Rd à shr esi, 2 SHR Resi, 2 à inc ecx INC Recx, 1 jcc 3e e.g. jnz 3e F 7b op 8b Immd/Disp Instruction Set VMs (c) 2004, J. E. Smith
Translation Algorithm Two Pass Algorithm: 1. Form superblocks using Dynamo MRET method 2. Crack x86 instructions into RISC-like micro-ops 3. Attempt to fuse ALU ops only 4. Fuse LD/ST instructions as tails and ALU ops as heads VMs (c) 2004, J. E. Smith
100% 90% 80% 70% 60% 50% Percentage of Dynamic Instructions ALU 40% FP or NOPs 30% BR ST 20% LD 10% Fused 0% 175.vpr 176.gcc 181.mcf 252.eon 254.gap Average 164.gzip 300.twolf 186.crafty 256.bzip2 255.vortex 197.parser 253.perlbmk Fusing Profile • About 50% of operations are fused • Only 5-10% of non-fused are single-cycle ALU ops VMs (c) 2004, J. E. Smith