420 likes | 572 Views
Windows Virtualization Best Practices And Future Hardware Directions. Benjamin Armstrong Program Manager Virtualization Microsoft Corporation. David Wooten Hardware Architect System Integrity Group Microsoft Corporation. Goals. After this session, you will
E N D
Windows Virtualization Best Practices And Future Hardware Directions Benjamin ArmstrongProgram ManagerVirtualizationMicrosoft Corporation David WootenHardware ArchitectSystem Integrity GroupMicrosoft Corporation
Goals • After this session, you will • Better understand how a Microsoft Windows virtualization virtual machine (VM) environment differs from a physical machine • Know what to do to ensure that your software works well within a VM
Agenda • Virtual machine hardware • Virtualization impacts on • Processor • Storage • Networking • Video • Understanding isolation • Development opportunities
Hardware Equivalency • Virtual machines (VMs) shouldaim to achieve a high level ofhardware equivalency • Most software solutions ‘just work’ • Not always possible to have100% equivalency • Awareness of differences in theVM environment can help you to deliver a better solution for your customers on a virtual platform
VSPs And VSCs • Windows virtualization will provide a setof core VSPs and VSCs plus emulated hardware on supported platforms • Core VSP/VSCs will be included for storage, networking, input and video • You cannot modify the core VSP/VSCs
Emulated Hardware • The initial release of Windows virtualization will always exposea limited set of emulated hardware • S3 Trio 64 Video card • DEC 21140 Network card • Etc. • It is possible to reconfigure theemulated hardware • It is not possible to change the typeof hardware being emulated
Processor Topology • Changing processor type and topology inside of VMs under Windows virtualization is possible • Processor changes require a cold boot of the VM • Do not make assumptions that • The number of processors won’t change • The core-to-processor ratio won’t change • The processor type won’t change • Hot add of virtual processors planned for Windows Server 2003 and Windows Server codenamed “Longhorn” guest operating system • Each VM is a single NUMA node
Processor Scheduling • Each virtual processor “believes” that it has 100% of its physical processor resources and that time is accurate • This is not always true • Physical processors can be oversubscribed • Resource limits can be configured • Hypervisor is responsible for scheduling of virtual processors • High-precision timing inside of VMs is usually, but not always, guaranteed to be accurate
Processor • User-mode code • Mostly, no noticeable change to user-mode code • Use CPUID to determine what is available • Processor features might be subset of physical machine • Do not assume all processors are always running at the same time • Affects parallel execution code
Processor • Kernel-mode code • Don’t access processor structures directly(CRs, DRs, MSRs, PMC) • This is very expensive • Don’t use CPUID as a synchronizing instruction • Use fences instead • Don’t assume CLI/STI gives accurate timing • Interrupts will still happen • Don’t use RDTSC accesses for timing • This is highly volatile • Don't rely on processor performance counters • Counters don't work outside of the parent partition
Storage • Storage is completely encapsulatedand the VM is not aware of this • Unless you are using pass-through storage • Do not assume performance characteristicsof storage devices • Do not assume that CDs are slow (ISOs are fast) • Do not assume that hard disks are fast(might be on a network) • Do not assume that floppy disks are slow • Emulated storage controllers are • Intel 440BX controller • AIC 7870 SCSI controller
Persistency Of Storage • Technologies like differencing disks, and snapshots mean that traditionally persistent storage might not be persistent any more • Your software may find itself arbitrarily moved back to an older point in time • Patches may be applied andthen ‘undone’ • Changes to storage persistencyare always user initiated
Networking • Routing through host networkadapter performed at OSI Layer 2 • Host network security software provides no protection • Unless the host is manually configuredto route the VM’s network traffic at a higher OSI Layer • Windows virtualization will only support 802.3 networking devices
Networking • Each virtual network card hasits own separate MAC address • This will be changed in the eventof a MAC address conflict • MAC addresses can be configuredto be static; But default to dynamic • Emulated network controller is • DEC/Intel 21140 Network controller • Performance not limited to 100 Mbit
Video • In Windows Server virtualization video capabilities will be targeted at server scenarios • 2D video support only • All video will be remoted over RDP • Emulated video controller • S3 Trio 64 Video controller • VGA and Text Mode performanceis not optimized • Non-planar video modes perform best
Isolation • By default, VMs are isolated entities • Child partitions are not able to access memory in any other partitions • Child partitions are not able to crashany other partitions • Only methods for inter-virtualmachine communication are • Traditional networking • Hypercalls
Integration Components • Integration components operateover VMBus to provide basic integration features • Time synchronization • Operating System (OS) shutdown • Registry updating • OS heartbeat • OS identification
Development Opportunities • VM neutral development • Software that is not dependent onspecific hardware will continue to function inside of VMs • External VM management • Software can utilize WMI interfaces to control and monitor VMs • Integrated VM solutions • VM-aware solutions can be developed that provide enhanced features for users of VMs
Virtualization Hardware Futures David WootenHardware ArchitectSystem Integrity GroupMicrosoft david.wooten @ microsoft.com
Future Technologies The topics in this presentation relate to hardware to support possible features in version 2 of the Windows hypervisor (HV2) The hardware “requirements” discussed are expected to be needed to support the features of HV2 but future events may change these requirements
Topics • “Execution Environment” and why it needs protection • Protections in Root Complex with DMA Remapping • Protections in Fabric to Regulate Routing • Roots of Trust and the SMM Conundrum
The Environment • The software running on a computer has control of the hardware on which it is running – its “execution environment” • If that software is running on a virtual computer, it is important to preserve the illusion of control over the virtualized execution environment • Prevents unexpected behavior • Preserves meaning of local attestation used for sealing
Preservation Of Environment • The preservation of the apparent execution environment of a virtual computer in a partition is the responsibility of the hypervisor • The hypervisor must be able to enforce isolation between partitions to insure adequate fidelity of the virtualization • The main isolation tool for the hypervisor is memory management • Memory virtualization by the MMU (and associated registers) can prevent inappropriate changes to the memory of another partition through direct access by the CPU • Memory virtualization extensions are needed in IO hardware to complete the memory protections
IO IO The IO Problem Partition 2 MMU Memory FA00 100 4200 Partition 1 100 4200 100 100 Hypervisor Legend Address Control 100 100
Evolution Of IO Protection • In initial implementation of Windows virtualization, the IO mapping problem is finessed by • “Assign” all IO devices to the Parent partition • Give Parent partition a special mapping of Guest Physical = System Physical • Place a lot of “trust” in the Parent • In HV2, the Parent may not have special rights to see into other partitions • other partition may be a “peer” to the Parent • In HV2, devices may be assigned to partitions other than the Parent • Partitions doing IO may not have the same level of assumed “trust” as the V1 Parent
Mechanisms For IO Protection • Main new mechanism is DMA remapping (DMAr) • Adds address translation to DMA • Lets hypervisor limit device access to memory • PCI Routing Control and ID Checking • Restrict peer-to-peer (P2P) access so that devices can’t do P2P with un-translated address • Check ID of requester in switches
DMA Remapping Partition 2 MMU Memory Partition 1 100 100 Hypervisor DMAr 4200 FA00 100 100 IO IO
HV2 DMAr Requirements • Chipset must support either IOMMU (AMD) or VT-d (Intel) • DMAr is not processor specific so IOMMU can be used with Intel processor and VT-d can be used with AMD • All IO devices must access memory through DMAr • Chipset may have more that one DMAr unit but they must use the same type of programming interface
PCI Routing Control • PCI devices are accessed using System Physical Addresses (SPA) • Drivers will program devices with Device Physical Addresses (DPA) – DPA may be equal to Guest Physical Address (GPA) or be device specific • To prevent a DPA from accessing a PCI device, switches must not route based on DPA
PCI Routing Control • Microsoft is working through the PCI-SIG to define a modification to switches and Functions so that DPA-based routing between PCI Functions can be disabled • Devices that must do P2P can get SPA from RC by using Address Translation Services (ATS) • With ATS, Function can ask DMAr in RC for the SPA corresponding to DPA and then use that DPA to directly access another device
Requester ID Checking • DMAr hardware uses the Requester ID (Bus-Dev-Func or “BDF”) to chose a translation table • A device could write to wrong memory address if the BDF is wrong • A switch can check the Requester ID and prevent errors of this sort • Bus number of requester must be >= the secondary bus number and <= the subordinate bus number of a switch port • Microsoft is working with the PCI-SIG to have this checking capability added to switches
Static And Dynamic Roots Of Trust • Static Root of Trust Measurement (SRTM) and Dynamic Root of Trust Measurement (DRTM) are different ways to start a chain of trust • To start a chain of trust, the CPU must be in a known state, running known code, and the system must be in a state in which the code can “defend” itself • From this initial condition, we can measure each of the state changes and be able to make assertions about the state of the computer
Static Root Of Trust Measurement • This is a chain of trust that is started by computer system reset – puts CPU in a known state • The first code executed (The Core Root of Trust for Measurement – CRTM) measures the next thing to be executed – CRTM is known code • Hardware is reset and peripheral access to memory is not allowed – CRTM can “defend” itself • Significant issue with SRTM is that, once trust is lost (e.g., unknown code executed), only way to get it back is to reboot the system
Dynamic Root Of Trust Measurement • Uses new CPU instructions to put the CPU in a known state • Code to be executed is sent to TPM to be “measured” into a special Platform Configuration Register (PCR) • This PCR is accessible only when in the DRTM initialization state and only by CPU • Initial, measured DRTM code is protected by hardware – method varies by vendor • With DMAr, hypervisor can “defend” itself from IO devices • With DRTM, if trust is lost, can restart chain of trust without rebooting
Secure Launch • “Secure Launch” refers to the act of starting the hypervisor using the DRTM • A Secure Launch allows the hypervisor to come up in a trusted state, with control of the system, regardless of what code has run previously • Allows arbitrary initialization code to run without affecting the trust state of the system • Major benefit of DRTM is that attestation of the platform can exclude lots of meaningless information that can’t be ignored by SRTM • Add-in cards • BIOS updates • Driver code used to boot hypervisor
DRTM And Trust State • The attestation of a partition must include the partition state and anything that can affect the execution of that partition • Would like attestation only to include software that is loaded after the DRTM • This allows sealing to exclude pre-launch actions • Can maintain chain of trust when code is updated • Bring system up in trusted state, verify that changes are within policy, then make changes and update sealed blobs
Isolation And SMM • SMM can be more privileged than the hypervisor • SMM can access any memory location without mediation by the hypervisor • The privilege level of SMM means that SMM code may have to be included in the seal-to state • Because SMM loads before the DRTM is initiated, almost all of the code update problems related to the SRTM are reinserted into the attestation/sealing process • SRTM problems arise because of changes to BIOS code which is not vetted by the OS/hypervisor • When OS/hypervisor loads, the changed BIOS means that PCRs no longer match, which means that blobs can’t be unsealed
What To Do About SMM? • One approach to dealing with SMM is to make it run in a “container” that is controlled by the hypervisor • Hypervisor can prevent SMM from accessing anything that it shouldn’t • Issue that OEMs have with this approach is that it could allow the hypervisor to prevent SMM from accessing the parts of the hardware that it must access • Will CPU melt if hypervisor is broken or rogue? • OEMs consider SMM to be part of the hardware and just as “trustworthy” as the hardware • Trust isn’t the issue, the attestation and security evaluation of SMM is the issue • “SMM is hardware” position begs the question of whether this applies equally to SMM “applications”.
SMM In HV2 • Microsoft does not yet have a complete solution for dealing with SMM privilege in HV2 • Likely to have to evolve the solution by working with processor, chipset, BIOS, and computer system vendors
Call To Action • Chipset vendors: Start planning DMAr deployment • Switch vendors: Look to PCI-sig for ECRs to implement access controls • Device vendors: Consider impact of DMAr and evaluate need for ATS • BIOS, CPU, system vendors: Help with SMM problem • Attend other virtualization presentations, especially VIR046 – HyperCall APIs Explained
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.