210 likes | 367 Views
02/09/2010. Industrial Project Course (234313) Virtualization-aware database engine Final Presentation . Students: Filimonov Dennis, Maor Dahan Supervisor: Abel Gordon. Introduction. What is Machine Virtualization?
E N D
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation • Students: Filimonov Dennis, MaorDahan • Supervisor: Abel Gordon
Introduction What is Machine Virtualization? It is the ability to run multiple operating systems simultaneously on the same physical machine. • Virtualization can provide server consolidation, a legacy environment on new platforms and simplified management . • Systems running on virtual machines suffer performance penalties.
Goals • Analyze how virtualization technologies affect applications performance and present alternative methods for reducing virtualization overhead. • Analyze and measure performance of the mySQL open source DB engine being used as a test subject running on a virtual machine hypervisor. • Identify virtualization critical overhead. • Prototyping an approach to reduce virtualization overhead by making the application aware of the virtualized environment .
Methodology • KVM – kernel Based Virtual Machine It is an Open source full virtualization solution for Linux on x86 hardware containing virtualization extensions. • Virtio - An I/O virtualization framework for Linux. We developed Virtio – SQL which is used to enable communication between guest machine and host machine. We integrated a char device in the Virtio– SQL frontend driver to enable user space communication with the MySQL server running in the guest machine. The Virtio-SQL backend driver we developed sits in the hypervisor.
Methodology (Cont.) • MySQL MySQL's pluggable Storage Engine Architecture gives users the flexibility to choose from a variety of purpose-built storage engines that are optimized for specific application demands . we isolated an engine from the whole MySQL stack to be able to run it in the host machine. the engine is isolated and compiled in to a dynamic librarywhich can be opened from theVirtio-SQL backend to activate function calls.
Design MySQL server Kernel module Virtio – SQL Frontend Char device Storage Engine front end Guest machine Virtio frontend Host machine QEMU – KVM virtualizer Virtio backend Storage engine backend Run time library Virtio – SQL Backend
Design (Cont.) • Frontend storage engine will receive SQL function calls for execution from remote user then forward them to the char device which is integrated in the Virtio-SQL frontend driver (for user space communication propose between the server and the Virtio-SQL frontend that is part of the kernel). • The Virtio – SQL frontend driver communicates with the backend driver of the Virtio-SQL located in the KVM hypervisor Which receives the function call and delivers it to the storage engine backend for execution. • The query result is propagated the same way back.
Measurements • The /procfile systemis a virtual file system that permits a novel approach for communication between the Linux kernel and user space. it was used to measure and record information from the kernel. • We modified the KVM kernel module and injected code that measures and records the cycles in guest and root mode for each VM exit, then using the /proc file system we can retrieve that data. • We wrote scripts to automate measurements. • We wrote code which generates random SQL insert queries. • We used SSH to activate remote scripts on the Guest machine from the host machine. • We wrote data analyzing and extraction scripts
Benchmarking • Test specification We compared three test configurations • A default engine (MyISAM) running on a virtual machine. • New approach engine running on a virtual machine. • New approach engine running on a virtual machine while using batches of 8 queries at a time. • We ran 7 tests on each of the configurations. • 1k ,2k ,4k ,8k ,16k ,32k ,64k record size insert queries. • Each test on a system ran 10 times and an average was calculated. • The number of queries was between 500000 queries for 1k test to 16000 queries for 64k test. • Each test ran one and a half minute.
Results • Tests with over 2k insert queries are more then 20% faster and up to 32% faster for 64k insert queries! • For smaller insert queries using batching of inserts can increase performance but still slower then the regular run. • Bigger insert queries batching doesn’t show significant improvement.
So where does the difference come from? Reminder – exit reasons the reasons the Hypervisor had to exit from the guest and switch to the host machine. • EXCEPTION_NMI: there was an exception or non – maskableinterrupt. most of them are caused by page faults handled by the host to maintain the shadow page tables (the machine does not have EPT support). • EXTERNAL_INTERRUPT: there was an interrupt caused by the real hardware while the guest was running. • PENDING_INTERRUPT:if the guest disables interrupts the host can not inject any interrupt. In this case, the host can write to the VMCS a value telling the processor to exit when the guest enables interrupts. Thus, right after the guest enables interrupts the host can inject them. • CR_ACCESS: the guest read/write to a control register in the processor. • IO_INSTRUCTION: I/O instruction. • APIC_ACCESS: advanced programmable interrupt controller access . most of them are caused when the guest accesses the APIC PAGE, most of them probably to acknowledge interrupts. • HALT:the processor is idling.
So what did we learn? • There is a strong relation between the exit distribution and the Insert record length. • Some exits are very dominant while other are negligible. • The writing to the real disk is more efficient then the virtual disk Therefore the Halt overhead is reduced. • The total time spent in the host machine for the new approach is smaller and the time spent in the guest machine remains similar therefore we improve virtualization overhead. • As a result of transferring a small but significant part of the code to the host machine we gained significant improvement. • We observe that for the new approach small insert queries degrade Performance .
Conclusions • Isolating MySQL storage engine to work independently from the MySQL stack is extremely difficult. • Dependents on global variables . • Usage of external functions. • Virtualization overhead can be reduced by changes at the application layer. • Measuring virtualization overhead is long and tedious so scripts should be used to automate measurements. • Analyzing virtualization overhead is very complex because many variables need to be considered and the relations between them. • Virtio alternatives should be considered to improve performance further.
Deliverables • Documentation • User’s Manuel • Developer’s Manuel • Project Internet site • Code • Virtio - SQL driver code. • KVM code changes. • Measurement script’s. • New approach MySQL engine – supports only INSERT queries. Thank you