1 / 21

02/09/2010

02/09/2010. Industrial Project Course (234313) Virtualization-aware database engine Final Presentation . Students: Filimonov Dennis, Maor Dahan Supervisor: Abel Gordon. Introduction. What is Machine Virtualization?

gali
Download Presentation

02/09/2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation • Students: Filimonov Dennis, MaorDahan • Supervisor: Abel Gordon

  2. Introduction What is Machine Virtualization? It is the ability to run multiple operating systems simultaneously on the same physical machine. • Virtualization can provide server consolidation, a legacy environment on new platforms and simplified management . • Systems running on virtual machines suffer performance penalties.

  3. Goals • Analyze how virtualization technologies affect applications performance and present alternative methods for reducing virtualization overhead. • Analyze and measure performance of the mySQL open source DB engine being used as a test subject running on a virtual machine hypervisor. • Identify virtualization critical overhead. • Prototyping an approach to reduce virtualization overhead by making the application aware of the virtualized environment .

  4. Methodology • KVM – kernel Based Virtual Machine It is an Open source full virtualization solution for Linux on x86 hardware containing virtualization extensions. • Virtio - An I/O virtualization framework for Linux. We developed Virtio – SQL which is used to enable communication between guest machine and host machine. We integrated a char device in the Virtio– SQL frontend driver to enable user space communication with the MySQL server running in the guest machine. The Virtio-SQL backend driver we developed sits in the hypervisor.

  5. Methodology (Cont.) • MySQL MySQL's pluggable Storage Engine Architecture gives users the flexibility to choose from a variety of purpose-built storage engines that are optimized for specific application demands . we isolated an engine from the whole MySQL stack to be able to run it in the host machine. the engine is isolated and compiled in to a dynamic librarywhich can be opened from theVirtio-SQL backend to activate function calls.

  6. Design MySQL server Kernel module Virtio – SQL Frontend Char device Storage Engine front end Guest machine Virtio frontend Host machine QEMU – KVM virtualizer Virtio backend Storage engine backend Run time library Virtio – SQL Backend

  7. Design (Cont.) • Frontend storage engine will receive SQL function calls for execution from remote user then forward them to the char device which is integrated in the Virtio-SQL frontend driver (for user space communication propose between the server and the Virtio-SQL frontend that is part of the kernel). • The Virtio – SQL frontend driver communicates with the backend driver of the Virtio-SQL located in the KVM hypervisor Which receives the function call and delivers it to the storage engine backend for execution. • The query result is propagated the same way back.

  8. Measurements • The /procfile systemis a virtual file system that permits a novel approach for communication between the Linux kernel and user space. it was used to measure and record information from the kernel. • We modified the KVM kernel module and injected code that measures and records the cycles in guest and root mode for each VM exit, then using the /proc file system we can retrieve that data. • We wrote scripts to automate measurements. • We wrote code which generates random SQL insert queries. • We used SSH to activate remote scripts on the Guest machine from the host machine. • We wrote data analyzing and extraction scripts

  9. Benchmarking • Test specification We compared three test configurations • A default engine (MyISAM) running on a virtual machine. • New approach engine running on a virtual machine. • New approach engine running on a virtual machine while using batches of 8 queries at a time. • We ran 7 tests on each of the configurations. • 1k ,2k ,4k ,8k ,16k ,32k ,64k record size insert queries. • Each test on a system ran 10 times and an average was calculated. • The number of queries was between 500000 queries for 1k test to 16000 queries for 64k test. • Each test ran one and a half minute.

  10. Results • Tests with over 2k insert queries are more then 20% faster and up to 32% faster for 64k insert queries! • For smaller insert queries using batching of inserts can increase performance but still slower then the regular run. • Bigger insert queries batching doesn’t show significant improvement.

  11. So where does the difference come from? Reminder – exit reasons the reasons the Hypervisor had to exit from the guest and switch to the host machine. • EXCEPTION_NMI: there was an exception or non – maskableinterrupt. most of them are caused by page faults handled by the host to maintain the shadow page tables (the machine does not have EPT support). • EXTERNAL_INTERRUPT: there was an interrupt caused by the real hardware while the guest was running. • PENDING_INTERRUPT:if the guest disables interrupts the host can not inject any interrupt. In this case, the host can write to the VMCS a value telling the processor to exit when the guest enables interrupts. Thus, right after the guest enables interrupts the host can inject them. • CR_ACCESS: the guest read/write to a control register in the processor. • IO_INSTRUCTION: I/O instruction. • APIC_ACCESS: advanced programmable interrupt controller access . most of them are caused when the guest accesses the APIC PAGE, most of them probably to acknowledge interrupts. • HALT:the processor is idling.

  12. Total time distribution between Guest and Host machine

  13. And what about the other exits?

  14. So what did we learn? • There is a strong relation between the exit distribution and the Insert record length. • Some exits are very dominant while other are negligible. • The writing to the real disk is more efficient then the virtual disk Therefore the Halt overhead is reduced. • The total time spent in the host machine for the new approach is smaller and the time spent in the guest machine remains similar therefore we improve virtualization overhead. • As a result of transferring a small but significant part of the code to the host machine we gained significant improvement. • We observe that for the new approach small insert queries degrade Performance .

  15. Conclusions • Isolating MySQL storage engine to work independently from the MySQL stack is extremely difficult. • Dependents on global variables . • Usage of external functions. • Virtualization overhead can be reduced by changes at the application layer. • Measuring virtualization overhead is long and tedious so scripts should be used to automate measurements. • Analyzing virtualization overhead is very complex because many variables need to be considered and the relations between them. • Virtio alternatives should be considered to improve performance further.

  16. Deliverables • Documentation • User’s Manuel • Developer’s Manuel • Project Internet site • Code • Virtio - SQL driver code. • KVM code changes. • Measurement script’s. • New approach MySQL engine – supports only INSERT queries. Thank you 

More Related