250 likes | 404 Views
Cho-Chin Lin, Zong -De Jian and Shyi-Tsong Wu Email: cclin@niu.edu.tw Department of Electronic Engineering National Ilan University, Taiwan. Live Migration Performance Modelling for Virtual Machines with Resizable Memory. High Performance Computing Lab. 3CM.
E N D
Cho-Chin Lin, Zong-De Jian and Shyi-Tsong Wu Email: cclin@niu.edu.tw Department of Electronic Engineering National Ilan University, Taiwan Live Migration Performance Modelling for Virtual Machines with Resizable Memory High Performance Computing Lab 3CM Cloud Computing for Critical Mission
Outline High Performance Computing Lab 3CM Cloud Computing for Critical Mission • Background (2) • Motivation and Goal (1) • Related Works (2) • Model of Live Migration (4) • Live Migration on Xen (3) • Assessment of Live Migration Performance (2) • Performance Bounds on Live Migration (5) • Performance under Stressed Memory Space (1) • Performance under Various Resident Sets (1) • Concluding Remarks and Future Direction (1)
Background (1/2) High Performance Computing Lab 3CM Cloud Computing for Critical Mission • Virtualization enables system performance consolidated as well as resource utilization flexible. • Virtualization mechanisms are classified into three categories: • Full virtualization (non-modified kernels) • Para-virtualization (modified kernels) • Hardware assisted virtualization • Avirtual machine hypervisor sits at the layer below the virtual machines and is responsible for managing resource sharing among the virtual machines. • Live Migration migrates a virtual machine across hosts with a small service downtime.
Background (2/2) High Performance Computing Lab 3CM Cloud Computing for Critical Mission Stages of Live Migration (6 stages) • Total migration time : • Service downtime : • Ta and Tb are the overheads ahead and behind the iterative copy stage and stop-and-copy stage , respectively. • tcopy(i) is the time needed by duplicating pages in the ith iteration. Initialization Preservation Iterative copy Stop-and-Copy Strunk(2012) and Akoush et al(2010) proposed two performance meters Commitment Activation
Motivation and Goal High Performance Computing Lab 3CM Cloud Computing for Critical Mission • An appropriate strategy for achieving small total migration time and service downtime is necessary for critical applications running on virtual machines. • To study the performance of live migration, a model of live migration is presented. Under this model, the bounds on the migration time and the proposed strategy of optimizing migration process are analyzed. • Our analysis is based on the experimentsin which virtual machines run tasks supervised by Xen. Our experimental results show that the downtime can be significantly reduced by dynamically resizing available memory size according to the working sets of running tasks.
Related Works (1/2) High Performance Computing Lab 3CM Cloud Computing for Critical Mission • Strategies of Iterative copy in live migration • Pre-copy (Clark et al, 2005) • It iteratively duplicates pages from the source host to the target host before entering stop-and-copy stage • Post-copy (Hines et al, 2009) • After the stop-and-copy stage for migrating the “core” pages,the target host pulls pages from the source host on page-fault • Reduce duplicating frequently updated pages • Page marking (Ma et al, 2010) • Second chance (Lin et al, 2012)
Related Works (2/2) High Performance Computing Lab 3CM Cloud Computing for Critical Mission • Reduce amount of duplicated pages • Identifying unused pages and encoding duplicated pages by RLE algorithm (Ma et al, 2012) • Analyze the performance of live migration • Characterizing the parameters affecting live migration on Xen(Akoush, 2010) • Investigating the factors for total migration time and servive downtime of live migration (Salfneret al, 2011) • Summarizing, classifying and evaluating important research about live migration (Strunk, 2012)
Model of Live Migration (1/4) High Performance Computing Lab 3CM Cloud Computing for Critical Mission • Live migration of virtual machineVis characterized by Three parameters:V(M,P , C) • 3-tupple M = ( Mmax,Mavl , Mtsk )is memory configuration. • Mmax : The maximum memory size of a deployed virtual machine • Mavl : The memory size available to the guest OS and running tasks. • Mtsk : The memory size needed by guest OS and running tasks. • 2-tupple P = ( ħ , D ) is duplication protocol. • ħ : History observation window • D : A set of patterns used to trigger a page duplication activity. • C is a set of terminating conditions. • For terminating iterative page duplicationand • For starting crossing-host memory consistency.
Model of Live Migration (2/4) High Performance Computing Lab 3CM Cloud Computing for Critical Mission M = ( Mmax , Mavl , Mtsk) = ( 38, 30, 21 ) • Mmax : The maximum memory size of virtual machineV. • Mavl:The memory size available to the guest OS and running tasks. • Mtsk :The memory size needed by guest OS and running tasks.
Model of Live Migration (3/4) High Performance Computing Lab 3CM Cloud Computing for Critical Mission P= ( ħ , D ) • ħ :History observation window • D:The set defines the patterns used to trigger a page duplication activity. If a page under consideration is modified at iteration kthenhk=1 ; otherwise, hk=0. History of some page h1h2h3 h4h5h6h7h8 ħ = 3 D= { 100, 001 } ... sent this page at 4th iteration: 1 1 0 0 0 1 0 1 1 1 0 0 0 1 0 1 at 5th iteration: sent this page at 6th iteration: 1 1 0 0 0 1 0 1 ...
Model of Live Migration (4/4) High Performance Computing Lab 3CM Cloud Computing for Critical Mission C is a set of terminating conditions. • For terminating iterative page duplicationand • For starting crossing-host memory consistency. Ex: • C1 : the number of iterations has reached 100. • C2 : the number of total duplicated pages exceeds the double of the maximum memory space. • C = { C1, C2 }
Live Migration on Xen (1/3) High Performance Computing Lab 3CM Cloud Computing for Critical Mission • The available memory of virtual machine can be adjusted using balloon driver. • Resizing memory to reduce free pages as given in Fig (b) • The available memory can be further reduced by swapping out no access pages as given in Fig(c). In this case, a secondary storage is used to provide an extra space to accommodate the running task. • (b) Mavl = Mtsk • M = ( 38, 21, 21 ) (a) Mavl> Mtsk M = ( 38, 30, 21 ) • (c) Mavl < Mtsk • M = ( 38, 12, 21 )
Live Migration on Xen (2/3) High Performance Computing Lab 3CM Cloud Computing for Critical Mission Bitmaps have been used by Xen to define duplication protocol. • Bitmap to_skip indicates the frame-updated status in response to the task running in this iteration. • to_skip[j] = 1 indicates the jth frame has been updated • Bitmap to_send indicates the frame-updated status in response to the task running in last iteration. • to_send[j] = 1 indicates the jth frame has been updated • Xen’s duplication Protocol P= ( 2 , {10} ) • The protocol indicates that if to_send[j] = 1 andto_skip[j] = 0 then the page in jth frame will be sent in this iteration.
Live Migration on Xen (3/3) High Performance Computing Lab 3CM Cloud Computing for Critical Mission • Four terminating conditions are given by Xen to avoid an endless pre-copy process. • C1: the number of updated frame in an iteration is less than fifty. • C2: the number of iterations has reached twenty-nine. • C3: the number of total duplicated pages exceeds the threefold of the maximum memory space. • C4: the number of duplicated pages in this iteration is larger than that in last iteration and the measured network bandwidth reaches its maximal value. • C = { C1, C2, C3, C4 } C4 has default value equal to false.
Live Migration Assessment (1/2) High Performance Computing Lab 3CM Cloud Computing for Critical Mission • Three experiments are conducted for • Assessing Performance Bounds on Live Migration • Atask of 890MB (1000MB, including guest OS) writes to pages periodicallytailored with various dirty rates (up to 2105) in access range=890MB. • In this experiment, (Mtsk,Mavl)= (1000MB, m), where m= 512MB, 1024MBand 2048MB). • Assessing Live Migration Performance under Stressed Memory Space • A task of 890MB (1000MB, including guest OS) reads pages periodically without any delay between two consecutive readings in access range=890MB. • In this experiment, (Mtsk , Mavl)= (1000MB, m), where 256MB m 2048MB. • Assessing Live Migration Performance for Various Resident Sets • A task of 890MB reads(writes) pages periodically without any delay between two consecutive accesses in access range=390MB (working set = 390MB) • In this experiment, (Mtsk,Mavl)= (1000MB, m), where m= 512MB, 1024MB and 2048MB).
Live Migration Assessment (2/2) High Performance Computing Lab 3CM Cloud Computing for Critical Mission • Environment of Experiment • Task Access Pattern Virtual Machine Hypervisor : Xen 4.0.1 VM Mmax = 2048 MB CPU = 1 core Disk = 4096 MB ( Swap = 2048 MB) Live migration of VM task target host source host Bandwidth = 100 Mbps • NFS client NFS server memory space of GuestOS about 110 MB ‥ ‥ ‥ ‥ ‥ ‥ ‥ ‥ ‥ Access range Page size = 4KB Size of task = 890 MB Mtsk about 1000 MB
Assessment I – Performance Bounds (1/5) High Performance Computing Lab 3CM Cloud Computing for Critical Mission • Akoush et al(2010) proposed the bounds of Tmgt and Tdt for Xen using the following expression. • To = Ta + Tb • LetMbe the size of memory space given to a virtual machine and m be the memory frame size. • B is the network bandwidth.
Assessment I – Performance Bounds (2/5) High Performance Computing Lab 3CM Cloud Computing for Critical Mission • Although the expressions given by Tmgt and Tdt seem to have intuitively captured the costs of the live migration for a virtual machine, more precise bounds should be found by classifying the memory space into various groups according the usages. • We modify the bounds on Tmgt and Tdt using the memory configuration parameter of our proposed model. The performance bounds on live migration are studied by varying the sizes of memory groups and the patterns of page-accessing.
Assessment I – Performance Bounds (3/5) High Performance Computing Lab 3CM Cloud Computing for Critical Mission • Range of Low Dirty Rate: Tmgt(V512 ,V1024, V2048) increases in proportion to the rise of the dirty rate. • Higher dirty rate can trigger more dirty frames to be duplicated. • Tmgt(V1024, V2048) dominates Tmgt(V512) since page-swapping inV512 reduces actual dirty rate. • Nit (V1024, V2048) takes less 30 to enter stop-and-copy stage due to a large number of pages duplicated in each iteration. Nit(V512) takes 30 due to small available memory • Range of High Dirty Rate:Tmgt(V512 ,V1024, V2048) decreases in proportion to the rise of dirty rate. • Higher dirty rate becomes to inhibit multiple-updated frames to be duplicated. • Tmgt(V512) dominates Tmgt(V1024 , V2048) since page-swapping inV512reduces actual dirty rate. • Nit(V512, V1024, V2048) takes 30 since small numbers of pages are duplicated in each iteration. Total migration time Total iteration number network bandwidth ≒ 3000 pages/sec
Assessment I – Performance Bounds (4/5) High Performance Computing Lab 3CM Cloud Computing for Critical Mission Mmax • If M1=M2=Mmax , the expression cannot capture the curves in our experiment due to overestimating the lower bound for V512. • If M1=M2=Mavl, the expression cannot capture the curves in our experiment due to underestimating the upper bounds for V1024 andV2048 • We conclude that the bounds need to be refined by replacing M1 and M2 with Mavland Mmax, respectively. Mavl Total duplication amount
Assessment I – Performance Bounds (5/5) High Performance Computing Lab 3CM Cloud Computing for Critical Mission • Only the page frames in the available memory space may be duplicated. • Only the page frames in the task access region may be duplicated. • Based on the above, we have M3= min(Mavl, Mtsk) min(Mavl , Mtsk ) Duplication amount at the last iteration
Assessment II – Stressed Space High Performance Computing Lab 3CM Cloud Computing for Critical Mission • Mavl < Mtsk: a frame accommodating a recently swapped page is considered as a dirty frame and will be duplicated in later time • The Tmgtincreases in proportion to the size of available memory. • The size of available memory limits the number of pages duplicated in an iteration • Mavl ≧Mtsk: the number of free pages increases in proportion to the size of available memory. It increase the number of pages to be page duplicated in the first iteration. • The Tmgtincreases in proportion to the size of available memory in much slower rate compared with the case of Mavl < Mtsk. • The number of iteration is close to 5, because there is no dirty page and no page-swapping events. • There is almost no pages to be duplicated across hosts in the last iteration. Total iteration and duplication amount at the last iteration under readings Total migration time and total duplication amount under readings
Assessment III –Resident Sets High Performance Computing Lab 3CM Cloud Computing for Critical Mission • There is no free pages in V1024. There is no no-access pages in V504. • If we can page out the no access data, the number of pages duplicated in iterative pre-copy stage should be minimized without affecting the number of duplicated pages in stop-and-copy stage. • The number of the duplicated pages for the three cases are the same in the stop-and-stop stage since the sizes of the working is fixed Memory resizing effect for reading/writing
Concluding Remarks and Future Direction High Performance Computing Lab 3CM Cloud Computing for Critical Mission • In this paper, we have presented a performance model of live migration for virtual machines. Under this model, new performance bounds are developed. • Under this model, experiments have been conducted using various parameter settings. It has been shown that • the service downtime is strongly related to the size of available memory and • the time of live migration can be reduced significantly if the memory space is resized dynamically based on the set of no-access pages • In the future, an algorithm will be developed for predicting the number of duplicated pages in the last iteration. Based on the prediction, the available memory area can be adjusted to minimize the downtime for real time applications.