250 likes | 442 Views
Virtual Machine Universe in Condor. What is VM universe?. A job user can submit a virtual machine to Condor Condor runs the virtual machine and sends back a result virtual machine support VMware server and Xen. Virtual Machine. Big picture. Submit machine. Execute machine. Startd.
E N D
What is VM universe? • A job user can submit a virtual machine to Condor • Condor runs the virtual machine and sends back a result virtual machine • support VMware server and Xen
Virtual Machine Big picture Submit machine Execute machine Startd Schedd Starter Shadow VM GAHP
Benefits of VM universe • platform independence • environment independent on host machine • checkpoint • networking in a virtual machine • snapshot disk • input CDROM image
Snapshot disk • All modified data will be stored into snapshot disks without changing original VM disk files. • VM disk files in a shared file system can be safely shared among multiple jobs • Can reduce disk space for result and checkpoint
Submit description file with shared file system • universe = vm • executable = WindowsXP • vm_type = vmware • vm_memory = 256 • vm_checkpoint = TRUE • vm_networking = TRUE • vm_networking_type = dhcp • vmware_dir = /shared/windows_vm • vmware_should_transfer_files = FALSE • vmware_snapshot_disk = TRUE • initialdir = /result1 • Queue • initialdir = /result2 • Queue
Job 1 Snapshot Disk Job 2 Snapshot Disk Snapshot disk with shared file system Execute machine 1 Submit machine /result1 Execute machine 2 /result2 Shared file system /windows_vm
Submit description file without shared file system • universe = vm • executable = WindowsXP • vm_type = vmware • vm_memory = 256 • vm_checkpoint = TRUE • vm_networking = TRUE • vm_networking_type = dhcp • vmware_dir = /windows_vm • vmware_should_transfer_files = TRUE • initialdir = /result1 • vmware_snapshot_disk = TRUE • Queue • initialdir = /result2 • vmware_snapshot_disk = FALSE • Queue
snapshot disk Snapshot disk without shared file system Submit machine Execute machine 1 (Job 1) Job 1 submit description ... vmware_snapshot_disk = TRUE Initialdir = /result1 Job 2 submit description ... vmware_snapshot_disk = FALSE Initialdir = /result2 Execute machine 2 (Job 2) /windows_vm
snapshot disk Snapshot disk without shared file system Submit machine Execute machine 1 (Job 1) Job 1 /result1 Job 2 /result2 Execute machine 2 (Job 2) /windows_vm
Input CDROM image • VM universe can not use input or argument parameter in a job submit description file as other universes do • With input CDROM images, a job user may run the same VM several times on different input data sets
Submit description file with input CDROM image • universe = vm • executable = WindowsXP • vm_type = vmware • vm_memory = 256 • vm_checkpoint = TRUE • vm_networking = TRUE • vm_networking_type = dhcp • vmware_dir = /windows_vm • vmware_should_transfer_files = FALSE • vmware_snapshot_disk = TRUE • initialdir = /result1 • vmware_cdrom_files = a.iso • Queue • initialdir = /result2 • vmware_cdrom_files = a.txt, b.txt • Queue
a.iso a.txt b.txt Input CDROM image Submit machine Execute machine 1 VM Job 1 submit description ... vmware_cdrom_files = a.iso Job 2 submit description ... vmware_cdrom_files = a.txt, b.txt Execute machine 2 VM
VMware VM universe • Snapshot disk • Input CDROM image • Can be used on either Linux host or Windows host
Xen VM universe • No support of snapshot disk • VM disk file in a shared file system can not be shared among multiple job unless it is read-only. • Input CDROM image • Can be used on only Linux host
Checkpoint • Periodic checkpoint and vacate checkpoint • All modified VM disk files and a file for VM memory will be transferred back to a submit machine • When snapshot disks are used, snapshot disk files and a file for VM memory will be transferred.
Suspend • Hard suspend: Memory being used by a VM will be released and the memory will be saved into a file • Soft suspend:Memory being used by a VM will not be released and the VM will be just paused like SIGSTOP
Networking issues when restarting from checkpoint • MAC and IP address for VM are also preserved when checkpointed • When restarting the checkpointed VM, MAC and IP address don’t change. • If we use NAT for VM networking, different execution machines may have different MAC and IP address of NAT gateway. • In VMware, if we install VMware tool inside VM, the tool program will automatically execute DHCP renew when a VM is restarted.
Future work • Support snapshot disks in Xen VM universe • For result, get only output files from VM instead of all VM files. • Support another Virtual machine program (e.g. QEMU)
Summary • We are testing VM universe. • Hopefully VM universe will be included in Condor 6.9.x. Questions?
snapshot disk snapshot disk Case Study 1Hierarchical Snapshot Shared file system -r—r—r— root:root 10GB /windows Parent disk -rw-rw— Todd:Todd 400M /windows_with_matlab Parent disk /windows_with_ matlab_and_excel -rw-rw— Todd:Todd 200M
Submit description file for Case Study 1 • universe = vm • executable = WindowsXP • vm_type = vmware • vm_memory = 256 • vm_checkpoint = TRUE • vm_networking = TRUE • vm_networking_type = dhcp • vmware_dir = /windows_with_matlab_and_excel • vmware_should_transfer_files = FALSE • vmware_snapshot_disk = TRUE • Queue
Case Study 2Vanilla Universe with platformVM • universe = vanilla • platformvm = /redhat_linux • executable = /tmp/test.sh • argument = a.txt • log = vanilla.log • error = vanilla.err • output = vanilla.out • transfer_input_files = /tmp/a.txt • Queue
Convert Vanilla Universe with platformVMinto VM Universe • universe = vm • executable = vanillaUniv • vm_type = vmware • vm_memory = 128 • vm_checkpoint = TRUE • vm_networking = TRUE • vm_networking_type = dhcp • vmware_dir = /redhat_linux • vmware_should_transfer_files = FALSE • vmware_snapshot_disk = TRUE • vmware_cdrom_files = /tmp/test.sh, /tmp/a.txt, submitfile.txt • Queue
Pre-created Platform VMs Shared file system With Condor installed -r—r—r— root:root 10GB /windows With Condor installed -r—r—r— root:root 4GB /freebsd With Condor installed -r—r—r— root:root 8GB /redhat_linux