640 likes | 743 Views
VMTorrent : Scalable P2P Virtual Machine Streaming. Joshua Reich , Oren Laadan , Eli Brosh , Alex Sherman, Vishal Misra , Jason Nieh , and Dan Rubenstein. VM Basics. VM: software implementation of computer Implementation stored in VM image VM runs on VMM Virtualizes HW
E N D
VMTorrent: Scalable P2P Virtual Machine Streaming Joshua Reich, Oren Laadan, Eli Brosh, Alex Sherman, Vishal Misra, Jason Nieh, and Dan Rubenstein
VM Basics • VM: software implementation of computer • Implementation stored in VM image • VM runs on VMM • Virtualizes HW • Accesses image VM VM Image VMM
Where is Image Stored? VM VM Image VMM
Traditionally: Local Storage Local Storage VM VMM
IaaS Cloud: on Network Storage Network Storage VM VMM VM Image
Can Be Primary Network Storage VM VMM NFS/iSCSI VM Image • e.g., OpenStack Glance • Amazon EC2/S3 • vSphere network storage
Or Secondary Network Storage Local Storage VM VMM VM Image • e.g., Amazon EC2/EBS • vSphere local storage
Either Way, No Problem Here Network Storage VM VMM VM Image
Here? Network Storage VM Image Bottleneck!
Lots of Unique VM Images on EC2 alone 54784 unique images* Network Storage *http://thecloudmarket.com/stats#/totals , 06 Dec 2012
Unpredictable Demand • Lots of customers • Spot-pricing • Cloud-bursting Network Storage
Don’t Just Take My Word • “The challenge for IT teams will be finding way to deal with the bandwidth strain during peak demand - for instance when hundreds or thousands of users log on to a virtual desktop at the start of the day - while staying within an acceptable budget” 1 • “scale limits are due to simultaneous loading rather than total number of nodes” 2 • Developer proposals to replace or supplement VM launch architecture for greater scalability3 http://www.zdnet.com/why-so-many-businesses-arent-ready-for-virtual-desktops-7000008229/?s_cid=e539 http://www.openstack.org/blog/2011/12/openstack-deployments-abound-at-austin-meetup-129 https://blueprints.launchpad.net/nova/+spec/xenserver-bittorrent-images
Challenge: VM Launch in IaaS • Minimize delay in VM execution • Starting from time launch request arrives • For lots of instances (scale!)
Naive Scaling Approaches • Multicast • Setup, configuration, maintenance, etc.1 • ACK implosion • “multicast traffic saturated the CPU on [Etsy] core switches causing all of Etsy to be unreachable“ 2 • [El-Sayed et al., 2003; Hosseini et al., 2007] • http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication
Naive Scaling Approaches • P2P bulk data download (e.g., Bit-Torrent) • Files are big (waste bandwidth) • Must wait until whole file available (waste time) • Network primary? Must store GB image in RAM!
Both Miss Big Opportunity VM image access • Sparse • Gradual • Most of image doesn’t need to be transferred • Can start w/ just a couple of blocks VM image streaming
VMTorrent Contributions • Architecture • Make (scalable) streaming possible: Decouple data delivery from presentation • Make scalable streaming effective: Profile-based image streaming techniques • Understanding / Validation • Modeling for VM image streaming • Prototype & evaluation not highly optimized
Talk • Make (scalable) streaming possible: Decouple data delivery from presentation • Make scalable streaming effective: Profile-based image streaming techniques • VMTorrentPrototype & Evaluation (Modeling along the way)
Decoupling Data Delivery from Presentation(Making Streaming Possible)
Generic Virtualization Architecture • Virtual Machine Monitor virtualizes hardware • Conducts I/O to image through file system VM VM Image Host VMM FS Hardware
Cloud Virtualization Architecture Network backend used • Either to download image • Or to access via remote FS Network Backend VM VM Image VMM FS Hardware
VMTorrent Virtualization Architecture • Introduce custom file system • Divide image into pieces • But provide appearance • of complete image to VMM Custom FS Network Backend VM VM Image VMM FS Hardware
Decoupling Delivery from Presentation VMM attempts to read piece 1 Piece 1 is present, read completes Custom FS Network Backend VM 0 1 2 3 4 5 VMM 6 7 8 Hardware
Decoupling Delivery from Presentation VMM attempts to read piece 0 Piece 0 isn’t local, read stalls VMM waits for I/O to complete VM stalls Custom FS Network Backend VM 0 1 2 3 4 5 VMM 6 7 8 Hardware
Decoupling Delivery from Presentation FS requests piece from backend Backend requests from network Custom FS Network Backend VM 0 1 2 3 4 5 VMM 6 7 8 Hardware
Decoupling Delivery from Presentation Later, network delivers piece 0 Custom FS receives, updates piece Read completes VMM resumes VM’s execution Custom FS Network Backend VM 0 0 1 2 3 4 5 VMM 6 7 8 Hardware
Decoupling Improves Performance Primary Storage No waiting for image download to complete Custom FS Network Backend VM 1 2 0 3 4 5 VMM 6 7 8 Hardware
Decoupling Improves Performance Secondary Storage No more writes or re-reads over network w/ remote FS X Custom FS Network Backend VM 1 2 0 3 4 5 X VMM 6 7 8 Hardware
But Doesn’t Scale Assuming a single server, the time to download a single piece is t = W + S / (rnet / n) • W:wait time for first bit • rnet: network speed • S: piece size • n : # of clients Transfer time, each client gets rnet/ n of server BW
Read Time Grows Linearly w/ n Assuming a single server, the time to download a single piece is t = W + n * S / rnet • W:wait time for first bit • rnet: network speed • S: piece size • n : # of clients Transfer time linear w/ n
This Scenario csd Custom FS Network Backend VM 1 2 0 3 4 5 VMM 6 7 8 Hardware
Decoupling Enables P2P Backend Alleviate network storage bottleneck Swarm • Exchange pieces w/ swarm P2P copy must remain pristine Custom FS Network Backend P2P Manager VM 1 2 0 1 2 0 3 4 5 3 4 5 VMM 6 7 8 6 7 8 Hardware
Space Efficient FS uses pointers to P2P image Swarm FS does copy-on-write Custom FS P2P Manager VM 1 2 0 1 2 0 3 4 5 3 4 5 VMM 6 7 8 6 7 6 7 8 Hardware
Minimizing Stall Time Non-local piece accesses Swarm Trigger high priority requests 4! Custom FS P2P Manager VM 1 2 0 1 2 0 4? 3 4 5 3 4 5 4? VMM 6 7 8 6 7 6 7 8 Hardware
P2P Helps Now, the time to download a single piece is t = W(d)+ S / rnet • W(d) :wait time for first bit as function of • d : piece diversity • rnet: network speed • S: piece size • n : # of peers Transfer time independent of n Wait is function of diversity
Low Diversity Little Benefit Nothing to share
P2P Helps, But Not Enough All peers request same pieces at same time t = W(d)+ S / rnet • Low piece diversity • Long wait (gets worse as ngrows) • Long download times
This Scenario p2pd Swarm Custom FS P2P Manager VM 1 2 0 1 2 0 3 4 5 3 4 5 VMM 6 7 8 6 7 6 7 8 Hardware
Profile-based Image Streaming Techniques(Making Streaming Effective)
How to Increase Diversity? Need to fetch pieces that are • Rare: not yet demanded by many peers • Useful: likely to be used by some peer
Profiling • Need useful pieces • But only small % of VM image accessed • We need to know which pieces accessed • Also, when (need later for piece selection)
Build Profile • One profile for each VM/workload • Ran one or more times (even online) • Use FS to track • Which pieces accessed • When pieces accessed • Entries w/ average appearance time, piece index, and frequency
Piece Selection • Want pieces not yet demanded by many • Don’t know piece distribution in swarm • Guess others like self • Gives estimate when pieces likely needed
Piece Selection Heuristic • Randomly (rarest first) pick one of first k pieces in predicted playback window • fetch w/ medium priority (demand wins)
Profile-based Prefetching • Increases diversity • Helps even w/ no peers (when ideal access exceeds network rate)
Obtain Full P2P Benefit Profile-based window-randomized prefetch t = W(d)+ S / rnet • High piece diversity • Short wait (shouldn’t grow much w/ n) • Quick piece download
Full VMTorrent Architecture p2pp Swarm Custom FS P2P Manager VM 1 2 0 1 2 0 3 4 5 3 4 5 VMM 6 7 8 6 7 6 7 8 profile Hardware
VMTorrent Prototype BT Swarm Custom C++ & Libtorrent Custom C Using FUSE Custom FS P2P Manager VM 1 2 0 1 2 0 3 4 5 3 4 5 6 7 8 6 7 6 7 8 profile Hardware