180 likes | 323 Views
Towards High-Availability for IP Telephony using Virtual Machines. Devdutt Patnaik, Ashish Bijlani and Vishal K Singh. Outline. Virtualization High Availability (HA) in Virtualized Platforms XEN and REMUS (HA solution for XEN) Remus applied to IP Telephony (IPT) applications
E N D
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh
Outline • Virtualization • High Availability (HA) in Virtualized Platforms • XEN and REMUS (HA solution for XEN) • Remus applied to IP Telephony (IPT) applications • Scalability and Reliability of IPT applications using Virtualization • Experimental Results • Conclusion
Virtualization and its Benefit • Abstraction layer (Hypervisor) between the physical hardware and the OS. • Single physical machine can host multiple virtual machines each running a different OS + application stack • VMMs • Xen, VMWare, Microsoft HyperV • Benefits • Server consolidation • Green computing • Cost savings – space and power • High Availability Reliability solutions, ease of upgrades with near zero down-times
Virtualized hosting for IP Telephony • Virtualized hosting for IP Telephony already available • Avaya, Cisco, Asterix etc. • IP Telephony in Cloud • Scalability: ability to elastically add/remove additional servers while supporting High-Availability for all servers • Reliability: protection against hardware and software failures • HA features in virtualization platforms • Memory state check pointing
Virtualization and High Availability • Seamless fail-over, Efficient and transparent migration of VM to another physical machine • Live Migration with very small down-times • Minimal or no impact to client nodes • Asynchronous check-pointing • Continuously syncs the state between the primary and secondary host • We use • Remus: A High Availability Solution for XEN
Remus on XEN • Remus is a High Availability solution available on the Xen VMM • Remus uses continuous check-pointing and keeps a consistent client view of network state • The secondary machine hosts a paused replica of the primary VM • Uses a heart-beat mechanism • Failure to receive periodic heart-beat on secondary will un-pause the backup VM • Heart beat time-out can be configured Fig 1 Image: http://osnet.cs.nchu.edu.tw/powpoint/seminar/2008/Remus.pdf
Remus on XEN (contd.) • Remus modes of operation • Net Mode – Highly reliable • No-Net Mode – better performance with negligible packet loss in case of failure • Tunable for Reliability vs. Performance Fig. 2 Disk writes and Network Writes • Net Mode: Buffers outgoing network packets until execution state is synced with the back up VM (on secondary host). • reliability at cost of performance Image: http://osnet.cs.nchu.edu.tw/powpoint/seminar/2008/Remus.pdf
Remus applied to IP Telephony- Scale with Reliability • Our work using HA in XEN extends: “architecture for fail-over and load sharing for IP Telephony” proposed by Kundan Singh et. al. • Challenges: • Overheads of virtualization on IP Telephony performance • Co-Hosted/Co-located media server causes interference because of heavy I/O workload
Reliability and Scalability using Virtual Machines • Scalability using load balancer (LB) • LB can elastically add more VMs as demand grows • Reliability using Remus in XEN Stateless Load balancer Reliability Architecture using Virtual machines • For every primary Virtual Machine there is a back up VM in paused state. • Since, backup VM is paused, it allows to place other running VMs on the same physical machine • Provides N to M elastic/backup model (m back up for n primary)
Reliability and Scalability using Virtual Machines (contd.) • Reliability • Provided by Xen + Remus • Failure of primary starts the execution of the secondary with IP address takeover • Clients continue to execute un-affected • Signaling and Media Server: • Co-located on same VM • allows better utilization, • no overhead of inter-vm communication • Placed on different VM • elastic scaling of media and signaling VM’s
Studying Performance Implications • Experimental setup • Primary /Backup Servers • Intel Core 2 Quad Processors, 2.5 Ghz, 8 GB RAM, 4MB L2 Cache • Hypervisor – Xen 3.2.1 + Remus • Default Credit Scheduler configuration • Guest OS : Para Virtualized Linux 2.6.18 • IP Telephony Workload • Modeled our workload using SIPStone • Measured % success of registrations during failover • Used UDP and TCP as transport for registrations • Used OpenSIPs as SIP server • RTPProxy as Media Server • SIPp for generating signaling and media traffic
Analysis and Results: Signaling • Guest VM and Domain 0 both have high CPU utilization with tcp_n (new tcp connection for each REGISTER) • UDP and tcp_1 (1 tcp connection for all REGISTER) have similar overhead. CPU utilization (in guest VM, dom0) Udp means with udp transport, tcp_1 means same connection for all call, tcp_n means new connection for each call With Remus NET mode, Registration overhead.
Analysis and Results: Signaling • CPU overhead increases with proportionately with signaling loads • Dom0 has significant overheads due to check-pointing overheads. • Net Mode gives good results for Signaling • With 1400 regs/sec failure was induced • with 100% completion of all by failover to the back up
Analysis and Results: Media • Media loads with Net Mode gives poor results • Media with No-Net gives good performance even with 400 streams with 2% losses • This can be further reduced by tweaking scheduler parameters • 100% fail-over of all calls in progress during media experiments Net Mode 100, 200, 400, 600 and 800 streams No Net Mode 100, 200, 400, 600 and 800 streams
Conclusion • Using No-Net mode for media streams gives us a balance between performance(loss and delay) and reliability(failover) while still being able to migrate 100% of all calls in progress (using TCP) which is a significant result • Net Mode for Signaling is a good configuration with 100% registration completion with failover • No-Net mode for the Media server deployment provides significant improvement in performance: loss and delay reduces significantly • While the No-Net configuration performs better for media, it may not provide call completion guarantees during the fail-over operation for signaling • Migration of user registration and call setup operations was 100% successful
Contributions • Extended load sharing and failover architecture using Virtualization • Proposed use of high availability feature in virtualized platforms to achieve reliability in IP Telephony • Proposed placement scheme of signaling and media applications for scale(elasticity) and efficiency (utilization) • Systematic evaluation of overheads involved in use of virtualization for IP Telephony Applications • Demonstrated that High Availability using Virtual Machines can be deployed for medium scale IP Telephony infrastructure
Future Work • More detailed analysis of overheads • Overhead because of check pointing in virtualization platform • Overhead because of I/O in Domain 0 • Propose solutions to improve performance • Improve I/O handing in XEN VMM • Propose better VM placement algorithm for IP Telephony applications • Utilizing fine grained overhead measurements for resource allocation • Considering I/O (media) vs. memory (signaling state replication) optimizations • Elasticity with co-location of media and signaling server on same VM
Questions • vs2140@columbia.edu