400 likes | 554 Views
QoS Measurement and Management for VoIP. Wenyu Jiang IRT Lab March 5, 2003. Introduction to VoIP & IP Telephony. Transport of voice packets over IP networks Cost savings Consolidates voice and data networks Avoids leased lines, long-distance toll calls Smart and new services
E N D
QoS Measurement and Management for VoIP Wenyu Jiang IRT Lab March 5, 2003
Introduction to VoIP & IP Telephony • Transport of voice packets over IP networks • Cost savings • Consolidates voice and data networks • Avoids leased lines, long-distance toll calls • Smart and new services • Call management (filtering, TOD forwarding): CPL • Better than PSTN quality: wide-band codecs • Protocols and Standards • Signaling: SIP (IETF), H.323 (ITU-T) • Transport: RTP/RTCP (IETF)
Practical Issues in VoIP • Quality of Service (QoS) • Internet is a best-effort network • Loss, delay and jitter • Users expect at least PSTN quality for VoIP! • Ease of deployment • Requires seamless integration with legacy networks (PSTN/PBX) • Security is a must • High yardstick of service availability • Can your network achieve 99.999% up time?
Outline • QoS measurement • Objective vs. subjective metrics • Automated measurement of subjective quality • QoS management: improving your quality • End-to-End: FEC, LBR, PLC • Network provisioning: voice traffic aggregation • Reality check • Performance of end-points (IP phones, …) • Deployment issues in VoIP • Evaluation of VoIP service availability through Internet measurement
Workings of a VoIP Client • Audio is packetized, encoded and transmitted • Forward error correction (FEC) may be used to recover lost packets • Playout control smoothes out jitter to minimize late losses; coupled with FEC • Packet loss concealment (PLC) • Last line of “defense” after FEC and playout
LBR: An Alternative to FEC • An (n,k) block FEC code can recover n-k losses • Low Bit-rate Redundancy (LBR) • Transmit a lower bit-rate version of original audio • No notion of “blocks” • Not bit-exact recovery
Objective QoS Metrics: Loss • Internet packet loss is often bursty • May worsen voice quality than random (Bernoulli) loss • Characterization of packet loss • 2-state Markov (Gilbert) model: conditional loss prob. • More detailed models, but more states! • Extended Gilbert model, nth order Markov model • Hidden Markov model, Gilbert-Elliot model, inter-loss distance • More states Larger test set, loss of big picture, and • Adaptive applications can trade-off model accuracy for fast feedback • Gilbert model provides an acceptable compromise
Effect of Gilbert Loss Model • Loss burst distribution of a packet trace • Roughly, though not exactly exponential • Loss burstiness on FEC performance • FEC less efficient under bursty loss 1000 Packet trace Gilbert model 100 number of occurrences 10 1 0.1 0 2 4 6 8 10 12 Loss burst length
Objective QoS Metrics: Delay • Complementary Conditional CDF (C3DF) • More descriptive than auto-correlation function (ACF) • Delay correlation rises rapidly beyond a threshold • Approximates conditional late loss probability
Subjective QoS Metrics • Perceived quality • Mean Opinion Score (MOS) • ITU-T P.800/830 • Obtained via listening tests • MOS variations • DMOS (Degradation) • CMOS (Comparison) • MOSc (Conversational): considers delay • A/B preference • Pros: more meaningful to end users • Cons: time consuming, labor intensive
Effect of Loss Model on Perceived Quality • Codec: G.729 (8kb/s ITU std) • Random (Bernoulli) vs. bursty (Gilbert) loss • Bursty lower MOS • True even when FEC or LBR is used
Going Further: Bridging Objective and Subjective Metrics • The E-model (ITU-T G.107/108) • Originally for telephone network planning • Considers various impairments • Reduces to delay and loss impairment when adapted for VoIP • Objective quality estimation algorithms • Suitable when network stats is not available, e.g., phone-to-phone service with IP in between. • Speech recognition performance may be used as a quality predictor, by comparing with original text
The E-model • Map from loss and delay to impairment scores (Ie, Id) • Compute a gross score (R value) and map to MOSc • Limited number of codec loss impairment mappings
Using Speech Recognition to Predict MOS • Evaluation of automatic speech recognition (ASR) based MOS prediction • IBM ViaVoice Linux version • Codec used: G.729 • Performance metric • absolute word recognition ratio • relative word recognition ratio
Recognition Ratio vs. MOS • Both MOS and Rabs decrease w.r.t. loss • Then, eliminate middle variable p
Speaker Dependency • Absolute performance is speaker-dependent • But relative word recognition ratio is not • Suitable for MOS prediction
Summary of QoS Measurement • Loss burstiness: • Affects (generally worsens) perceived quality as well as FEC performance • May be described with, e.g., a Gilbert model • Delay correlation: • Increases rapidly beyond a threshold, revealed through Complementary Conditional CDF (C3DF) • Late losses are also bursty • Perceived quality (MOS) estimation • Analytical: the E-model • If network statistics N/A: relative word recognition ratio can provide speaker-independent MOS prediction
Outline • QoS measurement • Objective vs. subjective metrics • Automated measurement of subjective quality • QoS management: improving your quality • End-to-End: FEC, LBR, PLC • Network provisioning: voice traffic aggregation • Reality check • Performance of VoIP end-points (IP phones, …) • Deployment issues in VoIP • Evaluation of VoIP service availability through Internet measurement
Quality of FEC vs. LBR • FEC is substantially and consistently better • At comparable bandwidth overhead • Across all codec configurations tested AMR LBR G.729+G.723.1 LBR
Quality of FEC under Bursty Loss • Packet interval T has a stronger effect on MOS with FEC than without FEC
FEC MOS Optimization Considering Delay Effect • Larger T FEC efficiency, but delay • Optimizing Twith the E-model • Calculate final loss probability after FEC, apply delay impairment of FEC, map to MOSc • Prediction close to FEC MOS test results • Suitable for analytical perceived quality prediction
Trade-off Analysis between Codec Robustness and FEC • 3 loss repair options • FEC, LBR, PLC • Loss-resilient codec • Better PLC • iLBC (IETF) • But more bit-rates • Better than FEC?
Observations and Results • When considering delay: • iLBC is usually preferred in low loss conditions • G.729 or G.723.1 + FEC better for high loss • Example: max bandwidth 14 kb/s • Consider delay impairment (use MOSc)
Effect of Max Bandwidth on Achievable Quality • 14 to 21 kb/s: significant improvement in MOSc • From 21 to 28 kb/s: marginal change due to increasing delay impairment by FEC
Provisioning a VoIP Network • Silence detection/suppression • Transmit only during On period, saves bandwidth • Allows traffic aggregation through statistical multiplexing • Characteristics of On/Off patterns in VoIP • Traditionally found to be exponentially distributed • Modern silence detectors (G.729B VAD, NeVoT SD) produce different patterns
Traffic Aggregation Simulation • Token bucket filter with N sources, R: reserved to peak BW ratio • CDF model resembles trace model in most cases • Exponential (traditional) model • Under-predicts out-of-profile packet probability; • Under-prediction ratio as token buffer size B • Similar results for NeVoT SD
Summary of QoS Management • End-to-End • FEC is superior in quality to LBR • Codec robustness is better than FEC in low loss conditions • Combining both schemes brings the best of both sides • Network provisioning • Observation: New silence detectors (G.729B, NeVoT SD) non-exponential voice On/Off patterns • Result: performance of voice traffic aggregation under new On/Off patterns • Important in traffic engineering and Service Level Agreement (SLA) validation
Outline • QoS measurement • Objective vs. subjective metrics • Automated measurement of subjective quality • QoS management: improving your quality • End-to-End: FEC, LBR, PLC • Network provisioning: voice traffic aggregation • Reality check • Performance of end-points (IP phones, …) • Deployment issues in VoIP • Assessment of VoIP service availability through Internet measurement
Mouth-to-ear Delay of VoIP End-points • All receivers can adjust M2E delay adaptively whenever it is too low or too high • M2E delay depends mainly on receiver (esp. RAT) • HW phones have relatively low delay (~45-90ms)
But Adaptiveness Perfection • Symptom of playout buffer underflow • Waveforms are dropped • Occurred at point of delay adjustment • Bugs in software? • LAN perfect quality?
Major Observations • Overall: end-points matter a lot! • HW IP phones: 45-90ms average M2E delay • SW clients: • Messenger 2000 lowest (68ms), XP (96-120ms) • c.f. GSMPSTN: 110ms either direction • NetMeeting very bad (> 400ms) • PLC robustness • Acceptable in all 3 IP phones tested, Cisco phone more robust • Silence detection/suppression • Works for speech input • Often fails for non-speech (e.g., music) input • Generates many unnatural gaps • Not good for customer support center (on-hold music)! • Acoustic echo cancellation (AEC): • Good on most IP phones (Echo Return Loss > 40 dB) • But some do not implement AEC at all
T1/E1 RTP/SIP Web Server Reality Check #2: IP Telephony Deployment • Localized deployment at Columbia Univ. Regular phone Conference Server Voicemail Server Telephone Switch/PBX Web based configuration sipd SIP proxy, redirect server SQL database Core Server SIP/PSTN Gateway Server status monitoring IP Phones
Issues and Lessons Learned • PSTN/PBX integration • Requires full understanding of legacy networks • Lower layer (e.g., T1 line configuration) • Parameters must match on both PSTN/PBX and gateway! • PBX access configurations • To ensure calls go through in both directions • Address translation (dial-plan) in both directions • Previous lessons/experiences can help greatly • E.g., second gateway installed in weeks instead of months • Security • Issue: SIP/PSTN gateway has no authentication feature • Solution: • Use gateway’s access control lists to block direct calls • SIP proxy server handles authentication using record-route
Reality Check #3: VoIP Service Availability • Focus on availability rather than traditional QoS • Delay is a minor issue; FEC recovers most isolated losses • Ability to make a call is vital, especially in emergency • Internet measurement sites: • 14 nodes worldwide, not just Internet2 and alike • Definitions: • Availability = MTBF / (MTBF + MTTR) • Availability = successful calls / first call attempts • Equipment availability: 99.999% (“5 nines”) 5 minutes/year • AT&T: 99.98% availability (1997) • IP frame relay SLA: 99.9% • UK mobile phone survey: 97.1-98.8%
First Look of Availability • Call success probability: • 62,027 calls succeeded, 292 failed 99.53% availability • Roughly constant across I2, I2+, commercial ISPs: 99.39-99.58% • Overall network loss • PSTN: once connected, call usually of good quality • exception: mobile phones • Compute % time below loss threshold • 5% loss causes degradation for many codecs • others acceptable till 20%
Network Outages • Sustained packet losses • arbitrarily defined at 8 packets • far beyond recoverable (FEC, interpolation) • 23% packet losses are outages • Make up significant part of 0.25% unavailability • Symmetric: AB BA • Spatially correlated: AB AX • Not correlated across networks (e.g., I2 and commercial) • Mostly short (a few seconds), but some are very long (100’s of seconds), make up majority of outage time
Outage-induced Call Abortion Probability • Long interruption user likely to abandon call • from E.855 survey: P[holding] = e-t/17.26 (t in seconds) • half the users will abandon call after 12s • 2,566 have at least one outage • 946 of 2,566 expected to be dropped 1.53% of all calls
Summary of Service Availability • Through several metrics, one can translate from network loss to VoIP service availability (no Internet dial-tone) • Current results show availability far below five 9’s, but comparable to mobile telephony • Outage statistics are similar in research and ISP networks • Working on identifying fault sources and locations • Additional measurement sites are welcome
Conclusions • Measuring QoS • Loss burstiness and delay correlation affects (generally worsens) perceived quality • Bridging objective and subjective metrics: the E-model, or speech recognition based MOS prediction • Performance of real products: IP phones and soft clients • Ensuring/improving QoS • Network provisioning (voice traffic aggregation) • Efficient, but may be expensive to deploy and manage • End-to-End (FEC > LBR, PLC) • Easier to deploy, but must control overhead of FEC • Reality Check • Good implementation at the end-point (e.g., IP phones) is vital • VoIP deployment requires PSTN integration and security • Service availability is crucial for VoIP, but still far from 99.999% over the Internet
Ongoing and Future Work • Sampling Internet performance • Where do the problems reside? • Access networks (Cable, DSL), or • International paths? • How can we solve these problems? • Can adaptive FEC react fast enough to changes in network conditions? • Playout delay behaviors of VoIP end-points • How well do they react to jitter, delay spikes?