250 likes | 368 Views
Impact of Packet Loss Location on Perceived Speech Quality. Lingfen Sun Graham Wade, Benn Lines Emmanuel Ifeachor University of Plymouth, U.K. {L.F.Sun@jack.see.plym.ac.uk} {j.wade,B.Lines,E.Ifeachor@plym.ac.uk}. Outline. Introduction Codec's internal concealment and convergence time
E N D
Impact of Packet Loss Location on Perceived Speech Quality Lingfen Sun Graham Wade, Benn Lines Emmanuel Ifeachor University of Plymouth, U.K. {L.F.Sun@jack.see.plym.ac.uk} {j.wade,B.Lines,E.Ifeachor@plym.ac.uk}
Outline • Introduction • Codec's internal concealment and convergence time • Perceptual speech quality measurement • Simulation system • Loss location with perceived quality • Loss location with convergence time • Conclusions and future work
Gateway Gateway IP Network SCN SCN Introduction • End-to-end speech transmission quality • IP network performance (e.g. packet loss and jitter) • Gateway/terminal (codec + loss/jitter compensation) • Impact of packet loss on perceived speech quality • Loss pattern (e.g. burst/random) • Loss location (codec's concealment)
Introduction (cont.) • Previous research on loss location • Concealment performance is speech content related (e.g. voiced/unvoiced) • Analysis based on MSE or SNR for limited codec • Perceptual objective methods only to assess overall quality under stochastic loss simulations • Questions: • How does a packet loss location affect perceived speech quality ? • How does a packet loss location affect codec's convergence time (for loss constraint)?
Codec's internal concealment • What is codec's concealment? • When a loss occurs, the decoder interpolates the parameters for the lost frame from parameters of previous frames. • Which codec has concealment algorithm? • G.729/G.723.1/AMR (main VoIP codecs) • CELP analysis-by-synthesis • What are the limitations of concealment algorithms? • During unvoiced(u) or voiced(v) • During u/v
Codec's convergence time • What is convergence time? • The time taken by decoder to resynchronize its state with encoder after a loss occurs. It is also called resynchronization time. • For set up loss constraint distance between two consecutive losses for new packet loss metrics • What is the relationship between convergence time with loss location, codec type and packet size?
System/network under test Perceptual quality measurement Reference signal Objective perceptual quality test Objective MOS Degraded signal • Transform the signal into the psychophysical representation approximating human perception • Calculating their perceptual difference • Mapping to objective MOS (Mean Opinion Score) • Algorithms: PSQM/PSQM+/MNB/EMBSD/PESQ
Simulation System Reference speech Degraded speech without loss Bitstream encoder decoder convengence time analysis Degraded speech with loss loss simulation decoder • Perceptual speech quality analysis with loss location • Convergence time analysis with loss location perceptual quality measure Reference speech
Speech test sentence • Speech test sentence is about 6 seconds. • First talkspurt (about 1.34 second, above waveform) is used for loss location analysis. • Four voiced segments, V(1) to V(4), which can be decided by pitch delay in G.729 codec
Pitch delay from G.729 codec V(2) V(1) V(3) V(4)
Loss location with perceived quality • Each time only one packet loss is created • Loss position moves from left to right one frame by one frame • Overall perceptual quality is measured from PSQM/PSQM+, MNB and EMBSD • Packet size: 1 to 4 frames/packet • Codec: G.729/G.723.1/AMR • How does a loss location affect perceived speech quality ?
Loss position with quality (1) Loss position reference speech PSQM+ degraded speech PSQM
Loss position with quality (2) Loss position reference speech PSQM+ degraded speech PSQM
Loss position with quality (3) Loss position reference speech PSQM+ degraded speech PSQM
Loss position with quality (4) reference speech Loss position degraded speech PSQM+ PSQM
Loss location with perceived quality • Loss location affects perceived quality. • The loss at unvoiced speech segment has no obvious impact on perceived quality. • The loss at the beginning of the voiced segment has the most severe impact on perceived quality. • PSQM+ yields the most detailed result comparing to MNB/EMBSD
Loss location with convergence time • Convergence time is almost the same for different packet size • Convergence time for a loss at unvoiced segments appears stable • Convergence time shows a good linear relationship for loss at the voiced segments • maximum at the beginning • linear descending • Up bound to the end of voiced segments
Conclusions and future work • Investigated the impact of loss locations on perceived speech quality • Investigated the impact of loss locations on convergence time • The results will be helpful to develop a perceptually relevant packet loss metric. • Future work will focus on more extensive analysis of the impact of packet loss on speech content