260 likes | 373 Views
Vincent Barriac, Jean-Yves Le Saout, Catherine Lockwood France Telecom R&D, Lannion, France Teamlog, Lannion, France : vincent.barriac@francetelecom.com jeanyves.lesaout@francetelecom.com.
E N D
Vincent Barriac, Jean-Yves Le Saout, Catherine LockwoodFrance Telecom R&D, Lannion, FranceTeamlog, Lannion, France: vincent.barriac@francetelecom.com jeanyves.lesaout@francetelecom.com Discussion on unified objective methodologies for the comparison of voice quality of narrowband and wideband scenarios Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction
Context • Emergence of new services using wideband speech communications • Need to track the performance of communication channels mixing wideband and narrowband conditions (for example, scalable…)
How to evaluate the speech quality? Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction
Subjective tests • Subjective tests for Narrowband conditions • Subjective tests for Wideband conditions • Subjective tests for mixed Narrowband and Wideband conditions?
Perceptual Evaluation of Speech Quality (PESQ) • PESQ for Narrowband Conditions • PESQ for Wideband Conditions? • PESQ for mixed Narrowband and Wideband Conditions?
Open Issues Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction
Questions about subjective tests • Is it possible to merge narrowband and wideband subjective scales? • In order to adapt existing MOS scores for narrowband systems to such a common scale, should we introduce in all subjective tests wideband references? • Or, can we find a mapping function to adapt narrowband subjective MOS values to wideband equivalent values?
Questions about PESQ • Would wideband-PESQ be adequate for measuring both wideband and narrowband codecs ? • Is the mapping function of P.862.1 also applicable for wideband scenarios ? • Finally, how to compare wideband PESQ values with the narrowband values?
First results on Mixed Narrowband & Wideband Subjective tests Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction
Description of the subjective tests (1/4) • The method of assessment uses the ACR (Absolute Category Rating) method as given in Recommendation P800. Each judgement has been collected on a 5-point quality scale, and scores have been assigned according to the classic ACR methodology: 5 Excellent 4 Good 3 Fair 2 Poor 1 Bad
Description of the subjective tests (2/4) • All the conditions are level adjusted to -26 dB with P.56 algorithm • Headphones are set at a constant nominal level of –79 dB SPL . • PESQ values are evaluated on the two sets of conditions in order to calibrate the judgement scale.
Description of the subjective tests (3/4) • Two tests: • one narrowband test, • a second narrowband and wideband mixed test containing the same narrowband conditions as in the first one • For each ACR test, 3 different groups of 8 listeners. Each listening session divided into 2 sub-sessions.
Description of the subjective tests (4/4) • First test: 30 test conditions including: • The 22 previous conditions • 3 wideband codecs at 16 kHz at different bit rates • a 16 kHz clear channel reference. 22 narrowband test conditions including: • 4 standard codecs (alone or in tandeming conditions at different bit rates) • 3 wideband codecs at different bit rates with output signals down-sampled to 8 kHz • a clear channel reference down-sampled to 8 kHz. • Second test:
Results of mixed Narrowband/ Wideband subjective test Improvement due to the increase of frequency range
Impact of mixing narrowband and wideband conditions • Decrease of narrowband conditions MOS values obtained in a " mixed narrowband/wideband " test in comparison to those obtained with a "narrowband only" test
Impact of mixing narrowband and wideband conditions • No change of MOS values on wideband conditions for a mixed narrowband/wideband test compared to MOS values on wideband conditions for a "wideband only" test Subjective MOS (mixed NB/WB test) Subjective MOS (wideband-only test)
Conclusion on Subjective tests • No need to introduce systematically wideband references in narrowband subjective tests Better definition of the scale with a complete use of the MOS scale. • Transfer function to adapt narrowband MOS scores to mixed narrowband/wideband MOS scale.
Extension of this result to PESQ Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction
Adaptation & validation of PESQ for wideband conditions • Modification of the input filter • Use of a mapping function Equation of the mapping function:
PESQ Results for wideband conditions Good matching between MOS scores and PESQ values Mapping function well adapted on test set Results to be confirmed on more test material
Conclusion on objective measures • Merge of Narrowband-PESQ and Wideband-PESQ in a unified scale by the same transfer function as for subjective tests. • Two objective measures PESQ. • PESQ with P862.1 mapping function for narrowband studies. • PESQ (including input filter modification) with a new mapping function? for wideband studies. • Transfer function.
Conclusion Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction
Perspectives Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction
Possible Applications • Tool to evaluate the best compromise between bit rate and frequency range for scalable codecs. • Tool to calibrate the MOS scale coverage for subjective tests. • Extension of the model E to wideband applications with the determination of new equipment impairment factors Ie according to the usual procedure using auditory listening results.
Questions? Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction