110 likes | 236 Views
Speaker-variance ratios in forensically-realistic vowel formant data: Normalising for consonantal context. Frantz Clermont J.P. French Associates & University of York Forensic Speech & Acoustics Laboratory United Kingdom. Annual Conference of
E N D
Speaker-variance ratios in forensically-realistic vowel formant data: Normalising for consonantal context Frantz Clermont J.P. French Associates & University of York Forensic Speech & Acoustics Laboratory United Kingdom Annual Conference of The International Association of Forensic Phonetics & Acoustics @Vienna, AUSTRIA 24-28 July 2011
Looking Back: 2008 research aim & findings QUESTION: Which formant(s) have greatest speaker discrimination value? • VOWEL CORPUS (DyViS): • 25 male speakers • Standard Southern British English • 5 short vowels: • KIT, DRESS, TRAP, STRUT, LOT • vowel tokens selected from • unscripted conversation • varying consonantal contexts • F1, F2, F3 measured at the most • stationary location in the vowel between-speaker variance within-vowel variance MAIN FINDINGS F3 • carries bulk of speaker information • phonetic spread< speaker spread F2, F1 • carry little speaker information • phonetic spread>speaker spread
A pending question: Might context effects be blurring speaker information in F2, F1? F-ratio observed (F2, F1) phonetic spread larger than speaker spread wide token-to-token phonetic variations We approach the above question by: • employing a phonologically-coded version of the same DyViS corpus (Simpson, 2008) • adapting a novel method of vowel-formant “de-Contexting” (Broad & Clermont, 2002;Clermont, 2010) • re-visiting our ANOVA analysis before & after de-Contexting tokens come from wide-ranging contexts
3 CONTEXT CLASSES (places of articulation) (Simpson, 2008) 5 SHORT VOWELS KIT DyViS (2008 corpus): A Phonologically-Coded Version Code “B”: Labial /p, b, m, f, v/ DRESS INITIAL POSITION: SCARCITY of “B” (LOT) “D” (TRAP) 14 Speakers ineligible FINAL POSITION: ONLY 2 EMPTY CELLS 2 Speakers ineligible TRAP Code “C”: Coronal /t, d, θ, s, ʃ, ɹ, l/ STRUT LOT Code “D”: Dorsal /ŋ, k, g, j/ • 23 speakers retained from the original set of 25 • Same 5 vowels & same set of F1, F2, F3 values • 3 context-dependent sets of tokens per vowel/speaker
Characterising Systematic Context Effects: Principle & Method (per formant & per speaker) Effect #2 VFE scaling relative to meanVFE Effect #1 VFE translation Finding VFE Scales: Linear regression of each VFE on meanVFE Vowel-Formant Ensemble (VFE) VFEs are geometrically SIMILAR HYPOTHESIS: VFEs are geometrically SIMILAR VFE’s scale = line slope Illustrations based on SPEAKER #10’s data
“De-Contexting” by INVERSE (VFE) Scaling: Principle & Method (per formant & per speaker) • VFE-Scales: • VFE-to-VFE similarity • sizes relative to mean VFE • INVERSE VFE-Scales: • VFE-to-VFE size standardisation • comparable to size of mean VFE Still SPEAKER #10’s data Still SPEAKER #10’s data
On the Non-deContextability of F3(illustration based on speaker #10’s data)
SPEAKER SEPARATION DIAGNOSTIC:F-ratio (23 speakers) before & after de-Contexting F1 F2 • Overall: • F-ratios : 4 of 5 vowels statistically sig. • de-contexted TRAP: stands out • TRAP & LOT: main contenders • de-Contexted F1 and raw F3: • competitively closer for TRAP! • Overall: • F-ratios: all 5 vowels statistically sig. • KIT, DRESS, TRAP: stand out • de-contexting causes major improvements for all 5 vowels • de-Contexted F2 and raw F3: • competitively closer for all vowels
Conclusions • Might context effects be blurring • speaker information in F2 and F1? • Affirmative … notably for F2 (all 5 vowels): • de-contexting brings F2 on par with F3 results of 2008! • particularly significant owing to: • vowel centres… distant from zones of major context influence • broad context classes … 3 places of artic. span 16 consonants • initial-context effects… left uncontrolled • Towards more robust forensic practice: • F3 … remains a major carrier of speaker information • F2 (de-contexted) … carries nearly as much power • FRONT vowels ... hold greater discrimination value
THANKS • Dr David J. Broad (U.S.A.) • Dr J. Peter French (U.K.) • Dr2B Philip T. Harrison (U.K.)
References Clermont, F. (2010). “A linear-scaling method for normalising vowels in various consonantal contexts”, Abstract Proc. 13th Australasian Int. Conf. on Speech Science and Technology (SST), Melbourne, 14-16 December. Clermont, F., French, P., Harrison, P. and Simpson, S. (2008). “Population data for English spoken in England: A modest first step”, Abstract Proc. Annual Conf. of the International Association for Forensic Phonetics and Acoustics (IAFPA), Lausanne, 21-23 July. Simpson, S. (2008). “Testing the speaker discrimination ability of formant measurements in forensic speaker comparison cases”, Unpublished MSc Thesis, University of York, UK. Broad, D.J. and Clermont, F. (2002). “Linear scaling of vowel-formant ensembles (VFEs) in consonantal contexts”, Speech Communication, vol. 37, pp. 175-195.