1 / 11

Frantz Clermont J.P. French Associates & University of York

Speaker-variance ratios in forensically-realistic vowel formant data: Normalising for consonantal context. Frantz Clermont J.P. French Associates & University of York Forensic Speech & Acoustics Laboratory United Kingdom. Annual Conference of

Download Presentation

Frantz Clermont J.P. French Associates & University of York

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaker-variance ratios in forensically-realistic vowel formant data: Normalising for consonantal context Frantz Clermont J.P. French Associates & University of York Forensic Speech & Acoustics Laboratory United Kingdom Annual Conference of The International Association of Forensic Phonetics & Acoustics @Vienna, AUSTRIA 24-28 July 2011

  2. Looking Back: 2008 research aim & findings QUESTION: Which formant(s) have greatest speaker discrimination value? • VOWEL CORPUS (DyViS): • 25 male speakers • Standard Southern British English • 5 short vowels: • KIT, DRESS, TRAP, STRUT, LOT • vowel tokens selected from • unscripted conversation • varying consonantal contexts • F1, F2, F3 measured at the most • stationary location in the vowel between-speaker variance within-vowel variance MAIN FINDINGS F3 • carries bulk of speaker information • phonetic spread< speaker spread F2, F1 • carry little speaker information • phonetic spread>speaker spread

  3. A pending question: Might context effects be blurring speaker information in F2, F1? F-ratio observed (F2, F1) phonetic spread larger than speaker spread wide token-to-token phonetic variations We approach the above question by: • employing a phonologically-coded version of the same DyViS corpus (Simpson, 2008) • adapting a novel method of vowel-formant “de-Contexting” (Broad & Clermont, 2002;Clermont, 2010) • re-visiting our ANOVA analysis before & after de-Contexting tokens come from wide-ranging contexts

  4. 3 CONTEXT CLASSES (places of articulation) (Simpson, 2008) 5 SHORT VOWELS KIT DyViS (2008 corpus): A Phonologically-Coded Version Code “B”: Labial /p, b, m, f, v/ DRESS INITIAL POSITION: SCARCITY of “B” (LOT) “D” (TRAP) 14 Speakers ineligible FINAL POSITION: ONLY 2 EMPTY CELLS 2 Speakers ineligible TRAP Code “C”: Coronal /t, d, θ, s, ʃ, ɹ, l/ STRUT LOT Code “D”: Dorsal /ŋ, k, g, j/ • 23 speakers retained from the original set of 25 • Same 5 vowels & same set of F1, F2, F3 values • 3 context-dependent sets of tokens per vowel/speaker

  5. Characterising Systematic Context Effects: Principle & Method (per formant & per speaker) Effect #2 VFE scaling relative to meanVFE Effect #1 VFE translation Finding VFE Scales: Linear regression of each VFE on meanVFE Vowel-Formant Ensemble (VFE) VFEs are geometrically SIMILAR HYPOTHESIS: VFEs are geometrically SIMILAR VFE’s scale = line slope Illustrations based on SPEAKER #10’s data

  6. “De-Contexting” by INVERSE (VFE) Scaling: Principle & Method (per formant & per speaker) • VFE-Scales: • VFE-to-VFE similarity • sizes relative to mean VFE • INVERSE VFE-Scales: • VFE-to-VFE size standardisation • comparable to size of mean VFE Still SPEAKER #10’s data Still SPEAKER #10’s data

  7. On the Non-deContextability of F3(illustration based on speaker #10’s data)

  8. SPEAKER SEPARATION DIAGNOSTIC:F-ratio (23 speakers) before & after de-Contexting F1 F2 • Overall: • F-ratios : 4 of 5 vowels statistically sig. • de-contexted TRAP: stands out •  TRAP & LOT: main contenders • de-Contexted F1 and raw F3: •  competitively closer for TRAP! • Overall: • F-ratios: all 5 vowels statistically sig. •  KIT, DRESS, TRAP: stand out •  de-contexting causes major improvements for all 5 vowels • de-Contexted F2 and raw F3: •  competitively closer for all vowels

  9. Conclusions • Might context effects be blurring • speaker information in F2 and F1? • Affirmative … notably for F2 (all 5 vowels): • de-contexting brings F2 on par with F3 results of 2008! • particularly significant owing to: • vowel centres… distant from zones of major context influence • broad context classes … 3 places of artic. span 16 consonants • initial-context effects… left uncontrolled • Towards more robust forensic practice: • F3 … remains a major carrier of speaker information • F2 (de-contexted) … carries nearly as much power • FRONT vowels ... hold greater discrimination value

  10. THANKS • Dr David J. Broad (U.S.A.) • Dr J. Peter French (U.K.) • Dr2B Philip T. Harrison (U.K.)

  11. References Clermont, F. (2010). “A linear-scaling method for normalising vowels in various consonantal contexts”, Abstract Proc. 13th Australasian Int. Conf. on Speech Science and Technology (SST), Melbourne, 14-16 December. Clermont, F., French, P., Harrison, P. and Simpson, S. (2008). “Population data for English spoken in England: A modest first step”, Abstract Proc. Annual Conf. of the International Association for Forensic Phonetics and Acoustics (IAFPA), Lausanne, 21-23 July. Simpson, S. (2008). “Testing the speaker discrimination ability of formant measurements in forensic speaker comparison cases”, Unpublished MSc Thesis, University of York, UK. Broad, D.J. and Clermont, F. (2002). “Linear scaling of vowel-formant ensembles (VFEs) in consonantal contexts”, Speech Communication, vol. 37, pp. 175-195.

More Related