690 likes | 702 Views
The Reliability of Formant Measurements in High Quality Audio Data: The Effect of Agreeing Measurement Procedures. Martin Duckworth, Kirsty McDougall, Gea de Jong, Linda Shockey. DyViS. Introduction. Formant measurement implicitly required legally in the UK in speaker comparison cases
E N D
The Reliability of Formant Measurements in High Quality Audio Data: The Effect of Agreeing Measurement Procedures Martin Duckworth, Kirsty McDougall, Gea de Jong, Linda Shockey DyViS
Introduction • Formant measurement implicitly required legally in the UK in speaker comparison cases • Measurements on analogue spectrograms had to be by hand and eye • Measurements on digital spectrograms can be assisted by formant trackers, LPC is common
Introduction • How replicable are measurements by eye on digital spectrograms?
Introduction • How replicable are measurement by eye on digital spectrograms? • If LPC tracking is used what can lead to variability?
Introduction • How replicable are measurement by eye on digital spectrograms? • If LPC tracking is used what can lead to variability? • Software settings
Introduction • How replicable are measurement by eye on digital spectrograms? • If LPC tracking is used what can lead to variability? • Software settings • Point at which data is extracted
Study Aims • What is required in order to make measurements more replicable?
Study Aims • What is required in order to make measurements more replicable? • If software (but not method) is held constant and data is high quality, can different laboratories make the same F1-3 measurements?
Study Aims • What is required in order to make measurements more replicable? • If software (but not method) is held constant and data is high quality, can different laboratories make the same F1-3 measurements? • If method of analysis is the same does this lead to statistically improved reliability between laboratories?
Aims continued • We are aiming to find a reliable means of obtaining formant values • We are examining reliability, not validity
Data • read speech from Cambridge DyViS database • male • Standard Southern British English • aged 18-25 • 40 speakers: Set 1 (20 speakers) Set 2 (20 speakers)
Data • 6 monophthongs: / iː, æ, ɑː, ɔː, ʊ, uː / • 6 repetitions per vowel per speaker • elicited in hVd contexts in sentences: It’s a warning we’d better HEED today. It’s only one loaf, but it’s all Peter HAD today. We worked rather HARD today. We built up quite a HOARD today. He insisted on wearing a HOOD today. He hates contracting words, but he said a WHO’D today.
Measurements • Analysts from 3 labs – Cambridge, Plymouth, Reading • Task: to measure F1, F2, F3 for each vowel token using Praat • Set 1 – using individual – but constrained- methods • Set 2 – after a meeting at which a single method is agreed
Set 1 Methods • Measure the formants at a relatively early point in the vowel
Set 1 Methods • Measure the formants at a relatively early point in the vowel • Measure formants over no more than 5 glottal pulses
Set 1 Methods • Measure the formants at a relatively early point in the vowel • Measure formants over no more than 5 glottal pulses • Use either: • LPC tracking checked against the spectrogram or
Set 1 Methods • Measure the formants at a relatively early point in the vowel • Measure formants over no more than 5 glottal pulses • Use either: • LPC tracking checked against the spectrogram or • hand/eye measures
Set 2 Method • Measure towards the start of the vowel
Set 2 Method • Measure towards the start of the vowel • Measure in a relatively steady early part of the vowel
Set 2 Method • Measure towards the start of the vowel • Measure in a relatively steady early part of the vowel • Measure around the vowel's maximum intensity
Set 2 Method • Measure towards the start of the vowel • Measure in a relatively steady early part of the vowel • Measure around the vowel's maximum intensity • Use a single time slice
Set 2 Method (continued) • Use the LPC formant tracker adjusted for best visual fit
Set 2 Method (continued) • Use the LPC formant tracker adjusted for best visual fit • When values generated by Praat are judged by visual inspection to be incorrect, replace them by correct values from a time-slice immediately preceding or following the slice being measured.
Results: HAD, F1 Set 1 Lab1 Lab2 Lab3
Results: HAD, F1 Set 1 Lab1 Lab2 Lab3
Results: HAD, F1 Set 1 Set 2 Lab1 Lab2 Lab3 Lab1 Lab2 Lab3
Results: HAD, F1 Set 1 Set 2 Lab1 Lab2 Lab3 Lab1 Lab2 Lab3
Statistical Analysis • 3 formants 6 vowels 2 datasets = 36 tests • Two-way ANOVA - repeated measures on the factor Lab (3) - between-groups factor Speaker (20) • If Lab signficant at p < 0.05:Pairwise comparisons with Sidak correction
Results: HAD, F1 Set 1 Set 2 Lab1 Lab2 Lab3 Lab1 Lab2 Lab3
Results: HAD, F1 Set 1 Set 2 Lab1 Lab2 Lab3 Lab1 Lab2 Lab3 Lab: significant
Results: HAD, F1 Set 1 Set 2 0.001 0.000 0.000 Lab1 Lab2 Lab3 Lab1 Lab2 Lab3 Lab: significant
Results: HAD, F1 Set 1 Set 2 0.001 0.000 0.000 Lab1 Lab2 Lab3 Lab1 Lab2 Lab3 Lab: significant but pairwise comparisons NS Lab: significant
Results: HAD, F1 Set 1 Set 2 0.001 0.000 NS 0.000 NS NS Lab1 Lab2 Lab3 Lab1 Lab2 Lab3 Lab: significant but pairwise comparisons NS Lab: significant
Results: HAD, F2 Set 1 Set 2 NS NS NS NS NS NS Lab1 Lab2 Lab3 Lab1 Lab2 Lab3 Lab: not significant Lab: not significant
Results: HAD, F3 Set 1 Set 2 NS 0.000 NS 0.000 NS NS Lab1 Lab2 Lab3 Lab1 Lab2 Lab3 Lab: not significant Lab: significant
Summary - HAD Set 1 Set 2
Summary - HAD main effect Set 1 Set 2
Summary - HAD Set 1 Set 2 pairwise comparisons
Summary - HAD Set 1 Set 2
Summary - HAD Set 1 Set 2 improvement
Summary - HAD Set 1 Set 2
Summary - HAD Set 1 Set 2
Summary - HAD Set 1 Set 2 improvement
Summary - HAD Set 1 Set 2
Summary - HAD Set 1 Set 2 Set 2: good news
Effect of Lab - 6 vowels Set 1
Effect of Lab - 6 vowels Set 1 Set 2
Influence of Speaker • Interaction Lab x Speaker significant (p < 0.05) for F1-F3 of all 6 vowels for both Set 1 and Set 2 • certain speakers lead to measurement differences among labs for example…