160 likes | 392 Views
Formant-Pattern Estimation Guided by Cepstral Compatibility. Frantz Clermont Philip Harrison Peter French J.P. French Associates & University of York United Kingdom. IAFPA Conference Plymouth, UK 23-25 July 2007. Central to Acoustic Phonetics Crucial to Forensic Phonetics
E N D
Formant-Pattern Estimation Guided by Cepstral Compatibility Frantz Clermont Philip Harrison Peter French J.P. French Associates & University of York United Kingdom IAFPA Conference Plymouth, UK 23-25 July 2007
Central to Acoustic Phonetics • Crucial to Forensic Phonetics • Central to a Major Debate in Speech & Speaker Recognition • SHORT ANSWER: • STILL NO OBJECTIVE WAY OF ENSURING/CHECKING RELIABILITY OF MEASURED F-PATTERNS • APPRECIABLE VARIABILITY AMONGST SOFTWARE PACKAGES (HARRISON, 2004) • WAYS FORWARD – WHAT ARE WE TO DO? • THROW OUR HANDS IN DESPAIR & SIMPLY USE AVAILABLE TOOLS? • STILL HOPE FOR SOME VIABLE SOLUTIONS? A Long-Standing Problem: How “reliable” is the measured formant-pattern?
A New Approach: Particulars & Aims • Objectivity & Reliability of F-pattern Estimation • Compatibility with the observed spectrum • A “Smart” Measure of Compatibility • We use a related representation: The Cepstrum • Efficient approximation to the exact spectrum • More readily available from speech signal • Contains vocal-tract resonance information • Robustness in speech and speaker recognition • We propose a Cepstral Analysis-by-Synthesis Method • To generate Candidate Cepstra • To determine most compatible candidate w.r.t observed cepstrum
Linear Prediction (LP) ALL-POLE FILTER FILTER ORDER M Pathway to Speech Spectrum Source-Filter MODEL N.B. “OPTIMUM” M UNRESOLVED ISSUE!
EXACT RAW EXACT LP Pathways to Formant Patterns:Prominences of Spectral Shapes
A Vocal-Tract Parameter A Fourier-Series Model of EXACT LP-Spectrum Pathways to Formant Patterns (cont’d):LP-derived Cepstrum (order M) M÷2 POLES {BROAD & NARROW BANDWIDTHS}
The Cepstral Distance (Euclidean):Un-Weighted versus Index-Weighted
ACOUSTICSIGNAL FRAMES OBSERVEDCEPSTRUM CEPSTRAL DISTANCE LP CEPSTRUM order M = 12 CANDIDATE CEPSTRA LP CEPSTRUM TO POLES M÷2 = 6 GENERATE ALL UNIQUE QUARTETS BINOMIAL (6,4) CEPSTRUMCONVERSION Cepstral Analysis-by-Synthesis (CAbS)
MICROPHONE TELEPHONE Some Vowel Data to Test the Approach Harrison (2004) THIS STUDY (1 speaker) • 2 adult-male, native speakers • 3 contemporaneous tokens
4.5 Candidate Values of Upper Bound (kHz) 10 11 18 2.6 2.5 Candidate Values of M POLE-GRAM (LP-ORDER M = 10) + CAbS RESULTS SUPERIMPOSED (Upper Bound = 3.5 KHz) Implementation Methodology CALCULATEAVERAGE INTRA-VOWEL DISPERSION CLUSTERING QUALITY (CQ)
Opt. M = 13 [3300 – 4200] Hz CQ = 27.05 Hz MICROPHONE TELEPHONE Opt. M = 10 [3300 – 3700] Hz CQ = 27.80 Hz Results 1: Optimisation of Analysis Parameters (LP order & Spectral Range)
Results 2a (MICROPHONE DATA):Min. Distance vs Min. Mean Bandwidth
Results 2b (TELEPHONE DATA):Min. Distance vs Min. Mean Bandwidth
Concluding Summary • Introduced a New Approach to F-pattern Estimation • Two major schools of parameterisation – the formant & the cepstrum • F-patterns estimated by requiring compatibility w/ observed cepstra • Compatibility is achieved using a “smart” measure of compatibility • sensitivity to spectral peaks that provide the best spectral match • flexibility to select any spectral range • Current Results are Promising • Compatibility Measure exhibits Robustness • Estimated F-patterns are Consistent with Phonetic Expectationz • LP-Order and Spectral Range are Concurrently Optimised • Proposed Approach – A Serious First Step towards • Objectivity • Reliability • A Very Useful Tool
Looking towards the Future • More Challenging Data • Beyond Steady-State Vowels • Wide Range of Speakers • Variety of Voice Qualities • Differing Recording Conditions • Speculations • The All-Pole LP-Model is likely to be insufficient for all cases • The Approach of Cepstral Compatibility holds the potential of a long survival!