80 likes | 175 Views
Applications. EARs, 22-23 August 2011. Stanley Wenndt, PhD AFRL/RIGC Rome Research Site. Familiar Speaker Recognition. Two motivations Finish MS Neuroscience degree Needed 700-level NEU course, Ind Study only option Speech Power versus Speech Intelligibility Gerber 1974
E N D
Applications EARs, 22-23 August 2011 Stanley Wenndt, PhD AFRL/RIGC Rome Research Site
Familiar Speaker Recognition • Two motivations • Finish MS Neuroscience degree • Needed 700-level NEU course, Ind Study only option • Speech Power versus Speech Intelligibility • Gerber 1974 • What about SID
Audio Data • In-House Database • Longitudinal study (20 sessions over 3 years) • 65 subjects • 25 (20 males, 5 females) connected to the Audio Group • Read, Digits, Short Sentences, Conversations • 10 Short Sentences • Two intonations • Let’s go skiing today. • Visual and audible cue • Natural elicitation • Shortfalls (hindsight) • Unequal Sentences • Different degrees of familiarity between listeners/speakers
Listening Experiments • Session 1 – Pure Tone Test • Session 2 –Familiarization with Test Set-up • Session 3 – Clean • Session 4 – 0-1K Hz, -20 dB, Speech shaped, add WGN • Session 5 - 1-2K Hz, -20 dB, Speech shaped, add WGN • Session 6 - 2-3K Hz, -20 dB, Speech shaped, add WGN • Session 7 - 3-4K Hz, -20 dB, Speech shaped, add WGN • Session 8 – 0-4K Hz, 0 dB, Speech shaped, add WGN • Session 9 - Clean • Session 10 - Whispered • Session 11 – Time-reversed
Listening Experiments • Results reported in 2 groups • Normal Hearing • Hearing Deficit • Hard to draw conclusions from 2nd group • Don’t know severity of hearing loss • Experiments are a rough 1st pass • 10 SID Listening Sessions • Analyze data • Learn from mistakes
Current Research • Data Analysis • Difficult to compare between sessions • Is the performance statistically different • Between group, within group? • Current data analysis is focused on individual sentences • Let’s go skiing today. • Same phonetic content • Same noise (or lack of) • Same intonation • Same session • Main variable is the speaker • Formants, shimmer, jitter, energy, etc
“Male 8” • Most easily recognized voice • Except for Session 6 • 2K-3K noise • Currently, we build models the same • Good or bad? • Can we figure out what is unique or not unique about and individual’s voice?