Computational Audition at AFRL/HE: Past, Present, and Future

Computational Audition at AFRL/HE:Past, Present, and Future Dr. Timothy R. Anderson Human Effectiveness Directorate Air Force Research Laboratory

Biologically Based Signal Processing AWACS Sensor-decision maker- shooter Future JAOC Command & Control Speech Technologies JAOC Chem-bio Defense Environment Combat Plans • research, development and applications of: • Biologically based algorithms • Perceptually relevant features • Human-centered metrics and models • to improve robustness of speech processing systems

Why Is This Area Important? • Present signal processing systems (i.e. speech and speaker recognition, speech coding, etc.) are not robust in adverse military environments. • Biological principles offer potential to provide improved performance in military environments.

Biologically Based Signal Processing • Approach • Develop psychoacoustic testing procedures • Characterize key features and processes • Developed human-centered model and metrics • Implement computationally efficient algorithms • Provide support to operational test and warfighting exercises to evaluate system utility • Technical Challenges • Identification and modeling of features and processes used by biological systems • Incorporation of those key features and processes into computationally efficient algorithms and structures

Research Areas • Cockpit Speech Recognition • Robust Speech Recognition • Monaural Speech Recognition • Binaural Speech Recognition • Auditory Model Front-ends • Speaker Recognition/Verification • Biologically Based Speaker ID • Channel Robustness • Speaker Recognizability Test

Phoneme Classification • Kohonen Self-Organizing Feature Map • 16 X 16 • 10 Speaker Database (TIMIT) • 10 sentences/speaker • Leaving one out method (per speaker) • Features calculated with • 16 ms window • 5 ms frame step

TRADITIONAL VS. AUDITORY MONAURAL

Binaural Speech Recognition • Past • Present • Future

Binaural Speech Recognition • Stereausis • Cocktail Party Processor • BAIM • BINAP

EXPERIMENT SETUP SOUND SOURCE X NOISE X SOURCE

MONAURAL VS. BINAURAL COCKTAIL PARTY PROCESSOR

MONAURAL VS. BINAURAL AUDITORY IMAGE MODEL

BINAURAL

MONAURAL

BAIM VS. CPP-AIM

COINCIDENCE

MONAURAL, BINAURAL AND TRADITIONAL

TASK SPEECH PHONEME RECOGNITION LOW TO HIGH SNR Binaural Speech Recognition RESULTS BINAURAL AUDITORY MODELPROVIDES BETTER REPRESENTATION THAN TRADITIONAL TECHNIQUES: RESULTS 7-12 dB BINAURAL ADVANTAGE

Binaural Speech Recognition • Past • Present • No Current Work • Future

Binaural Speech Recognition • Past • Present • Future • Implement binaural ASR system • Investigate further binaural fusion mechanisms • Meeting room data • Implement binaural system using AIM chips

Auditory Model Front Ends • Past • Present • Future

Auditory Model Front Ends • Tanner Research “Analog Speech Recognition” • Implementation of AIM • 56 channels Analog Filter bank • Single SBUS board • 1.5 X Real-time

Auditory Model Front Ends • AFIT • Designed Digital Implementation • Middle ear, BMM, adaptive thresholding • 32 channels per chip • 300 Hz – 7 kHz • 44.1 KHz sampling rate • 2 chips provide 64 channels in real-time

Auditory Model Front Ends • Past • Present • Single board system designed and prototyped - USB • Current chip design undergoing debug • Second fabrication run this fall • Future

Auditory Model Front Ends • Past • Present • Future • Debug and verify chip fabrication • Debug PC based real-time auditory model front end • Implement complete end-to-end auditory ASR • Investigate feedback mechanisms in auditory model for ASR

Biologically Based SID • Past • Present • Future

Biologically Based SID • Auditory Models Investigated • Payton’s Auditory Model (PAM) • Auditory Image Model (AIM) • VQ Codebook used to model speaker • 37 Speakers from TIMIT (dr1,2 12F 25M) • MFCC 94% • PAM 67% • AIM 91%

Biologically Based SID • Using perceptual features • Formants, formant bandwidths, and pitch • Voiced Frames • Using GMM classifier • Conducting experiments on larger databases • Switchboard

Biologically Based SID MFCCs, no Deltas, no CMS MFCCs, no CMS F0 Base

Biologically Based SID • Performance isn’t the best, but this feature set… • Uses only 9 features versus 19–38 for MFCCs • Hasn’t been as heavily researched as MFCCs

Biologically Based SID • Determine reasons for performance differences between various databases • Channel & score normalizations • Pitch-synchronous features • Closed-phase analysis • Glottal model features

Biologically Based SID

Biologically Based SID • Investigate other auditory based features • Vocal agitation • Formants, formant bandwidths, and pitch calculated from the auditory model • Auditory model features • Conduct experiments on other databases • Broadcast news • Military training exercises

Speaker Recognizability Test • Past • Present • Future

Speaker Recognizability Test • Dynastat “The Development of a Method for Evaluating and Predicting Speaker Recognizability in Voice Communication Systems” • Determined perceptually relevant features • Perceptual voice traits (PVT) • 21 traits currently identified • Developed methodology to measure these traits • Human listeners • Developed measure to determine loss due to channel • Diagnostic Speaker Recogniziability Test (DSRT)

Speaker Recognizability Test • Use perceptual voice traits to identify groups of similar and distinctive speakers • Determine if current SID systems have difficulty with these similar speakers • Implementing in-house • Web-based listening test for • PVT rating • DSRT

Speaker Recognizability Test • Obtain PVT ratings for larger database • Switchboard • Determine acoustic correlates of perceptually relevant features • Use as features for speaker recognition • Utilize DSRT for communication system testing

Summary • Computational Audition offers potential for improved performance in adverse military environments • Still lots of research needs to be accomplished • Fidelity of model • Model feedback pathways • Computation issues no longer limiting factor in performing meanful experiments

Questions?

Computational Audition at AFRL/HE: Past, Present, and Future