470 likes | 592 Views
Computational Audition at AFRL/HE: Past, Present, and Future. Dr. Timothy R. Anderson Human Effectiveness Directorate Air Force Research Laboratory. Biologically Based Signal Processing. AWACS. Sensor-decision maker- shooter. Future JAOC Command & Control. Speech Technologies. JAOC.
E N D
Computational Audition at AFRL/HE:Past, Present, and Future Dr. Timothy R. Anderson Human Effectiveness Directorate Air Force Research Laboratory
Biologically Based Signal Processing AWACS Sensor-decision maker- shooter Future JAOC Command & Control Speech Technologies JAOC Chem-bio Defense Environment Combat Plans • research, development and applications of: • Biologically based algorithms • Perceptually relevant features • Human-centered metrics and models • to improve robustness of speech processing systems
Why Is This Area Important? • Present signal processing systems (i.e. speech and speaker recognition, speech coding, etc.) are not robust in adverse military environments. • Biological principles offer potential to provide improved performance in military environments.
Biologically Based Signal Processing • Approach • Develop psychoacoustic testing procedures • Characterize key features and processes • Developed human-centered model and metrics • Implement computationally efficient algorithms • Provide support to operational test and warfighting exercises to evaluate system utility • Technical Challenges • Identification and modeling of features and processes used by biological systems • Incorporation of those key features and processes into computationally efficient algorithms and structures
Research Areas • Cockpit Speech Recognition • Robust Speech Recognition • Monaural Speech Recognition • Binaural Speech Recognition • Auditory Model Front-ends • Speaker Recognition/Verification • Biologically Based Speaker ID • Channel Robustness • Speaker Recognizability Test
Phoneme Classification • Kohonen Self-Organizing Feature Map • 16 X 16 • 10 Speaker Database (TIMIT) • 10 sentences/speaker • Leaving one out method (per speaker) • Features calculated with • 16 ms window • 5 ms frame step
TRADITIONAL VS. AUDITORY MONAURAL
Binaural Speech Recognition • Past • Present • Future
Binaural Speech Recognition • Stereausis • Cocktail Party Processor • BAIM • BINAP
EXPERIMENT SETUP SOUND SOURCE X NOISE X SOURCE
TASK SPEECH PHONEME RECOGNITION LOW TO HIGH SNR Binaural Speech Recognition RESULTS BINAURAL AUDITORY MODELPROVIDES BETTER REPRESENTATION THAN TRADITIONAL TECHNIQUES: RESULTS 7-12 dB BINAURAL ADVANTAGE
Binaural Speech Recognition • Past • Present • No Current Work • Future
Binaural Speech Recognition • Past • Present • Future • Implement binaural ASR system • Investigate further binaural fusion mechanisms • Meeting room data • Implement binaural system using AIM chips
Auditory Model Front Ends • Past • Present • Future
Auditory Model Front Ends • Tanner Research “Analog Speech Recognition” • Implementation of AIM • 56 channels Analog Filter bank • Single SBUS board • 1.5 X Real-time
Auditory Model Front Ends • AFIT • Designed Digital Implementation • Middle ear, BMM, adaptive thresholding • 32 channels per chip • 300 Hz – 7 kHz • 44.1 KHz sampling rate • 2 chips provide 64 channels in real-time
Auditory Model Front Ends • Past • Present • Single board system designed and prototyped - USB • Current chip design undergoing debug • Second fabrication run this fall • Future
Auditory Model Front Ends • Past • Present • Future • Debug and verify chip fabrication • Debug PC based real-time auditory model front end • Implement complete end-to-end auditory ASR • Investigate feedback mechanisms in auditory model for ASR
Biologically Based SID • Past • Present • Future
Biologically Based SID • Auditory Models Investigated • Payton’s Auditory Model (PAM) • Auditory Image Model (AIM) • VQ Codebook used to model speaker • 37 Speakers from TIMIT (dr1,2 12F 25M) • MFCC 94% • PAM 67% • AIM 91%
Biologically Based SID • Past • Present • Future
Biologically Based SID • Using perceptual features • Formants, formant bandwidths, and pitch • Voiced Frames • Using GMM classifier • Conducting experiments on larger databases • Switchboard
Biologically Based SID MFCCs, no Deltas, no CMS MFCCs, no CMS F0 Base
Biologically Based SID MFCCs, no Deltas, no CMS MFCCs, no CMS F0 Base
Biologically Based SID MFCCs, no Deltas, no CMS MFCCs, no CMS F0 Base
Biologically Based SID MFCCs, no Deltas, no CMS MFCCs, no CMS F0 Base
Biologically Based SID • Performance isn’t the best, but this feature set… • Uses only 9 features versus 19–38 for MFCCs • Hasn’t been as heavily researched as MFCCs
Biologically Based SID • Determine reasons for performance differences between various databases • Channel & score normalizations • Pitch-synchronous features • Closed-phase analysis • Glottal model features
Biologically Based SID • Past • Present • Future
Biologically Based SID • Investigate other auditory based features • Vocal agitation • Formants, formant bandwidths, and pitch calculated from the auditory model • Auditory model features • Conduct experiments on other databases • Broadcast news • Military training exercises
Speaker Recognizability Test • Past • Present • Future
Speaker Recognizability Test • Dynastat “The Development of a Method for Evaluating and Predicting Speaker Recognizability in Voice Communication Systems” • Determined perceptually relevant features • Perceptual voice traits (PVT) • 21 traits currently identified • Developed methodology to measure these traits • Human listeners • Developed measure to determine loss due to channel • Diagnostic Speaker Recogniziability Test (DSRT)
Speaker Recognizability Test • Past • Present • Future
Speaker Recognizability Test • Use perceptual voice traits to identify groups of similar and distinctive speakers • Determine if current SID systems have difficulty with these similar speakers • Implementing in-house • Web-based listening test for • PVT rating • DSRT
Speaker Recognizability Test • Past • Present • Future
Speaker Recognizability Test • Obtain PVT ratings for larger database • Switchboard • Determine acoustic correlates of perceptually relevant features • Use as features for speaker recognition • Utilize DSRT for communication system testing
Summary • Computational Audition offers potential for improved performance in adverse military environments • Still lots of research needs to be accomplished • Fidelity of model • Model feedback pathways • Computation issues no longer limiting factor in performing meanful experiments