1 / 46

Computational Audition at AFRL/HE: Past, Present, and Future

Computational Audition at AFRL/HE: Past, Present, and Future. Dr. Timothy R. Anderson Human Effectiveness Directorate Air Force Research Laboratory. Biologically Based Signal Processing. AWACS. Sensor-decision maker- shooter. Future JAOC Command & Control. Speech Technologies. JAOC.

hetal
Download Presentation

Computational Audition at AFRL/HE: Past, Present, and Future

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Audition at AFRL/HE:Past, Present, and Future Dr. Timothy R. Anderson Human Effectiveness Directorate Air Force Research Laboratory

  2. Biologically Based Signal Processing AWACS Sensor-decision maker- shooter Future JAOC Command & Control Speech Technologies JAOC Chem-bio Defense Environment Combat Plans • research, development and applications of: • Biologically based algorithms • Perceptually relevant features • Human-centered metrics and models • to improve robustness of speech processing systems

  3. Why Is This Area Important? • Present signal processing systems (i.e. speech and speaker recognition, speech coding, etc.) are not robust in adverse military environments. • Biological principles offer potential to provide improved performance in military environments.

  4. Biologically Based Signal Processing • Approach • Develop psychoacoustic testing procedures • Characterize key features and processes • Developed human-centered model and metrics • Implement computationally efficient algorithms • Provide support to operational test and warfighting exercises to evaluate system utility • Technical Challenges • Identification and modeling of features and processes used by biological systems • Incorporation of those key features and processes into computationally efficient algorithms and structures

  5. Research Areas • Cockpit Speech Recognition • Robust Speech Recognition • Monaural Speech Recognition • Binaural Speech Recognition • Auditory Model Front-ends • Speaker Recognition/Verification • Biologically Based Speaker ID • Channel Robustness • Speaker Recognizability Test

  6. Phoneme Classification • Kohonen Self-Organizing Feature Map • 16 X 16 • 10 Speaker Database (TIMIT) • 10 sentences/speaker • Leaving one out method (per speaker) • Features calculated with • 16 ms window • 5 ms frame step

  7. TRADITIONAL VS. AUDITORY MONAURAL

  8. Binaural Speech Recognition • Past • Present • Future

  9. Binaural Speech Recognition • Stereausis • Cocktail Party Processor • BAIM • BINAP

  10. EXPERIMENT SETUP SOUND SOURCE X NOISE X SOURCE

  11. MONAURAL VS. BINAURAL COCKTAIL PARTY PROCESSOR

  12. MONAURAL VS. BINAURAL AUDITORY IMAGE MODEL

  13. BINAURAL

  14. MONAURAL

  15. BAIM VS. CPP-AIM

  16. COINCIDENCE

  17. MONAURAL, BINAURAL AND TRADITIONAL

  18. TASK SPEECH PHONEME RECOGNITION LOW TO HIGH SNR Binaural Speech Recognition RESULTS BINAURAL AUDITORY MODELPROVIDES BETTER REPRESENTATION THAN TRADITIONAL TECHNIQUES: RESULTS 7-12 dB BINAURAL ADVANTAGE

  19. Binaural Speech Recognition • Past • Present • No Current Work • Future

  20. Binaural Speech Recognition • Past • Present • Future • Implement binaural ASR system • Investigate further binaural fusion mechanisms • Meeting room data • Implement binaural system using AIM chips

  21. Auditory Model Front Ends • Past • Present • Future

  22. Auditory Model Front Ends • Tanner Research “Analog Speech Recognition” • Implementation of AIM • 56 channels Analog Filter bank • Single SBUS board • 1.5 X Real-time

  23. Auditory Model Front Ends • AFIT • Designed Digital Implementation • Middle ear, BMM, adaptive thresholding • 32 channels per chip • 300 Hz – 7 kHz • 44.1 KHz sampling rate • 2 chips provide 64 channels in real-time

  24. Auditory Model Front Ends • Past • Present • Single board system designed and prototyped - USB • Current chip design undergoing debug • Second fabrication run this fall • Future

  25. Auditory Model Front Ends • Past • Present • Future • Debug and verify chip fabrication • Debug PC based real-time auditory model front end • Implement complete end-to-end auditory ASR • Investigate feedback mechanisms in auditory model for ASR

  26. Biologically Based SID • Past • Present • Future

  27. Biologically Based SID • Auditory Models Investigated • Payton’s Auditory Model (PAM) • Auditory Image Model (AIM) • VQ Codebook used to model speaker • 37 Speakers from TIMIT (dr1,2 12F 25M) • MFCC 94% • PAM 67% • AIM 91%

  28. Biologically Based SID • Past • Present • Future

  29. Biologically Based SID • Using perceptual features • Formants, formant bandwidths, and pitch • Voiced Frames • Using GMM classifier • Conducting experiments on larger databases • Switchboard

  30. Biologically Based SID MFCCs, no Deltas, no CMS MFCCs, no CMS F0 Base

  31. Biologically Based SID MFCCs, no Deltas, no CMS MFCCs, no CMS F0 Base

  32. Biologically Based SID MFCCs, no Deltas, no CMS MFCCs, no CMS F0 Base

  33. Biologically Based SID MFCCs, no Deltas, no CMS MFCCs, no CMS F0 Base

  34. Biologically Based SID • Performance isn’t the best, but this feature set… • Uses only 9 features versus 19–38 for MFCCs • Hasn’t been as heavily researched as MFCCs

  35. Biologically Based SID • Determine reasons for performance differences between various databases • Channel & score normalizations • Pitch-synchronous features • Closed-phase analysis • Glottal model features

  36. Biologically Based SID

  37. Biologically Based SID • Past • Present • Future

  38. Biologically Based SID • Investigate other auditory based features • Vocal agitation • Formants, formant bandwidths, and pitch calculated from the auditory model • Auditory model features • Conduct experiments on other databases • Broadcast news • Military training exercises

  39. Speaker Recognizability Test • Past • Present • Future

  40. Speaker Recognizability Test • Dynastat “The Development of a Method for Evaluating and Predicting Speaker Recognizability in Voice Communication Systems” • Determined perceptually relevant features • Perceptual voice traits (PVT) • 21 traits currently identified • Developed methodology to measure these traits • Human listeners • Developed measure to determine loss due to channel • Diagnostic Speaker Recogniziability Test (DSRT)

  41. Speaker Recognizability Test • Past • Present • Future

  42. Speaker Recognizability Test • Use perceptual voice traits to identify groups of similar and distinctive speakers • Determine if current SID systems have difficulty with these similar speakers • Implementing in-house • Web-based listening test for • PVT rating • DSRT

  43. Speaker Recognizability Test • Past • Present • Future

  44. Speaker Recognizability Test • Obtain PVT ratings for larger database • Switchboard • Determine acoustic correlates of perceptually relevant features • Use as features for speaker recognition • Utilize DSRT for communication system testing

  45. Summary • Computational Audition offers potential for improved performance in adverse military environments • Still lots of research needs to be accomplished • Fidelity of model • Model feedback pathways • Computation issues no longer limiting factor in performing meanful experiments

  46. Questions?

More Related