370 likes | 545 Views
INTRODUCTION. METHODS. RESULTS. CONCLUSION. Noise Robust Speech Recognition Group SB740. INTRODUCTION. METHODS. RESULTS. CONCLUSION. Standard feature extraction. Framing. FFT. Filter Bank. Cepstrum Coefficients. speech. features. INTRODUCTION. METHODS. RESULTS. CONCLUSION.
E N D
INTRODUCTION METHODS RESULTS CONCLUSION Noise Robust Speech Recognition Group SB740
INTRODUCTION METHODS RESULTS CONCLUSION Standard feature extraction Framing FFT Filter Bank Cepstrum Coefficients speech features
INTRODUCTION METHODS RESULTS CONCLUSION Improved feature extraction Pre- Processing Post- Processing Filter Bank Cepstrum Coefficients Framed FFT spectrum features
INTRODUCTION METHODS RESULTS CONCLUSION Pre-ProcessingQuantile Based Noise Estimation for spectral subtraction (QBNE) • Assuming that each frequency band contain only noise in a fraction of time even during speech • For each frequency band the frames are sorted by amplitude • A fixed q-value equal for all frequency bands • Intersection between the vertical line and each frequency band is the noise estimate • Problem with mis-matched training and test conditions
INTRODUCTION METHODS RESULTS CONCLUSION Pre-ProcessingAdaptive Quantile Based Noise Estimation for spectral subtraction (AQBNE) • Goal is to improve the performance when training with low noise and testing with high noise • Adapt to the utterance and noise levels • Adjust the q-value for each frequency band • Result is a q-estimation curve as opposed to a fixed value • High and low noise situations will converge to similar representations
INTRODUCTION METHODS RESULTS CONCLUSION Filter BankSpeech Band Emphasizing Filter Bank (SBE) • Mel Frequency Cepstrum Coefficient (MFCC) • Motivated from human perception and critical bands • Mel Frequency Filter Bank • Triangular filters • Highest resolution at low frequencies • Resulting Importance Function • Speech Band Emphasizing Filter Bank • Emphasizes the primary speech band • Highest resolution at 1500 Hz
INTRODUCTION METHODS RESULTS CONCLUSION Results • QBNE with Mel Frequency Filter Bank showed an improvement of 15% • AQBNE with SBE Filter Bank showed an improvement of 28% • AQBNE with SBE Filter Bank showed a remarkable result under highly mis-matched conditions:80% improvement compared to 21% when using QBNE with Mel Frequency Filter Bank
INTRODUCTION METHODS RESULTS CONCLUSION Conclusion • AQBNE avoids describing speech signals during training to a level of detail which is unattainable during testing under noisy conditions • The suggested SBE Filter Bank, though empirically chosen, indicates that filter distributions other than the standard Mel-scale may attain improved performance in noisy conditions
Presentation of Abstract • Agenda: • Purpose of the abstract. • Structure of the abstract. • Content of the abstract.
Purpose of abstract • Announcement to the 17th 7 semester conference the 21th of December 2004. • Appetizer to attract the right audience. • In the abstract it is kept in mind that the audience for this project is other 7 semester students from the institute of electronic systems in Aalborg and Esbjerg.
Structure of the abstract • Title: • Topic:The long title gives a detailed description of the content: • ”Noise Robust Automatic Speech Recognition with Adaptive Quantile Based Noise Estimation and Speech Band Emphasizing Filter Bank” • Nature: Noise estimation. • Scope: Automatic speech recognition. • Text is structured as IMRaD structure.
Structure of the abstract • Throughout the text important keywords are used: • ASR, Noise Estimation, Feature Extraction. • Known methods presented before new methods to create continuity. • Complexity increased during the abstract.
Content of the abstract • Introduction: • Contains information of the initial problem, the proposes made in the paper and field of operation. • This is the shortest section in the abstract, but contains a lot of keywords.
Content of the abstract • Methods: • This section is the longest of the abstract, and contains references to known methods as well as new methods and solutions are introduced. • The first sentence in this section is linket to the introduction by the phrase ”feature extraction”. • This section ends with an advertisment to the results.
Content of the abstract • Results: • The methods that have improved the recognition performance is presented first. • The best result is mentioned with the exact result compared to known methods. • The proposed solutions that have not improved the recognition is mentioned last in the section.
Content of the abstract • Discussion: • First the method that did not improve the recognition performance is explained. • Secondly the methods that have improved the recognition performance are described. • The abstract is concluded by the recommendations based on the results achieved in this project.
Structure of Paper • IMRaD model • Introduction - Introduction • Methods - Methods (PP, QBNE, AQBNE, SBE) • Results - Experimental framework - Experimental results • Discussion - Conclusion
Introduction • Problem definition • Noise in speech signals has a dramatic effect on ASR. • Analysis • Analysis of known methods. • Interesting known methods (PP, QBNE, MFCC). • Results: Develop new methods and combine different methods.
Methods • Known methods • PP – Short presentation of method and implementation. • QBNE – Short presentation of method and thorough description of implementation. • New methods • AQBNE and SBE – Motivation (Why is this a good method?) – Implementation (Compared to QBNE and MFCC)
Results • Description of measurement instrument (HTK) and SpeechDat-Car database. • Results in tables
Results • Discussion of results in text. • Chosen results in graph.
Conclusion • Contains a summary of the important results, so it can be read and understood right after reading the abstract.
Worksheets • Agenda: • Structure and organization • Brief presentation of worksheets
Structure and organization • The worksheets are basis for the paper and the implementation of our system • Directly information about methods • Necessary background knowledge • Give the group members the necessary knowledge to understand a subject • Write in english • The topic of the project was completely new to us • Impossible to plan work for a long time period • Discuss subjects, study, discuss new subjects • Writing procedure: • The group discusses which subjects that need to be investigated • 1-2 persons work together and write a work sheet • The group read and give feedback • 1 person finish it
Brief presentation of work sheets • 1. Introduction • State the aim of the project and our initial problem • 2. Speech production • Human speech characteristics • 3. Hidden Markov Model • Often used in speech recognition systems • 4. Unwanted noise and effects • Noise and affects that can affect our system • 5. Java execution speed test • Consideration of implementation language • 6. Java processor blocks • Documents the implementation of our system • 7. Matlab related • How to read sound files from SpeechDat-Car database
Brief presentation of work sheets • 8. Frontend Interfaces • Input: SpeechDat-Car audio wave format, Output: HTK format • 9. The standard frontend • Transformation of the sampled audio data into freature vectors • 10. Post-Processing • 11. The Mel filterbank • 12. Quantile Based Noise Estimation • 13. Spectral subtraction • 14. Experimental framework • How we have tested the methods influence on the speech recognition • 15. Experimental results • Describes our baseline and refer to App. A • 16. Structure of abstract and paper • Overview of the important elements • App. A: Raw results
Causality • Causal: • Post-Processing • Speech Band Emphasizing Filter Bank • Non-causal: • (Adaptive) Quantile Based Noise Estimation
Ordinary (non-causal) QBNE • One discrete frequency (w) • Entire utterance is used for noise estimate
Causal QBNE • One discrete frequency (w) • Noise estimate updated for each new frame
Causal QBNE n=0 n=1 n=2
Causality • PP and SBE are inherently causal • QBNE and AQBNE can be made causal by using af buffer for the quantile • Additional computational cost • Reduced storage requirement
Closure • Agenda: • Future work • Project working process
Future work (1/2) • Implement causal AQBNE • Find optimal q-estimation curve etc.
Future work (2/2) • Combine AQBNE and SBE with advanced front-end (WI008) AQBNE SBE Filter-Bank Source: ETSI ES 202 050 V1.1.3 (2003-11)
Project working process • Project reporting form • No 3 weeks final report correction • Worksheets easier to write than report chapters • Difficult to parallelize tasks • Few tasks • Large groups • Information gathering • State of the art knowledge from scientific papers • No textbooks with up to date information exist