230 likes | 375 Views
Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University. Project Outline. Implementation Project Modules Speech Database Speech Signal Analysis Hidden Markov Models + Training Language Models + Training Recognition Algorithms Evaluation. Implementation.
E N D
Seminar • Speech Recognition Projects • E.M. Bakker • LIACS Media Lab • Leiden University
Project Outline • Implementation • Project Modules • Speech Database • Speech Signal Analysis • Hidden Markov Models + Training • Language Models + Training • Recognition Algorithms • Evaluation
Implementation • A Safe C++ Programming Style • Not to be used in C++ • Syntax and Programming Style • Conventions • Basic Design Rules • Program Services • Memory Services • Diagnostics • Important Topics • Portability • Testing • Reliability
Implementation: A Safe C++ Programming Style • Features to be avoided, or not to be used in C++ • C inherited features • if(c=0), ?:, , ,goto, break, continue, union, struct, bit-wise, (&& || !), int, short, double, unsigned, ++, --, explicit constant numbers, cast, variable argument lists • Preprocessor features • macros for constants, macros for functions, #pragma, compiler/platform specific directives • Object Oriented • global data, global non-member functions, public data, friend, overloading operators@, ++,... • Memory and pointer-related • pointers, new, delete, malloc, free(), pointers to functions, ->, ->* .*, const char*, NULL, type &ref - t, type count[], type *count, type *count[], type (*count)[], type (&count) • printf, scanf, assembly language, object passed by val and temporary objects
Implementation: Syntax and Programming Style • Programs in plain English • Meaningful names • One statement per line • const: for data and methods whenever possible • variables: local whenever possible • private/protected data members whenever possible • do not use confusing syntax like • if (a) • for (I=0;I++<4;) • always use default in switch-statement • use assert in all the critical points
Implementation: Conventions • Functions and methods: My_Example_Function() • Variables: my_example_var • Classes: MyExampleClass • Constants: MY_EXAMPLE_CONSTANT • In general: meaningful names, except for indices • Comment: • file-description • version history (bugs new functionality) • user information (user guide) • implementation information (reference guide) • code comment
Implementation: Basic Design Rules • Project modularity achieved through classes. • Structure the program by Classes only (only methods are allowed, no separate functions) • Project is decomposed into modules with as little cross-dependence as possible • One module per class • Classes should have minimal interfaces • Modules should have minimal dependencies • Implementation issues hidden from clients (information hiding) • Inheritance should be extensively used • Advantages: • Improved readability • Reduced maintenance work • Improved robustness
Implementation: Program Services • Safe memory management • memory service • dynamic memory management: C++ without pointers • Diagnostics • decide which data must be checked when, and define the actions • File management, user interfaces • User program configuration management • Text data management • Mathematical data management
Implementation: Memory Services & Diagnostics • Memory Services • Diagnostics
Some Important Topics • Portability • portability and defined options in files: compatib.h, defopt.h, Boolean.h • Testing • test routines and version history • Reliability • readability • maintainability
RES General Specification • RES (Recognition Experimental System) is an HMM based experimental tool for continuous multispeaker speech recognition. The system works on recorded speech files and it basically includes: • the batch modules for acoustic model initialization and training • grammar models training • phoneme/word recognition • performance evaluation. • RES is state of art in speaker independent phonetic recognition: • with 69.2% of percent correct using all TIMIT test data using context independent phonetic models. • It yields 87.83% of percent correct in speaker independent word recognition on ATIS using context independent phonetic models not optimally tuned on this database.
RES General Specification • How to build an ASR system for a different language? • we need many segmented speech recordings to feed the training programs and get good HMM models of our voices. • use a freeware program like Snack 1.4 (search on the Internet) to prepare the data. • search a Dutch multispeaker phonetic database. • Design and feed the right language-model. • Speech samples to train and test the RES system? • You can download speech samples from Linguistic Data Consortium (LDC) after you have obtained a user account.
General Specification • Required C++ custom libraries: • none • Portability: • Linux • Windows 3.x, Windows 95, NT • DOS with DjGpp • Compilers: • Ms Visual C++ >4.0 • DjGpp version 2.8.1 or • GNU Linux Gpp version 2.8.1 or newer
Speech Database • Speech data retrieval • Speech files: • NIST1A (ATIS x, TIMIT), • MS WAV • custom, adding software drivers • Label File: • ATIS • IMIT • various subsets, custom labels alphabets included in a file, custom label handling supplying a driver. • Other options: • overlap • window length • file buffering
Speech Signal Analysis • Feature Extraction • Signal processing: • Any concatenation of processing blocks is allowed. Each block performs a class of processing and the actual processing is specified by the options. • Available processing blocks: • Preemphasis_and_Hamming • Mean_Subtraction • FFT • MFCC with Log/non Log Energy • any order differences • Other Blocks can be added supplying proper drivers.
Hidden Markov Models • HMM model Initialization • HMM topology: • 4 predefined types with configurable number of states. • Acoustic Units: • as allowed by the available database • emission densities: • Untied Gaussian mixtures • full or diagonal covariance matrix • number of mixtures configurable for each acoustic unit • Initialization method: • maximum distortion splitting on segmented database
Hidden Markov Models Training • Training algorithm: • Single and Simultaneous Model Re-estimation Baum-Welch. • parameter re-estimation: selective by configuration.
Language Models • Language Model: • unigram and bigram on words and phonemes • Smoothing techniques • Good-Turing, non-linear and linear interpolation model • Word Clustering: minimum mean square error on transition probability • Perplexity • word and phoneme based computation
Recognition • Recognition • Recognition Unit: • acoustic units, words • Algorithm Type • Viterbi with Beam search and Window search pruning strategies
Evaluation • Evaluation: Wagner-Fisher algorithm
Projects 1. Dutch Speech Corpus + Database Interface (2 groups) • in an early phase some example classes should be available, like counting, etc. • maybe use tools like ‘praat’ (for wav labeling with phonetics), etc. 2. Signal Analysis and Feature Extraction (2 groups) 3. HMM Initialization + HMM Dutch Phonetic Training (2 groups) 4. Dutch Language Model + Word Class Training (2 groups) • in an early phase some examples should be available 5. Recognition (2 groups) Evaluation (All)
Project Designs The design of the project should contain the following: • The implementation goals • The underlying technique and theory • A functional description of the starting-code and tools • The design of new code and functionality • Implementation goals and a time-scheme • NB if it is considered difficult to obtain all the goals within the current time-frame, team up with the other team • Interfacing • Define the module-interfaces • Define the time-path for the essential module-inputs • Define a realistic time-path for the (partial-)outputs of the module.