100 likes | 138 Views
AI: Neural Networks lecture 4 Tony Allen School of Computing & Informatics Nottingham Trent University. MLP speech enabled systems applications.
E N D
AI: Neural Networks lecture 4Tony AllenSchool of Computing & InformaticsNottingham Trent University
MLP speech enabled systems applications • M. Nakamura, K. Maruyama, T. Kawabata & K. Shikanu; “Neural Network Approach to Word Category Prediction for English Texts”; Procs. of the Int. Conference on Computational Linguistics, 1990, Helsinki, pp. 213–218. • H. Schmid, “Part-of-Speech Tagging with Neural Networks”, Procs. of the Int. Conference on Computational Linguistics, 1994, pp. 172–176. • Norman Poh & Jerzy Korczak; “Hybrid Biometric Person Authentication Using Face and Voice Features”; Proceedings of 3rd Int. Conference, Audio- and Video-Based Biometric Person Authentication AVBPA 2001, Sweden, pp 348-353, June 2001.
Part-of-Speech Tag Prediction In Bigram neural network predictor system, input vector is (one of 89 bit) POS tag for previous word. Output vector is (one of 89 bit) POS tag for next word.
Part-of-Speech Tag Prediction: Results • Prediction accuracy increases as more of the output neurons are included within the output classification
Part-of-Speech Tag Prediction: Application • Neural Network predictor used to improve speech recognition results by 6%. • HMM recogniser produces 10 potential word/tag candidates for each recognition event. Netgram predictor output used to select best word/tag candidate.
Part-of-Speech Tag Disambiguation • Each output node corresponds to one of the tags in the tagset. • Input vector comprises the POS probabilities of the current word and 2 future words plus the disambiguated tags of 3 preceeding words. • POS probabilities obtained from look-up table
Part-of-Speech Tag Disambiguation: Results • Network trained on 2 million subpart of PennTreebank corpus over 4 million epochs. • Networks tested on 100000 word subpart which was not part of training set.
Biometric user authentication 10 moments extracted from HSI colour eye images automatically located in face images using histogram analysis, round mask convolution and peak-searching algorithm. 64 Morlet wavelet coefficients extracted from 3 seconds worth of speech input.
Biometric user authentication: Results 2 mlps (one for face and one for voice) are trained for each of 30 people. Outputs of each pair of mlps ANDED together to give verification output for each person.