160 likes | 283 Views
Project 1 : Eigen-Faces Applied to Speech Style Classification. Brad Keserich, Senior, Computer Engineering College of Engineering and Applied Science; University of Cincinnati; Cincinnati, Ohio Suryadip Chakraborty, School of Computing Sciences and Informatics
E N D
Project 1 :Eigen-Faces Applied to Speech Style Classification Brad Keserich, Senior, Computer Engineering College of Engineering and Applied Science; University of Cincinnati; Cincinnati, Ohio Suryadip Chakraborty, School of Computing Sciences and Informatics Dr. Dharma Agrawal, Professor, School of Computing Sciences and Informatics • Sponsored ByThe National Science Foundation Grant ID No.: DUE-0756921
Introduction • Speech recognition • Voice disorders • Stuttering • Pausing • Other less known forms • Research group focus on Parkinson’s Patients
Techniques • Previous work • Good results using Neural Network classifiers using Fuzzy values • Wavelet Transformations are effective • For this project • Eigen-faces method adapted to audio
Goals Investigate the usefulness of the eigen-faces method for speech classification
Objectives • Acquire data • Extract salient features • Analyze Eigen-faces effectiveness
Eigen-faces for audio t w3 w4 w5 w1 w2 f1 f2 : wi vi = : : : : fr
Classifiers using Abstract Features • Training • Training set of feature vectors • Convert to Zero-mean truth set • Top k principle components (using principle component analysis (PCA)) • Classifying • Project new vectors onto eigenbasis • Residuals indicate closeness to a class
Data • Recorded word: “Ta-Be-Mo-No” • Consonant + vowel sounds • Easy to do segmentation • Use “Ta” portion only • Use voice acting for data collection • Same person • Vary the way the word is spoken • Variance of speaking style • Stuttering • Pausing • Pace • Pitch inflections
Voice Acting Abstract Features Stutter Detection Audio Recording Software Ground Truthing Segmentation Signal Duration Eigen-faces Speech Detection Pipeline Power Spectrum
Segmentation and Labeling • Automation • Works well for slow clear cases • Not as well for more realistic cases • Slow cases are close to hand segmentation • By Hand • More reliable segmentation at this point • Done with sample counts in Logic 8 • Label the segments with correct sound
Modifications • Use additional features in the Eigen-faces method • Stutter detection • Pauses and spacing within the spoken word • Pitch inflections • Utilize Mel-Cepstrum to pick up features • Substitute Laplacian Eigenmap for PCA
Results • Features performing well • Blatant stutter detection • Long durations • Spectrum analysis • Good class seperability
Conclusions • Eigen-faces work for spoken audio data • More tweaking required • Further research • Mel-Cepstrum features • Laplacian Eigenmapping to replace PCA • May be useful as a front end to Fuzzy-Neuro classifiers
References • Wu, H., Siegel, M., & Khosla, P. (1999). Vehicle sound signature recognition by frequency vector principal component analysis. IEEE Transactions on Instrumentation and Measurement, 48(5) doi: http://dx.doi.org/10.1109/19.799662. • Belkin, M. & Niyogi, P. (2002). Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. • Prahalld, K. Speech Technology: A Practical Introduction Topics: Spectogram, Cepstrum and Mel-Frequency Analysis. http://www.speech.cs.cmu.edu/11-492/slides/03_mfcc.pdf.