100 likes | 166 Views
A3: Incremental Specification in Context Cooperation: LSS, IMS Grzegorz Dogil, Bin Yang, Wolgang Wokurek Stefan Uhlich, Andreas Madsack. Content. Current research topics: Subglottal resonances Robust speech representation Landmark/Phoneme detection Future research direction.
E N D
A3: Incremental Specification in Context Cooperation: LSS, IMS Grzegorz Dogil, Bin Yang, Wolgang Wokurek Stefan Uhlich, Andreas Madsack
Content • Current research topics: • Subglottal resonances • Robust speech representation • Landmark/Phoneme detection • Future research direction
Subglottal Resonances • Topic: • Measurement of subglottal resonances • Relationship to vowel chart • Results: (in cooperation with Steven Lulich, MIT) • Recording of 20 speaker with sensors • Analysis of Swabian Diphthongs • Future (until 2010): • Measure 50+ speakers with new sensor • Calculate for these speakers vowelspacewith first two sg resonances G SG VT
Robust Speech Representation • How can we identify features that are important? • So far: Measuring relevance with mean squared error • A feature is more important, if it allows a good reconstruction in the spirit of the mean squared error • Mathematically more tractable • General question for the next project phase: • How do we measure perceptional relevance? Random Erasure Process Features s1,...,sN Subset of Features y1,...,yM Reconstr. Features ŝ1,...,ŝN Reconstruction
Landmark/Phoneme Detection • Topic: • Find relevant Features for Landmarks/Phonemes using statistical evaluation methods • Identify characteristic temporal contour of relevant features • Used features (only subset selected): • En. envelopes (A2), Liu bands • LPC (LSS), VQP (Wokurek), f0, MFCC, ... • Results (for different tasks): • Segment wise detection • Evaluation of performance difference for fixed and phoneme-based segmentation
Future Research Direction (I) • Identify perceptual relevant regions of speech
Future Research Direction (I) • Identify perceptual relevant regions of speech
Future Research Direction (II) • Example: /ae/ of handbag • Exemplars: Different versions of perceptual relevant regions for the same phoneme Set of all /ae/'s in corpus and corresponding feature values Feature Selectione.g. find best five features Statistical Classifier not covered i.e. 20 % New Exemplar, i.e. coverage of 80 %
Future Research Direction (III) • Work packages: • Regions: How to identify perceptual relevant regions in the (t,f)-plane? • Feature extraction: IMS, LSS (part already done), robustness • Feature selection: IMS (phonetically motivated), LSS (statistically motivated) + Combination of both + memory decay • Evaluation: Are the identified regions relevant? • ... for speech representation in context? (IMS) • ... for usage-induced context information? (LSS) • Transition to higher levels(pitch-accents (A1), syllables (A2), words (A4))
References • Subglottal Resonces • W. Wokurek and A. Madsack (2008), Messung subglottaler Resonanzen mit Beschleunigungssensoren, Fortschritte der Akustik--DAGA-2008 (Dresden) pp. 125-126 • A. Madsack, S. Lulich, W. Wokurek and G. Dogil (2008), Subglottal Resonances and Vowel Formant Variability: A Case Study of High German Monophthongs and Swabian Diphthongs, LabPhon11, Wellington • Robust Speech Representation, Incremental Specification • M. Lugger and B. Yang (2007), An incremental analysis of different feature groups in speaker independent emotion recognition, Proc. ICPhS 2007 • S. Uhlich and B. Yang (2008), A generalized optimal correlating transform for multiple description coding and its theoretical analysis, Proc. IEEE ICASSP 2008 • R. Blind, S. Uhlich, B. Yang and F. Allgöwer, Robustification and Optimization of a Kalman Filter with Measurement Loss using Linear Precoding, submitted to Proc. ACC 2009 • M. Lugger and B. Yang, “Psychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features“, to be published in: Speech Recognition, Publisher: I-Tech Education and Publishing, Vienna, Austria • Landmark/Phoneme Detection • A. Madsack, G. Dogil, S. Uhlich, Y. Zeng and B. Yang, On the Importance of Timing Information in Plosive Detection, submitted to Proc. ICASSP 2009