1 / 10

A3: Incremental Specification in Context Cooperation: LSS, IMS

A3: Incremental Specification in Context Cooperation: LSS, IMS Grzegorz Dogil, Bin Yang, Wolgang Wokurek Stefan Uhlich, Andreas Madsack. Content. Current research topics: Subglottal resonances Robust speech representation Landmark/Phoneme detection Future research direction.

Download Presentation

A3: Incremental Specification in Context Cooperation: LSS, IMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A3: Incremental Specification in Context Cooperation: LSS, IMS Grzegorz Dogil, Bin Yang, Wolgang Wokurek Stefan Uhlich, Andreas Madsack

  2. Content • Current research topics: • Subglottal resonances • Robust speech representation • Landmark/Phoneme detection • Future research direction

  3. Subglottal Resonances • Topic: • Measurement of subglottal resonances • Relationship to vowel chart • Results: (in cooperation with Steven Lulich, MIT) • Recording of 20 speaker with sensors • Analysis of Swabian Diphthongs • Future (until 2010): • Measure 50+ speakers with new sensor • Calculate for these speakers vowelspacewith first two sg resonances G SG VT

  4. Robust Speech Representation • How can we identify features that are important? • So far: Measuring relevance with mean squared error • A feature is more important, if it allows a good reconstruction in the spirit of the mean squared error • Mathematically more tractable • General question for the next project phase: • How do we measure perceptional relevance? Random Erasure Process Features s1,...,sN Subset of Features y1,...,yM Reconstr. Features ŝ1,...,ŝN Reconstruction

  5. Landmark/Phoneme Detection • Topic: • Find relevant Features for Landmarks/Phonemes using statistical evaluation methods • Identify characteristic temporal contour of relevant features • Used features (only subset selected): • En. envelopes (A2), Liu bands • LPC (LSS), VQP (Wokurek), f0, MFCC, ... • Results (for different tasks): • Segment wise detection • Evaluation of performance difference for fixed and phoneme-based segmentation

  6. Future Research Direction (I) • Identify perceptual relevant regions of speech

  7. Future Research Direction (I) • Identify perceptual relevant regions of speech

  8. Future Research Direction (II) • Example: /ae/ of handbag • Exemplars: Different versions of perceptual relevant regions for the same phoneme Set of all /ae/'s in corpus and corresponding feature values Feature Selectione.g. find best five features Statistical Classifier not covered i.e. 20 % New Exemplar, i.e. coverage of 80 %

  9. Future Research Direction (III) • Work packages: • Regions: How to identify perceptual relevant regions in the (t,f)-plane? • Feature extraction: IMS, LSS (part already done), robustness • Feature selection: IMS (phonetically motivated), LSS (statistically motivated) + Combination of both + memory decay • Evaluation: Are the identified regions relevant? • ... for speech representation in context? (IMS) • ... for usage-induced context information? (LSS) • Transition to higher levels(pitch-accents (A1), syllables (A2), words (A4))

  10. References • Subglottal Resonces • W. Wokurek and A. Madsack (2008), Messung subglottaler Resonanzen mit Beschleunigungssensoren, Fortschritte der Akustik--DAGA-2008 (Dresden) pp. 125-126 • A. Madsack, S. Lulich, W. Wokurek and G. Dogil (2008), Subglottal Resonances and Vowel Formant Variability: A Case Study of High German Monophthongs and Swabian Diphthongs, LabPhon11, Wellington • Robust Speech Representation, Incremental Specification • M. Lugger and B. Yang (2007), An incremental analysis of different feature groups in speaker independent emotion recognition, Proc. ICPhS 2007 • S. Uhlich and B. Yang (2008), A generalized optimal correlating transform for multiple description coding and its theoretical analysis, Proc. IEEE ICASSP 2008 • R. Blind, S. Uhlich, B. Yang and F. Allgöwer, Robustification and Optimization of a Kalman Filter with Measurement Loss using Linear Precoding, submitted to Proc. ACC 2009 • M. Lugger and B. Yang, “Psychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features“, to be published in: Speech Recognition, Publisher: I-Tech Education and Publishing, Vienna, Austria • Landmark/Phoneme Detection • A. Madsack, G. Dogil, S. Uhlich, Y. Zeng and B. Yang, On the Importance of Timing Information in Plosive Detection, submitted to Proc. ICASSP 2009

More Related