1 / 16

SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch

SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch. Contents. Introduction Objectives Articulatory Features Speech Material Experimental details set-up Results Questions, future plans. Introduction.

tara
Download Presentation

SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACE Louis ten Bosch

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPEECH VARIATION AND THE USE OF DISTANCE METRICS ON THE ARTICULATORY FEATURE SPACELouis ten Bosch

  2. Contents • Introduction • Objectives • Articulatory Features • Speech Material • Experimental details • set-up • Results • Questions, future plans

  3. Introduction • Speech is usually represented in terms of sequences from a limited set of phone-like symbols (ASR, synthesis, annotation) • ‘Beads-on-a-string’ paradigm (Ostendorf, 1999; etc) • Powerful as meta description • Weak to describe articulatory variation, pronunciation variation • Research on new descriptions & models of speech • Many proposals for new signal representations (continuity preserving, auditorily inspired) and new models (neural models, long-span models, parallel models) • Here: articulatory features (AF)

  4. Objectives • To obtain alternative representations that intrinsically better model variation in speech • Focus on articulatory/pronunciation variation • To investigate the relation between better representations and decoding

  5. Articulatory Features (AFs) • AF advantages are twofold: • Allow feature asynchrony • Deal with ‘incompleteness’: incomplete nasalization, voicing • Intrinsically better modelling of continuous processes • Assumed to better model fine phonetic details (FPD) • FPD mediate human speech processing (lexical access) • [together with indexical information]

  6. Distance Metric in AF Space • Each utterance is a path in AF space • Distance metric in AF space defines ‘speed’ along path • Compare with delta-features in ASR • Speed peak detection impose intrinsic temporal structure • Which distances to use? • Three types (L1, L2, cosine) • How relates this ‘intrinsic’ temporal structure with external temporal structure e.g. phone boundaries?

  7. Articulatory Features and Their Values

  8. Speech Material • IFAcorpus (Dutch, read + prepared, 8 speakers, 6 used for training and development, 2 for test) • Many different rich annotation levels

  9. AF Classification Results by ANNs

  10. AF-Based Events and Segment Boundaries

  11. Alignment Results • Nbr of hits (detected -> observed) versus time window size: Wesenick & Kipp ‘96

  12. Asynchrony and Phonetic Classes Average (in number of frames) and standard deviation of the difference (diff.) between cosine-peak location and manual boundary. Only the transitions with extreme negative and positive distances are shown. Manner transition avg. (st.dev.) Fricative-fricative -0.57 (1.6) Vowel-vowel -0.31 (1.8) …. Silence-approximant 0.49 (1.8) Approx.-stop 0.63 (1.6) Vowel-silence 0.64 (2.1) Nasal-approx 0.66 (1.0)

  13. Open questions 1 • To what extent the type of distance (L1, L2, cosine) distinguishes fine detail in the alignment with manual segmentation? • For distances close to 0, all metrics will provide about the same result • The metrics deviate for larger distances, thereby putting more weight to different types of distinctions • This means that event parsing along the AF trajectory may result into essentially different segmentations along the trajectory for different metrics.

  14. Open questions 2 • What about the cue trading (by using weights)? • Difficult, depends on phone • What about the precise quantification of asynchrony? • The variation of observed AF vectors around a canonical AF vector = feature asynchrony + the variation in the classifier output

  15. Near-future plans • Exploit phenomena described here in terms of design principles for alternative procedures for data-driven annotation and unit selection • Design word recognition framework based on AF representation of speech • Study usability for memory-prediction models

  16. Thank you for your attention

More Related