1 / 24

Acoustic Landmarks and Speech Information

This talk explores the location of speech information in acoustic phonetics and psychophysics, and its relevance to speech recognition. It discusses the typology of acoustic landmarks and the encoding of phonemes using distinctive features. Infograms and two-point infograms are introduced as tools for analyzing speech information density. The talk concludes with an overview of the landmark-synchronous Baum-Welch algorithm and its potential advantages.

croucha
Download Presentation

Acoustic Landmarks and Speech Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Information at Acoustic Landmarks Mark Hasegawa-Johnson Electrical and Computer Engineering, UIUC

  2. Outline of this Talk • Where is Speech Information? • A Typology of Acoustic Landmarks • Phoneme Encoding w/Distinctive Features • Infograms • Two-point Infograms • Entropy and Average Classification • Landmark-Synchronous Baum-Welch

  3. Where is Speech Information?

  4. Where is Speech Information? • According to Acoustic Phonetics • Vowel information is in the STEADY STATE • Consonant information is near TRANSITIONS • According to Speech Psychophysics • Speech w/o steady state is intelligible • Speech w/o transitions is unintelligible • According to Speech Recognition • Transition-anchored observations outperform segment-anchored observations.

  5. Typology of Acoustic Landmarks • ACOUSTIC LANDMARK = • Perceptually salient acoustic event • near which phonetic information is dense. • Examples • Consonant Release: “ga, ma, sa” • Consonant Closure: “egg, em, ess” • Manner Change: “agfa, anfa, asfa, asna” • Counter-Examples: Non-Landmarks • Place Change: “agda, aftha, amna, this ship”

  6. Information about GLIDE is most dense at: Point of maximum constriction • Middle of intersyllabic glide “aya, a letter” • Start of syllable-initial glide “tra” • End of syllable-final glide “art”

  7. Information about a VOWEL is dense near: • 1. On-glide and off-glide (GLIDE landmarks!) • 2. VOWEL LANDMARK: pick a reference time near center of the “steady state”

  8. Acoustic Landmarks Chosen for the Infogram Experiments • Releases, Closures, Manner Change: as marked in TIMIT • Glide, Flap, or /h/ “Pivot Landmark:” • Syllable-initial: at START segment boundary • Syllable-final: at END segment boundary • Intersyllabic: halfway through • Vowel “Pivot Landmark:” • 33% of the distance from START to END

  9. Phoneme Encoding for Infogram Experiments • Encode w/Binary Distinctive Features • /s/ = [+consonantal, -sonorant, +continuant, +strident, +blade, +anterior, -distributed, +stiffvocalfolds, -grave, +fricative] • Feature hierarchy: infogram is based only on syllables for which feature is salient. • Redundant features: partial solution to the problem of context-dependent acoustic implementation.

  10. Distinctive Features in TIMIT Features of a segment are determined based on left, center, and right phones. Once determined, features are attached to any landmarks caused by the center phoneme. [-sonorant] [-continuant] [+strident]

  11. Infograms • Joint probability p(x,d) estimated from TIMIT TRAIN corpus. • Feature takes value d (-1 or +1). • X(t,f), t=0 at landmark, takes value x, x quantized to 23 levels. • Infogram is the mutual information:

  12. Infograms: Manner Features

  13. Infograms: Place Features

  14. Infograms: Vowel Features

  15. Two-Point Infogram • Maximize ID(t1,f1) • Find Joint PMF of d, x, and X(t2,f2)=y • Calculate Two-Point Infogram:

  16. Infograms: One-Point, Two-Point

  17. Conditional Entropy and Average Classification • Infogram ID(t,f )=HD - HD|X(t,f) • a priorientropy HD • min p(d)= f(HD) is classification error probability given NO OBSERVATIONS • conditional entropy HD|X(t,f) • f(HD|X(t,f)) similar to log-average error probability given ONE OBSERVATION at time t, frequency f.

  18. “Average” Classification Error vs. Entropy [strident] a priori: 0.89 bits, p(error)=0.31 [strident] given one measurement: 0.29 bits, p(error)=0.05 [strident] given two measurements: 0.22 bits, p(error)=0.04

  19. From Classification to Recognition • Histogram gives a classification probability bti(di)=p(di|Li, ti, Xi(t,f) ) • di = vector of feature values e.g. di= [ +consonantal, -sonorant, +continuant, ...] • Li = landmark type e.g. release, closure, pivot • ti = landmark time • Xi(t,f) = spectrogram:t =t- ti • Duration probabilities define a transition a(ti,tk )=p( Lkat tk | Liat ti )

  20. Landmark-Synchronous Baum-Welch Algorithm Probability of transcription [L1,d1,L2,d2,] given an observation matrix X:

  21. What’s Different About the LM-Synchronous Baum-Welch? • Traditional: • Time is independent variable: t=1,2,3,... • Phonetic State is dependent variable, governed by transition probabilities: aik • Landmark-synchronous: • Phonetic Landmark is independent variable: Li=L1, L2, L3,... • Time is dependent variable, governed by transition probabilities: a(ti,tk )

  22. Potential Advantages of LM-Synchronous Baum-Welch • Detailed modeling of spectrogram near landmarks • Possibly better ACOUSTIC MODELING. • Explicit timing models • Explicit, efficient, fully integrated recognition of PROSODY. • bt(d) estimated from a long window • Possible use of asynchronous cues, e.g. AUDIO-VISUAL integration.

  23. Conclusions • Speech information is dense near landmarks. • Infogram displays the distribution of information. • One-point spectral information • Two-point information using a greedy algorithm • Recognition may be possible using landmark-synchronous Baum-Welch

More Related