330 likes | 508 Views
Recognition of Prosodic Factors and Detection of Landmarks for Improvements to Continuous Speech Recognition. Sarah Borys. Motivation. This research attempts to address some of the problems in modern speech recognition through the use of prosody and acoustic landmarks. Speech Recognition.
E N D
Recognition of Prosodic Factors and Detection of Landmarks for Improvements to Continuous Speech Recognition Sarah Borys
Motivation • This research attempts to address some of the problems in modern speech recognition through the use of prosody and acoustic landmarks.
Speech Recognition problems associated with the proposal
5800000 6800000 seventy 6800000 13000000 six 5800000 6800000 s 6800000 7600000 eh 7600000 7900000 v 7900000 8300000 en 8300000 8600000 t 8600000 9300000 iy 9300000 10700000 s 10700000 11900000 ih 11900000 12200000 k 12200000 13000000 s Word and Phone Transcriptions
Phonemes • Vowels aa ae iy ow ey uw • Consonants p t k b d g s sh z th n l
Prosody • Changes word meaning. • Changes the way phones sound
Prosody • Wanted. Chief justice of the Massachusetts supreme court.
Prosody • Wanted. Chief justice of the Massachusetts supreme court • Intonation - [Wanted].[Chief justice of the Massachusetts supreme court].
Prosody • Wanted. Chief justice of the Massachusetts supreme court • Intonation - [Wanted].[Chief justice of the Massachusetts supreme court]. • Pitch Accent - Wanted. Chief justice of the Massachusetts supreme court.
Prosody • Wanted. Chief justice of the Massachusetts supreme court • Intonation - [Wanted].[Chief justice of the Massachusetts supreme court]. • Pitch Accent - Wanted. Chief justice of the Massachusetts supreme court. • Function - Wanted. Chief justice of the Massachusetts supreme court.
Prosody • Wanted. Chief justice of the Massachusetts supreme court • Intonation - [Wanted].[Chief justice of the Massachusetts supreme court]. • Pitch Accent - Wanted. Chief justice of the Massachusetts supreme court. • Function - Wanted. Chief justice of the Massachusetts supreme court. • Co-Articulation - Wanted. Chief justice of the Massachusetts supreme court.
Hidden Markov Models a11 a22 a33 1 2 3 a12 a23 a13
Experiments • Phone splitting • Landmark detection
Phone Splitting • Determine the log likelihoods of phonetic and prosodic splits • Compare the log likelihoods
Prosodic Contexts • Independent • Phrase final • Phrase initial • Accent • Function
Phonetic Contexts • Left fricative • Right fricative • Left stop • Right stop • Left nasal • Right nasal • Left liquid • Right liquid • Left vowel • Right vowel f ow n
Log Likelihood Independent Log Likelihood = log p(O| λ1) O = [o1, o2, … oN] (short time spectra) λ = model parameters Context Log Likelihood = log [p([o1…oM] | λ2)p([oM+1…oN] | λ3)]
Post Processing • Example: The phone “ow”
Post Processing • Calculate the weighted average as follows: WA = 98(-832.69507) + 17(-1315.40808) 115 115
Post Processing • If WA > LL; then split into prosodic allophones • If WA ≤ LL; then do not split into prosodic allophones -904.05264 > -986.11096
Results • Almost all phonetic context splits result in log likelihood improvements for all phones. • Prosodic splits also cause the log likelihood to improve.
Experiments • Phone splitting • Landmark detection
Distinctive Features • b [-sonorant, -continuant, +lips] • s [-sonorant, +continuant, +blade] • m [+sonorant, -continuant, +lips] • y [+sonorant, +continuant, +blade] • aa [+sonorant, +continuant, +syllabic, +low, -front]
Distinctive Features and Landmarks • In order to correctly detect and recognize speech, the different manner features must be correctly recognized and classified. • In order to classify manner features correctly, landmarks need to be correctly identified.
Landmarks problems associated with the proposal
Distinctive Features and Landmarks • Speech • Consonantal • Continuant • Sonorant • Syllabic
Support Vector Machines • Given a set of N observations y1 – yN • Each yi is associated with some xi€ Rn for 1 ≤ i ≤ N • Observation sets in the form of (xi, yi) are drawn from an unknown probability distribution P(xi, yi) • Is there an optimal function F, that can learn the mapping xi yi?
Conclusions • The incorporation of prosody into the phoneme model leads to significant improvements in speech recognition rates. • Landmarks are detectable with high accuracy and have the potential to improve recognition rates.
Future Work • Prosody and conversational speech • Prosody and landmark detection • Landmark detection and continuous speech recognition