Proposal: Continue exploring factored tandem models and prosody

Proposal: Continue exploring factored tandem models and prosody Arthur Kantor

phoneState phoneState PLPs dg1 pl1 rd log MLP outputs, concatenated + KLT . . . log outputs of separate MLPs PLPs Factored tandem observations tandem • Goals: • Find appropriate weights for dg1, pl1, rd, PLP … to optimize word error rate • Find some clustering to optimize word error rate factored tandem

Weight tuning in detail • Explore the interaction between observation stream weights and the language model • As the stream weights increase, the language model weight should also be increased • the increases are not proportional • As the observation becomes more factored, there are more mixture weights, but there is less ability to represent correlation • Separate the effects of factoring the observations from increasing the number of weight parameters • Can be tested by keeping the mixture weights constant in all the factors

phoneState phoneState PLPs dg1 pl1 rd log MLP outputs, concatenated + KLT . . . log outputs of separate MLPs PLPs phoneState semi factored dg1+pl1 rd . . . PLPs Observation factoring in detail • There is a range of partially factored observation models unfactored fully factored How to cluster?

Add a PROSODY featureto feature – based models • PROSIDY feature takes on 4 values: • Onset • reduced nucleus • regular nucleus • coda • Prosody combined with Lips Tongue and Glottis uniquely specifies all of the phones used in our phone-based models • This allows for a more fair comparison with the phone based features • Goals: • repeat the workshop experiments with the added PROSODY feature • Explore higher-level prosodic structure, such as phrasal stress and prosodic phrase boundaries

Questions

Proposal • Continue to explore • Tandem observation factoring • Feature substitution in the pronunciation model • Prosody

Feature substitution in the pronunciation model • Feature-based pronunciation modeling is promising • Goal: Make use of this in a speech recognizer

word word ind1 ind1 U1 U1 sync1,2 sync1,2 ind2 ind2 sync2,3 sync2,3 U2 U2 ind3 ind3 U3 U3 Obs Obs Asynchrony between underlying (dictionary) feature values

word word ind1 ind1 U1 U1 sync1,2 sync1,2 S1 S1 ind2 ind2 sync2,3 sync2,3 U2 U2 S2 S2 ind3 ind3 U3 S3 S3 U3 Obs Obs Asynchrony with feature substitution

Proposal: Continue exploring factored tandem models and prosody