30 likes | 188 Views
OOV OOV spotting OOV OOV OOV [f r ah m] OOV spotting [t uw] OOV OOV [f r ah m] [w er d] spotting [t uw] OOV [m aa d el ih ng] From Word Spotting to OOV Modeling. Goal To automatically extract filler vocabulary for word-spotting Why? So language model has something to work with
E N D
OOV OOVspottingOOV OOV OOV [f r ah m]OOVspotting[t uw]OOV OOV [f r ah m] [w er d]spotting[t uw]OOV[m aa d el ih ng] From Word Spotting to OOV Modeling • Goal • To automatically extract filler vocabulary for word-spotting • Why? • So language model has something to work with • May improve recognition accuracy on keywords • Gives earlier payoff in domain-specific training • Scenario • Start with small lexicon (e.g. 5-50 words) • Start with weak language model • Bootstrap by clustering filler vocabulary from large collection of untranscribed data Paul Fitzpatrick (6345g11)
Run recognizer Extract OOV fragments Identify rarely-used additions Identify competition Add to lexicon Remove from lexicon Update lexicon, baseforms Methodology Hypothesized transcript N-Best hypotheses Update Language Model
Results • Initial lexicon • email, phone, room, office, address • Top 10 OOV clusters found (ranked by frequency) • 1. n ah m b er 6. p l iy z • 2. w eh r ih z 7. ae ng k y uw • 3. w ah t ih z 8. n ow • 4. t eh l m iy 9. hh aw ax b aw • 5. k ix n y uw 10. g r uw p • Example sentence hypothesis • (w ah t ih z) (ih t er z uw) room (n ah m b er) • What is Victor Zue’s room number?