1 / 39

The Hidden Vector State Language Model

The Hidden Vector State Language Model. Vidura Senevitratne, Steve Young Cambridge University Engineering Department. Reference. Young, S. J., “The Hidden Vector State language model”, Tech. Report CUED/F-INFENG/TR.467, Cambridge University Engineering Department, 2003.

goro
Download Presentation

The Hidden Vector State Language Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department

  2. Reference • Young, S. J., “The Hidden Vector State language model”, Tech. Report CUED/F-INFENG/TR.467, Cambridge University Engineering Department, 2003. • He, Y. and Young S.J., “Hidden Vector State Model for hierarchical semantic parsing”, In Proc. of the ICASSP, Hong Kong, 2003. • Fine, S., Singer Y., and Tishby N., “The Hierarchical Hidden Markov Model: Analysis and applications”, Machine Learning 32(1): 41-62, 1998.

  3. Outline • Introduction • HVS Model • Experiments • Conclusion

  4. Introduction • Language model: • Issue of data sparseness, inability to capture long distance dependencies and model the nested structural information • Class-based language model • POS tag information • Structured language model • Syntactic information

  5. Hierarchical Hidden Markov Model • HHMM is structured multi-level stochastic process. • Each state is an HHMM • Internal state: hidden state that do not emit observable symbols directly • Production state: leaf state • States of HMM are production states of HHMM.

  6. HHMM (cont.) • Parameters of HHMM:

  7. HHMM (cont.) • Transition probability: horizontal • Initial probability: vertical • Observation probability:

  8. HHMM (cont.) • Current node is root: • Choose child according to initial probability • Child is production state: • Produce an observation • Transit within the same level • When it reaches end-state, back to parent of end-state • Child is internal state: • Choose child • Wait until control is back from children • Transit within the same level • When it reaches end-state, back to parent of end-state

  9. HHMM (cont.)

  10. HHMM (cont.) • Other application: trend of stocks (IDEAL 2004)

  11. Hidden Vector State Model

  12. Hidden Vector State Model (cont.) The semantic information relating to any single word can be stored as a vector of semantic tag names

  13. Hidden Vector State Model (cont.) • If state transitions were unconstrained • Fully HHMM • Transitions between states can be factored into a stack shift: two stage, pop, push • Stack size is limited, # of new concept to be pushed is limited to one • More efficient

  14. Hidden Vector State Model (cont.) • The joint probability is defined:

  15. Hidden Vector State Model (cont.) • Approximation (assumption): • So,

  16. Hidden Vector State Model (cont.) • Generative process associated with this constrained version of HVS models consists of three step for each position t: 1. choose a value for nt 2. Select preterminal concept tag ct[1] 3. Select a word wt

  17. Hidden Vector State Model (cont.) • It is reasonable to ask an application designer to provide examples of utterances which would yield each type of semantic schema. • It is not reasonable to require utterances with manually transcribed parse trees. • Assume abstract semantic annotations and availability of a set of domain specific lexical classes.

  18. Hidden Vector State Model (cont.) Abstract semantic annotations: • show me flights arriving in X at T. • List flights arriving around T in X. • Which flight reaches X before T. = FLIGHT(TOLOC(CITY(X),TIME_RELATIVE(TIME(T)))) Class set: CITY: Boston, New York, Denver…

  19. Experiments Experimental Setup Training set: ATIS-2, ATIS-3 Test set: ATIS-3 NOV93, DEC94 Baseline: FST (Finite Semantic Tagger) GT for FST, Witten-Bell for HVS Show me flights from Boston to New York Goal: FLIGHT Slots: FROMLOC.CITY = Boston TOLOC.CITY = New York

  20. Experiments

  21. Experiments Dash line: goal detection accuracy, Solid line: F-measure

  22. Conclusion • The key features of HVS model • Its ability for representing hierarchical information in a constrained way • Its capability for training directly from target semantics without explicit word-level annotation.

  23. HVS Language Model • The basic HVS model is a regular HMM in which each state encodes history in a fixed dimension stack-like structure. • Each state consists of a stack where each element of the stack is a label chosen from a finite set of cardinality M+1: C={c1,…,cM,c#} • A D depth HVS model state can be characterized by a vector of dimension D with most recently pushed element at index 1 and the oldest at index D

  24. HVS Language Model (cont.)

  25. HVS Language Model (cont.) • Each HVS model state transition is restricted: (i) exactly nt class label are popped off the stack (ii) exactly one new class label ct is pushed into the stack • The number of elements to pop nt and the choice of new class label to push ct are determined:

  26. HVS Language Model (cont.)

  27. HVS Language Model (cont.) • nt is conditioned on all the class labels that are in the stack at t-1 but ct is conditioned only on the class labels that remain on the stack after the pop operation • Former distribution can encode embedding, whereas the latter focuses on modeling long-range dependencies.

  28. HVS Language Model (cont.) • Joint probability: • Assumption:

  29. HVS Language Model (cont.) • Training: EM algorithm • C,N: latent data, W: observed data • E-step:

  30. HVS Language Model (cont.) • M-Step: • Q function (auxiliary): • Substituting P(W,C,N|λ)

  31. HVS Language Model (cont.) • Calculate probability distributions separately.

  32. HVS Language Model (cont.) • State space S, if fully populated: • |S|=MD states, for M=100+, D=3 to 4 • Due to data sparseness, backoff is needed.

  33. HVS Language Model (cont.) • Backoff weight: • Modified version of absolute discounting

  34. Experiments • Training set: • ATIS-3,276K words, 23K sentences. • Development set: • ATIS -3 Nov93 • Test set : • ATIS-3 Dec94, 10K words, 1K sentences. • OOV were removed • k=850

  35. Experiments (cont.)

  36. Experiments (cont.)

  37. Conclusion • The HVS language model is able to make better use of context than standard class n-gram models. • HVS model is trainable using EM.

  38. Class tree for implementation

  39. Iteration number vs. perplexity

More Related