340 likes | 480 Views
Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling. Jerome R. Bellegarda. Outline. Introduction LSM Applications Conclusions. Introduction. LSA in IR: Words of queries and documents Recall and precision
E N D
Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda
Outline • Introduction • LSM • Applications • Conclusions
Introduction • LSA in IR: • Words of queries and documents • Recall and precision • Assumption: There is some underlying latent semantic structure in the data • Latent structure is conveyed by correlation patterns • Documents: bag-of-words model • LSA improves separability among different topics
Introduction • Success of LSA: • Word clustering • Document clustering • Language modeling • Automated call routing • Semantic Inference for spoken interface control • These solutions all leverage LSA’s ability to expose global relationships in context and meaning
Introduction • Three unique factors for LSA: • The mapping of discrete entries • The dimensionality reduction • The intrinsically global outlook • The change of terminology to latent semantic mapping (LSM) to convey increased reliance on the general properties
Latent Semantic Mapping • LSA defines a mapping between the discrete sets • M: an inventory of M individual units, such as words • N: an collection of N meaningful compositions of units, such as documents • L: a continuous vector space • ri: unit in M • cj: composition in N
Feature Extraction • Construction of a matrix W of co-occurrences between units and compositions • The cell of W:
Feature Extraction • The entropy of ri: • Value of Entropy Close to 0 means that the unit is present only in a few specific compositions. • The global weight is therefore a measure of the indexing power of the unit ri
Singular Value Decomposition • The MxN unit-composition matrix W defines two vector representations for the units and the compositions • ri: a row factor of dimension N • cj: a column factor of dimension M • Unpractical: • M,N can be extremely large • Vector ri, cj are typically sparse • Two spaces are distinct from each other
Singular Value Decomposition • Employ SVD: • U: MxR left singular matrix with row vectors ui • S: RxR diagonal matrix of singular values • V: NxR right singular matrix with row vector vj • U, V are column-orthonormal • UTU=VTV=IR • R<min(M, N)
Singular Value Decomposition • captures the major structural associations in and ignores higher order effects • The closeness of vector in L: • Unit-unit comparison • Composition-composition comparison • Unit-Composition comparison
Closeness Measure • WWT: co-occurrences between units • WTW: co-occurrences between compositions • ri, rj: units which have similar pattern of occurrence across the composition • ci, cj: compositions which have similar pattern of occurrence across the unit
Closeness Measure • Unit-Unit Comparisons: • Cosine measure: • Distance: [0, π]
Closeness Measure • Composition-Composition Comparisons: • Cosine measure: • Distance: [0, π]
Closeness Measure • Unit-Composition Comparisons: • Cosine measure: • Distance: [0, π]
LSM Framework Extension • Observe a new composition , p>N, the tilde symbol reflects the fact that the composition was not part of the original N • , a column vector of dimension M, can be thought of as an additional column of the matrix W • U, S do not change:
LSM Framework Extension : pseudo-composition : pseudo-composition vector • If the addition of causes the major structural associations in W to shift in some substantial manner, the singular vectors will become inadequate.
LSM Framework Extension • It would be necessary to re-compute SVD to find a proper representation for
Salient Characteristics of LSM • A single vector embedding for both units and compositions in the same continuous vector space L • A relatively low dimensionality, which make operations such as clustering meaningful and practical • An underlying structure reflecting globally meaningful relationships, with natural similarity metrics to measure the distance between units, between compositions or between units and compositions in L
Applications • Semantic classification • Multi-span language modeling • Junk e-mail filtering • Pronunciation modeling • TTS Unit Selection
Semantic Classification • Semantic classification refers to determine which one of predefined topic a given document is most closely aligned with • The centroid of each clusters can be viewed as the semantic representation of this outcome in LSM space • Semantic anchor • A newly observed word sequence measures by computing the distance between the document and semantic anchor, and pick minimum
Semantic Classification • Domain knowledge is automatically encapsulated in the LSM space in a data-driven fashion • For Desktop interface control: • Semantic inference
Multi-Span Language Modeling • In a standard n-gram , the history is string • In LSM language modeling, the history is the current document up to word • Pseudo-document: • Continually updated as q increases
Multi-Span Language Modeling • An Integrated n-gram + LSM formulation for the overall language model probability: • Different syntactic constructs can be used to carry the same meaning (content words)
Multi-Span Language Modeling Assume that the probability of the document History given the current word is not affected by immediate context preceding it
Junk E-mail Filtering • It can be viewed as a degenerate case of semantic classification (two categories) • Legitimate • Junk • M: an inventory of words, symbols • N: a binary collection of email messages • Two semantic anchors
Pronunciation Modeling • Also called grapheme-to-phoneme conversion (GPC) • Orthographic anchors • (one for each in-vocabulary word) • Orthographic neighborhood • In-vocabulary word with High closeness for out-vocabulary word
Conclusions • Descriptive Power • Forgoing local constraints is not acceptable in some situations • Domain Sensitivity • Depend on the quality of the training data • polysemy • Updating the LSM Space • SVD on the fly is not practical • Success of LSM for three characteristics