Independent Components in Text

Independent Components in Text Paper by Thomas Kolenda, Lars Kai Hansne and Sigurdur Sigurdsson Yuan Zhijian

Vector Space Representations • Indexing: Forming a team set of all words occurring in the database. -- Form term set -- Document -- Term-document matrix

Vector Space Representations • Weighting: Determine the values of the weights • Similarity measure: based on inner product of weight vectors or other metrics

LSI-PCA Model • The main objective is to uncover hidden linear relations between histograms, by rotating the vector space basis. • Simplify by taking the k largest singular values

ICA—Noisy Separation • Model: X=AS+U • Assumptions: -- I.I.d. Sources -- I.I.d. and Gaussian noise with variance and -- Source distribution:

ICA—Noisy Separation(cont.) • Known mixing parameters, e.g. A, -- Bayes formula: P(S|X)œ P(X|S)P(S) -- Maximizing it w.r.t.S -- Solution: -- For low noise level

ICA (cont.) • Text representations on the LSI space • Document classification • Key words -- Back projection of documents to the original vector histogram space

ICA (cont.) • Generalisation error -- Principle tool for model selection • Bias-variance dilemma: -- Too few components, leading high error -- Too many components, leading ”overfit”

Examples • MED data set -- 124 abstracts, 5 groups, 1159 terms • Results: -- ICA is successful in recognizing and ”explaining” the group structure.

Examples • CRAN data set -- 5 classes, 138 documents, 1115 terms • Results: -- ICA identified some group structure but not as convincingly as in the MED data

Conclusion • ICA is quite fine • Independence of the sources may or may not be well aligned with a manual labeling

Independent Components in Text

Independent Components in Text

Presentation Transcript

CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis

Independent Components Analysis

Spectral Features for Automatic Text-Independent Speaker Recognition

A Text-Independent Speaker Recognition System

Text Features in Expository Text

Text Independent Speaker Identification Using Gaussian Mixture Model

Independent components analysis of starch deficient pgm mutants

Text Text Text Text Text Text Text Text Text Text Text Text Text Text

Blind Source Separation by Independent Components Analysis

Text Independent Speaker Recognition with Added Noise

Text independent speaker identification in multilingual environments

Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification

Segmental Score Fusion for Text-independent Speaker Verification

Independent Reading: Formats for Responding to Text

Discovering Cyclic Causal Models by Independent Components Analysis

Using Text Components

CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis

Independent Components Analysis

Text independent speaker identification in multilingual environments

GICTG Independent Distributor of Electronic Components