120 likes | 243 Views
Knowledge-Free Induction of Morphology Using Latent Semantic Analysis (Patric Schone and Daniel Jurafsky). Danny Shacham Yehoyariv Louck. Presentation Outlines. The problem Previous solutions The proposed approach Advantages The Technique Evaluation Criteria The Results. The Problem.
E N D
Knowledge-Free Induction of Morphology Using Latent Semantic Analysis(Patric Schone and Daniel Jurafsky) Danny Shacham Yehoyariv Louck
Presentation Outlines • The problem • Previous solutions • The proposed approach • Advantages • The Technique • Evaluation Criteria • The Results
The Problem • The main problem this research is trying to solve is: How to automatically induce morphological relationships between words • The importance of the problem arises from the field of morphological analyzers and the growing need to build them without human knowledge.
Previous Solutions • Existing induction approaches relies on statistics of hypothesized stems and affixes to choose which affixes are legitimate. • relying on statistics rather than on semantic knowledge may lead to induction errors. • the three main algorithms today are: • D’eJean (1998) • Goldsmith (1997) • Gaussier (1999)
The proposed approach - advantages • This paper introduce a semantic-based algorithm which only proposes affixes when they are sufficiently similar semantically. • Using semantic similarity may resolve some of the problems introduced earlier. • The proposed solution is knowledge free. • The proposed solution could be applied to any inflectional language.
The proposed approach – The Technique • The algorithm consists of 4 stages: • Identifying potential affixes • Finding pairs of words that are possibly morphological variants • Developing semantic vectors for each word • Selecting variants that has similar semantic vectors ( similar semantic meaning)
The Technique – Stage 1 • The selection of candidate affixes is done using the p-similarity technique ( like Gaussier ). • The method inserts words into a trie and extracting affixes by looking at the nodes in the trie where there are branches. • Only the k most frequent affixes are selected. (k usually 200)
The Technique – Stage 2 • Identifying rules– a pair of candidate affixes that descend from a common ancestor node. • Defining PPMV ( pair of potential morphological variants) - two words sharing the same root and the same affix rule. • Defining ruleset - a ruleset of a given rule is the set of all PPMV that have the rule in common. • Building a rulesets for every rule extracted from the data.
The Technique – Stage 3 • Building a term-term matrix ( of size Nx2N) which identify local semantic information. • Applying SVD (singular value decomposition) on the term-term matrix. • Using the SVD results ( U , D , V) building a semantic vector for each word.
The Technique – Stage 4 • For each pair of word we wish to check. We take the two word’s semantic vectors and perform NCS (normalized cosine score). • By considering NCS for all word pairs under a particular rule we determine which PPMV are legitimate.
The proposed approach - Evaluation Criteria • The algorithm is compared to Goldsmith’s Linguistica (2000) by using CELEX and a scoring mechanism. • The scoring mechanism uses conflation sets and the summation of correct, inserted and deleted words in the conflation sets in comparison to CELEX conflation sets.
The proposed approach –The Results • The results suggest that semantics and LSA can play a key part in knowledge free morphology induction. • The results show that the semantic only approach shown in this article rival any current state of the art system.