290 likes | 452 Views
Automated Suggestions for Miscollocations. Anne Li-E Liu David Wible Nai-lung Tsao. Overview. Introduction Methodology Experimental Results Conclusion. Introduction. Our study focuses on how to find suggestions for miscollocations automatically.
E N D
Automated Suggestions for Miscollocations Anne Li-E Liu David Wible Nai-lung Tsao Automated Suggestions for Miscollocations
Overview • Introduction • Methodology • Experimental Results • Conclusion Automated Suggestions for Miscollocations
Introduction • Our study focuses on how to find suggestions for miscollocations automatically. • In this paper, only verb-noun collocations and miscollocations are considered. Automated Suggestions for Miscollocations
Introduction • Howarth’s (1998) investigation of collocations found in L1 and L2 writers’ writing. • Granger’s analysis on adverb-adjective collocation (1998). • Liu’s (2002) lexical semantic analysis on the verb-noun miscollocations in English Taiwanese Learner Corpus. Automated Suggestions for Miscollocations
Introduction Projects using learner corpora in analyzing and categorizing learner errors: • NICT JLE (Japanese Learner English) Corpus • The Chinese Learner English Corpus (CLEC) • English Taiwan Learner Corpus (or TLC) (Wible et al., 2003). Automated Suggestions for Miscollocations
reduce V collocates from Collocation Explorer An example • She tries to improve her students’ problems. Automated Suggestions for Miscollocations
Method • Three features of collocate candidates are used: 1. Word association strength, 2. Semantic similarity 3. Intercollocability (Cowie and Howarth, 1996). Automated Suggestions for Miscollocations
Resource • 84 VN miscollocations in TLC (Liu, 2002). Training data: 42 Testing data: 42 • Two knowledge resources: BNC, WordNet • Two human evaluators. Automated Suggestions for Miscollocations
Word Association Strength • Mutual Information (Church et al. 1991) • Two purposes: • All suggested correct collocations have to be identified as collocations. • The higher the word association strength the more likely it is to be a correct substitute for the wrong collocate. Automated Suggestions for Miscollocations
Synonymous relation Hypernymy relation Semantic Similarity • A semantic relation holds between a miscollocate and its correct counterpart (Gitsaki et al., 2000; Liu 2002) • The synsets of WordNet to be nodes in a graph. measure graph-theoretic distance *say a story tell a story think of a story *say a story Automated Suggestions for Miscollocations
Semantic Similarity Automated Suggestions for Miscollocations
convey message get across point express concern communicate feeling Intercollocability • Cowie and Howarth (1996) propose that certain collocations form clusters on the basis of the shared meaning. convey point get across the message communicate concern convey feeling express concern Automated Suggestions for Miscollocations
convey message get across point express concern communicate feeling Intercollocability • Collocations in a cluster show a certain degree of intercollocability. ? condolences express one’s concern express communicate concern feeling Automated Suggestions for Miscollocations
Does any of the 86 verbs co-occur with the 52 nouns? reduce/ improve + quality + efficiency + effectiveness Intercollocability She tries to *improve her students’ problems. *improve problem Starting point. problem 86 verb collocates improve 52 noun collocates problem problem resolve/ improve resolve reduce + situation + matter + way Automated Suggestions for Miscollocations
problem improve Intercollocability • The cluster is partially created and the link between improve, resolve and reduce is developed by virtue of the overlapping noun collocates. situation matter problem way quality efficiency effectiveness situation matter problem way resolve reduce Automated Suggestions for Miscollocations
Intercollocability Quantify intercollocability The number of shared collocates Automated Suggestions for Miscollocations
problem improve situation matter problem way quality efficiency effectiveness situation matter problem way resolve shared collocate (resolve, improve) = 3 shared collocate (reduce, improve) = 3 The more shared collocates a verb has with the wrong verb, the more likely this verb is a good candidate reduce Automated Suggestions for Miscollocations
Integrate the 3 features • The probabilistic model Automated Suggestions for Miscollocations
Training • Probability distribution of word association strength MI value to 5 levels (<1.5, 1.5~3.0, 3.0~4.5, 4.5~6, >6) P( MI level ) P(MI level | Sc) Automated Suggestions for Miscollocations
Training • Probability distribution of semantic similarity Similarity score to 5 levels (0.0~0.2, 0.2~0.4, 0.4~0.6, 0.6~0.8 and 0.8 ~1.0 ) P(SS level ) P(SS level | Sc) Automated Suggestions for Miscollocations
Training • Probability distribution of intercollocability Normalized shared collocates number to 5 levels (0.0~0.2, 0.2~0.4, 0.4~0.6, 0.6~0.8 and 0.8 ~1.0 ) P(SC level ) P(SC level | Sc) Automated Suggestions for Miscollocations
Experiments • Different combinations of the three features. Automated Suggestions for Miscollocations
Results Automated Suggestions for Miscollocations
Results (cont.) Automated Suggestions for Miscollocations
Conclusion • A probabilistic model to integrate features. • The early experimental result shows the potential of this research. Automated Suggestions for Miscollocations
Future works • Applying such mechanisms to other types of miscollocations. • Miscollocation detection will be one of the main points of this research. • A larger amount of miscollocations should be included in order to verify our approach. Automated Suggestions for Miscollocations
Thank you! Q & A Anne Li-E Liu lel29@cam.ac.uk David Wible wible45@yahoo.com Nai-Lung Tsao beaktsao@gmail.com Automated Suggestions for Miscollocations