130 likes | 191 Views
CSC2012. Chinese analogy search considering multi-relations. Zhao Lu Department of Computer Science and Technology, East China Normal University, Shanghai, China. CSC2012. Our problem.
E N D
CSC2012 Chinese analogy search considering multi-relations Zhao Lu Department of Computer Science and Technology, East China Normal University, Shanghai, China
CSC2012 Our problem • Latent Relation Search is a recently proposed query-by-example technique that aims at solving queries in which the user specifies a triplet of terms (A,B,C) and seeks from a search engine a fourth term D whose relationship with C is analogous to that of A and B. • For example, HuoQigangand GuoJingjing is a couple. If the name Yao Ming is given, we can find out Yao Ming’s wife Ye Li. • The relation between Yao Ming and Ye Li is highly similar to HuoQigangand GuoJingjing.
CSC2012 Contribution • We propose a hybrid method to represent relations between word-pairs using bag of words and lexical patterns. • We count frequencyand weight of word. • A k-means clustering method is used to extract all the relation words representing different relationships between word pair (A, B).
CSC2012 Three Kinds of Relation Mapping OTO MR OTM
CSC2012 Extracting relation-words Preprocessing Modular 1. Extract the complete sentences containing A and B. 2.Word segmentation and POS tagging.
CSC2012 Extract relation-words by lexical pattern • We count the frequency and weight of each word. • The definition of weight is the times of the word which occurs in a sentence that match a lexical pattern. Table 1:Lexical patterns
CSC2012 Clustering using a k-means method • In order to distinguish the different words on behalf of different relations, we use the k-means clustering to clarify the words into different clusters. • After clustering, we select the word with the highest frequency and weight value as the relation-representing word. • Extracting Target Words in the same way
CSC2012 Experiment evaluations
CSC2012 Experiment Results Fig.3 The relation-word ranks for test cases Fig.2 Percentage of questions which target words at variousrank
CSC2012 MRR and Percentage of Target Words at Different Rank
CSC2012 Conclusion • A Chinese Analogy search method is proposed. • Different relationships between the entities are distinguished by k-means clustering. • Our approach achieves a MRR of 0.773 which is higher than existing methods.
CSC2012 Future work • In the future, we will focus on the way to distinguish the three kinds of relation mapping automatically . • Some method like SVM will be applied to raise the accuracy of extracting relation-words.