180 likes | 299 Views
Ranking Definitions with Supervised Learning Methods. J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu. Motivation. People may need to find definitions of terms from Web. Traditional information retrieval is designed to search for relevant document, not suitable for this.
E N D
Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu
Motivation • People may need to find definitions of terms from Web. • Traditional information retrieval is designed to search for relevant document, not suitable for this. • Google’s definition search may suffer from relying on glossary pages and ranking in alphabetic order.
Task for definition search • Receive a query term, usually a noun. • Extract definition candidates from the document collection. • Rank the candidates according to the degree to which each one is good. • Output the result.
Three categories of definitions • Good: must contain the general notion of the term and several important properties. • Bad: neither describes the general notion nor the properties of the term. • Indifferent: between good and bad.
First step: collecting candidates • Parse all sentences with a Base NP (base noun phrase) parser and identify <term> with • <term> is the first Base NP of the first sentence. • Two Base NPs separated by “of” or “for” are considered as <term> • Extract definition candidates with patterns: • <term> is a|an|the * • <term>, *, a,|an|the * • <term> is one of *
Second step: Ranking candidates • Ranking based on Ordinal Regression (ordinal classification). • Ranking SVM is used. • Ranking based on classification • SVM is used.
Ranking based on Ordinal Regression • Ordinal regression is a problem in which the classifiers classifies instances into a number of ordered categories. • Ranking SVM is used as the model. • For each candidate x, • U(x)=wTx, where w represents a vector of weights. • The higher of U(x), the better x is as a definition
Ranking based on Classification • Only good and bad definitions are used. It is a binary classification. • SVM is used as the model. • F(x)= wTx+b
Removing redundant candidates • After ranking, duplicate definition may exist. • Use Edit distance to remove the one with a lower ranking score.
Conclusions • Address the issue of searching for definitions by definition ranking. • Results are better than traditional IR. • Enterprise search system has been developed. • Not limited to search of definitions.