170 likes | 178 Views
This research paper proposes a solution to the problem of search engines returning too many citations for person name queries. The solution involves grouping the citations by person using attributes, links, and page similarity. The confidence matrix for each facet is combined to create a final confidence matrix, from which grouping is determined using the Stanford Certainty Measure. Precision and recall measurements are used to evaluate the effectiveness of the solution.
E N D
Grouping Search-Engine Returned Citations for Person Name Queries Reema Al-Kamha Research Supported by NSF
The Problem • Search engines return too many citations • Example: “Christopher Young” • Google returns around 26,500 citations • Many people named “Christopher Young” • It would help to group the citations by person. • How do we group them?
Our Solution • Three facets • Attributes • Links • Page Similarity • Confidence matrix for each facet • Final confidence matrix
Attributes Email Address, Phone, City, State, Zip Code.
Confidence Matrix for Attributes Facet D1&D5 have the same State. D1&D9 have the same State. D4&D9 have the same City.
Links • Returned citations that have a same host www.cs.byu.edu/info/dwembley.html www.cs.byu.edu/info/directory.php • One returned citation links to another returned citation.
Confidence Matrix for Links Facet D1 D0 D5 D0
Page Similarity • Similarity between two documents to which the two returned citations link • The number of shared pairs of adjacent capitalized words
Final Matrix • Combine the confidence matrices using Stanford Certainty Measure. • For Example: D1, D5 • Confidence value for the attribute facet is 0.49 • Confidence value for the link facet is 0 • Confidence value for the link facet is 0.95 • Confidence value between D1, D5 is 0.49+0.95- 0.49*0.95 = 0.97
Final Matrix and Grouping Method {D0,D1}, {D0,D5}, {D1,D4}, {D1,D5}, {D1,D8}, {D1,D9}, {D4,D5}, {D4,D8}, {D4,D9}, {D5,D8}, {D5,D9}, {D8,D9} {D0,D1,D4,D5,D8,D9}, {D2}, {D3}, {D6}, {D7}
Recall and Precision • Assume we get:{0,1,3} {2,4} {5} • The correct grouping is: {0,1,2,3} {4,5} • We get:(0,1) (0,3) (1,3) (2,4) • The correct group gives: (0,1) (0,2) (0,3) (1,2) (1,3) (2,3) (4,5) • R=3/7 , P=3/(3+1)
Split and Merge • Assume we get:{0,1,3} {2,7,4} {5} {6} • The correct grouping is: {0,1,3,5,6} {2,7} {4} • Merge: 1/8 +1/8 = 2/8 • Split: 1/8
Measurements • Precision and Recall • R=89% , P=96.6% • Weighted Merge and Split • M=0.036 , S=0.008
Contributions • Grouped person-name queries by person • Provided an additional tool for search engine queries