170 likes | 282 Views
Partitioning Search-Engine Returned Citations for Proper-Noun Queries. Reema Al-Kamha. Supported by NSF. The Problem. Search engines return too many citations Example: “Bonnie Lake” Google returns around 800 citations Citations ranked best first Many refer to the same object
E N D
Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha Supported by NSF
The Problem • Search engines return too many citations • Example: “Bonnie Lake” • Google returns around 800 citations • Citations ranked best first • Many refer to the same object • Can we partition by same object? • Proper Noun Queries • Discard citations not of the right kind • Partition the rest by same object • Retain the best-first ranking
Solution • Classification • Group 1: those of the chosen kind • Group 2: those not of the chosen kind • Partition • Three facets • Attributes • Links • Page Similarity • Sub-facets for each facet • Confidence Matrix for each sub-facet • (Weighted) Mean for each facet • Final Confidence Matrix
Attributes • Attribute(s) (One-to-One) Latitude and longitude • Single Attribute (Functional Determination) Province with a lake’s name • Multiple Attributes (Functional Determination) Campground name and highway with a lake’s name • Attributes (Nonfunctional Determination) Country with a lake’s name • Distinguishing Attribute State for a lake
Links • Returned citations that link together • Returned citations that have a common URL prefix: same Host, same File name, and same URL. example of Host: http://www.cs.byu.edu/info/dwembley.html http://www.cs.byu.edu/info/directory.php example of File: http://sunsite.unc.edu/javafaq/oldnews.html http://helios.oit.unc.edu/javafaq/oldnews.html
Confidence Matrix for Returned Citations that Link Together 1 4
Page Similarity • Similarity between each two returned citations • Similarity between two citations-referenced documents
Confidence Matrix for Similarity between two Citation-Referenced Documents
Modified Confidence Matrix for Similarity between two Citation-Referenced Documents
1,4 3,5 5,8 7,8 Final Matrix {1,4} {3,5,7,8} {2} {6}
Measurements • Classification ( Percent correctly classified) • Number of Partitions (Precision and Recall) • Each Partition (Precision and Recall)
Current Implementation Status • Interface • Google connection • Citations retrieval • Page retrieval
Contribution • Solve one type of object-identity problem • Provide an additional tool for search engine queries