120 likes | 253 Views
Using ODP Metadata to Personalize Search. Presented by Lan Nie 0 9 / 2 1/2005, Lehigh University. Introduction. ODP metadata 4 million sites, 590,000 categories Tree Structure Categories: inner node Pages: leaf node, high quality, representative Using ODP Metadata to personalize Search
E N D
Using ODP Metadata to Personalize Search Presented by Lan Nie 09/21/2005, Lehigh University
Introduction • ODP metadata • 4 million sites, 590,000 categories • Tree Structure • Categories: inner node • Pages: leaf node, high quality, representative • Using ODP Metadata to personalize Search • 4 billion vs. 4 million • Using ODP Metadata for personalized search • Is biasing possible in the ODP context? Extend ODP classifications from its current 4 million to a 4 billion Web automatically by biasing
Using ODP Metadata For Personalized Search • User Profile: several topics from ODP selected by user • Personalized Search • Send Q to a search Engine S(E.g., Google, ODP Search) • Res=URLs returned by S • For i= 1 to size(Res) Dist[i]=Distance(Res[i], Prof) • Resort Res based on Dist • Representation • Both user profile and URL(50% in Google directory) can be represented as a set of nodes in the directory tree • Distance ( Profile, URL) • Minimum distance between the 2 set of nodes.
Naïve Distances Minimum tree distance • Intra-topic links • Subsumer Graph shortest path • Inter-topic links • Complex Distance The bigger the subsumer’s depth is, the more related are the nodes • Combing with Google PageRank Some Google Results are not annotated
Extending ODP Annotations To The Web • Manual annotation for the whole web is impossible • Biasing is an implicit way for extending annotations to the Web • Is basing possible in the ODP context? Are ODP entries good biasing sets to obtain relevant results: generate rankings which are different enough from the non-biased ranking • When does biasing make a difference? Find the characteristics the biasing set has to exhibit in order to obtain relevant results
Experimental Setup • Compare the similarity between top 100 non-biased PageRank results and biased results • Similarity Measure • OSIM: degree of overlap between the top n elements of two rank lists • KSim: degree of agreement on ordering between the two rank lists
Choice of Biasing Sets • Top [0-10]% PageRank pages • Top[0-2]% PageRank pages • Randomly selected pages • Low PageRank pages • Varied the sum of score within the set between 0.000005% and 10% of the total sum over all pages (TOT). • Experiments are done on a crawl of 3 million pages, and then applied on Stanford WebBase crawl.
According to the random model of biasing, every set with TOT below 0.015% is good for biasing. • Results are not influence by the crawl size (3 million crawl vs 120 million WebBase crawl) • Entries in ODP have TOT below than 0.015% thus biasing is possible in the ODP context
Conclusions • A Personalized search algorithm to rank urls based on the distance between user profile and url in the ODP taxonomy. • Biasing on ODP entries will take effect, thus it is feasible to extend the manual ODP classification to the Web is feasible