240 likes | 461 Views
Personalized Search Based on User Search Histories. Presented By Ananth Ram Reddy Pesaru. Outline. Introduction Goals Personalization Sources of User Information User Profile Creation Conceptual Rank Final Rank System Architecture Experiments Summary Conclusion References.
E N D
Personalized Search Based on User Search Histories Presented By Ananth Ram Reddy Pesaru
Outline • Introduction • Goals • Personalization • Sources of User Information • User Profile Creation • Conceptual Rank • Final Rank • System Architecture • Experiments • Summary • Conclusion • References
Introduction • Search engines are utilized as referrals to web sites, compared to direct navigation and web links. • User profiles, descriptions of user interests, can be used by search engines to provide personalized search results. • According to an analysis conducted by OneStat(a provider of real-time web analytics, website monitoring and website statistics) most common query length submitted to a search engine (32.6%) is two words long.
Introduction (2) • 77.2% of all queries are three words long or less. This shows how small information is supplied for an ordinary search. • User profiles are built based on activity at the search site itself and the use of these profiles to provide personalized search results are studied. • Common Problems of Search Engines • Ambiguity • Retrieved results are mostly based on web popularity rather than user’s interests
Goals • To show that user profiles can be implicitly created out of short phrases such as queries and snippets collected by the search engine itself. • Improve search accuracy by • Concept based retrieval • Matching user interests
Personalization • Personalization is the process of presenting the right information to the right user at the right moment. • Information can be collected from users in two ways • Explicit Method – Used by commercial systems such as Yahoo! • Examples: Asking users for feedback such as preferences or ratings Disadvantages: - Users often provide inconsistent or incorrect information - The profile built is static whereas the user’s interests may change over time - The construction of the profile places a burden on the user that they may not wish to accept. • Implicit Method - Complex to explore • Examples: Observing user behaviors such as the time spent reading an online document, click through, page scrolling
Personalization (2) • Users can personalize the search explicitly by selecting specific Web sites, the number of Web pages to collect, and the noun phrases used in the final map of results. • Search can make use of personalization in two different ways • Providing tools that help users organizing their past searches, preferences and visited URLs. • Creating and maintaining sets of user’s interests (stored in profiles). • To provide better results, these sets can be used by search engine’s retrieval process
Sources of user Information • Browsing histories of Users • Collect information through desktop robot or have user connect to Internet via a proxy • User Desktops • Contextual retrieval • User Search Histories • The information is made available only to the search engine
User Profile creation User profiles are created by • Collecting information about the interests of users • Categorizing representative texts into concept hierarchies • Utilize Open Directory Project for concepts • Train Classifier (on Training Pages) • Compare representative texts to training texts • Using concept weights to represent user interests
Conceptual rank • Users submit queries to Search Engine. • Top ten results are re-ranked based on original rank and conceptual similarity to the user’s profile. • Search result titles and summaries are classified to create a document profile in the same format as the user profile. • Conceptual Match: • Calculated based on similarity between each result and user profile by using the cosine similarity function • Re-rank results based on the calculated concept match. The rank order produced is called -›Conceptual Rank wtik = weight of conceptk in UserProfilej wtjk = weight of conceptjin DocumentProfilej
Final Rank • The final rank of the document is calculated by combining the conceptual rank with Google’s original rank using the following weighting scheme FinalRank = α * ConceptualRank + (1-α) * GoogleRank • α has a value between 0 and 1. • If α = 0, • conceptual rank is not given any weight, and it is equivalent to the original rank assigned by Google. • If α = 1, • the search engine ranking is ignored and pure conceptual rank is considered. • The conceptual and search engine based rankings can be blended in different proportions by varying the value of α.
System architecture • Google Wrapper • A wrapper built around google using Google’s APIs • Monitors users by maintaining a log of • Submitted queries for which at least one results was visited • Top Ten results retrieved • User selected snippets from retrieved results • Classifier – a conceptual search engine • Used to classify queries and snippets for each user as well as search engine results. • A set of scripts are used to process the log files and evaluate the per-userand overall performance. • The log file is split between users and, for each user, further divided into training and testing sets.
Experiments • No. of volunteers monitored = 6 • No. of queries collected = 609 • Duplicated queries were deleted for each user • Total No. of queries = 282 (47 queries per user) distributed into the following sets • 240 queries were used to train 2 user profiles – Query and snippet based • 30 queries were used for testing personalized search parameters • 12 queries were used for validating the selected parameters
Experiment 1 Profile built by classifying and combining each query (4 concepts from each classified query) Varied No. of queries used to create profile No. of concepts for the user profile Average Google Rank = 4.4 Best Average Conceptual Rank = 2.9 (Using 30 queries to create the profile and 4 concepts from user profile)
Experiment 2 Profile built by classifying and combining each snippet (5 concepts from each classified snippet) Varied No. of snippets used to create profile No. of concepts for the user profile Average Google Rank = 4.4 Best Average Conceptual Rank = 2.9 (Using 30 snippets to create the profile and 20 concepts from user profile)
Experiment 3 Final Rank: Combination of original rank with conceptual rank (Query-based profile) Varied α from 0.0 – 1.0 (In steps of 0.1) Parameters that gave best improvements in Experiment – 1 were considered. Best value when α is 1.0 (Only conceptual rank is applied)
Experiment 4 Final Rank: Combination of original rank with conceptual rank (Snippet-based profile) Varied α from 0.0 – 1.0 (In steps of 0.1) Parameters that gave best improvements in Experiment – 2 were considered. Best value when α is 1.0 (Only conceptual rank is applied)
Comparison of Average Rank for Validation Queries 12 Testing Queries – 2 Per User Query Based Profile - (30 Concepts and 4 Concepts per query) Snippet Based Profile - (30 Snippets and 20 Concepts per snippet)
Summary • User profiles were built based on submitted queries and snippets of user-selected results. • The information available was sufficient to build user profiles that were able to significantly improve personalized rankings. • Query-Based Profile produced an improvement of 37%. • Snippet-Based Profile produced an improvement of 34%.
Conclusion • In order to create personalized search, search engines capture information submitted to their site. • Users do not need to install proxy servers or desktop bots. • Personalized service poses some privacy issues. • There is a need to look at combination of short-term, long term user interests with current task focus.
References • M. Speretta, & S. Gauch,. Personalized search based on user search histories. In Web Intelligence, The 2005 IEEE/WIC/ACM International Conference on (pp. 622-628), Sept 2005. • P.K. Chan, A Non-Invasive Learning Approach to Building Web User Profiles, KDD-99 Workshop on web usage analysis and user profiling, pp. 7 – 12, 1999. • S. Gauch, D. Ravindran, S. Induri, J. Madrid, and S. Chadalavada, Internal Technical Report ITTC-FY2004-TR-8646-37, Information and Telecommunication Technology Center, University of Kansas. • D. Billsus, M.J. Pazzani. A hybrid user model for news story classification. In proceedings of the seventh international conference on User modeling, Banff, Canada, pp. 99 – 108, 1999. • Google APIs. http://www.google.com/apis.