460 likes | 681 Views
NLP Course Seminar. WEB PERSONALIZATION. Group 14 Vishaal Jatav (04d05013) Varun Garg (04d05015) . Roadmap. Motivation Introduction The Personalization Process Personalization Approaches Personalization Techniques Issues Conclusion. Motivation. Some Facts
E N D
NLP Course Seminar WEB PERSONALIZATION Group 14Vishaal Jatav (04d05013)Varun Garg (04d05015)
Roadmap Motivation Introduction The Personalization Process Personalization Approaches Personalization Techniques Issues Conclusion
Motivation Some Facts Overwhelming amount of information on web Not all the documents are relevant to the user Users cannot convey their information needs Users never find any document 100% relevant Users expect more personal behavior I don't want results of Delhi when I am in Bombay. I was looking for crane (the bird) not crane (the machine).
Introduction Personalization React differently to different users System reacts in a way the users want it to Ultimately bring back the user to the system Web Personalization Apply machine learning and data mining Build models of user behavior (called profiles) Predict user's needs and expectations Adaptively estimate better models
The Personalization Process Consider the following pieces of information Geographical Location Age, gender, ethnicity, religion, etc. Interests Previous reviews on products ...... How could these pieces of information help? How to collect these information?
The Personalization Process(Contd...) Collect lots of information on the user behavior Information must be attributable to a single user Decide on a user model Featuring user needs, lifestyle, situations, etc. Create user profile for each user of the system Profile captures the individuality of the user Habits, browsing behavior, lifestyle, etc. With every interaction, modify the user profile
The Personalization Process More Formally Web is a collection of n items I = {i1,i2,....in} User comes from a set U = {u1,u2,...um} User has rated each item by ruk : I→ [0,1] U ! where, ij = ! means ij is not rated by the user Ik(u) is set of items not yet rated by user uk Ik(r) is set of items rated by user uk GOAL: recommend items ij to user ua that are present in Ia(u), which might be of his interest
Classification of Personalization Approaches Individual Vs Collaborative Reactive Vs Proactive User Vs Item Information
Classification of Personalization ApproachesIndividual Vs Collaborative Individual approach (Google Personalized Search) Use only individual user's data Generate user profile by analyzing User's browsing behavior User's active feedback on the system Advantage Can be implemented on the client-side - no privacy violation Disadvantage Based only on past interactions – lack of serendipity
Classification of Personalization ApproachesIndividual Vs CollaborativeContd... Collaborative approach (Amazon recommendations) Find the neighborhood of the active user React according to an assumption If A is like B, then B likes the same things as A likes Disadvantages New item rating problem New user problem Advantage Better than individual approach - Once the two problems are solved.
Classification of Personalization ApproachesReactive Vs Proactive Reactive approach Explicitly ask user for preferences Either in the form of query or feedback Proactive approach Learn user preferences by user behavior No explicit preference demand from the user Behavior is extracted Click-through rates Navigational pattern
Classification of Personalization ApproachesUser Vs Item Information User Information Geographic location (from IP address) age, gender, marital status, etc (explicit query) Lifestyle, etc. (inference from past behavior) Item Information Content of Topics – movie genre, etc. Product/ domain ontology
Personalization Techniques Content-Based Filtering Collaborative Filtering Model Based Personalization Rule based Graph theoretic Language Model
Content-Based Filtering Syskill and Webert use explicit feedback Individual, Reactive, Item-information Uses naïve Bayes to distinguish likes from dislikes Initial probabilities updated with new interactions Uses 128 most informative words from each item Letizia uses implicit feedback Individual, Proactive, Item-information Find likes/dislikes based on tf-idf similarity Others use nearest-neighborhood for similarity
Collaborative Filtering Found successful in recommendation systems General Technique For every user, a user neighborhood is computed Neighborhood contains users who have rated several items almost equally Get candidate items for recommendations Items seen by the neighborhood but not by active user ua Data is stored in the form of a rating matrix Items as rows and users as columns
Collaborative FilteringContd.... System must provide the following algorithms Measure similarity between users For creation of the neighborhood Pearson and Spearman Correlation, cosine similarity, etc. Predicting rank of the item not rated by the user To decide order with which these items will be presented Weighted sum of ranks – most common Select neighborhood subset for prediction To reduce large amount of computation Threshold in similarity value – most common
Model Based Personalization Approaches Executed in two stages Offline process – to create the actual model Online process – using the model and interaction Common data used for model generation Web usage data (web history, click-through rates, etc.) Item's structure and content data Examples Rule-Based Models Graph-Theoretic Models Language Models
Model Based PersonalizationRule Based Models Association rule-based Item ia is in unordered association with ib If user considers ib, then ia is a good recommendation Sequence rule-based Item ia is in sequential association with ib If user considers ia, then ib is a good recommendation Association between items can be stored as a dependency graph
Model Based PersonalizationGraph Theoretic Model Ratings data is transformed into a directed graph Nodes are users A edge between ui and uj means that ui predicts uj Weights on edges represents the predictability To predict if an item ik will be of interest to ui Calculate shortest path from ui to any user ur Where ur has rated ik Predicted rating is calculated as a function of path between ui and ur
Model Based PersonalizationLanguage Modeling Approaches Without using user's relevance feedback Simple language modeling Using user's relevance feedback N gram based methods Noisy channel model based method
Language Model ApproachSimple Language Modeling Without using user's feedback History consists of all the words in the past queries Learn User Profile as {(w1,P(w1)),... (wn,P(wn))} where
Language Model ApproachSimple Language Modeling Sample User profile
Language Model ApproachSimple Language Modeling Re-ranking of unpersonalized results Re-ranking is done according to P(Q|D,u) α Is a weighter parameter between 0 and 1 UP is user profile
Language Model ApproachN gram based approach Using user's relevance feedback Learn User Profile Let Hurepresent the search history of user u H = {(q1, rf1), (q2, rf2), (q3, rf3), ...., (qn, rfn)} Unigram Now the user profile consists of {(w1, P(w1)), (w2, P(w2)), (w3, P(w3)), ...., (wn, P(wn))}
Language Model ApproachN gram based approach Sample Unigram User Profile
Language Model ApproachN gram based approach Bigram the user profile consists of {(w1w2, P(w2|w1)), (w2w3, P(w3|w2)), ... , (wn-1wn, P(wn|wn-1))}
Language Model ApproachN gram based approach Sample Bigram User Profile
Language Model ApproachN gram based approach Re-ranking unpersonalized results Based on unigram (α = weighting parameter) Q = q1 q2 q3 .... qn P(q1 q2 q3 .... qn)= P(q1) P(q2) P(q3) ....... P(qn)
Language Model ApproachN gram based approach Based on bigrams Q = q1 q2 q3 .... qn P(q1 q2 q3 .... qn)= P(q1|q2) P(q2|q3) ....... P(qn-1|qn)
Language Model ApproachNoisy Channel based approach With using User's Feedback (Implicit) User history is represented as Hi = (Q1,D1) , (Q2,D2) , .... (QN,DN) Di is the document visited for Qi D consists of words w1, w2, .... wm Basic Idea – Statistical Machine Translation Given Parallel Text of languages S and T We get P(ti|si) ∀ si ϵ S and ti ϵ T Using EM we get the optimized model P(T|S)
Language Model ApproachNoisy Channel based approach Similarly T = past queries Q1, Q2, .... QK S = text of relevant documents for queries T We learn the model P(Q|D) or more precisely P(qi|wj) Assumption Translate the ideal [information containing] document into a query Document – a verbose language Query – a compact language User profile is stored as Tuples < qi , wj , P(qi|wj) >
Language Model ApproachNoisy Channel based approach Sample Noisy Channel User Profile
Language Model ApproachNoisy Channel based approach Re-ranking Re-rank the documents using P(Q|D,u) α = weighting parameter P(qi|GE) is the lexical probability of qi
Issues in Personalization Cold Start Problem (new user problem) Latency Problem (new item problem) Data sparseness Scalability Privacy Recommendation List Diversity Robustness
Conclusion Web personalization is the need of the hour for e-businesses A relatively new research topic Several issues are yet to be solved effectively Data should be collected without evading user privacy Creating user models effectively and scaling it to the size of a large number of users/ items is at the core of Personalization
Bibliography Rohini U, Vamshi Ambati and Vasudeva Varma. Statistical Machine Translation Models for Personalized Search. In the Proceedings of 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008), January 7-12, 2008, Hyderabad, India. Sarabjot S. Anand and Bamshad Mobasher. Intelligent techniques for web personalization. In Intelligent Techniques for Web Personalization, pages 1-36. Springer, 2005. Vasudeva Verma. Personalization in Information Retrieval, Extraction and Access. In Workshop On Ontology, NLP, Personalization And IE/IR - IIT Bombay, Mumbai 15-17 July 2008 http://en.wikipedia.org/wiki/Personalisation Snapshots from Google Inc.