210 likes | 344 Views
Contextual Ranking of Keywords Using Click Data. Utku Irmak, Vadim von Brzeski , Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized by Park,Sung Eun , IDS Lab., Seoul National University Presented by Park,Sung Eun ,IDS Lab., Seoul National University.
E N D
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session 2010. 04. 09. Summarized by Park,SungEun , IDS Lab., Seoul National University Presented by Park,SungEun ,IDS Lab., Seoul National University
Contents • Introduction • Contextual Shortcuts • Concept Ranking Method • Feature Space • Interestingness and Relevance of a Concept • Evaluation • Cross Validation Approach, Editorial Evaluation, Real World Results • Conclusion
Introduction • Determining and ranking the key concepts in a document • Goal • Given the candidate set of entities, learn a ranking function which orders the entities by their interestingness and relevance • Applications • Contextual advertising systems • Text summarization • User centric entity detection systems • Detect entities and concepts within text • Transform those detected entities into actionable like “intelligent hyperlinks”
Contextual Shortcut • A concept vector • Concepts : A piece of text that refers to an abstract thought or idea. Ex) car insurance, justice • Generating concept vector • Term vector : TF/IDF from documents in Yahoo! Search • Unit vector : all units found in the document • Units are constructed from query logs in an iterative statistical approach using the frequencies of the distinct queries • Concept vector : the term vector and the unit vector are merged
Previous Concept Ranking Method Document • AG(TF,Unit) • A term appears in the term vector, but not in the unit vector • punish its term vector weight • A term appears in the unit vector, but not in the term vector • its unit weight • add this term to the concept vector with its unit weight • um its term vector and unit vector weights
Proposed Concept Ranking Method Features Term 1 Terms Term 2 Term 3 Term 3 SVMlight Term 4 Term 5 Term 7 Term 6 … … • Ranking Function : SVM(Support Vector Machine) • SVMlight : an open source library for ranking SVM • Interestingness : 9 Features of a concept • Relevance: pre-mined terms of the concept
Relevance of a Concept in a Context • A mining approach to obtain a good relevance scoring mechanism • Use pre-mined keywords for each concepts • Relevant terms of • Relevance of the concept can be computed based on the co-occurrence of the pre-mined keyword.
Relevance of a Concept in a Context • Relevant term scoring • Search engine snippets • Using Yahoo! Developer Network API • Treat returned snippets as a document and compute score= tf*idf • Top m=100 terms based on the score • Prisma query refinement tool • Prisma is a tool which assists users to augment or replace their queries by providingfeedback terms by considering the top 50 documents in a large collection based on factors such as count and position of the terms, document rank, occurrence of query terms within the input phrase. • Construct single document from the concepts returned by Prisma for concept ci and compute the score based on the tf*idf values
Relevance of a Concept in a Context Query Suggetions Prisma Snippet • Relevant term scoring • Related query suggestions • Using Yahoo! Developer Network API • 300 suggestions and the query frequencies of the suggestions • Say k is the number of term appeared in suggestion lists
Evaluation • Cross Validation Approach • Data • Randomly sampled news stories that were annotated by Contextual Shortcuts • The number of times these stories viewed and the number of clicks received by each concept that was detected in the stories • 870 stoires,6420 concepts of 16549 sample clicks • Weighted Error Rate Where Click-through-rate=(the number of clicks) / (the number of views)
Evaluation • NDCG(Normalized discounted cumulative gain measure) • A valuable metric for those applications that require high precision at top ranks • Score for a sorted list of k concepts on documenti • Where score(j)=bucketNo(CTR(j)/100), bucketNo() returns a bucket number between 0 and 1000 considering all the CTR values observed in the system in increasing order.
Evaluation Interestingness features
Evaluation Relevance score
Evaluation Interestingness Features and Relevance Score
Evaluation • EditorialEvaluation • Processed set of documents is presented to the judges • A judge is asked to select a document from the pool. • Ask to read the document and rate each entity or concept highlighted in the document in terms of its interestingness and relevance
Contributions We propose to use implicit user feedback in the form of click data to determine the most interesting and relevant concepts in a context via a machine learning approach. We describe a feature space pertinent to the interestingness of a concept, and present algorithms to identify relevance of a concept in a given context. We evaluate the proposed techniques extensively using click data, an editorial study, and an analysis on production system. The results show significant improvements. We provide a detailed description of a framework that enables efficient implementation of the proposed techniques in a production system.
Discussion • No theoretical base on their feature selection assumptions. • No references or base theory at all • Depending on the technology already developed in previous studies. • Huge advantage on having valuable dataset.
Q&A Thank you