190 likes | 215 Views
Semantic News Recommendation Using WordNet and Bing Similarities. Introduction (1). Recommender systems help users to plough through a massive and increasing amount of information Recommender systems: Content-based Collaborative filtering Hybrid
E N D
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013)
Introduction (1) • Recommender systems help users to plough through a massive and increasing amount of information • Recommender systems: • Content-based • Collaborative filtering • Hybrid • Content-based systems often make term-based comparisons between user profiles and items • Common measure: Term Frequency – Inverse Document Frequency (TF-IDF) as proposed by Salton and Buckley [1988] 28th Symposium On Applied Computing 2013 (SAC 2013)
Introduction (2) • One could take into account semantics: • Concepts instead of terms → Concept Frequency – Inverse Document Frequency (CF-IDF): • Measures (cosine) similarity using item and profile concept scores • Reduces noise caused by non-meaningful terms • Yields less terms to evaluate • Allows for semantic features, e.g., synonyms • Relies on a domain ontology • Synsets instead of concepts → Synset Frequency – Inverse Document Frequency (SF-IDF): • Similar to CF-IDF • Measures (cosine) similarity using item and profile synsetscores • Does not rely on a domain ontology • Relies on a large semantic lexicon: WordNet 28th Symposium On Applied Computing 2013 (SAC 2013)
Introduction (3) • One could take into account semantics: • Semantic Similarity (SS) recommender: • Measures similarity between item and profile synsets • Various similarity measures: Jiang & Conrath [1997], Leacock & Chodorow [1998], Lin [1998], Resnik [1995], Wu & Palmer [1994] • Outperforms TF-IDF, CF-IDF, and SF-IDF • SS recommenders seem to be a good choice, but: • No support for named entities (persons, companies, …) • Many of these are used in texts, e.g., news 28th Symposium On Applied Computing 2013 (SAC 2013)
Introduction (4) • Hence, we propose the BingSS recommender: • Bing: • Identifies similarities of named entities • Uses Bing page counts • Bing offered a free API at the time of writing • SS: • Identifies similarities of known synsets • Uses WordNetsynsets • Wu & Palmer similarity • Implementation in Ceryx (as a plug-in for the Hermes news processing framework [Frasincar et al., 2009]) 28th Symposium On Applied Computing 2013 (SAC 2013)
Framework: User Profile • User profile consists of all read news items • Implicit preference for specific topics 28th Symposium On Applied Computing 2013 (SAC 2013)
Framework: Preprocessing • Before recommendations can be made, each news item is parsed: • Tokenizer • Sentence splitter • Lemmatizer • Part-of-Speech 28th Symposium On Applied Computing 2013 (SAC 2013)
Framework: Synsets • We make use of the WordNet dictionary and WSD • Each word has a set of senses and each sense has a set of semantically equivalent synonyms (synsets): • Turkey: • turkey, Meleagris gallopavo (animal) • Turkey, Republic of Turkey (country) • joker, turkey (annoying person) • turkey, bomb, dud (failure) • Fly: • fly, aviate, pilot (operate airplane) • flee, fly, take flight (run away) • Synsets are linked using semantic pointers • Hypernym, hyponym, … 28th Symposium On Applied Computing 2013 (SAC 2013)
Framework: Bing • Bing similarity score is calculated by computing the pair-wise similarities between all named entities u and r in an unread document U and the user profile R: • V is a vector with all combinations of named entities from U and R, and simPMI(u,r) is the Point-Wise Mutual Information co-occurrence measure for u and r • We only consider the top βBing entity pairs with the highest similarity in V 28th Symposium On Applied Computing 2013 (SAC 2013)
Framework: SS (1) • TF-IDF, CF-IDF, and SF-IDF use cosine similarity: • Two vectors: • User profile items scores • News message items scores • Measures the cosine of the angle between the vectors • Semantic Similarity (SS): • Two vectors: • User profile synsets • News message synsets • Jiang & Conrath [1997],Resnik [1995] , and Lin [1998]: information content of synsets • Leacock & Chodorow [1998] and Wu & Palmer [1994]:path length between synsets 28th Symposium On Applied Computing 2013 (SAC 2013)
Framework: SS (2) • SS similarity score is calculated by computing the pair-wise similarities between all synsets u and r in an unread document U and the user profile R : • W is a vector with all combinations of synsets from U and R that have a common Part-of-Speech, and sim(u,r) is any of the mentioned SS measures • We only consider the top βSSsynset pairs with the highest similarity in W 28th Symposium On Applied Computing 2013 (SAC 2013)
Framework: BingSS • We take the weighted average of Bing page counts simBing and SS scores simSS:where weight α is optimized during training 28th Symposium On Applied Computing 2013 (SAC 2013)
Implementation: Hermes • Hermes framework is utilized for building a news personalization service for RSS • Its implementation is the Hermes News Portal (HNP): • Programmed in Java • Uses OWL / SPARQL / Jena / GATE / WordNet 28th Symposium On Applied Computing 2013 (SAC 2013)
Implementation: Ceryx • Ceryx is a plug-in for HNP • Uses WordNet / Stanford POS Tagger / JAWS Lemmatizer / Lesk WSD / Alias-I LingPipe 4.1.0 Named Entity Recognizer / Bing API 2.0 • Main focus is on recommendation support • User profiles are constructed • Computes SS and BingSS 28th Symposium On Applied Computing 2013 (SAC 2013)
Evaluation (1) • Experiment: • We evaluate 100 news items on their correspondence with 8 topics (USA, Microsoft or competitors, Google or competitors, financial markets, …) • User profile: all articles that are related to each of the topics • Ceryx computes SS and BingSS with various cut-offs • Measurements: • Accuracy • Precision • Recall • Specificity • F1-measure 28th Symposium On Applied Computing 2013 (SAC 2013)
Evaluation (2) 28th Symposium On Applied Computing 2013 (SAC 2013)
Evaluation (3) • Results: • Optimized cut-off values are 0.49 (SS) and 0.63 (BingSS) • BingSS recommendation outperforms SS recommendation on accuracy, precision, specificity, and F1 • This comes at the cost of a reduced recall • For BingSS, named entity similarities are more important than synset similarities (α = 0.72 vs. 0.28) 28th Symposium On Applied Computing 2013 (SAC 2013)
Conclusions • Semantics-based recommendation can be performed by means of synsets from a semantic lexicon (SS) • Named entities are not included, but can be considered through search page counts (BingSS) • BingSS outperforms SS and named entities are considered to be more important than synsets • Future work: • Also include page counts for synsets • Apply named entity page counts to other methods, e.g., TF-IDF, CF-IDF, or SF-IDF 28th Symposium On Applied Computing 2013 (SAC 2013)
Questions 28th Symposium On Applied Computing 2013 (SAC 2013)