270 likes | 412 Views
TWinner : Understanding News Queries with Geo-content using Twitter. Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science GIR ’1 0. 29 April, 2011 Sengyu Rim. Outline. Introduction Related Work Twitter as News-wire Determining News Intent
E N D
TWinner: Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science GIR ’10 29 April, 2011 SengyuRim
Outline • Introduction • Related Work • Twitter as News-wire • Determining News Intent • Assigning Weights to Tweets • Experiments and Results • Conclusion 2/26
Introduction • Motivations • Users find news through search engines • The search results of common search engines are different from the user expected • Non-critical information • Unorganized content • Necessary for search engines to understand the intend of the user query
Introduction Motivation E.g what event in Korea attracted most attention in 2002? A naive user is searching the news with keyword “korea” on 2002.06-18 Food: Kimchi Map: korea News: Korea:Italy 2:1 Wiki: Korea 4/26
Introduction • Analyze the content of a popular social networking site, Twitter to know the intention of the user query • Twitter provides popular news topics • Twitter provides keywords that may enhance the user query • TWinner makes two novel contributions to the field of Geographic information retrieval • Identifying the intent of the user query • Adding additional keywords to the query
Introduction • The architecture of the news intent system Twinner
Outline • Introduction • Related Work • Twitter as News-wire • Determining News Intent • Assigning Weights to Tweets • Experiments and Results • Conclusion
Related Work • To identify and disambiguate the locations of users • Natural Language Processing • Data Mining • To establish the relationship between the location of the news and news content • A model using NLP techniques
Outline • Introduction • Related Work • Twitter as News-wire • Determining News Intent • Assigning Weights to Tweets • Experiments and Results • Conclusion
Twitter as News-wire • Twitter • Free social networking • Micro-blogging service • Medium for news updates
Outline • Introduction • Related Work • Twitter as News-wire • Determining News Intent • Assigning Weights to Tweets • Experiments and Results • Conclusion
Determining News Intent • Identification of Location • Geo-tags the query to a location with certain confidence • Frequency-Population Ratio • FPR always remains constant in the absence of a news making event irrespective of the location • Used to assign a news intent confidence to the query • FPR = (α + β) * Nt • α: the population density factor • β: location type constant • Nt:the number of tweets per minute at that instant
Determining News Intent • Experiments on determining the effect of geo-type and population density
Determining News Intent • The drawback of FPR • Fails to take into account the geographical relatedness of features • Modified FPR • FPR = Σ δi (αi + βi) * Nt • δi: factor that each geo-location related to the primary search query
Outline • Introduction • Related Work • Twitter as News-wire • Determining News Intent • Assigning Weights to Tweets • Experiments and Results • Conclusion
Assigning Weights to Tweets • Detecting Spam Messages • Spam messages carry little or no relevant information • Nature of spam messages • The formula that tags to a certain level of confidence whether the message is spam or not • Np: the number of followers • Nq: the number of people the user is following • μ: an arbitrary constant • Nr: the ratio of number of tweets containing a reply to the total number of tweets
Assigning Weights to Tweets • On basis of user location • The experiment conducted to understand the relation between Twitter messages and the location of the user
Assigning Weights to Tweets • Using Hyperlinks Mentioned in Tweets • 30-50% of the general Twitter messages contain a hyperlink to external website • The news Twitter messages of this percentage increases to 70-80% • We also make use of this pointer to assign the weights to tweets
Assigning Weights to Tweets • Semantic Similarity • Summarize the Twitter messages into a couple of keywords • Naïve approach picks k keywords ignoring the sematic similarity • The definition of the semantic similarity • M: the total number of articles searched in New York Times Corpus • f(x): the number of articles for term x • f(y): the number of articles for term y
Assigning Weights to Tweets • Reassigns the weight of all keywords on the basis of the following formula • Wi*= Wi + ΣSij* Wj • Wi*: the new weight of the keyword i • Wi: the weight without semantic similarity • Sij: the semantic similarity derived from semantic formula • Wj : the initial weight of the other words being considered • Identifies k keywords that are semantically dissimilar but together contribute maximum weight. • Spq<Sthreshold, the similarity between any two word(p) and word(q) belonging to the set of k is less than a threshold • W1+W2+W3+….+Wk is maximum for all groups satisfying the condition above mentioned
Outline • Introduction • Related Work • Twitter as News-wire • Determining News Intent • Assigning Weights to Tweets • Experiments and Results • Conclusion
Experiment and Results • Experiments-to see the validity of the hypothesis • First: a naïve user is looking for the latest on the happenings in the context to the Ford Hood incident on 12th November 2009 • Second: a naïve user is looking for the latest on the happenings in the context to ‘Russia’ on 5th December 2009 • Third: :a naïve user is looking for the latest on the happenings in the context to ‘Haiti’ on 18th January 2010
Experiment and Results • Results
Experiment and Results • Result-shows the contrast in search results produced by using original query and after adding keywords obtained by TWinner
Outline • Introduction • Related Work • Twitter as News-wire • Determining News Intent • Assigning Weights to Tweets • Experiments and Results • Conclusion
Conclusion • We present a system to predict a user’s news intent • Takes location mentioned and time of query into consideration • Makes use of the social networking site Twitter to understand the relationship between geo-information and the news intend of the query • Future work • Understanding the content of the social media message • Sentiment conveyed by the messages • Enhancing the accuracy of the system