230 likes | 384 Views
Detecting Newsworthy Topics in Twitter Steven Van Canneyt and Matthias Feys April 8 th , 2014. Methodology. Input. News publisher detection. time interval i. Topic detection. Topic ranking. topic 2. topic 1. 1. 2. 3. topic 3. time interval i. Topic enrichment. 1. topic 3.
E N D
DetectingNewsworthy Topics in TwitterSteven Van Canneyt and Matthias FeysApril 8th, 2014
Input News publisher detection time interval i
Topic detection Topic ranking topic 2 topic 1 1. 2. 3. topic 3 time interval i
Topic enrichment 1. topic 3 headline 3, tweets, pictures… 2. topic 2 headline 2, tweets, pictures… 3. topic 1 headline 1, tweets, pictures… time interval i
News publisher detection • Bayesian Network Classifier • Estimates probability that a user is a ‘news publisher’ • Only use tweets from users with probability > α (0.04) • Training set: 10,000 manually annotated users which tweets contains newsworthy content? which tweets are posted by users who mostly talk about newsworthy stories?
News publisher detection • Features
Topic detection • DBSCAN clustering algorithm • Cosine similarity • Boosted tf-idf representation of the tweets
Topic detection standard term frequency-inverse document frequency of word w boosting of proper nouns and verbs (1.5) boosting of bursty words during time interval i
Topic ranking • SVM Classifier • Estimates probability that a topic is ‘newsworthy’ • Only use topics with probability > β (0.5) • Training set: 116 manually annotated topics retrieved from the ‘2012 US elections’ training dataset
Topic ranking • Features • Tweet features • eg. #tweets • User features • eg. %users with ‘news publisher’ prob. > 0.9 • Topical coherence • eg. %tweets containing most informative word in the cluster • Non duplicate features • eg. highest cosine similarity between the cluster and the newsworthy topics detected in previous time intervals
Topic enrichment • Objective: enrich detected newsworthy topic s • Headline • Split tweets in cluster s in sentences • Select sentence with highest cosine similarity with the center of the cluster • Rule based approach to clean sentence • eg. removing URLs, emoticons, ‘#’-symbol, ‘@’-symbol
Topic enrichment • Keywords • words in headline which are in the top 50% of the most important words of topic s
Topic enrichment • Representative tweets • Select all tweets posted during time interval i • Also tweets not posted by news publishers • Discard tweets with cosine similarity to the center of s < λ (0.6) • Sort obtained tweets based on their relevance to s • relevance = cosine similarity between tweet en topic center, multiplied by user_factor • user_factor= 1.5 if user is ‘news publisher’, 1 otherwise • Remove near-duplicates • ‘near-duplicate’ if cosine similarity > μ (0.7)
Topic enrichment • Representative pictures • Select all tweets posted during time interval i • Also tweets not posted by news publishers • Discard tweets with cosine similarity to the center of s < λ (0.6) • Select picture URLs from the tweets • Sort picture URLs based on the sum of the relevance values of their associated tweets
tags Sofia, monument, makeover, provokes tweets Pro-Ukraine paint job - Sofia monument's latest makeover provokes protest from Russia http://bbc.in/1frf9UN Kijow w Sofii. RT: @BBCWorld Pro-Ukraine paint job in Sofia provokes protest from Russia http://bbc.in/1frf9UN Pro-#Ukraine paint job-Sofia monument's latest makeover provokes #protest from R http://bbc.in/1frf9UN via @BBCWorld
tags Jubilant, protesters, driving, vehicle, Museum, Parliament tweets Jubilant protesters driving military vehicle from a Kiev Museum around Parliament building #Kiev #Ukraine Another #Russia|n armored vehicles spotted in #Sevastopol in #Crimea. #Ukraine http://qn.quotidiano.net/esteri/2014...
tags Jubilant, protesters, driving, vehicle, Museum, Parliament tweets Jubilant protesters driving military vehicle from a Kiev Museum around Parliament building #Kiev #Ukraine Another #Russia|narmored vehicles spotted in #Sevastopol in #Crimea. #Ukraine http://qn.quotidiano.net/esteri/2014...
Steven.VanCanneyt@intec.ugent.be Questions?