160 likes | 266 Views
Ranking Tweets Considering Trust and Relevance. Srijith Ravikumar,Raju Balakrishnan, and Subbarao Kambhampati Arizona State University. 1. One of the most prominent micro-blogging service.
E N D
Ranking Tweets Considering Trust and Relevance • Srijith Ravikumar,Raju Balakrishnan, and Subbarao Kambhampati • Arizona State University 1
One of the most prominent micro-blogging service. • Twitter has over 140 million active users and generates over 340 millions tweets daily and handles over 1.6 billion search queries per day. • Users access tweets by following other users and by using the search function. 2
Twitter Search Results for the Query: “Britney Spears” • Sorted by Reverse Chronological Order • Select the top retweeted single tweet as the top Tweet. • Does not apply any relevance metrics. • Contains spams and untrustworthy tweets. 3
TweetRank Query Query TweetRank Top K Results Top N Results Acts as a mediator between User and Twitter K is much higher than N and thereby we are able to eliminate untrustworthy results. 4
Need for Relevance and Trust Spread of False Facts in Twitter has become an everyday event • Re-Tweets and users can be bought. • Thereby making relying on those for trustworthiness does not work. 5
Getting Relevant & Trustworthy Results • Manual curation is out of question.. (unless you are the • Government of China :-) ) • How many would it take to clean up a micro-blog with140 million active users? • Automated analysis? • Page Rank uses the explicit links between the Web Pages for evaluation of Trust and Relevance. But what are the links between tweets? 6
Links in Twitter Space Agreement Retweet Re-Tweet: Explicit links between tweets Agreement: Implicit links between tweets that contain the same fact 7
Agreement • Agreement between two tweets is defined as amount of similarity in their content. • Retweets are not considered in Agreement as Retweets are unverified endorsements. • How does agreement Capture Relevance and Trust? • A tweet which is agreed upon by a large number of other tweets is likely to be popular. The popular tweets are more likely to be Relevant. • Since agreement does not include retweets, most agreed tweet has most number of independent users agreeing on the same fact and hence they are more trustworthy. 8
Agreement Computation • For efficient computation of agreement we need to understand the meaning of each tweet. This need Natural Language Processing. • As a preliminary idea, we compute agreement using Soft TF-IDF with Jaro-Winkler similarity. • Soft TF-IDF is similar to TF-IDF except it considers similar tokens in two compared document vectors in addition exactly similar terms. 9
Computing Ranked Results • Simple voting technique is used to compute the Ranked Results. • The Agreement of a tweet is the sum of the agreement with all others tweets. • The tweets are sorted according to Agreement voting and Top-N results are send to user. 1.3 1.0 .6 1 2 .7 .4 0.0 3 10
Evaluation - Relevance • Top N results where manually labelled as follows: 12
Evaluation - Trust • Top N results where manually labelled as follows: 13
Ranking Cost • The time increases quadratically with the number of tweets. • Since the computation of agreement is pairwise it can be easily parallelized using MapReduce. 14
Twitter Eco-System Tweeted URL Tweeted By Followers Hyperlinks 15
Summary • Micro-blog spamming is increasingly becoming lucrative and problematic. • We are working on a ranking sensitive to trustworthiness and relevance of Micro-blogs. • We model the tweet space as a tri-layer graph; containing tweet layer, user layer and web-page layer. • Ranking is derived based on users, tweets, and prestige of the referred web pages. 16