590 likes | 710 Views
Topical search in Twitter. Complex Network Research Group Department of CSE, IIT Kharagpur. Topical search on Twitter. Twitter has emerged as an important source of information & real-time news Most common search in Twitter: search for trending topics and breaking news Topical search
E N D
Topical search in Twitter Complex Network Research Group Department of CSE, IIT Kharagpur
Topical search on Twitter • Twitter has emerged as an important source of information & real-time news • Most common search in Twitter: search for trending topics and breaking news • Topical search • Identifying topical attributes / expertise of users • Searching for topical experts • Searching for information on specific topics
Prior approaches to find topic experts Research studies Pal et. al. (WSDM 2011) uses 15 features from tweets, network, to identify topical experts Weng et. al. (WSDM 2010) uses ML approach Application systems Twitter Who To Follow (WTF), Wefollow, … Methodology not fully public, but reported to utilize several features
Prior approaches use features extracted from User profiles Screen-name, bio, … Tweets posted by a user Hashtags, others retweeting a given user, … Social graph of a user #followers, PageRank, …
Problems with prior approaches User profiles – screen-name, bio, … Bio often does not give meaningful information Information in users profiles mostly unvetted Tweets posted by a user Tweets mostly contain day-to-day conversation Social graph of a user – #followers, PageRank Does not provide topical information
We propose … • Use a different way to infer topics of expertise for an individual Twitter user • Utilize social annotations • How does the Twitter crowd describe a user? • Social annotations obtained through Twitter Lists • Approach essentially relies on crowdsourcing
Twitter Lists • A feature used to organize the people one is following on Twitter • Create a named list, add an optional List description • Add related users to the List • Tweets posted by these users will be grouped together as a separate stream
Using Lists to infer topics for users • If U is an expert / authority in a certain topic • U likely to be included in several Lists • List names / descriptions provide valuable semantic cues to the topics of expertise of U
Dataset • Collected Lists of 55 million Twitter users who joined before or in 2009 • 88 million Lists collected in total • All studies consider 1.3 million users who are included in 10 or more Lists • Most List names / descriptions in English, but significant fraction also in French, Portuguese, …
Mining Lists to infer expertise Collect Lists containing a given user U List names / descriptions collected into a ‘document’ for the given user Identify U’s topics from the document Handle CamelCase words, case-folding Ignore domain-specific stopwords Identify nouns and adjective Unify similar words based on edit-distance, e.g., journalists and jornalistas, politicians and politicos (not unified by stemming)
Mining Lists to infer expertise • Unigrams and bigrams considered as topics • Result: Topics for U along with their frequencies in the document
Topics inferred from Lists politics, senator, congress, government, republicans, Iowa, gop, conservative politics, senate, government, congress, democrats, Missouri, progressive, women celebs, actors, famous, movies, comedy, funny, music, hollywood, pop culture linux, tech, open, software, libre, gnu, computer, developer, ubuntu, unix
Lists vs. other features Profile bio love, daily, people, time, GUI, movie, video, life, happy, game, cool Most common words from tweets Most common words from Lists celeb, actor, famous, movie, stars, comedy, music, Hollywood, pop culture
Lists vs. other features Profile bio Fallon, happy, love, fun, video, song, game, hope, #fjoln, #fallonmono Most common words from tweets Most common words from Lists celeb, funny, humor, music, movies, laugh, comics, television, entertainers
Who-is-who service • Developed a Who-is-Who service for Twitter • Shows word-cloud for major topics for a user • http://twitter-app.mpi-sws.org/who-is-who/ Inferring Who-is-who in the Twitter Social Network, WOSN 2012 (Highest rated paper in workshop)
Topical experts in Twitter • 400 million tweets posted daily • Quality of tweets posted by different users vary widely • News, pointless babble, conversational tweets, spam, … • Challenge: to find topical experts • Sources of authoritative information on specific topics
Basic methodology • Given a query (topic) • Identify experts on the topic using Lists • Discussed earlier • Rank identified experts w.r.t. given topic • Need ranking algorithm • Additional challenge: keeping the system up-to-date in face of thousands of users joining Twitter daily
Ranking experts • Used a ranking scheme solely based on Lists • Two components of ranking user U w.r.t. query Q • Relevance of user to query – cover density ranking between topic document TU of user and Q • Popularity of user – number of Lists including the user • Cover Density ranking preferred for short queries Topic relevance( TU, Q ) × log( #Lists including U )
Cognos • Search system for topical experts in Twitter • Publicly deployed at http://twitter-app.mpi-sws.org/whom-to-follow/ Cognos: Crowdsourcing Search for Topic Experts in Microblogs, ACM SIGIR 2012
Cognos results for “stem cell”
Evaluation of Cognos - 1 • Competes favorably with prior research attempts to identify topical experts (Pal et al. [WSDM 2011])
Evaluation of Cognos – 2 Cognos compared with Twitter WTF Evaluator shown top 10 results by both systems Result-sets anonymized Evaluator judges which is better / both good / both bad Queries chosen by evaluators themselves 27 distinct queries were asked at least twice In total, asked 93 times Judgment by majority voting
Cognos vs Twitter WTF Cognos judged better on 12 queries Computer science, Linux, mac, Apple, ipad, India, internet, windows phone, photography, political journalist Twitter WTF judged better on 11 queries Music, Sachin Tendulkar, Anjelina Jolie, Harry Potter, metallica, cloud computing, IIT Kharagpur Mostly names of individuals or organizations Tie on 4 queries Microsoft, Dell, Kolkata, Sanskrit as an official language
Cognos vs Twitter WTF Low overlap between top 10 results … In spite of same topic being inferred for 83% experts Major differences are due to List-based ranking Top Twitter WTF results – mostly business accounts Top Cognos results – mostly personal accounts
Keeping system up-to-date • Any search / recommendation system on OSN platform needs to be kept up-to-date • Thousands of new users join every day • Need efficient way of discovering topical experts • Can brute force approach be used? • Periodically crawl data (profile, Lists) of all users
Scalability problem 200 million new users joined Twitter during 9 months in 2011 740K new users join daily Lower-bound estimate: 1480K API calls per day required to crawl their profiles and Lists Twitter allows only 3.6K API calls per day per IP 480K API calls per day from whitelisted IP Plus, 465 million users already
How many experts in Twitter? Only 1% listed 10 or more times Only 0.12% listed 100 or more times If experts can be identified efficiently, possible to crawl their Lists
Identifying experts efficiently Hubs – users who follow many experts and add them to Lists Identified top hubs in social network using HITS Crawled Lists created by top 1 million hubs Top 1M hubs listed 4.1M users 2.06M users included in 10 or more Lists (50%) Discovered 65% of the estimated number of experts listed 100 or more times
Identifying experts efficiently More than 42% of the users listed by top hubs have joined Twitter after 2009 Discovered several popular experts who joined within the duration of the crawl All experts reported by Pal et. al. discovered Discovered all Twitter WTF top 20 results for 50% of the queries, 15 or more for 80% of the queries
Looking for Tweets by Topic • Services today are limited to keyword search • Knowing which keywords to search for, is itself an issue • Keyword search is not context aware • Tweets are too small to deduce topics • Topic analysis of 400M tweets/day is a challenge
Challenges • Some tweets are more important than others • Millions of tweets are posted on popular topics • Only some are relevant to the context intended • Tweets may contain wrong or misleading info • Twitter has a large population of spammers • Twitter is also a potent source of rumors • Some tweets are outright malicious
Our Approach to the Issues • Scalability • We only look at tweets from as small subset of users who are experts on different topics • Topic deduction • We map user expertise topics, to tweets/hashtags, instead of the other way round • Trustworthiness • Our source of tweets is a small subset of users • It is practical to vet their expertise and reputation
600K experts on 36K distinct topics Advantages of list-based methodology
TopicalDiversityofExpertSample CSCW’14
Challenges in Used Approach • We assign topics to tweets/hashtags • Inferring tweet topics from tweeter expertise • Experts can have multiple topics of expertise • Experts do tweet about topics beyond their expertise • Solution: If multiple experts on a subject tweet about something, it is most likely related to the topic.
Sampling Tweets from Experts • We capture all tweets from 585K topical experts • This is a set we obtained from our previous study • This about 0.1% of the whole Twitter population • The experts generate 1.46 million tweets/per day • This is 0.268% of all tweets on twitter • Expertise in diverse topics (36K) • Our topics of expertise is crowd sourced • We will have more topics as more users show interests
Methodology at a Glance • Given a topic, we gather tweets from experts • We use hashtags to represent subjects • Clustering Tweets by similar hashtags • A cluster represents information on related subjects • Ranking clusters by popularity • Number of unique experts tweeting on the subject • Number of unique tweets on the subject • Ranking tweets by authority • Tweets from highest ranked user is shown first
What-is-happening on Twitter • twitter-app.mpi-sws.org/what-is-happening/ Topical search in Microblogs with Cognoscenti, Or: The Wisdom of Crowdsourced Experts,
Results for the last week on Politics (a popular topic)
Related tweets are grouped together by common hashtags. Number of experts tweeting on the subject and the number of tweets on the subject decides ranking. The most popular tweet from the most authoritative user represents the group.