Expertise networks in online communities: structure and algorithms

Expertise networks in online communities:structure and algorithms Jun Zhang, Mark Ackerman, Lada Adamic School of Information, University of Michigan International Symposium on Self-OrganizingOnline Communities March 31st, 2007

motivation • lots of people are turning to question-answer forums for help • automatically infer the expertise of participants • expertise could be used to rank answers, or recommend posts one could reply to methods • empirical evaluation of ranking algorithms • social network analysis • simulation • understand underlying dynamics • predict performance of ranking algorithms in communities with yet-unobserved dynamics

related work • Netscan(Marc Smith & co) • Robert Kraut commitment & online community • Virtual communities (Barry Wellman) • using link-based ranking algorithms to evaluate expertise in email networks (Dom et al.) image credit: Danyel Fisher

Can we automatically infer expertise? • We use PageRank, HITS, ask/reply ratios, etc. to try and automatically infer the expertise of the users • Human raters read the posts made by users • In online JavaForum, ask/reply ratio outperforms PageRank… • Develop simulations: • distribution of expertise (skewed) • who asks questions most often? (novices) • who answers questions • 1. best expert most likely • 2. someone a bit more expert

0.9 1 1 A A A A A B B B B B C C C C C Constructing a community expertise network unweighted 1 weighted by # threads 2 Thread 1 Thread 2 1/2 weighted by shared credit 1+1//2 Thread 1: Large Data, binary search or hashtable? user ARe: Large... user BRe: Large... user C Thread 2: Binary file with ASCII data user ARe: File with... user C weighted with backflow 0.1

JavaForum • 87 sub-forums • 1,438,053 messages • community expertise network constructed: • 196,191 users • 796,270 edges • Observations • More than 55% of users usually only ask questions, while there are about 25% of users answer questions. • Many questions are answered by few advanced users while majority of users only answer a few. • Top repliers answer questions for everyone. • However, less expert users tend to answer questions of others with lower expertise level.

0 10 2 a = 1.87 fit, R = 0.9730 -1 10 -2 10 cumulative probability number of people one received replies from -3 10 -4 10 0 1 2 3 10 10 10 10 degree (k) Uneven participation • ‘answer people’ may reply to thousands of others • ‘question people’ are also uneven in the number of repliers to their posts, but to a lesser extent number of people one replied to

Not Everyone Asks/Replies The Web is a bow tie The Java Forum network is An uneven bow tie • Core: A strongly connected component, in which everyone asks and answers • IN: Mostly askers. • OUT: Mostly Helpers

relating network structure to Java expertise • Human-rated expertise levels • 2 raters • 135 JavaForum users with >= 10 posts • inter-rater agreement (t = 0.74, r = 0.83) • for evaluation of algorithms, omit users where raters disagreed by more than 1 level (t = 0.80, r = 0.83)

Structural Info Based Expertise Ranking Metrics • # replies posted (# answers) • experts can answer many questions • # people replied to (# indegree) • experts can answer questions from many different people • z-score for the 2 above (observed – m)/s • experts are above the mean in the above two metrics • PageRank replying to people who reply to people • higher level experts can answer mid-level experts • HITS experts answer questions by people whose questions other experts have answered hubs point to good authorities

automated vs. human ratings # answers indegree automated ranking z # answers z indegree PageRank HITS authority human rating

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 # answers z-score # answers indegree z-score indegree PageRank HITS authority Kendall’s t Spearman’s r Top K JavaForum empirical evaluation of ranking algorithms simple local measures do as well (and better) than measures incorporating the wider network topology

Modeling community structure to explain algorithm performance

simulating probability of expertise pairing suppose: expertise is uniformly distributed probability of posing a question is inversely proportional to expertise pij= probability a user with expertise j replies to a user with expertise i 2 models: ‘best’ preferred ‘just better’ preferred j>i

visualization Best “preferred” just better

degree correlation profiles degree-degree correlations between asker and helper asker indegree asker indegree asker indegree best preferred (simulation) just better (simulation)

It can tell us when to use which algorithms Preferred Helper: ‘best available’ Preferred Helper: ‘just better’

Different ranking algorithms perform differently In the ‘just better’ model, a node is correctly ranked by PageRank but not by HITS

simplest models do not capture all ‘local’ interactions

Summary • Expertise Networks have interesting characteristics • A set of useful metrics • Simulation as an analysis tool • There are rich design opportunities • Find experts with the help of structural information (and content analysis) • Predict good answers • Re-order questions/answers to match expertise questions posed by experts wait an average of 9 hours for the first reply novice questions are answered in 40 minutes working paper: “Expertise-Level based Interface Personalization for Online Help-seeking Communities”

Future Work • Looking at diverse sets of question-answer forums (Yahoo Answers) • Expertise across different topics • Using explicit ratings for evaluation of automated expertise identification & incorporation into algorithms (battling spam) • Users’ expertise change over time • Developing applications, e.g. recommender engines for questions beauty & style cars & transportation hair maintenance & repairs

for more info • ExpertiseRank algorithms and evaluations Zhang, J., Ackerman, M.S., Adamic, L., Expertise Networks in Online Communities: Structure and Algorithms, WWW’07 • Simulations of expertise networks Zhang, J., Ackerman, M.S., Adamic, L., CommunityNetSimulator: Using Simulations to Study Online Community Network Formation and Implications, C&T2007 Jun Zhang junzh@umich.edu http://www-personal.si.umich.edu/~junzh Mark Ackerman ackerm@eecs.umich.edu http://www.eecs.umich.edu/~ackerm/ Lada Adamic ladamic@umich.edu http://www-personal.umich.edu/~ladamic

Expertise networks in online communities: structure and algorithms

Expertise networks in online communities: structure and algorithms

Presentation Transcript

BIOLOGICAL NETWORKS

Structure Learning in Bayesian Networks

CULTURAL ALGORITHMS: A TUTORIAL

Structure and models of real-world graphs and networks

BİM 202 ALGORITHMS

Network Layer

CSE 373: Data Structures and Algorithms

Security of Sensor Networks

Computer Networks

Randomized Algorithms and Motif Finding

Serving our Native American Communities of Michigan

Vehicular Ad hoc Networks (VANET)

Parallel Algorithms on Networks of Processors

Ad-Hoc Networks: Routing Algorithms

College of software,Zhejiang University of Technology

Sampling Bayesian Networks

Wireless Sensor Networks Routing

Genetic Algorithms

Algorithms

CPSC 411 Design and Analysis of Algorithms

Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices