320 likes | 453 Views
You Are Who You Know: Inferring User Profiles in Online Social Networks Third ACM International Conference on WSDM 2010. Alan Mislove , Bimal Viswanath , Krishna P. Gummadi , Peter Druschel MPI-SWS Rice University Northeastern University
E N D
YouAreWhoYouKnow:InferringUserProfilesinOnlineSocialNetworksThird ACM International Conference on WSDM 2010 Alan Mislove, BimalViswanath, Krishna P. Gummadi, Peter Druschel MPI-SWS Rice University Northeastern University Saarbrücken, Germany Houston, TX, USA Boston, MA, USA
outline • Introduction • Approach • Result • My thought
Facebook and personal data Profile Users upload information to sites like Facebook Profile information, Status updates, Photos, videos
InferringUserProfilesinOnlineSocialNetworks • Givenattributesforsomeusers,wecaninfertheattributesoftheremainingusers. • Forexampleschool:NCCU.
Facebook and personal data(cont.) • Privacy model for data Choose what to reveal And what to keep private • When reasoning about privacy Don’t often consider implicit data What our friends reveal about ourselves • What is implicit data? Exploiting homophily People associate with others who sharecommonattribute
Roadmap • Idea: Use communities to infer attributes • Collect fine-grained community data • Do attribute-based communities exist? • How well can we infer user attributes?
Idea: Use communities • Discover:users are more likely to be friends with users with similar attributes, and that groups of users with common attributes often form dense subgraphs. • Look for groupings of users Called communities Potentially share attributes
Roadmap • Idea: Use communities to infer attributes • Collect fine-grained community data • Do attribute-based communities exist? • How well can we infer user attributes?
Social network data • Crawled two Facebook networks Rice University (university) New Orleans (regional) • Picked known seed user Crawled all of his friends, added new users to list Effectively performed a BFS of graph
Social network data • Crawled two Facebook networks Rice University (university) New Orleans (regional) • Picked known seed user Crawled all of his friends, added new users to list Effectively performed a BFS of graph
Social network data • Crawled two Facebook networks Rice University (university) New Orleans (regional) • Picked known seed user Crawled all of his friends, added new users to list Effectively performed a BFS of graph
Collecting attributes • Obtained authoritative informationQueried student directoryCollege (dormitory), major(s), year • Could not collect Facebook profiles • Collected Facebook profiles • Collectall attributesE.g., high school, groups, genderhabits,
Roadmap • Idea: Use communities to infer attributes • Collect fine-grained community data • Do attribute-based communities exist? • How well can we infer user attributes?
Do attributes define communities? • Put users into groups based on attributesDetermine if these are communities • How good is a community division? • Need metric to rate communities • Modularity rates community strengthRange [-1,1]0 represents expected in random graph≥0.25 represents community structure
Partition a network into k distinct communities.(eg.C1~C3) • Eiithe set of intra-community edges in CiEijthe set of inter-community edges connecting CiandCj(i≠j), |E|edgesinthesocialnetwork. • eii = |Eii|/|E| , eij= |Eij|/2|E|,ai=Σjeij,(Ci)
Partitionasocialnetworkinto3communities:C1,C2,C3. • |E|=16,|E11|=5, |E12|=1, |E13|=1 • e11=|E11|/|E|=5/16,e12=|E12|/2|E|=1/32, e11=|E13|/2|E|=1/32,
a1=Σ1≦j≦3e1j=5/16+1/32+1/32=3/8. • e22=5/16,e33=3/16,a2=3/8,a3=1/4ModularityQ=Σ1≦i≦3(eii-ai2)=15/32. • Tre=Σieii,||e||thesumofalltheelements.
Attribute communities for Rice ugrads • Riceugradsdon’tformcommunitybaseonmajorattribute
Roadmap • Idea: Use communities to infer attributes • Collect fine-grained community data • Do attribute-based communities exist? • How well can we infer user attributes?
Using communities to infer attributes • Can we detect a single attribute community?Given a few users in the community • Propose a new algorithm to detect a specific communityProblem: How to evaluate community strength?
B A
Algorithm • Given seed users, find a community byAdding usersStopping at some point • At each step, add user who increasesnormalized conductance by the most • Stop when no user increases normalized conductance
Normalized Conductance • How strong is community A? • Metric: Normalized conductance CFraction of A’s links within ARelative to a random graph • Range is [-1,1]0 represents no stronger than random
Let G = (V,E) denote a graph, let A V be a subset of the vertices that forms a community, and let B = V A. • to be the number of edges between A and B as the number of edges within A. B A
eA=eAA+eAB.ThenumberofedgesthatreachverticeswithinA. • eB=eAB+eBB.ThenumberofedgesthatreachverticeswithinB. • Inarandomgraph,weexpectthateAB=eAeB.
How to evaluate? • Evaluate performance using precision and recall • ideally want a precision and recall of 1.0 C.由aglo檢測出來且確實有此attribute的人 B.實際上擁有此 attribute的人 C B A Recall= C/B A.由algo.檢測出來擁有此attribute的人 Precision = C/A
A.由algo.檢測出來擁有此attribute的人 C.由aglo檢測出來且確實有此attribute的人 A=C Precision = C/A = 1 But, 並不是真的準, 所以還需要參考recall 的值。 B B.實際上擁有此 attribute的人 Recall= C/B
RecallandPrecisionformatriculationyearcommunitydetectionforRiceugradsRecallandPrecisionformatriculationyearcommunitydetectionforRiceugrads
Summary • Good interpretation: Can reduce burden on users Don’t have to fill in entire profile • Bad interpretation: Can figure out attributes users don’t reveal Privacy is a function of what friends reveal
My thought • Ongoing online social network privacy debateFocuses mainly on explicitly provided attributes • Demonstrated that many attributes can be inferredEven if user didn’t provide them