310 likes | 454 Views
You Are What You Link. Lada Adamic Eytan Adar. WWW 10 – May, 2001. Outline. Graph structures of social networks How person to person links on the web create observable social networks Understanding and predicting links
E N D
You Are What You Link Lada Adamic Eytan Adar WWW 10 – May, 2001
Outline • Graph structures of social networks • How person to person links on the web create observable social networks • Understanding and predicting links • Additional online info (text, links, email subscriptions) gives context to social links • Predict social links even where there is no explicit hyperlink. • Understanding communities through links
Julie Becky my roomie Becky Hey, I’m Becky. Hi, I’m Julie! I study... I live in ... My favorite books are... Here are some photos... I’m studying... my best friend Julie I like ... My friends are... My favorite links:
Becky and Julie aren’t the only ones to link to each other
Graph Structure of Social Networks
Differences in cohesiveness of communities Stanford MIT
The number of links/person is uneven Interesting social networks analysis
Largest connected component MIT: 86% Stanford: 58%
MIT: 6.4 hops Stanford: 9.2 hops Shortest path from one person to another
# of links among neighbors C = max # links among neighbors Clustering Coefficient 3 1 = C = 4*3/2 2 MIT: 0.22 Stanford: 0.21 70x that of a random graph!
email list outlink outlink inlink inlink Information available online common text common text
How information was collected User’s web directories were crawled Outlinks were extracted Text was passed through ThingFinder to extract things like people, places, companies Mailing list subscriptions were obtained from the mailing list servers (95% public for Stanford, internal to MIT) Inlinks were obtained by querying search engines: Google for Stanford AltaVista for MIT (equivalent urls)
Comparison with traditional means of gathering information on social networks Advantages Easily and automatically gathered (no phone, live, or mail surveys). Data sets are orders of magnitude larger. Information is already public. Disadvantages Data sets are incomplete i.e. you don’t get to ask the questions, just take down the answers
So can we guess who’s friends with whom from the information gathered online? • Choose person A • Rank everybody else according to their likeness to that person • See how “friends” (people who are linked to A) were ranked. • Evaluate for text, outlinks, inlinks, mailing lists separately
Example, top matches for a particular user annaken: Clifford Hsiang Chao
Coverage in ability to predict user-user links i.e. friends had at least one item in common
Performance of friend matching algorithm Stanford The most common ranking for a friend is #1 MIT
Stanford we don’t have that much in common with our friend’s friend’s friends
What are good and bad link predictors? • What you would expect… • Very unique things are only relevant to individuals • Very general things (“MIT” “Stanford”) are relevant to everyone • Some top 10 lists…
Text Based Predictors • Bad phrases: general organizations, cities (Oakland, Cambridge, etc), departments (CS)
Out-link Based Predictors • Worst ranked sites are search engines and portals (Altavista, Lycos, Yahoo, etc.), and top level homepages such as www.mit.edu and www.stanford.edu.
In-link Based Predictors • The top predictors are almost exclusively individual home pages pointing to lists of friends • Poor predictors: Long lists (all homepages, department listings)
Mailing List Based Predictors • Bad lists: General announcement lists at MIT, non-housing based activities (theater), job lists
Future Work • Use other pieces of available information • demographic information (where people live, department, year, etc.) • combine information • Label structures (Flake, et. al. 2000) • Given structures determined by graph algorithms • Label them using extracted information
Summary • Homepage graph structure varies depending on community • Possible to predict (to some degree) where links will exist • Good predictors seem unique to communities