1 / 38

Creating Models of Real-World Communities with ReferralWeb

Creating Models of Real-World Communities with ReferralWeb. Henry Kautz University of Washington Bart Selman Cornell University. Recommender Systems. New category of software: programs that make personalized recommendations of goods, services, and people Amazon.com - books

makara
Download Presentation

Creating Models of Real-World Communities with ReferralWeb

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Creating Models of Real-World Communities with ReferralWeb • Henry Kautz • University of Washington • Bart Selman • Cornell University

  2. Recommender Systems • New category of software: programs that make personalized recommendations of goods, services, and people • Amazon.com - books • Jango.com - stores • Whowhere.com - friends • Current methods Content-based: find things similar to ones you like Collaborative-filtering: find things liked by people who are similar to you • Explosive growth • Viewed as crucial for e-commerce sites • Excite: 100,000,000+ recommendations per day!

  3. Anonymous Opinions • Most recommender systems hide the identity of the sources of the recommendations • E-communities: fictitious identities • Matchmaker systems: deliberately hide true identities • Collaborative filtering: aggregation - no one to trust (or blame!) • Result: anonymous opinions • Okay choosing a movie or CD • But would you bet your job on that “recommendation”?! • Gee, boss, the project failed, but somebody on the net, I don't know who, said it was a good approach...

  4. Trusted Recommendations • For serious life / business decisions, you want the opinion of a trusted expert • If an expert not personally known, then want to find a reference to one via a chain of friends and colleagues • Referral-chain provides: • Way to judge quality of expert's advice • Reason for the expert to respond in a trustworthy manner • Finding good referral-chains is slow, time-consuming, but vital • business gurus on “networking”

  5. Example Tasks • You are an associate editor for JAIR. Find a reviewer for a paper that claims new results on “expander graphs”. • You are considering transferring to a different division of your company. Is that division head a good guy to work for? • You are putting together a project team to launch a new internet service. Who in your company should you tap for expertise on image compression?

  6. ReferralWeb • Set of all possible referral-chains = asocial network • System for modeling, visualizing, and searching social networks • in a company • in an e-community • in the WWW as a whole • Integrates IR search with a model of personal connections

  7. Social Networks • Social network model specifies: • Who knows who • Who knows what • How to create? • Ask users to register with system and provide lists of contacts and interests • sixdegrees, 6DOS, Firefly, Whowhere? • Highstartupcost • Incomplete, out of date, untrustworthy information • Best experts will actively avoid • a network of the lonely and disenfranchised?

  8. Mining Social Networks • Alternative: automatically generate network models from pre-existing data • Email logs (not) • Bibliographic databases • Corporate records of organizational structure, project teams, in-house documents • Arbitrary web pages • personal web pages more accurate / up to date than official corporate records! • Can extract evidence for both relationships and expertise

  9. Discovering Names • Proper name extraction • Can accurately identify names in arbitrary documents • Frequency of co-occurrence of names can be quickly determined using IR search engines • Canonizing names • John Zack, J. C. Zack, Jim Zack • Match names / initials / nicknames as long as unambiguous • closed world assumption • Improvement: use context • “Henry A. Kautz” matches “Harry A. Kautz” if both strongly linked to “Bart Selman”

  10. Disambiguating Names • Problem: different individuals with the same name • Observation: Within even large organizations the vast majority (90%+) of full names are unique • 3,000 employees in R&D at AT&T • 10,000 research scientists in AI, NL, and theory • For medium size networks - considered as noise • Key interface issue: ability to explain each link in path to users • Further scaling: name + additional context

  11. User Profiles • Manually-entered profiles incomplete, impossible to maintain • impossible in principle to create complete a-priori list of kinds of expertise • Many services today create highly specialized profiles • your book buying habits • Simple, robust profile: “bag of words” of all documents in which your name appears • standard IR vector space model to match queries, people

  12. Test Networks • 1.Proof of concept: 1,000 node network • Created by combination of web crawling and Altavista queries, centered on a professor at M.I.T. • Test group of users could usually find experts on given topics • but small size of network led to distant referrals • 2. 10,000 Researchers in AI, Theory, and NL • Based on 30,000 bibliography entries from high-quality conferences • AAAI, STOC, FOCS, ACL... • links between co-authors (not citations) • http://www.research.att.com/kautz/referralweb • “paper-reviewer finder”

  13. Exploring the Network

  14. Who can I ask to review a paper on “expander graphs”?

  15. Experts on Expander Graphs

  16. Paths to Experts

  17. Request Details on Frieze

  18. Frieze Home Page

  19. Observations • Quickly found short chains to experts • Could not be found using IR search alone • User can select chain that is most likely to succeed • Do not want to bother busiest, most famous experts with every request • Chains cross disciplines • Kautz - AI • Kearns - AI, Machine Learning • Blum - Machine Learning, Theory • Frieze - Theory, Mathematics • Useful tool for strengthening ties both within and between communities

  20. Why Does it Work? • The Small World Phenomena Milgram (1967) - any two individuals in the U.S.A. are linked by a chain of 6 or fewer first-name acquaintances • “6 degrees of separation” • Erdös numbers • “6 degrees of Kevin Bacon” • But • No formal model to explain short paths! • Due to high average degree? • True for acquaintances or co-stars, but false for our computer science co-author database! • 100’s versus 61 versus 4.28!

  21. Small-world Networks • Due to randomness? • Random graphs have short average path lengths • But social networks are not random • nodes are highly clustered (many cliques) • random graph model predicts that high clustering corresponds to long average paths! • Better model: Small-world networks • Idea: a highly structured (clustered) network with just a few random links (Watts & Stogatz, 1998) • Result: high clustering + short paths! • Random edges correspond to shortcuts • direct relationships between people who primarily participate in different sub-communities

  22. Small-world vs. Random Networks Clustering Coefficient = Average value of C(n) over all nodes, where

  23. Corporate Communities • Finding good internal experts a strategic business problem • “intellectual assets” worthless if not consulted! • AT&T: 170,000 employees, 3,000 in the R&D community • How to build a project team? • What R&D people to consult for a new business venture? • What business people to contact about a new technological breakthrough? • In practice: successful projects based on grassroots cross-organizational networking

  24. Modeling the AT&T Corporate Network • Model integrates information from • Official organizational charts (online) • Personal web pages (+ crawling) • External publication databases • Internal technical document databases • Informal structure will prove vital for • finding shorter paths to experts • finding people who can reliably evaluate experts • synergy between official and unofficial channels

  25. Who can tell me about the Director of Speech Processing research at AT&T?

  26. Paths With All Link Types

  27. Filtering link types

  28. Paths With Only Organizational Links

  29. Paths With Only Web/Article Links

  30. Observations • Official company hierarchy only a sparse subset of the corporate social network • Shortest (and often best) paths involve a combination of official and unofficial links • Conditions for trust and evaluation may greatly differ • Global social network is the union of many different kinds of sub-networks Search greatly aided when user can choose different views of the network • types of edge • strength of edge

  31. Who can help out my project with some great image compression software?

  32. A Note on Believability • Observation: the recommendations made by (any) recommender system tend to be either astonishingly accurate, or absolutely ridiculous • true for any AI-complete problem • How can a recommender system be trusted enough for “serious” use? • Make system transparent: able to explain its reasoning • indicate to user where the data is ambiguous • Any link or node can be explained by viewing the data on which it is based

  33. Checking the Expert’s Expertise

  34. Checking the Reason for an Edge

  35. Verifying the Edge Context

  36. Summary • Many uses of recommender system require connecting people to people, not just providing “oracular” advice • Find people, not just documents - access to information that may not even be online! • Help users evaluate quality of information • Need to automatically model existing, real-world communities • Cannot require everyone to sign up in advance! • Can improve and strengthen the “weak ties” that are crucial for effective organizations • ReferralWeb: a tool for generating and searching social networks

  37. Status and Future Work • ReferralWeb • Version 2.0 for the Computer Science research community http://www.research.att.com/~kautz/referralweb • Corporate version undergoing trials in AT&T Labs • Current research topics • Automatic clustering - discovery of sub-communities • Combining uncertain information • Scale-up to WWW-size communities • Analysis of more accurate formal models of small-world networks • accurately predict search performance

  38. Bibliography • Kautz, H., Selman, B. & Shah, M. 1997. The Hidden Web. AI Magazine 18(2): 27-36. • Milgram, S. 1967. The Small-World Problem. Psychology Today 1(1): 60-76. • Resnick, P., ed. 1996. Special Section on Recommender Systems. Communications of the ACM 30(3). • Watts, D. & Stogatz, S. 1998. Collective dynamics of ‘small-world’ networks. Nature 393: 440-442.

More Related