380 likes | 467 Views
Creating Models of Real-World Communities with ReferralWeb. Henry Kautz University of Washington Bart Selman Cornell University. Recommender Systems. New category of software: programs that make personalized recommendations of goods, services, and people Amazon.com - books
E N D
Creating Models of Real-World Communities with ReferralWeb • Henry Kautz • University of Washington • Bart Selman • Cornell University
Recommender Systems • New category of software: programs that make personalized recommendations of goods, services, and people • Amazon.com - books • Jango.com - stores • Whowhere.com - friends • Current methods Content-based: find things similar to ones you like Collaborative-filtering: find things liked by people who are similar to you • Explosive growth • Viewed as crucial for e-commerce sites • Excite: 100,000,000+ recommendations per day!
Anonymous Opinions • Most recommender systems hide the identity of the sources of the recommendations • E-communities: fictitious identities • Matchmaker systems: deliberately hide true identities • Collaborative filtering: aggregation - no one to trust (or blame!) • Result: anonymous opinions • Okay choosing a movie or CD • But would you bet your job on that “recommendation”?! • Gee, boss, the project failed, but somebody on the net, I don't know who, said it was a good approach...
Trusted Recommendations • For serious life / business decisions, you want the opinion of a trusted expert • If an expert not personally known, then want to find a reference to one via a chain of friends and colleagues • Referral-chain provides: • Way to judge quality of expert's advice • Reason for the expert to respond in a trustworthy manner • Finding good referral-chains is slow, time-consuming, but vital • business gurus on “networking”
Example Tasks • You are an associate editor for JAIR. Find a reviewer for a paper that claims new results on “expander graphs”. • You are considering transferring to a different division of your company. Is that division head a good guy to work for? • You are putting together a project team to launch a new internet service. Who in your company should you tap for expertise on image compression?
ReferralWeb • Set of all possible referral-chains = asocial network • System for modeling, visualizing, and searching social networks • in a company • in an e-community • in the WWW as a whole • Integrates IR search with a model of personal connections
Social Networks • Social network model specifies: • Who knows who • Who knows what • How to create? • Ask users to register with system and provide lists of contacts and interests • sixdegrees, 6DOS, Firefly, Whowhere? • Highstartupcost • Incomplete, out of date, untrustworthy information • Best experts will actively avoid • a network of the lonely and disenfranchised?
Mining Social Networks • Alternative: automatically generate network models from pre-existing data • Email logs (not) • Bibliographic databases • Corporate records of organizational structure, project teams, in-house documents • Arbitrary web pages • personal web pages more accurate / up to date than official corporate records! • Can extract evidence for both relationships and expertise
Discovering Names • Proper name extraction • Can accurately identify names in arbitrary documents • Frequency of co-occurrence of names can be quickly determined using IR search engines • Canonizing names • John Zack, J. C. Zack, Jim Zack • Match names / initials / nicknames as long as unambiguous • closed world assumption • Improvement: use context • “Henry A. Kautz” matches “Harry A. Kautz” if both strongly linked to “Bart Selman”
Disambiguating Names • Problem: different individuals with the same name • Observation: Within even large organizations the vast majority (90%+) of full names are unique • 3,000 employees in R&D at AT&T • 10,000 research scientists in AI, NL, and theory • For medium size networks - considered as noise • Key interface issue: ability to explain each link in path to users • Further scaling: name + additional context
User Profiles • Manually-entered profiles incomplete, impossible to maintain • impossible in principle to create complete a-priori list of kinds of expertise • Many services today create highly specialized profiles • your book buying habits • Simple, robust profile: “bag of words” of all documents in which your name appears • standard IR vector space model to match queries, people
Test Networks • 1.Proof of concept: 1,000 node network • Created by combination of web crawling and Altavista queries, centered on a professor at M.I.T. • Test group of users could usually find experts on given topics • but small size of network led to distant referrals • 2. 10,000 Researchers in AI, Theory, and NL • Based on 30,000 bibliography entries from high-quality conferences • AAAI, STOC, FOCS, ACL... • links between co-authors (not citations) • http://www.research.att.com/kautz/referralweb • “paper-reviewer finder”
Observations • Quickly found short chains to experts • Could not be found using IR search alone • User can select chain that is most likely to succeed • Do not want to bother busiest, most famous experts with every request • Chains cross disciplines • Kautz - AI • Kearns - AI, Machine Learning • Blum - Machine Learning, Theory • Frieze - Theory, Mathematics • Useful tool for strengthening ties both within and between communities
Why Does it Work? • The Small World Phenomena Milgram (1967) - any two individuals in the U.S.A. are linked by a chain of 6 or fewer first-name acquaintances • “6 degrees of separation” • Erdös numbers • “6 degrees of Kevin Bacon” • But • No formal model to explain short paths! • Due to high average degree? • True for acquaintances or co-stars, but false for our computer science co-author database! • 100’s versus 61 versus 4.28!
Small-world Networks • Due to randomness? • Random graphs have short average path lengths • But social networks are not random • nodes are highly clustered (many cliques) • random graph model predicts that high clustering corresponds to long average paths! • Better model: Small-world networks • Idea: a highly structured (clustered) network with just a few random links (Watts & Stogatz, 1998) • Result: high clustering + short paths! • Random edges correspond to shortcuts • direct relationships between people who primarily participate in different sub-communities
Small-world vs. Random Networks Clustering Coefficient = Average value of C(n) over all nodes, where
Corporate Communities • Finding good internal experts a strategic business problem • “intellectual assets” worthless if not consulted! • AT&T: 170,000 employees, 3,000 in the R&D community • How to build a project team? • What R&D people to consult for a new business venture? • What business people to contact about a new technological breakthrough? • In practice: successful projects based on grassroots cross-organizational networking
Modeling the AT&T Corporate Network • Model integrates information from • Official organizational charts (online) • Personal web pages (+ crawling) • External publication databases • Internal technical document databases • Informal structure will prove vital for • finding shorter paths to experts • finding people who can reliably evaluate experts • synergy between official and unofficial channels
Who can tell me about the Director of Speech Processing research at AT&T?
Observations • Official company hierarchy only a sparse subset of the corporate social network • Shortest (and often best) paths involve a combination of official and unofficial links • Conditions for trust and evaluation may greatly differ • Global social network is the union of many different kinds of sub-networks Search greatly aided when user can choose different views of the network • types of edge • strength of edge
Who can help out my project with some great image compression software?
A Note on Believability • Observation: the recommendations made by (any) recommender system tend to be either astonishingly accurate, or absolutely ridiculous • true for any AI-complete problem • How can a recommender system be trusted enough for “serious” use? • Make system transparent: able to explain its reasoning • indicate to user where the data is ambiguous • Any link or node can be explained by viewing the data on which it is based
Summary • Many uses of recommender system require connecting people to people, not just providing “oracular” advice • Find people, not just documents - access to information that may not even be online! • Help users evaluate quality of information • Need to automatically model existing, real-world communities • Cannot require everyone to sign up in advance! • Can improve and strengthen the “weak ties” that are crucial for effective organizations • ReferralWeb: a tool for generating and searching social networks
Status and Future Work • ReferralWeb • Version 2.0 for the Computer Science research community http://www.research.att.com/~kautz/referralweb • Corporate version undergoing trials in AT&T Labs • Current research topics • Automatic clustering - discovery of sub-communities • Combining uncertain information • Scale-up to WWW-size communities • Analysis of more accurate formal models of small-world networks • accurately predict search performance
Bibliography • Kautz, H., Selman, B. & Shah, M. 1997. The Hidden Web. AI Magazine 18(2): 27-36. • Milgram, S. 1967. The Small-World Problem. Psychology Today 1(1): 60-76. • Resnick, P., ed. 1996. Special Section on Recommender Systems. Communications of the ACM 30(3). • Watts, D. & Stogatz, S. 1998. Collective dynamics of ‘small-world’ networks. Nature 393: 440-442.