620 likes | 636 Views
Topic and Role Discovery in Social Networks. Andrew McCallum Andre Corrada-Emmanuel Xuerui Wang Computer Science Department University of Massachusetts Amherst Also including joint work with Natasha Mohanty. The #1 computer application:. Email. Managing and Understanding
E N D
Topic and Role Discoveryin Social Networks Andrew McCallum Andre Corrada-Emmanuel Xuerui Wang Computer Science Department University of Massachusetts Amherst Also including joint work with Natasha Mohanty
The #1 computer application: Email.
Managing and Understanding Connections of People in our Email World Workplace effectiveness ~ Ability to leverage network of acquaintances But filling Contacts DB by hand is tedious, and incomplete. Contacts DB Email Inbox Automatically WWW
Contact Info and Person Name Extraction Social Network Analysis Person Name Extraction Homepage Retrieval Keyword Extraction Name Coreference System Overview CRF WWW Email names
An Example To: “Andrew McCallum” mccallum@cs.umass.edu Subject ... Search for new people
Example keywords extracted Summary of Results Contact info and name extraction performance (25 fields) Expert Finding:When solving some task, find friends-of-friends with relevant expertise. Avoid “stove-piping” in large org’s by automatically suggesting collaborators. Given a task, automatically suggest the right team for the job. (Hiring aid!) Social Network Analysis:Understand the social structure of your organization.Suggest structural changes for improved efficiency.
Outline a • Email, motivation • ART Graphical Model. • Experimental Results • Enron Email (corpus) • Academic Email (one person) • RART: Roles for ART • Group-Topic Model • Experiments on voting data • Voting data from U.S. Senate and the U.N.
Clustering words into topics withLatent Dirichlet Allocation [Blei, Ng, Jordan 2003] GenerativeProcess: Example: For each document: 70% Iraq war 30% US election Sample a distributionover topics, For each word in doc Iraq war Sample a topic, z Sample a wordfrom the topic, w “bombing”
Example topicsinduced from a large collection of text JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY EARN ABLE SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIENTIST STUDYING SCIENCES BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIELD PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS BAT TERRY FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM POLE INDUCED STORY STORIES TELL CHARACTER CHARACTERS AUTHOR READ TOLD SETTING TALES PLOT TELLING SHORT FICTION ACTION TRUE EVENTS TELLS TALE NOVEL MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNESS STRANGE FEELING WHOLE BEING MIGHT HOPE DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PERSON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECTIONS CERTAIN WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL DIVE DOLPHIN UNDERWATER [Tennenbaum et al]
Example topicsinduced from a large collection of text JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY EARN ABLE SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIENTIST STUDYING SCIENCES BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIELD PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS BAT TERRY FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM POLE INDUCED STORY STORIES TELL CHARACTER CHARACTERS AUTHOR READ TOLD SETTING TALES PLOT TELLING SHORT FICTION ACTION TRUE EVENTS TELLS TALE NOVEL MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNESS STRANGE FEELING WHOLE BEING MIGHT HOPE DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PERSON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECTIONS CERTAIN WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL DIVE DOLPHIN UNDERWATER [Tennenbaum et al]
Inference and Estimation • Gibbs Sampling: • Easy to implement • Reasonably fast r
Outline a a • Email, motivation • ART Graphical Model. • Experimental Results • Enron Email (corpus) • Academic Email (one person) • RART: Roles for ART • Group-Topic Model • Experiments on voting data • Voting data from U.S. Senate and the U.N.
Enron Email Corpus • 250k email messages • 23k people Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT) From: debra.perlingiere@enron.com To: steve.hooser@enron.com Subject: Enron/TransAltaContract dated Jan 1, 2001 Please see below. Katalin Kiss of TransAlta has requested an electronic copy of our final draft? Are you OK with this? If so, the only version I have is the original draft without revisions. DP Debra Perlingiere Enron North America Corp. Legal Department 1400 Smith Street, EB 3885 Houston, Texas 77002 dperlin@enron.com
Topics, and prominent senders / receiversdiscovered by ART Topic names, by hand
Topics, and prominent sender/receiversdiscovered by ART Beck = “Chief Operations Officer” Dasovich = “Government Relations Executive” Shapiro = “Vice President of Regulatory Affairs” Steffes = “Vice President of Government Affairs”
Comparing Role Discovery Traditional SNA ART Author-Topic connection strength (A,B) = distribution over recipients distribution over authored topics distribution over authored topics
Comparing Role DiscoveryTracy Geaconne Dan McCarty Traditional SNA ART Author-Topic Different roles Different roles Similar roles Geaconne = “Secretary” McCarty = “Vice President”
Comparing Role DiscoveryTracy Geaconne Rod Hayslett Traditional SNA ART Author-Topic Not very similar Very similar Different roles Geaconne = “Secretary” Hayslett = “Vice President & CTO”
Comparing Role DiscoveryLynn Blair Kimberly Watson Traditional SNA ART Author-Topic Very similar Very different Different roles Blair = “Gas pipeline logistics” Watson = “Pipeline facilities planning”
McCallum Email Corpus 2004 • January - October 2004 • 23k email messages • 825 people From: kate@cs.umass.edu Subject: NIPS and .... Date: June 14, 2004 2:27:41 PM EDT To: mccallum@cs.umass.edu There is pertinent stuff on the first yellow folder that is completed either travel or other things, so please sign that first folder anyway. Then, here is the reminder of the things I'm still waiting for: NIPS registration receipt. CALO registration receipt. Thanks, Kate
Outline a a • Email, motivation • ART Graphical Model. • Experimental Results • Enron Email (corpus) • Academic Email (one person) • RART: Roles for ART • Group-Topic Model • Experiments on voting data • Voting data from U.S. Senate and the U.N. a
Results with RART:People in “Role #3” in Academic Email • olc lead Linux sysadmin • gauthier sysadmin for CIIR group • irsystem mailing list CIIR sysadmins • system mailing list for dept. sysadmins • allan Prof., chair of “computing committee” • valerie second Linux sysadmin • tech mailing list for dept. hardware • steve head of dept. I.T. support
Roles for allan (James Allan) • Role #3 I.T. support • Role #2 Natural Language researcher Roles for pereira (Fernando Pereira) • Role #2 Natural Language researcher • Role #4 SRI CALO project participant • Role #6 Grant proposal writer • Role #10 Grant proposal coordinator • Role #8 Guests at McCallum’s house
Outline a a • Email, motivation • ART Graphical Model. • Experimental Results • Enron Email (corpus) • Academic Email (one person) • RART: Roles for ART • Group-Topic Model • Experiments on voting data • Voting data from U.S. Senate and the U.N. a a
ART & RART: Roles but not Groups Traditional SNA ART Author-Topic Not Not Block structured Enron TransWestern Division
Group-Topic Model [Wang, Mohanty, McCallum 2005]
U.S. Senate Data sets • 3426 bills from 16 years of voting records from the U.S. Senate • Yea / Nea / Abstain (absent) • Each bill comes with an abstract (text describing the contents of the bill).
Topics Discovered Traditional “Mixtures of Unigrams” Group- Topic Model
Groups from topic Education + Domestic Groups Discovered Agreement Index
Senators who change Coalition Dependent on Topic e.g. Senator Shelby (D-AL) votes with the Republicans on Economic with the Democrats on Education + Domestic with a small group of maverick Republicans on Social Security + Medicaid
U.N. Data Set • 931 U.N. Resolutions, voted on by 192 countries, from 1990-2003. • Yes / No / Abstain votes • List of keywords summarizes the content of the resolution. • Also experiments later with resolutions from 1960-2003
Topics Discovered Traditional mixture of unigrams Group-TopicModel
Summary • Traditionally, SNA examines links, but not the language content on those links. • Presented ART, an Bayesian network for messages sent in a social network: captures topics and role-similarity. • RART explicitly represents roles. • Additional work • Group-Topic model discovers groupsand clusters attributes of relations.[Wang, Mohanty, McCallum, LinkKDD 2005]
Outline a a • Assume you already understand Graphical Models & CRFs. • Intro to the importance of joint inference • Review of previous examples • Joint segmentation & coreference for citations • Inference: Sparse Belief Propagation • Learning: Piecewise Training • Social Network Analysis in Email • Author-Recipient-Topic Model • Enron and Academic Email • Group-Topic Model • Voting data from U.S. Senate and the U.N. • Demo of New Research Paper Search Engine a a a a a a a
Previous Systems Cites Research Paper
More Entities and Relations Expertise Cites Grant Research Paper Person Venue University Groups