150 likes | 159 Views
Use graph walks to derive extended similarity measures between email and meeting entries, allowing for retrieval of relevant messages, attendees, and email aliases from a joint graph representation. Preliminary results are promising.
E N D
An Email and Meeting AssistantUsing Graph Walks Einat Minkov William W. Cohen CEAS-2006
Documents and Links • PageRank (Brin and Page, 98), HITS (Kleinberg, 98) • Co-training (Blum and Mitchell) • Documents are not isolated objects: they are connected to other documents via hyperlinks • Document similarity/relatednessvia random graph walk
Structured Documents • In structured data, documents are inter-connected via other common objects. • Email and meeting entries are examples of structured data:text + meta-data • Represent email and meetings as a joint graph • Derive extended similarity measures between graph objects using lazy graph walks. • Show me recent relevant messages to this message • What is the full name of ‘Danny’ that is mentioned in this message? Framework: Questions we can ask:
Email as a Graph Chris.germany@enron.com alias Chris sent_from sent_from_email Mgermany@ch2m.com sent_to_email 1.22.00 file1 On_date sent_to has_subj_term Melissa Germany has_term work where yo I’m you
Email as a Graph • A directed graph • A node carries an entity type • An edge carries a relation type • Edges are bi-directional (cyclic) • Nodes inter-connect via linked entities.
Meetings • Like Email messages, Meeting entries are structured. • Share entities with Email: • Email and meetings can be naturally represented as a joint graph. TIME TEXT PERSONS
The Joint Graph nodex Shared content Social network Timeline
Edge Weights • Graph G : - nodes x,y,z - node types T(x), T(y), T(z) - edge labels - parameters • Edge weight x y: • Prob. Distribution: a. Pick an outgoing edge label b. Pick node y uniformly
Graph Similarity Defined by lazy graph walks over k steps. Given: Stay probability: (larger values favor shorter paths) A transition matrix: Initial node distribution: Output node distribution: We use this platform to perform SEARCH of related items in the graph:a query is initial distribution Vq over nodes and a desired output type Tout
Evaluation Many tasks/ applications can be phrased as search queries in this framework. Given: a meeting: text & date Retrieve:a ranked listofrelevant email-addresses (potential attendees) TASK I: Find Meeting Attendees TASK II:Find Email Aliases Given: a person’s name Retrieve: a ranked list of his/hers email-addresses
Methods Corpus • Baseline: String matchingUse distance metric (JARO-Winkler) – Finds similar email-addresses to personal / project names mentioned. • 346 email files (‘Meetings’ folder) • 334 meeting entries (‘Palm’) • Both over the same time span (about 6 months) • The joint graph includes 3,680 nodes • Graph walk • 3 Steps • Uniform weights
Results: Find Meeting Attendees A. All email addresses • 11-point precision-recall curve, averaged over 13 examples meeting term date B. One address per person file e-address
Results: Find Email Aliases A. By first name • 14 examples (2 to 5 email aliases each) term term term term file person B. By full name term e-address
Summary • A Joint representation of email and meetings: • Denser links • Augments social network information • Supports Meeting management applications • Preliminary results are promising. • Application of learning and more results for email-related tasks, available at:“Contextual Search and Name Disambiguation in Email Using Graphs”, Einat Minkov, William W. Cohen, Andrew Y. Ng in SIGIR 2006