1 / 13

Expert Search

Explore the implementation and research goals of an expert search system utilizing the W3C corpus for richer candidate representation and email discussion list modeling. Discover innovative approaches for forming associations and conducting evaluations.

Download Presentation

Expert Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Expert Search Project group 3

  2. Introduction • Expert Search Task • Enterprise Track at TREC • W3C corpus (300.000) • Goals • Implement an expert search system • Conduct research • Richer candidate representation • Effect of e-mail discussion lists

  3. Modeling expert search • Approach • Build language model for each document • Find a number of relevant documents (Topicality) • Different levels of associations • Motivation • How a user normally searches for an expert • Google search • Find relevant documents • Check names in documents

  4. Initial search system • Minimal system • 2005 TREC entry • Formal models paper • Model 2

  5. Initial associations • Forming associations: M0..M3 • M0: EXACT_MATCH • M1: NAME_MATCH • M2: LASTNAME_MATCH • M3: E-MAIL_MATCH Expert search is part of the larger field of enterprise search (David Hawking, 2004). According to D. Hawking enterprise search it includes searching through all possible sources of textual information within an organization……….Hawking performed experiments which show that anchor text proved to a good startingpoint for enterprise search. For more information e-mail to: Hawking@wc3.org...

  6. Added associations • New associations ( M4..M7 ) • M4: FIRST_LAST_MATCH • M5:MAIL_HEADER_FROM • M6:MAIL_HEADER_TO • M7: MAIL_HEADER_CC To: Dan Connolly Cc: Maria Fernandez From: David Hawking Subject: Conferences Date: 21 Oct 2003 12:07 I’d like to send Smith to ADC2004. She’s entitled under section whatever on p.27 of the corporate manual. Jones wants to go but she already went on that junket to Maui.  David L. Hawking 

  7. System additions • User Interface (GUI) • TREC evaluation framework • Special treatment of discussion lists • Cleaning the W3C Corpus • Signature detection • Quotation detection • Forming associations ( M4 .. M7 ) • Richer candidate representation • Detect multiple names and e-mail addresses • Entry page detection • Statistics on e-mail usage • Signatures

  8. Evaluation • Improvement overview on P5

  9. Evaluation (2) • Effect of topicality

  10. Evaluation (3) • Signature detection • Only relevant signatures • Entry page • Remove non-relevant ( antivirus, disclaimers, yahoo ) • Statistics • extracted • distinct • candidates associated • distinct candidates 28000 signatures. 3788 signatures. 1179 candidates. 208 candidates.

  11. System demonstration • 2 searches on basic interface • TREC interface

  12. For further research • Reranking - discover the options of successful reranking, try to implement it in a fast and effective way (this is a hard task) • Structure E-mail discussion lists • Extract more from signatures

  13. Questions • Are there any questions?

More Related