160 likes | 266 Views
A language modeling framework for expert finding. Presenter : Lin, Shu -Han Authors : Krisztian Balog , Leif Azzopardi , Maarten de Rijke. Information Processing and Management (IPM) 45 (2009) 1–19. Outline. Motivation Objective Methodology Experiments Conclusion Comments.
E N D
A language modeling framework for expert finding Presenter : Lin, Shu-Han Authors : KrisztianBalog, Leif Azzopardi, Maarten de Rijke Information Processing and Management(IPM) 45 (2009) 1–19
Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments
Motivation • The expert finding: finding experts given a topic. • Yellow Pages: • Profiles:employees self-assess their skills. • Keywords;e.g.,marketing • Problem: • Information:antiquated • Keywords:restricted 3
Objectives • Withintheorganization… • Minepublished intranetdocuments. • Search all kinds of expertise. • ‘Whoaretheexpertsontopic“Internet marketing and internet advertising”inmyorganization?’ 4
Methodology–Overview (uniform) Bayes’ Theorem (constant) • Tocapturetheassociationbetweenacandidateexpertandanareaofexpertise… “What is the probability of a candidate ca being an expert given the query topic q?” • Model1:candidate-based(query-independent)approach: idea:build a profileof candidate experts, and rank them based on query. • Model2:document-based(query-dependent)approach idea:findthequery-relevant documents, then associatewith experts. 5
Methodology–Model1 p(InternetMarketing|θca)=p(“Internet”|θca)‧p(“Marketing”|θca) (Smoothed) (weighted) e.g.,p(Internet marketing and internet advertising|θca)=p(“Internet”|θca)2‧ p(“Marketing”|θca) ‧ p(“and”|θca) ‧ p(“Advertising”|θca) Buildatextualrepresentation(model)ofaperson’sknowledgeaccordingtohisdocuments. Thenestimatetheprobabilityofthequerygiventhecandidate’smodel. 6
Methodology–Model1B (weighted) e.g.,p(“Internet”|“Mail.No.43”,“John”)…John(john@gmail.com)isamajorinmarketing.… …<731842>(< 731842 >)isamajorinmarketing.… p.s.thecloser,themorepowerful. • Estimatep(t|d,ca) • Candidateidentifier • Windowsize(w) 7
Methodology–Model2 (Smoothed) 8
Methodology–Model2B Model2 Model2B 9
Methodology–document-candidateassociations (documentimportance) (seniormemberoforganization) Booleanmodel TF-IDF 10
Experiments (1/3 + 1/2 + 1)/3 = 11/18 • Evaluationmeasures: • MAP(meanaverageprecision) • MRR(meanreciprocalrank): 11
Experiments Model1vs.Model2 Window-basedmodels 12
Experiments Associationmethods Parametersensitivity 13
Conclusions • Model1:build a profile of candidate experts, and rank them based on query. • Model 2:find the query-relevant documents, then associate with experts. • Model 2was to be preferred over Model 1: • Effectiveness:in terms of average precision and reciprocalrank • Implement:only requiring a regular document index • window-basedextensions improved : • Effectiveness: especially on top of Model 1 • Frequency-based(TF-IDF) document-candidate associations ishelpful.
Comments • Advantage • Integrateideas • Drawback • … • Application • …