290 likes | 302 Views
SEMEF is a tool for discovering experts and their expertise using a taxonomy-based approach. It leverages Semantic Web technologies to enhance information exchange and automatic discovery. The system collects expertise profiles, quantifies expertise, ranks experts, and expands collaboration networks.
E N D
SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Advisor: I. Budak ArpinarCommittee: Prashant Doshi Robert J. Woods 11/27/2007 Delroy Cameron Masters ThesisComputer Science, University of Georgia
OUTLINE • Background • Expertise Profiles • Ranking Experts • Collaboration Networks Expansion • Results and Evaluation • Conclusion • Demo SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
BACKGROUND • Semantic Web • What? • Extension of current Web • Attach Meaning to Data • Why? • Under Utilization of Current Web • HTML Limitations • Goal • Enhance Information Exchange • Automatic Information Discovery • Interoperability of Services SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
BACKGROUND • Semantic Web • Technologies • XML • RDF/RDFS/OWL • URI • Ontology “David Billington is a Professor of Mathematics” <course name=”Mathematics”> <lecturer>David Billington</lecturer> </course> <lecturer name=”David Billington”> <teaches>Mathematics</teaches> </lecturer> <teachingOffering> <lecturer>David Billington</lecturer> <course>Mathematics</course> </teachingOffering > <rdf:Description rdf:id=mynamespace:Professor_2”> <rdf:has_name>David Billington</rdf:has_name> <rdf:teaches rdf:resource=”#Mathematics”/> </rdf:Description> SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
BACKGROUND • Semantic Web • Common Challenges • Entity Disambiguation • Ontology Mapping/Alignment • Trust/Provenance • Semantic Association Discovery • Application • Social Networks • Bio-Informatics • National Security • GPS Data Mining SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
BACKGROUND • Social Networks • What? • Connected through Social Relationships • Characteristics • Clustering Coefficient (connectedness to neighbors) • Centrality (average shortest path length) • Geodesic (shortest path length) SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
BACKGROUND • Peer-Review Process • What? • Review scholarly manuscripts • Challenges • Slow • Conflict of Interest • Finding Suitable Reviewers • Arbitrary Knowledge Approach • Research Diversification • Emerging Fields SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
CONTRIBUTIONS • Applicability of Semantics • Finding Expertise • Fine Levels of Granularity • Finding Experts • Taxonomy • Collaboration Networks • Discovery of Unknown Experts SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
SEMEF • SEMantic Expert Finder • Finding Expertise (Expertise Profiles) • Collecting Expertise • Quantifying Expertise • Finding (Ranking) Experts • w/ and w/o taxonomy • Collaboration Networks • Geodesic • C-Nets SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
EXPERTISE PROFILES • Collecting Expertise • Collect All Publication • Map papers to topic • Quantify all papers • Publications Dataset • DBLP 473,296 papers (conference/session names - Nov. 2007) • ACM, IEEE, Science Direct 29,454 papers (abstracts/index terms) • Combined 476,299 papers SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
EXPERTISE PROFILES • Collecting Expertise • Papers-to-Topics Dataset • Combined (476,299) • Topics (320) • Relationships (676,569) • Expertise Profiles (560,792) SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
EXPERTISE PROFILES • Quantifying Expertise • Mapping each paper to distinct value • Publication Impact • Hector Garcia-Molina (248 papers - 2003) • E. F. Codd (49 papers - 2003) • Citeseer Impact Statistics (1221 venues) • DBLP URIs SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
EXPERTISE PROFILES author_A topic1 (4.50) topic2 (1.86) topic3 (3.08) paper1 paper2 paper3 paper4 paper5 paper6 1.54 1.10 1.86 1.86 1.54 1.54 Figure 1: Expertise Profile
RANKING EXPERTS • Taxonomy of Topics • Session names • Conference Names • O’CoMMA • Paper Abstracts • Index Terms 216 50 60 192 128 320 Figure 2: Taxonomy of Topics
RANKING EXPERTS • Case 1 • Single Topic without Taxonomy • Traverse all Expertise Profiles • Sum impact, (papers topics) • Case 2 • Single Topic with Taxonomy • Traverse all Expertise Profiles • Sum impact, (papers topics, subtopics) Prevent Expertise Overestimation 1) Map 2) Papers to leaf nodes only SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
RANKING EXPERTS • Case 3 • Array of Topics without Taxonomy • Same as Case 2 • Case 4 • Array of Topics with Taxonomy • Filter input topics • Sum impact, (papers topics, subtopics) SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
COLLABORATION NETWORKS EXPANSION • Geodesic STRONG WEAK opus:Proceedings_543 opus:Article_in_Proceedings_179 opus:isIncludedIn opus:isIncludedIn opus:author opus:author opus:Article_in_Proceedings_35 opus:Article_in_Proceedings_8 author_A author_B opus:author opus:author author_A author_B opus:Article_in_Proceedings_291 opus:Article_in_Proceedings_3 opus:author opus:author opus:author opus:author author_B author_A author_A author_2 author_1 author_B MEDIUM UNKNOWN Figure 3: Geodesic Relationships
COLLABORATION NETWORKS EXPANSION • C-Net • Ordering Cluster of Experts • Collaboration Strength* coauthor_1 {0.73, 0.5} coauthor_2 {1.81, 1.0} coauthor_n {1.1, 0.8} Super Node {14.80} coauthor_5 {1.54, 1.0} coauthor_3 {0.73, 0.5} coauthor_4 {0.73, 0.5} Figure 3: Geodesic Relationships * Newman, M. E. J.: Coauthorship Networks and Patterns of Scientific Collaboration. National Academy of Sciences of the United States of America, 1(101): 5200- 5205, (2004).
RESULTS AND EVALUATION • Evaluation • WWW Search Track (2005/6/7) • Input Topics Call For Papers • SWETO-DBLP Subset (67,366 authors) • DBLP (560,792) • Validation • Collaboration Networks Expansion SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
Percentage in SEMEF List Search Track (Number of PC Members in SEMEF List) Cumulative Percentage in PC List Search 2005 Search2006 Search 2007 Average (top) 0-10% 10 13 13 12 35% 10-20% 5 8 6 6 52% 20-30% 6 0 0 2 58% 30-40% 4 1 1 2 65% 40-50% 6 2 0 3 73% 50-60% 3 1 1 2 79% 60-70% 4 0 0 1 82% 70-80% 1 1 0 1 85% 80-90% 1 0 0 0 85% 90-100% 0 0 0 0 85% Total 40/48 26/29 21/25 29/34 83 89 84 85 RESULTS AND EVALUATION • Validation Table 1: Past PC Lists comparison with SEMEF
RESULTS AND EVALUATION • Validation Figure 4: Average Number of PC in SEMEF List
RESULTS AND EVALUATION • Validation Figure 5: Average PC Distribution in SEMEF List
Relationships PC List (Number of Expert Relationships) Above Average Expertise (in PC) Search 2005 Search2006 Search 2007 Chair1 Chair2 Chair1 Chair2 Chair1 Chair2 STRONG 2 0 3 0 3 0 0 MEDIUM 10 7 6 2 7 8 4 WEAK 31 17 15 20 11 14 10 EXTREMELY WEAK 1 2 1 2 0 0 0 Relationships SEMEF (Number of Expert Relationships) Above Average Expertise (in PC) Search 2005 Search2006 Search 2007 Chair1 Chair2 Chair1 Chair2 Chair1 Chair2 STRONG 6 2 10 3 10 2 3 MEDIUM 106 53 88 55 88 76 16 WEAK 649 293 608 582 605 576 58 EXTREMELY WEAK 99 26 66 26 66 43 3 RESULTS AND EVALUATION • Collaboration Networks Expansion Table 3: PC Chair – PC Member Geodesic Relationships Table 4: PC Chair – SEMEF List Geodesic Relationships
CONCLUSION • Expertise Profiles • Publication Data • Publication Impact Statistics • Papers-to-Topics Relationships • Ranking Experts • w/ and w/o Taxonomy • Single and Array of Topics • Collaboration Networks Expansion • Semantic Association Discovery • Geodesic • C-Nets SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
DEMO • Web Application • Apache Tomcat 6.0 • Java Server Pages • Ubuntu 7.10 Delroy Cameron MastersThesisComputer Science, University of Georgia
RELATED WORK • Particle Swarm Algorithm • ExpertiseNets • Expertise Browser • Experience Atoms • Expertise Recommender • Change history • Tech Support Heuristics • Profiling, Identification, Supervisor SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
RELATED WORK • Web-Based Communities • Expert Rank • Formal Probabilistic Models • Candidate Models • Document Models • RDF-Matcher SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
EXPERTISE PROFILE ALGORITHM Algorithm findExpertiseProfile(researcherURI, list of publications) create ‘empty expertise profile’ foreach paper of researcherdo get ‘topics’ list of paper (using papers-to-topics dataset) get ‘publication impact’ if ‘publication impact’ is null do ‘publication impact’ default weight else ‘weight’ ‘publication impact’ + existing ‘weight’ from expertise profile if ‘expertise profile’ contains ‘topic’ do update ‘expertise profile’ with <’topic,’ ‘weight’> else add <’topic,’ ‘weight’> pair to ‘expertise profile’ end return ‘expertise profile’ SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks
RANKING EXPERTS ALGORITHM Algorithm rankValue(researcherURI, list of topics) setexpertRank to zero create temp ‘expertiseprofile’ filter topics foreachtopic in filtered topics list do get ‘papers’ for this topic (using papers-to-topics dataset) foreachpaper in papers list do ifresearcher is author do get ‘publicationimpact’ as ‘weight’ expertRankValue = expertRankValue + ‘publicationimpact’ add <’topic,’ ‘weight’> pair to temporary ‘expertise profile’ endif end end return ‘rankValue’ SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks