130 likes | 295 Views
ArnetMiner: Extraction and Mining of Academic Social Networks. Presenter : Cheng-Feng Weng Authors : Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, Zhong Su 2009/03/11. KDD.9 (2008). Outline. Motivation Objective Methods Experiments Conclusion Comments. Motivation(1/2).
E N D
ArnetMiner: Extraction and Miningof Academic Social Networks Presenter : Cheng-Feng Weng Authors :Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, Zhong Su 2009/03/11 KDD.9 (2008)
Outline • Motivation • Objective • Methods • Experiments • Conclusion • Comments
Motivation(1/2) • People are not only interested in searching for different types of information (such as authors, conferences, and papers), but are also interested in finding semantics-based information(such as structured researcher profiles).
Motivation(2/2) • Lack of semantics-based information. • Information is sometimes incomplete or inconsistent. • Lack of a unified approach to efficiently model the academic network. • Different types of information in the academic network were modeled individually, can’t be captured.
Objective • It proposed the ArnetMiner system: • Extracting researcher profiles automatically from the Web. • Integrating the publication data into the network from existing digital libraries. • Modeling the entire academic network. • Providing search services.
Researcher Profile Extraction a)separate the text into tokens b)assign possible tags to each token(CRF model) researcher name Google API SVM classifier SVM homepage/introducing page
Name Disambiguation ‘David Mitchell’ and ‘Andrew Mark’ 2-CoAuthor relationship ‘David Mitchell’ and ‘Fernando Mulford’ Hidden Markov Random Fields (HMRF) framework …‘Andrew Mark’ and ‘Fernando Mulford’…
Formalization using HMRF • The conditional distribution of the researcher: labels y given the observations x (papers) is diagonal matrix King papers {Xi, Xi+1,…Xn} King’ profile
Modeling Academic Network • With a multinomial distribution • ACT Model 1: • determine topics first • ACT Model 2: • first choose a publication venue • ACT Model 3: • first write a paper
Academic Search Services Association Search: Dijkstra algorithm
Conclusions • It proposed a unified topic model to simultaneously model the different types of information in the academic network. The modeling results have been applied to expertise search and association search. • To deal with the name ambiguity
Comments • Advantage • Consider many concepts • Is a complete application system • Drawback • Performance could be more improved • Application • Library