180 likes | 196 Views
This analysis explores the challenges faced by researchers in navigating the vast amount of literature in their field and introduces a probabilistic generative model for citations to uncover the evolution of research themes over time.
E N D
Topic Analysis By Yiyi Shen
Motivation Demo Model Topic Strength Preview
Motivation Bottleneck: Be increasingly difficult for researchers to see the complete graph of a field Research community grows rapidly Problems: Junior researchers can often get lost in the overwhelming amount of related papers. Researchers who seek to shift to a new topic may spend lots of time preparing a reading list on his own.
Motivation When did a topic become popular and is it attracting attention in past days and today?
Motivation Heat = Publication? Cold Boot ! Current graph on Acemap
Model LDA (latent Dirichlet allocation), is a generative statistical model in natural language processing. In conventional LDA, each word in document may be viewed as belonging to one topic with particular probability while each topic, with particular probability as well, choose a word. LDA
Citation-based Difficult to annotate each word its belonging topic even manually Computational complexity Papers & abstract absence Word --> Citation Reference:Understanding Evolution of Research Themes: a Probabilistic Generative Model for Citations,XiaolongWang, ChengxiangZhai and Dan Roth Department of Computer Science University of Illinois, Urbana-Champaign Urbana
Citation-LDA Suppose d is a document cites a bag of other documents {𝑐𝑡}, where 𝑐𝑡 is a cited reference. And z is a topic. A reverse conditional distribution of documents given a topic. It can be interpreted as how a topic is characterized by a set of documents that are cited. A probability distribution over topics conditioned on document. Reference:Understanding Evolution of Research Themes: a Probabilistic Generative Model for Citations,XiaolongWang, ChengxiangZhai and Dan Roth Department of Computer Science University of Illinois, Urbana-Champaign Urbana
Our model Suppose d is a paper cites a bag of other documents {𝑐𝑡}, where 𝑐𝑡 is a cited reference. And z is a topic. Given a topic, we can get all papers that belong to it. We use the information of these papers, such as year, paper ID, paper rank, …to implement advanced works. Microsoft Database Classified Hierarchical Ranked …
Topic Strength Topic Temporal Strength It reveal the relative popularity of topics at different times, which can help users to identify current and previous research topics as well as the rough topic life spans.
Topic Strength Publish number Paper rank r Citations C(d) Publication Paper relations D Publish year t
Topic Strength Bridge Publication & Heat ! Promote Cold Boot !
Demo http://acemap.sjtu.edu.cn/TopicStrength