180 likes | 390 Views
Concept-Based Analysis of Scientific Literature. Chen-Tse Tsai , Gourab Kundu, Dan Roth CS @ UIUC. Understanding Research Communities. Consider following questions What are the key applications studied by the community?
E N D
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth CS @ UIUC
Understanding Research Communities • Consider following questions • What are the key applications studied by the community? • What applications have matured enough to be used as a technique of other applications? • What methods were developed to solve a particular problem? • In this paper • Extract concepts from scientific papers • A concept is a cluster of possible mentions • {svm, support vector machines, maximal margin classifiers,…} • Analyze computational linguistic research by answering above questions
Outline • Computational Approach • Concept Mention Extraction • Citation-Context based Concept Clustering • Evaluation of Algorithms • Understanding Computational Linguistic Research
Concept Mention Extraction • Identify and categorize mentions of concepts (Gupta and Manning, 2011) • TECHNIQUE and APPLICATION “We apply support vector machines on text classification.” • Unsupervised Bootstrapping algorithm (Yarowsky, 1995; Collins and Singer, 1999) • The proposed algorithm • Extract noun phrases (Punyakanok and Roth, 2001) • For each category, initialize a decision list by seeds. • For several rounds, • Annotate NPs using the decision lists. • Extract top features from new annotated phrases, and add them intodecision lists.
Citation-Context Based Concept Clustering(CitClus) • Cluster mentions into semantic coherent concepts Group concept mentions by citation context Merge clusters based on lexical similarity between mentions in the clusters Paper1…………………………………… support vector machine………………... ……………………………………………………………………………………. c4.5…….. Paper2…………………………………… svm-based classification………………… .…………………………………............. decision_trees………….…….………………………………… (Cortes,1995) (Cortes,1995) (Cortes,1995) (Cortes,1995) (Quinlan,1993) (Quinlan,1993) (Quinlan,1993) (Quinlan,1993) Paper3.………………………………………………………………………….. svm….…………………………………….…………………………………………………… Paper4…………………………………… maximal_margin_classifiers…………………………………….………………………………………………………………….. (Vapnik,1995) (Vapnik,1995) (Vapnik,1995) (Vapnik,1995) • c4.5 • decision trees • support vector machine • svm-based classification • svm • maximal margin classifiers
Outline • Computational Approach • Concept Mention Extraction • Citation-Context based Concept Clustering • Evaluation of Algorithms • Understanding Computational Linguistic Research
Evaluation of Mention Extraction • ACL Anthology Network Corpus (Radev et al., 2009) • Training data: 11,005 abstracts • Test data: 474 abstracts (Gupta and Manning 2011)
Evaluation of Concept Clustering • Manually cluster the extracted mentions from 1000 full text papers. • CitClus: the proposed approach • LexClus: group the concept mentions by lexical similarity • CitClus groups • “maximal entropy classifier” and “logistic classifier” • “topic modeling” and “latent dirichlet allocation”
Outline • Computational Approach • Concept Mention Extraction • Citation-Context based Concept Clustering • Evaluation of Algorithms • Understanding Computational Linguistic Research
The emergence of SVM The emergence of Topic modeling Trends Analysis CitClus LexClus Topic modeling is high in 90’s, because LDA cannot generate a tight enough cluster for a specific concept LDA
Predictive Quality • For a concept, predict the number of papers in a year, given the number of papers in the previous three years • Linear regression over every three consecutive years • The better the grouping of mentions into coherent concept is, the more stable the trend graph is.
Relations Between Concept Categories • For a given concept, calculate the ratio between number of application mentions and technique mentions. • Three concepts in ACL community • Support vector machines, Machine translation, POS tagging POS tagging, #tech/#app SVM, #app/#tech MT, #tech/#app
Relations Between Concept Categories • For a given application, what techniques have been applied to it. Phrase-based and MERT Machine translation Decision Tree CRF Decision Tree disappears Named entity recognition
Conclusion • This work proposed algorithms for identifying, categorizing and clustering mentions of scientific concepts. • These tools can provide rather deep understanding and useful insight of research communities. Named entity recognition