140 likes | 277 Views
Eric Yulianto A0069442B 22 February 2013. UROP Research Update Citation Function Classification. Motivation. To assist researchers during paper review process. Quick categorization with minimal amount of reading. Help prioritize more important papers. Problem. Given a citation on a paper.
E N D
Eric Yulianto A0069442B 22 February 2013 UROP Research UpdateCitation Function Classification
Motivation • To assist researchers during paper review process. • Quick categorization with minimal amount of reading. • Help prioritize more important papers.
Problem • Given a citation on a paper. • What is the purpose of the citation? • Need to repeatedly read a section of the paper. • Intention may not be obvious from the citation sentence.
Related Work • Teufel et al., 2006 • Feature used: • Cue phrases • Verb Clusters • Verb Tense • Modality • Self-citation indicator • Ibk/k-Nearest Neighbour Algorithm • Accuracy: 77%
Related Work • Angrosh et al., 2010 • Citation classification => Sentence classification • Related Work Section only. • Feature Used: • Word Category. • Presence of citation in previous sentence. • Conditional random field. • Generally perform well: Accuracy: 96.51%. • Did not perform well on citation sentence.
Related Work • Dong and Schafer, 2011 • Feature used: • Cue words. • Physical: Location,Popularity,Density,AvgDens. • Sentence syntax • Ensemble-style self-training algorithm.
Current Progress (Analysis) • Citation scheme • Adopt and modify the scheme done in Teufel et al., 2006. • 12 classes => 4 classes. • Weakness • CompareContrast • Positive • Neutral
Current Progress (Analysis) • Dataset • ANLP Conference from ACL Anthology. • Context extracted from ParsCit output. • Distribution: 609 citations • Weakness: 30 • CompareContrast: 72 • Positive: 236 • Neutral: 271
Current Progress (Analysis) • Classification Algorithm • Weka Implementation of Naive Bayes and SVM • Uses chi-square attribute selection filter
Current Progress (Analysis) • Feature Used and Tested: • Cue Words • Cue Words + chi-square filter • Word Categories (Angrosh et al., 2010)
Ongoing Process • Feature extracted but not yet tested: • Physical Features (Dong and Schafer, 2011) • Location • Density • Popularity • Author and Title Information • Publication Year
Follow Up • Add more features that can help differentiate the citation functions. • Larger dataset • Split the classification into two stages: • Use the metadata(physical features, author information, title information, publication year) • Use the cue words to refine the classification