330 likes | 664 Views
“The Rise and Rise of Citation Analysis ”. Tanmoy Chakraborty CNeRG , IIT- Kgp , India. Twofold Research Interests. Analyzing communities/clusters in complex networks Studying different aspects of citation network . “The Rise and Rise of Citation Analysis ”. In collaboration with
E N D
“The Rise and Rise of Citation Analysis” TanmoyChakraborty CNeRG, IIT-Kgp, India
Twofold Research Interests • Analyzing communities/clusters in complex networks • Studying different aspects of citation network
“The Rise and Rise of Citation Analysis” In collaboration with Suhansanu Kumar, PawanGowel, AnimeshMukherjee, NiloyGanguly
Mixed Sentiment • Sense and non-sense about citation analysis (*6860) --- T. Opthof, Cardiovascular research, 97 • The rise and rise of citation analysis (*1399) --- Lokman I. Meho, Phy. Res., 07 • Does citation pay? (*887) --- Fowler & Aksnes, Scientometrics, 07 • Think beyond citation analysis (*1009) --- Sarli et al., NIPS, 10
Raw Citations Count • To assess • Quality of a paper • Prominence of a researcher • Success of a collaboration/group • Quality of a conference/journal • Quality of an Institute • Impact of a research area Sooner or later, you will definitely be subjected to such an analysis Only Citation Count
Bibliometrics: Raw Citation Count • Journal Impact factor • Immediacy factor • Eigen factor • Altmetric • 5 years Impact factor Common assumption Citation
Publication Universe • Crawled entire Microsoft Academic Search • Papers only in Computer Science domain • Basic preprocessing
Citation Profile An exhaustive analysis of the citation profiles • Papers having at least 10 yrs history and consider at most 20yrs history • Scale the entries of the citation profile between 0-1 • Use peak-detection heuristics • Each peak should be at least 75% of the max peak • Two consecutive peak should be separated at least 3 yrs
Five Universal Citation Profiles Avg. behavior Q1 and Q3 represent the first and third quartiles of the data points respectively. Another category: ‘Oth’ => having less than one citation (on avg) per year
Immediate Questions • Is the Journal Impact factor (JIF) formula correct? • JIF at year 2000 : Eugene Garfield (1975) # of citations received in 2000 by papers published in that journal in 1998 and 1999 divided by # of papers published in the journal in 1998 and 1999
Immediate Questions • Importance of the recent papers in current time period? • Relevance of the journal itself in current time period • Why not all the citations at current time • Why last 2 years?
Immediate Questions • JIF overlooks the importance of Peak_Late and MonIncr Over-consider Over-consider Under-consider
More on the Categories • Are they biased on the year of publication? (Aging factor) Ans: No Same age
More on the Categories 2. Are they biased on Journals/conferences?
More on the Categories 3. Are they affected by self-citation? Transition matrix showing the transition of categories after removing self citations Least affected by self citations Most affected by self citation
More on the Categories 4. What about Peak_Mul? Peak_Int 5.1 Time 4.2 Peak_Mul 3.1+2.5 = 5.6 Avg Peak Height (12.1-5.3) = 6.8 ~ (10.8-4.2) 3.1 2.5 12.1 5.3 Peak_Late 5.3 10.8 Years after publication Peak_Mul Might be Intermediary betweenPeak_IntandPeak_Late
Where does this classification help? • To improve Bibliometrics in scientific research • Various prediction systems • Future citation prediction system • Predicting emerging field/topic • Predating future star/seminal papers • Paper search and Recommender systems
On predicting Future Citation Count at the Time of Publication
Traditional Framework Yan et al., JCDL, 2012
Problems in Traditional Framework • Consider initial few years’ statistics after publication • Proved to be very effective • Lack of time dimension in prediction • Suffers a lot from outlier points during regression
Problems in Traditional Framework:how to tackle • Consider initial few years’ statistics after publication • Proved to be very effective • Lack of time dimension in prediction • Suffers a lot from outlier points during regression • Try to predict citations as early as possible • (may be at the time of publication ) • Should consider the time dimension • Reduce outlier points as much as possible
Performance Evaluation • Coefficient of determination (R2) The more, the better • Mean squared error (θ) The less, the better • Pearson correlation coefficient (ρ)The more, the better
Performance of SVM Confusion Matrix
Conclusion • Five universal citation profiles • Different analysis on these categories • Can help to reframe existing bibliometrics • Can be a generic way in machine learning • Can enhance the performance of the existing systems