1 / 32

“The Rise and Rise of Citation Analysis ”

“The Rise and Rise of Citation Analysis ”. Tanmoy Chakraborty CNeRG , IIT- Kgp , India. Twofold Research Interests. Analyzing communities/clusters in complex networks Studying different aspects of citation network . “The Rise and Rise of Citation Analysis ”. In collaboration with

jennis
Download Presentation

“The Rise and Rise of Citation Analysis ”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “The Rise and Rise of Citation Analysis” TanmoyChakraborty CNeRG, IIT-Kgp, India

  2. Twofold Research Interests • Analyzing communities/clusters in complex networks • Studying different aspects of citation network

  3. “The Rise and Rise of Citation Analysis” In collaboration with Suhansanu Kumar, PawanGowel, AnimeshMukherjee, NiloyGanguly

  4. Mixed Sentiment • Sense and non-sense about citation analysis (*6860) --- T. Opthof, Cardiovascular research, 97 • The rise and rise of citation analysis (*1399) --- Lokman I. Meho, Phy. Res., 07 • Does citation pay? (*887) --- Fowler & Aksnes, Scientometrics, 07 • Think beyond citation analysis (*1009) --- Sarli et al., NIPS, 10

  5. Raw Citations Count • To assess • Quality of a paper • Prominence of a researcher • Success of a collaboration/group • Quality of a conference/journal • Quality of an Institute • Impact of a research area Sooner or later, you will definitely be subjected to such an analysis Only Citation Count

  6. Bibliometrics: Raw Citation Count • Journal Impact factor • Immediacy factor • Eigen factor • Altmetric • 5 years Impact factor Common assumption Citation

  7. Publication Universe • Crawled entire Microsoft Academic Search • Papers only in Computer Science domain • Basic preprocessing

  8. Publication Universe

  9. Citation Profile An exhaustive analysis of the citation profiles • Papers having at least 10 yrs history and consider at most 20yrs history • Scale the entries of the citation profile between 0-1 • Use peak-detection heuristics • Each peak should be at least 75% of the max peak • Two consecutive peak should be separated at least 3 yrs

  10. Five Universal Citation Profiles Avg. behavior Q1 and Q3 represent the first and third quartiles of the data points respectively. Another category: ‘Oth’ => having less than one citation (on avg) per year

  11. Five Universal Citation ProfilesA deeper look

  12. Immediate Questions • Is the Journal Impact factor (JIF) formula correct? • JIF at year 2000 : Eugene Garfield (1975) # of citations received in 2000 by papers published in that journal in 1998 and 1999 divided by # of papers published in the journal in 1998 and 1999

  13. Immediate Questions • Importance of the recent papers in current time period? • Relevance of the journal itself in current time period • Why not all the citations at current time • Why last 2 years?

  14. Immediate Questions • JIF overlooks the importance of Peak_Late and MonIncr Over-consider Over-consider Under-consider

  15. More on the Categories • Are they biased on the year of publication? (Aging factor) Ans: No Same age

  16. More on the Categories 2. Are they biased on Journals/conferences?

  17. More on the Categories 3. Are they affected by self-citation? Transition matrix showing the transition of categories after removing self citations Least affected by self citations Most affected by self citation

  18. More on the Categories 4. What about Peak_Mul? Peak_Int 5.1 Time 4.2 Peak_Mul 3.1+2.5 = 5.6 Avg Peak Height (12.1-5.3) = 6.8 ~ (10.8-4.2) 3.1 2.5 12.1 5.3 Peak_Late 5.3 10.8 Years after publication Peak_Mul Might be Intermediary betweenPeak_IntandPeak_Late

  19. Where does this classification help? • To improve Bibliometrics in scientific research • Various prediction systems • Future citation prediction system • Predicting emerging field/topic • Predating future star/seminal papers • Paper search and Recommender systems

  20. On predicting Future Citation Count at the Time of Publication

  21. Problem Definition

  22. Traditional Framework Yan et al., JCDL, 2012

  23. Problems in Traditional Framework • Consider initial few years’ statistics after publication • Proved to be very effective • Lack of time dimension in prediction • Suffers a lot from outlier points during regression

  24. Problems in Traditional Framework:how to tackle • Consider initial few years’ statistics after publication • Proved to be very effective • Lack of time dimension in prediction • Suffers a lot from outlier points during regression • Try to predict citations as early as possible • (may be at the time of publication ) • Should consider the time dimension • Reduce outlier points as much as possible

  25. Our Framework: 2-stage Model

  26. Features

  27. Performance Evaluation • Coefficient of determination (R2) The more, the better • Mean squared error (θ) The less, the better • Pearson correlation coefficient (ρ)The more, the better

  28. Performance of SVM Confusion Matrix

  29. Performance of Regression Model

  30. Feature Analysis

  31. Conclusion • Five universal citation profiles • Different analysis on these categories • Can help to reframe existing bibliometrics • Can be a generic way in machine learning • Can enhance the performance of the existing systems

  32. Thank You

More Related