270 likes | 288 Views
This research proposes a modified version of the PageRank algorithm to rank research papers based on their relevance and quality, considering factors such as the time of publication and the conferences/journals they are published in. It also explores co-authorship networks to identify highly influential authors in specific research fields.
E N D
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU
Research Idea Overview: • Rising trends in research and advancement of technology that provides access to a great deal of these research papers like the digital libraries, online journals etc., • But, how far they make it easy for an upcoming researcher to find relevant papers based on his topics of interest. • One of the most interesting and challenging problems of the recent days.
Generalized Solution: • How do you identify a highly qualified paper in a particular topic of interest? • One well-known answer is, Citation Networks, where the value of a paper is determined by the number of citations. • If a paper is cited by many other research papers, it is likely that the paper is considered to be highly important.
Then what’s new? • There’s a lot of research done and still a lot of research has been going on in this field - Analysis of Citation Networks. • This is because citation of a research paper depends on various factors such as the time when the paper is published, the type of journal/conference the paper is published in etc.,
Traditional Methods: • Traditional methods like h-index, g-index and impact factor are used in determining the number of citations. • All of them are based on quantity of the citations. • Each method mentioned above is different and each has its own advantages and disadvantages.
Traditional Methods (Contd.,) • For example, h-index does not consider self-citations which also influences the paper’s rank upto some extent. • Moreover, all these metrics does not take into account the quality of papers being cited, the time when the paper is published and the type of conference/journal the paper is published in. • These metrics only take into account the number of citations.
Solution? • Because of their disadvantages, these metrics cannot be used all the time and you need a more generalized algorithm which when applied, gives you a more accurate result. • Hence, I use a modified version of page-rank algorithm, to rank the research papers, taking into consideration the time when the paper is published and the type of conference the paper is published in.
Why consider time? • Time is one of the most important factors, that is missing in the previous research done. • Recently published papers are less likely to get cited; so they have less number of citations which influences their rank. • So, I consider time as a constraint in my algorithm; which results in a more accurate ranking of the research papers.
Why consider conference/journal? • Higher the requirements of the journal/conference, higher is the quality of the paper published. • So, based on the ranks of the research papers, we rank the conferences/journals. • Taking both the ranks of papers and conferences, we calculate an authoritative score for each author, based on which we rank the authors.
Solution (Contd.,) • Alongside, finding the ranks of authors using research papers, for the same dataset, we can construct a co-authorship network and using the same algorithm, we can find the ranks of the authors. • But, this requires a lot of effort as almost every research paper written today is multi-authored.
Solution (contd.,) • Co-authorship networks are similar to citation networks in all the aspects, except the edges in a co-authorship network represents the scientific collaboration between two authors. • This co-authorship network also helps us to find the collaborations in a research community.
Solution - overview Overall the ranking of authors individually based on the research papers and journals they are published in and finding the highly influential authors from the co-authorship network helps us in identifying the most qualified authors in a particular research field.
Missing pieces of study in previous research: • All the metrics use citation count as a parameter to determine how important a paper is, but this gives an approximation of how important the paper is rather than the actual picture. • So, it’s not only the quantity of citations that matter, but also the quality of citations.
Contd., • So, the rank of a research paper can be measured by taking into account the importance of the citing paper rather than just taking citation count as a measure. • This provided the basis for the use of page-rank algorithm in determining the ranks of research papers.
Page-rank algorithm • This is one of the most widely used algorithms in search engine optimization. • Proposed by Larry page and Sergey Brin, it’s used in the world’s most powerful search engine - Google. • This not only counts the number of inlinks, but also considers the number of outlinks to determine the quality and importance of a node.
General version of Page-rank algorithm In general page-rank algorithm is represented as: PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) • PR(A) is the PageRank of page A, • PR(Ti) is the PageRank of pages Ti which link to page A, • C(Ti) is the number of outbound links on page Ti and • d is a damping factor which can be set between 0 and 1.
Contd., • The page-rank algorithm does not rank website as a whole, but rank is determined for each page individually. • Further, the rank of a page A is recursively determined by the ranks of pages that link to A.
Contd., • We modify this page-rank algorithm by taking into account the importance of the citing journal in which the paper is published. • We still modify this version of page-rank algorithm by taking the date of publishing the papers into account and thus rank the papers and then the authors accordingly.
Co-authorship networks: • Now, why does co-authorship come into the picture? • Co-authorship not only helps us to find the ranks of highly influential authors but also the scientific collaborations. • But, do these scientific collaborations matter? Are these of any help? • Ofcourse, yes!
Contd., • These helps us to determine the collaborations among different scientific communities in a particular field of research. • Now, this helps a new researcher to identify how far the research has been done in a particular field and the way research is going on.
Contd., • The collaboration between different communities provide the opportunity to discover the increasing specialization, combine the different knowledge and skills of various researchers. • Also, this helps different communities to share the complex infrastructure by bringing them together, thus reducing the cost of research.
Contd., • Even in the co-authorship networks, it’s the quality that should matter and not the quantity. • One famous example for this is The Erdos Number project, a co-authorship network analysis, which points out the importance of quality - “It is not the number of authors that you publish with matters but rather whom you publish with”.
Research Idea - Conclusion There is a mutually reinforcing relationship between the ranks of papers and authors. But, as of today co-authorship networks received relatively less importance than citation networks. In this project, I would like to determine the ranks of authors from citation networks and a list of highly influential authors from co-authorship networks from various disciplines in Computer Science.
Approach: • DATASET: • Extract the data from DBLP website. • DBLP is a well-known website that tracks journal articles, conference papers and other publications in Computer Science. • Algorithm implementation in MATLAB.