Towards “Unbiased” Ranking of Scientific Literature

Towards “Unbiased” Ranking of Scientific Literature Speaker: Hai Zhuge Authors: Xiaorui Jiang, Xiaoping Sun and Hai Zhuge Knowledge Grid Research Group Institute of Computing Technology Chinese Academy of Sciences, China ACM CIKM2012, Hawaii, USA

Outline • Introduction • Definition and source of “ranking bias” • Analysis of “ranking bias” • Method • Intra-network ranking • Inter-network ranking • Results • Dataset ad Benchmarks • Recommendation intensity on papers and researchers • Recommendation sensitivity on papers • Venue ranking • Conclusion H. Zhuge, ICT, CAS

Ranking Biases Ranking can help find important papers and researchers PageRank benefits old papers Now HITS benefits new papers p0 p0 p2 p2 p1 p1 p3 p3 p4 p4 p5 p5 p6 p6 H. Zhuge, ICT, CAS

Example of Ranking Bias • Experiments on ACL Anthology • Paper ID: 4:J99-2002 is an article in Computational Linguistics in 1999 and is ranked 4th by PageRank. PageRank HITS Only3 papers out of 19 are after2000! • Only7papers out of 27 are before 2000! • None of 1980s H. Zhuge, ICT, CAS

Time Distribution >40% >45% Quite similar 0% Our method: guarantees much “fairer” play between different time periods H. Zhuge, ICT, CAS

Outline H. Zhuge, ICT, CAS • Introduction • Definition and source of “ranking bias” • Analysis of “ranking bias” • Method • Intra-network ranking • Inter-network ranking • Results • Dataset ad Benchmarks • Recommendation intensity on papers and researchers • Recommendation sensitivity on papers • Venue ranking • Conclusion 2014/9/24 6

Data Model • Using metadata information only • 6 inter-network transition matrix • Paper-Researcher Network: PR, RP (=PRT) • Paper-Venue Network: PV, VP (=PVT) • Researcher-Venue Network: RV, VR (=RVT) ,r2 ,r4 r1 r3 r2 2 metadata 1 v1 v2 p1 p2 1 r4 Venues: v1: p1 v2: p2, p3 v3: p4 1 1 1 1 1 1 1 r1 2 2 1 r3 p3 p4 v3 1 r5 2 1 r3 r5 r1, Paper Influence Network (PIN) Researcher Influence Network (RIN) Venue Influence Network (VIN) H. Zhuge, ICT, CAS

Intra-Network Ranking • paut: authority vector • psnd: hub vector H. Zhuge, ICT, CAS

Iterative Ranking (1-)psndt) (1-)paut(t) rimp(t) (1-)(1-)rimp(t)/2 (1-)(1-)rimp(t)/2 paut(t) psnd(t) PIN: psnd PIN: paut RIN: rimp (1-)(1-)psnd(t)/2 (1-)(1-)paut(t)/2 (1-)vprs(t) (1-)rimp(t) (1-)(1-)psnd(t)/2 (1-)(1-)paut(t)/2 VIN: vprs (1-)(1-)vprs(t)/2 (1-)(1-)vprs(t)/2 vprs(t) H. Zhuge, ICT, CAS

Experiment Setup • ACL Anthology Network (AAN) till March 2011 • 18041 papers; 14386 researchers; 273 venues • Benchmark: • 227 papers collected from the reading lists of natural language processing or computational linguistics courses of 15 top universities – BenchP • The corresponding researchers (authors) – BenchR1 • The top-100 cited researchers from AAN - BenchR2 • Compared algorithms • PageRank • RHITS (randomized version of HITS by Ng et al., 2001) • CoRank: a generalized algorithm utilizing PIN and Researcher Collaboration Network (RCN) • Not compare FutureRank & P-Rank: not used RCN/RIN • MutualRank: this paper H. Zhuge, ICT, CAS

Glance at MutualRank Results • 19 papers (16 in the 1990s and 3 in the 1980s) out of 36 are before 2000 – better reflect the reality Top-100 H. Zhuge, ICT, CAS

CoRank is Similar to PageRank Top-15 relevant papers Quite similar H. Zhuge, ICT, CAS

Recommendation Intensity on Papers: RI(P)@k • P – the top-k papers returned by algorithms • For each paper p in P H. Zhuge, ICT, CAS

Recommendation Intensity on Researchers: RI(R)@k • R – the returned top-k researchers; researcher r ∈ R H. Zhuge, ICT, CAS

Recommendation Intensity cont. • Performance under different settings • (a) MutualRank (BiRank) is consistently better than CoRank • (b) There is no big difference between using RIN and RCN • BiRank: using only PIN and RIN • TriRank: using PIN, RIN/RCN and VIN H. Zhuge, ICT, CAS

Recommendation Sensitivity: RS(P,Y)@k • Among the papers published during the year range Y , the recommendation sensitivity of the top-k papers is RS(P,Y)@k. • RS(P,Y)@k is an entropy-like measure which reflects how uniform the results distribute between different time periods. • RS(P,Y)@k is also a measure reflecting the degree of bias • The flatter the recommendation sensitivity curve (RSC) is, the less sensitive the ranking algorithm is. • At a certain point, the smaller RS is, the less sensitivity the ranking algorithm is. H. Zhuge, ICT, CAS

Venue Ranking • MutualRank also returns meaningful and reasonable results compared to online ranking systems and human judgments Correlation Analysis H. Zhuge, ICT, CAS

Conclusions Proposed a balanced ranking on the complex network consisting of paper network, author network and publishing venue network  Feasibility depends on the semantics of links H. Zhuge, ICT, CAS

Ranking and Semantic Links • H. Zhuge. The Knowledge Grid: Toward Cyber-Physical Society, World Scientific Publishing co. 2012. • Chapter 2 includes Ranking Semantic Link Network • H.Zhuge, Semantic linking through spaces for cyber-physical-socio intelligence: A methodology, Artificial Intelligence, 175(2011)988-1019. • H.Zhuge, Interactive Semantics, Artificial Intelligence, 174(2010)190-204. • H.Zhuge and J.Zhang, Topological Centrality and Its Applications, Journal of the American Society for Information Science and Technology, 61(9)(2010)1824-1841. • H.Zhuge, Communities and Emerging Semantics in Semantic Link Network: Discovery and Learning, IEEE Transactions on Knowledge and Data Engineering, 21(6)(2009)785-799. H. Zhuge, ICT, CAS

Thanks! H. Zhuge, ICT, CAS

Towards “Unbiased” Ranking of Scientific Literature