1 / 22

Informetric methods seminar

Informetric methods seminar. Tutorial 2: Using Matlab for network construction, ranking, clustering, topic modeling, and path finding Erjia Yan. Contents. Network construction Ranking C lustering T opic modeling P ath finding. Contents. Network construction Ranking C lustering

Download Presentation

Informetric methods seminar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Informetric methods seminar Tutorial 2: Using Matlab for network construction, ranking, clustering, topic modeling, and path finding Erjia Yan

  2. Contents • Network construction • Ranking • Clustering • Topic modeling • Path finding

  3. Contents • Network construction • Ranking • Clustering • Topic modeling • Path finding

  4. From data to networks • Bibliographical data

  5. Web of Science format • Paper-to-paper citation network is the base • Web of Science cited references format: • First Author, Year Of Publication, Abbreviated Journal Name, Volume Number, Beginning Page Number • AANESTAD M, 2011, J STRATEGIC INF SYST, V20, P161 • All fields can be found in “full record + cited references” downloading option Some of the newer records may also have DOI. For a better match, it is better to remove the DOI from the cited references

  6. Citation matching • For citing papers, extract these fields and format them into Web of Science cited reference format. • Now we have citing papers and cited references that have the same format • Use these two fields, construct an internal citation network that only contains those cited references that are cited by the citing papers in the data set

  7. Procedures • If you can write an app for this, it would be great! • Otherwise, you can follow these instructions • Converting into • Use Access to construct the network • Have a table for citing papers • Import the converted citation pairs to Access • Use query to extract those pairs whose papers are in the table • Now you have the node info and link info • Import both into Matlab

  8. Adjacent matrices • Now we have paper-to-paper citation networks, but in order to construct for instance author-to-author citation or author co-citation networks, we need to use adjacent matrices. Authors a cell number 1 (i,j)=1 indicates paper i is written by author j Papers

  9. Procedures • Convert into • Add to the beginning of the file • Use Txt2Pajek on the linkage file • Import the edge section of the .net file to Matlab • Select M(1:n,n+1:m) where m is the col size. The selection is our author-paper adjacent matrix

  10. Citation and coauthorship

  11. Cocitation and biblio. coupling

  12. Co-word

  13. Contents • Network construction • Ranking • Clustering • Topic modeling • Path finding

  14. PageRank • By David Gleich of Purdue University • http://www.mathworks.com/matlabcentral/fileexchange/11613-pagerank • pagerank(M,options) • options.c: the teleportation coefficient [double | {0.85}] • options.v: the personalization vector [vector | {uniform: 1/n}]

  15. Contents • Network construction • Ranking • Clustering • Topic modeling • Path finding

  16. Built-in functions • K-means • IDX = kmeans(X,k) • http://www.mathworks.com/help/stats/kmeans.html • Hierarchical clustering • http://www.mathworks.com/help/stats/hierarchical-clustering.html

  17. Modularity-based clustering • By MIT Strategic Engineering • http://strategic.mit.edu/downloads.php?page=matlab_networks • [modules,module_hist,Q] = newmangirvan(adj,k) • [groups_hist,Q]=newman_comm_fast(adj)

  18. VOSviewer clustering • By Nees van Eck and Ludo Waltman of Leiden University • http://www.vosviewer.com/relatedsoftware/ • A variant of the modularity-based clustering technique • [X, cluster_size, V] = VOS_clustering(A, P)

  19. Contents • Network construction • Ranking • Clustering • Topic modeling • Path finding

  20. Matlab Topic Modeling Toolbox • By Mark Steyvers of University of California Irvine • http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm • Input: The input is a bag of word representation containing the number of times each words occurs in a document. 

  21. Contents • Network construction • Ranking • Clustering • Topic modeling • Path finding

  22. Bioinformatics toolbox • http://www.mathworks.com/help/bioinfo/ref/graphshortestpath.html • [dist, path, pred]=graphshortestpath(G,S,T) • from S to T in graph G • [dist] = graphallshortestpaths(G) • find all shortest path in graph G; dist is a distance matrix for the shortest path of each pair of nodes

More Related