780 likes | 1.04k Views
Network. Pajek. Introduction. Pajek is a program, for Windows , for analysis and visualization of large networks having some thousands or even millions of vertices. In Slovenian language the word pajek means spider. Application.
E N D
Network Pajek
Introduction • Pajek is a program, for Windows, for analysis and visualization of large networks having some thousands or even millions of vertices. In Slovenian language the word pajek means spider.
Application • Pajek should provide tools for analysis and visualization of such networks: • collaboration networks, • organic molecule in chemistry, • protein-receptor interaction networks, • genealogies, • Internet networks, • citation networks, • diffusion (AIDS, news, innovations) networks, • data-mining (2-mode networks), etc. • See also collection of large networks at: • http://vlado.fmf.uni-lj.si/pub/networks/data/
Main goals • to support abstraction by (recursive) decomposition of a large network into several smaller networks that can be treated further using more sophisticated methods; • to provide the user with some powerful visualization tools; • to implement a selection of efficient (subquadratic) algorithms for analysis of large networks.
six data structures in pajek • network– main object (vertices and lines - arcs, edges): • graph, valued network, 2-mode or temporal network • partition • Nominal property of vertices. Default extension: .clu • vector • numerical property of vertices. Default extension: .vec • permutation • reordering of vertices. Default extension: .per • cluster • subset of vertices (e.g. a class from partition). Default extension: .cls. • hierarchy • hierarchically ordered clusters and vertices. Default extension: .hie
Network – .net • Network can be defined in different ways on input file. Look at three of them: • 1. List of neighbours (Arcslist / Edgeslist)(see test 1.net) *Vertices 5 1 ”a” 2 ”b” 3 ”c” 4 ”d” 5 ”e” *Arcslist 1 2 4 2 3 3 1 4 4 5 *Edgeslist 1 5
Explanation • Data must be prepared in an input (ASCII) file. Program NotePad can be used for editing. Much better is a shareware editor, TextPad. • Words, starting with *, must always be written in first column of the line. They indicate the start of a definition of vertices or lines. • Using *Vertices 5 we define a network with 5 vertices. This must always be the first statement in definition of a network. • Definition of vertices follows after that – to each vertex we give a label, which is displayed between “and ”. • Using *Arcslist, a list of directed lines from selected vertices are declared (1 2 4 means, that there exist two lines from vertex 1, one to vertex 2 and another to vertex 4). • Similarly *Edgeslist, declares list of undirected lines from selected vertex. • In the file no empty lines are allowed – empty line means end of network.
Network – .net • 2. Pairs of lines (Arcs / Edges) (see test 2.net) *Vertices 5 1 ”a” 2 ”b” 3 ”c” 4 ”d” 5 ”e” *Arcs 1 2 1 1 4 1 2 3 2 3 1 1 3 4 2 4 5 1 *Edges 1 5 1
Explanation • Directed lines are defined using *Arcs, undirected lines are defined using *Edges. The third number in rows defining arcs/edges gives the value/weight of the arc/edge. • In the previous format (Arcslist / Edgeslist) values of lines are not defined • the format is suitable only if all values of lines are 1. • If values of lines are not important the third number can be omitted (all lines get value 1). • In the file no empty lines are allowed – empty line means end of network.
Network – .net • 3.Matrix (see test 3.net) *Vertices 5 1 ”a” 2 ”b” 3 ”c” 4 ”d” 5 ”e” *Matrix 0 1 0 1 1 0 0 2 0 0 1 0 0 2 0 0 0 0 0 1 1 0 0 0 0
Explanation • In this format directed lines (arcs) are given in the matrix form (*Matrix). If we want to transform bidirected arcs to edges we can use “Network>create new network>Transform>Arcs to Edges>Bidirected only”
Additional definition of network • Additionally, Pajek enables precise definition of elements used for drawing networks (coordinates of vertices, shapes and colors of vertices and lines, ...). • Example: (see test 4.net) *Vertices 5 1 “a” box 2 “b” ellipse 3 “c” diamond 4 “d” triangle 5 “e” empty ...
Draw • Layout of networks • Energy: The network is presented like a physical system, and we are searching for the state with minimal energy • Kamada-Kawai: using separate components, you can tile connected components in a plane • Fruchterman-Reingold: draw in a plane or space and selecting the repulsion factor • Eigen Values: Selecting 2 or 3 eigenvectors to become the coordinates of vertices. Can obtain nice pictures
Partition – .clu • Partitions are used to describe nominal properties of vertices. • e.g., 1-men, 2-women • Definition in input file (see test.clu) *Vertices 5 1 2 2 2 1
Vector – .vec • Vectors are used to describe numerical properties of vertices (e.g., centralities). • Definition in input file (see test.vec) *Vertices5 0.58 0.25 0.25 0.08 0.25
Pajek project files • It is time consuming to load objects one by one. Therefore it is convenient to store all data in one file, called Pajekproject file (.paj). (see test.paj) • Project files can be produced manually by using “File>PajekProject File>Save” • To load objects stored in Pajek project file select “File>Pajek Project File>Read”
Menu structure • Commands are put to menu according to the following criterion: • commands that need only a network as input are available in menu Net, • commands that need as input two networks are available in menu Networks, • commands that need as input two objects (e. g., network and partition) are available in menu Operations, • commands that need only a partition as input are available in menu Partition . . .
Global and local views on network • Local view is obtained by extracting sub-network induced by selected cluster of vertices. • Global view is obtained by shrinking vertices in the same cluster to new (compound) vertex. In this way relations among clusters of vertices are shown. • Combination of local and global view is contextual view: Relations among clusters of vertices and selected vertices are shown.
Example • Import and export in 1994 among 80 countries are given. They is given in 1000$. (See Country_Imports.net) • Partition according to continents (see Country_Continent.clu) • 1 – Africa, 2 – Asia, 3 – Europe, 4 – N. America, 5 – Oceania, 6 – S. America. • Operations>Extract from Network>Partition • Operations>Shrink Network>Partition
Extracting Subnetwork • Operations>Extract from Network>Partition
Extracting Subnetwork • Operations>Shrink Network>Partition
Removing lines with low values • Network>Info>Line Values
Removing lines with low values • Network>Create New Network>Transform>Remove>Lines with value>lower than (340000)
Resources • Download • The latest version of Pajek is freely available, for non-commercial use, at its home page: http://vlado.fmf.uni-lj.si/pub/networks/pajek/ • Text file into Pajek • http://vlado.fmf.uni-lj.si/pub/networks/pajek/howto/text2pajek.htm • WoS to Pajek • http://vlado.fmf.uni-lj.si/pub/networks/pajek/WoS2Pajek/default.htm • Tutorial • Exploratory Social Network Analysis with Pajek • visit Pajek wiki for more information • http://pajek.imfm.si/doku.php
http://pajek.imfm.si/doku.php?id=wos2pajek/ WOS to pajek
S519 Web of Science
S519 Output
S519 Output
wos2pajek • The download link: • http://pajek.imfm.si/doku.php?id=wos2pajek • The new tutorial slides: • http://pajek.imfm.si/lib/exe/fetch.php?media=faq:wos:wos2pajek07.pdf
MontyLingua • Download from: http://web.media.mit.edu/~hugo/montylingua/ • Unpack it and copy ‘montylingua-2.1’ to C:\Python26\Lib\site-packages • Set up a new environment variable named ‘MONTYLINGUA’ and set the variable value as c:\Python26\Lib\site-packages\MontyLingua-2.1\Python
wos2pajek • Download the latest version of WoS2Pajek. • http://pajek.imfm.si/doku.php?id=wos2pajek • Unpack it, and double click on WoS2Pajek.py to show the main interface of program:
WoS2Pajek Program • The current version of WoS2Pajek requires 7 parameters to be given by the user: • MontyLinguadirectory: path to the directory in which the MontyLingua package is installed; • project directory: where the output files are saved; • WoS file; • maxnum– estimate of the number of all vertices (number of records+number of cited Works) –30*number of records; • step – prints info about each k*step record as a trace; step= 0– no trace. • use ISI name / short name; • make a clean WoS file without duplicates; • boolean list[DE, ID, TI, AB] specifying which fields are sources of keywords.
Cite.net • Network/Info/General • Network/Create New Network/Transform/Remove/Loops • Network/Create New Network/Transform/Remove/Multiple lines/Single line
CiteNew.net • Paper citation network • Questions • What are highly cited articles? • The diameter of the network? • What are the major clusters? • More questions?
Strong component of cite network • Network/Create Partition/Components/Strong [2] • Operations/Network+Partition/Extract SubNetwork[1-*] • Operations/Network+Partition/Transform/Remove Lines/Between Cluster • Save citestrong.clu
Co-author network • Read WA.net • Network/2-mode network/2-mode to 1-mode/Columns • Network/Create Partition/Components/Weak [2] • Operations/Network+Partition/Extract SubNetwork[1-*] • Network/Create New Network/Transform/Remove/Loops • WANew.net (which is a co-author network) • Questions: • The author with highest co-authors?
Bibliographic coupling network • [Read Cite.net] • Network/Create New Network/Transform/1-mode to 2-mode • Network/2-mode Network/2-mode to 1-mode/Rows • Network/Create Partition/Components/Weak [2] • Operations/Network + Partition/Extract SubNetwork[1-*]
Co-citation network • [Read Cite.net] • Network/Create Partitions/Degree/Output • Operations/Network+Partition/Extract subNetwork[1-*] • Network/Create New Network/Transform/1-mode to 2-mode • Network/2-mode network/2-mode to 1-mode/Columns • Network/Create Partition/Components/Weak [2] • Operations/Network+Partition/Extract SubNetwork[1-*]
Two-mode network • One-mode network • each vertex can be related to each other vertex. • Two-mode network • vertices are divided into two sets and vertices can only be related to vertices in the other set.
Example *vertices 15 10 1 "P1" 2 "P2" 3 "P3" 4 "P4" 5 "P5" 6 "P6" 7 "P7" 8 "P8" 9 "P9" 10 "P10" 11 "Au1" 12 "Au2" 13 "Au3" 14 "Au5" 15 "Au5" *edgeslist 1 11 12 15 2 12 14 15 3 14 4 11 15 5 12 13 6 13 7 11 15 8 11 12 14 9 11 12 13 14 15 10 11 12 15 • Suppose we have data as below: • P1: Au1, Au2, Au5 • P2: Au2, Au4, Au5 • P3: Au4 • P4: Au1, Au5 • P5: Au2, Au3 • P6: Au3 • P7: Au1, Au5 • P8: Au1, Au2, Au4 • P9: Au1, Au2, Au3, Au4, Au5 • P10: Au1, Au2, Au5 See two_mode.net
Transforming to valued networks • The network is transformed into an ordinary network, where the vertices are elements from the first subset, using • “Network>2 mode network>2-Mode to 1-Mode>Rows”.
Transforming to valued networks • If we want to get a network with elements from the second subset we use • “Network>2 mode network>2-Mode to 1-Mode>Columns”.