390 likes | 434 Views
How to do….social network analysis Mike Savage (with the help of Nick Crossley) University of Manchester. What is social network analysis?. A set of mathematical and statistical techniques for handling relational data .
E N D
How to do….social network analysis Mike Savage(with the help of Nick Crossley)University of Manchester
What is social network analysis? • A set of mathematical and statistical techniques for handling relational data. • For identifying the structural properties of sets of relations (i.e. of networks). • And for visualising and describing networks. • Social scientific origins in: sociology, anthropology, social psychology. • Mathematical bases in: graph theory, matrix algebra and (increasingly) statistics. • There is an increasing dialogue with physics, maths, computer science, informatics.
The challenge of social networks • Relational thinking challenges powerful approaches which focus on (a) individual attributes and (b) variable-centred perspectives. • Emphasises that it is connections (and disconnections) that matter • Although highly formalised and quantitative, it often depends on qualitative, rather than survey based, research • Survey measures can often provide data on ego-networks (an individual’s relations) but qualitative work is often necessary to gather information on whole networks
What is relational data? • Data concerning relations between things (e.g. agents) rather than the individual properties/attributes of those things. • We might be interested in relations between: people, organisations, nations, towns …anything. • We might be interested in relations based upon: knowledge, emotion, exchange, infection/contamination …anything.
What are structural properties of networks? • Defining characteristics/patterns of the relationships comprising a network taken as a whole. • SNA is not focused on individual properties of nodes or types of relations but on (i) the ‘whole’ these ‘parts’ comprise, (ii) the position of nodes within it (e.g. central or marginal) and (iii) the (relational) properties they acquire from it. • SNA is formal (i.e. not focused on content/ meaning of relationships). That is a strength and a weakness.
Why bother with SNA? • It is difficult to describe/analyse complex networks discursively. • SNA organises relational data and makes it (via the software) easily ‘searchable’. • SNA identifies genuine, meaningful, measurable and potentially sociological relevant properties of networks. • It shows that networks have properties which make a difference.
A simple relational matrix in which presence/absence of a relation is indicated by a 1 or 0 respectively: who drinks with whom?
Note that in this case there is unnecessary repetition of information: if Tom drinks with Sally then Sally drinks with Tom • We need to be mindful of this in any calculations we may make. We have one relationship here, not two.
Some relations are ‘directed’ (e.g. liking) and thus not necessarily reciprocated. This is recorded in the matrix. Tom likes Sally but she doesn’t like him. She likes Alice. • In this case Tom’s relation to Sally and hers to him are distinct and should be treated independently
It is often not meaningful to ask whether a person is related to their self. • But it may be (e.g. organisations send memos internally and externally, individuals like/dislike themselves). • The matrix convention, when it is not meaningful, is to indicate that a person is not related to their self. • Most software packages, including Pajek, use forms of data entry (short of a full matrix) where relations to self are not indicated unless meaningful.
Relations may be weighted in ordinal/interval manner: e.g. 0 = ‘Don’t like’, 1=‘like’, 2=‘really like’; or telephones n times per week.
Graphs represent the same information as matrices in a different way. • Relationships are indicated by connecting lines. • If undirected these lines are referred to as ‘edges’. • If directed these lines are referred to as ‘arcs’ and direction is indicated by an arrow head (potentially at both ends) • The objects or nodes (e.g. people) connected by arcs/edges are referred to as vertices (sing. = vertex). • These terms derive from graph theory. • (Examples to follow from Savage, Tampubolon and Warde, 2008).
Figure 8.1: Communication Networks within the Labour Party Note Respondents were asked ‘with whom do you discuss things to do with the organisation (for example, activities, issues, strategy)’.
Figure 8.2: Communication Networks within the Conservation Group Note Respondents were asked ‘with whom do you discuss things to do with the organisation (for example, activities, issues, strategy).
Figure 8.3.3: Obtaining Information Network in the Conservation Group.
Nice pictures, but so what? • SNA is often applied to intrinsically interesting groups (elites, policy makers, medical diseases) but it is not always clear how it applies to ‘mundane’ groups • The value of linking sociograms to qualitative evidence. We used in depth interviews to show how the structure of networks affected people’s narratives • This shows that the hub and spoke structure (conservation group) is more clientist and less energising than the inner wheel structure (Labour Party)
SNA involves measurement of relational properties: e.g. • Degree (in-degree, out-degree, average degree). • Density. • Components, cliques and cores. • Centrality. • Centralisation. • Erdos/Bacon number(s). • Diameter. • (see John Scott’s handbook for an excellent intro to these and more)
Density = • The actual number of connections in a network expressed as a proportion of the total possible number of connections. • A figure between 0 and 1. • Not easy to make meaningful comparisons of density across networks of different sizes (or involving different types of relations). • High density should generate greater: trust, cultural homogeneity and diffusion speed.
Components = sub-graphs within a graph, any of whose vertices can be reached from any other (by whatever route). • Important because the holes that separate different components within a graph prevent the flow of resources and might indicate tribal or ideological divides.
Cliques = a subset of vertices within a component, all of whose members are related to one another (density = 1). • K-cores = a subset of vertices within a component, all of whose members enjoy a specified number of relations (=‘k’) with the others: e.g. a 6-core is a subset, all of whose members enjoy relations with at least 6 of the other members. • And because of their relations to those outside of their grouping and relative power.
Figure 8.1: Communication Networks within the Labour Party Note Respondents were asked ‘with whom do you discuss things to do with the organisation (for example, activities, issues, strategy)’.
Figure 8.2: Communication Networks within the Conservation Group Note Respondents were asked ‘with whom do you discuss things to do with the organisation (for example, activities, issues, strategy).
Varieties of (vertex) centrality • Central vertices enjoy both advantages and disadvantages (potentially) Close Deg Bet Close
There are also ‘centralisation’ measures focused upon the network as a whole. • How centralised is it? • The most basic of these take variance measures for one of the centrality scores (or something very similar) • Centralised networks may be more easy to organise. • They are often robust in relation to random attack/damage. • But vulnerable to targeted attack.
e.g. remove any ‘spoke’ from this ‘wheel’ and damage is minimal but remove the central hub and the whole thing collapses (atomises).
The concept of ‘degree’ is also used in this context (and the concept of ‘walk’). An individual with a Bacon number of 4 is separated from Bacon by ‘four degrees of separation’. • The ‘small world’ phenomenon is sometimes referred to as ‘six degrees of separation’. 1 Bacon 3 4 2
Diameter of a component = • The shortest possible path between the two most distant vertices (measured in degrees). • Or the (mean) average path distance between all possible pairs of vertices. • Relevant because resources, viruses etc. will take longer to travel across large diameters. • Information/gossip will be more liable to distortion (Chinese whispers) the more mouths it passes through.
For a very clear exposition of these and other ideas see: Scott, J. (2000) Social Network Analysis: A Handbook, London, Sage. For a longer, more detailed and more challenging (but very clear) intro, see: Wasserman, S. and Faust, K. (1994) Social Network Analysis, Cambridge, CUP.
Pajek software • Free to download • Easy to use • Good book available on it. • Good on Visualisation. • Compatible with other packages (e.g. netdraw, SPSS) • Able to handle very big data sets.
Getting Started • Preparing a data file for Pajek. • Visualising with Pajek.
Preparing a data file 1: • You cannot prepare data files within Pajek itself. You must prepare them separately. • Pajek doesn’t read Word files. • Best to use ‘Notepad’ or a similar program. • Notepad is a standard feature on W4W (look in ‘programs’). • Notepad saves files in ‘textfile’ format by default. Pajek can read textfiles. • There are many ways of constructing datafiles for Pajek, here are two:
Simply open Notepad and type: *vertices n 1 Tom 2 Dick 3 Harry *edges 1 2 1 3 2 3 Etc.
Or *vertices n 1 Tom 2 Dick 3 Harry *arcs 1 2 1 3 2 3 Etc.
You can also weight relationships: *vertices n 1 name 2 name 3 name *edges 1 2 2 1 3 1 2 3 4 Etc.
The Pajek book • De Nooy, W, Mrvar, A. and Batagelj, V. (2005) Exploratory Social Network Analysis with Pajek. Cambridge: Cambridge University Press, £22.99 pbk, 0-521-60262-9.