1.04k likes | 1.28k Views
Introduction to Social Network Analysis Columbia University April 2007 James Moody Duke University. Introduction. Introduction Social Network data Basic data elements Network data sources Local (ego) Network Analysis Introduction Network Composition Network Structure
E N D
Introduction to Social Network Analysis Columbia University April 2007 James Moody Duke University
Introduction • Introduction • Social Network data • Basic data elements • Network data sources • Local (ego) Network Analysis • Introduction • Network Composition • Network Structure • Local Network Models • Complete Network Analysis • Exploratory Analysis • Network Connections • Network Macro Structure • Stochastic Network Analyses • Social Network Software Review • Work through examples
Introduction We live in a connected world: “To speak of social life is to speak of the association between people – their associating in work and in play, in love and in war, to trade or to worship, to help or to hinder. It is in the social relations men establish that their interests find expression and their desires become realized.” Peter M. Blau Exchange and Power in Social Life, 1964 "If we ever get to the point of charting a whole city or a whole nation, we would have … a picture of a vast solar system of intangible structures, powerfully influencing conduct, as gravitation does in space. Such an invisible structure underlies society and has its influence in determining the conduct of society as a whole." J.L. Moreno, New York Times, April 13, 1933 These patterns of connection form a social space, that can be seen in multiple contexts:
Introduction Source: Linton Freeman “See you in the funny pages” Connections, 23, 2000, 32-42.
Introduction High Schools as Networks
Introduction • And yet, standard social science analysis methods do not take this space into account. • “For the last thirty years, empirical social research has been dominated by the sample survey. But as usually practiced, …, the survey is a sociological meat grinder, tearing the individual from his social context and guaranteeing that nobody in the study interacts with anyone else in it.” • Allen Barton, 1968 (Quoted in Freeman 2004) • Moreover, the complexity of the relational world makes it impossible to identify social connectivity using only our intuition. • Social Network Analysis (SNA) provides a set of tools to empirically extend our theoretical intuition of the patterns that compose social structure.
Introduction Why do Networks Matter? Local vision
Introduction Why do Networks Matter? Local vision
Introduction • Social network analysis is: • a set of relational methods for systematically understanding and identifying connections among actors. SNA • is motivated by a structural intuition based on ties linking social actors • is grounded in systematic empirical data • draws heavily on graphic imagery • relies on the use of mathematical and/or computational models. • Social Network Analysis embodies a range of theories relating types of observable social spaces and their relation to individual and group behavior.
Introduction Key Questions • Social Network analysis lets us answer questions about social interdependence. These include: • “Networks as Variables” approaches • Are kids with smoking peers more likely to smoke themselves? • Do unpopular kids get in more trouble than popular kids? • Are people with many weak ties more likely to find a job? • Do central actors control resources? • “Networks as Structures” approaches • What generates hierarchy in social relations? • What network patterns spread diseases most quickly? • How do role sets evolve out of consistent relational activity? • We don’t want to draw this line too sharply: emergent role positions can affect individual outcomes in a ‘variable’ way, and variable approaches constrain relational activity.
1. Introduction and Background • Why networks matter: • Intuitive: information travels through contacts between actors, which can reflect a power distribution or influence attitudes and behaviors. Our understanding of social life improves if we account for this social space. • Less intuitive: patterns of inter-actor contact can have effects on the spread of “goods” or power dynamics that could not be seen focusing only on individual behavior.
Social Network Data The unit of interest in a network are the combined sets of actors and their relations. We represent actors with points and relations with lines. Actors are referred to variously as: Nodes, vertices, actors or points Relations are referred to variously as: Edges, Arcs, Lines, Ties Example: b d a c e
Social Network Data Basic Data Elements • Social Network data consists of two linked classes of data: • Nodes: Information on the individuals (actors, nodes, points, vertices) • Network nodes are most often people, but can be any other unit capable of being linked to another (schools, countries, organizations, personalities, etc.) • The information about nodes is what we usually collect in standard social science research: demographics, attitudes, behaviors, etc. • Often includes dynamic information about when the node is active • b) Edges: Information on the relations among individuals (lines, edges, arcs) • Records a connection between the nodes in the network • Can be valued, directed (arcs), binary or undirected (edges) • One-mode (direct ties between actors) or two-mode (actors share membership in an organization) • Includes the times when the relation is active • Graph theory notation: G(V,E)
Directed, binary b d Undirected, binary b b d d b d a c e a a c c e e 1 2 1 3 4 a c e Directed, Valued Undirected, Valued Social Network Data Basic Data Elements In general, a relation can be: (1) Binary or Valued (2) Directed or Undirected The social process of interest will often determine what form your data take. Almost all of the techniques and measures we describe can be generalized across data format.
Social Network Data Basic Data Elements In general, a relation can be: (1) Binary or Valued (2) Directed or Undirected b d a c e Directed, Multiplex categorical edges The social process of interest will often determine what form your data take. Almost all of the techniques and measures we describe can be generalized across data format.
Primary Group Ego-Net Best Friend Dyad 2-step Partial network Social Network Data Basic Data Elements: Levels of analysis Global-Net
Social Network Data Basic Data Elements: Levels of analysis We can examine networks across multiple levels: • 1) Ego-network • - Have data on a respondent (ego) and the people they are connected to (alters). Example: 1985 GSS module • - May include estimates of connections among alters • 2) Partial network • - Ego networks plus some amount of tracing to reach contacts of contacts • - Something less than full account of connections among all pairs of actors in the relevant population • - Example: CDC Contact tracing data for STDs
Social Network Data Basic Data Elements: Levels of analysis We can examine networks across multiple levels: • 3) Complete or “Global” data • - Data on all actors within a particular (relevant) boundary • - Never exactly complete (due to missing data), but boundaries are set • Example: Coauthorship data among all writers in the social sciences, friendships among all students in a classroom
Social Network Data Graph Layout A good network drawing allows viewers to come away from the image with an almost immediate intuition about the underlying structure of the network being displayed. However, because there are multiple ways to display the same information, and standards for doing so are few, the information content of a network display can be quite variable. Consider the 4 graphs drawn at right. After asking yourself what intuition you gain from each graph, click on the screen. Now trace the actual pattern of ties. You will see that these 4 graphs are exactly the same.
Social Network Data Basic Data Structures In general, graphs are cumbersome to work with analytically, though there is a great deal of good work to be done on using visualization to build network intuition. I recommend using layouts that optimize on the feature you are most interested in. The two I use most are a hierarchical layout or a force-directed layout are best. We’ll see some examples of “best practice” after getting a little more familier with data structure.
a a b b c c d d e e b d b d a a 1 1 a c e a 1 1 c e b b 1 c c 1 1 1 1 1 1 d d 1 1 e e 1 1 1 1 Social Network Data Basic Data Structures From pictures to matrices Undirected, binary Directed, binary
a b b a c c b d e d c e e c d a b c d e a 1 1 b 1 c 1 1 1 d 1 1 e 1 1 Social Network Data Basic Data Structures From matrices to lists Arc List Adjacency List a b b a b c c b c d c e d c d e e c e d
Social Network Data Basic Data Elements: Modes • Social network data are substantively divided by the number of modes in the data. • 1-mode data represents edges based on direct contact between actors in the network. All the nodes are of the same type (people, organization, ideas, etc). Examples: • Communication, friendship, giving orders, sending email. • There are no constraints on connections between classes of nodes. • 1-mode data are usually singly reported (each person reports on their friends), but you can use multiple-informant data, which is more common in child development research (Cairns and Cairns).
Social Network Data Basic Data Elements: Modes Social network data are substantively divided by the number of modes in the data. 2-mode data represents nodes from two separate classes, where all relations cross classes. Examples: People as members of groups People as authors on papers Words used often by people Events in the life history of people The two modes of the data represent a duality: you can project the data as people connected to people through joint membership in a group, or groups to each other through common membership N-mode data generalizes the constraint on ties between classes to N groups
Social Network Data Basic Data Elements: Modes Breiger: 1974 - Duality of Persons and Groups Argument: Metaphor: people intersect through their associations, which defines (in part) their individuality. The Duality argument is that relations among groups imply relations among individuals
Social Network Data Basic Data Elements: Modes Bipartite networks imply a constraint on the mixing, such that ties only cross classes. Here we see a tie connecting each woman with the party she attended (Davis data)
Social Network Data Basic Data Elements: Modes Bipartite networks imply a constraint on the mixing, such that ties only cross classes. Here we see a tie connecting each woman with the party she attended (Davis data)
Social Network Data Basic Data Elements: Modes By projecting the data, one can look at the shared between people or the common memberships in groups: this is the person-to-person projection of the 2-mode data.
Social Network Data Basic Data Elements: Modes By projecting the data, one can look at the shared between people or the common memberships in groups: this is the group-to-group projection of the 2-mode data.
Social Network Data Basic Data Elements: Modes Working with two-mode data A person-to-group adjacency matrix is rectangular, with persons down rows and groups across columns Each column is a group, each row a person, and the cell = 1 if the person in that row belongs to that group. You can tell how many groups two people both belong to by comparing the rows: Identify every place that both rows = 1, sum them, and you have the overlap. 1 2 3 4 5 A 0 0 0 0 1 B 1 0 0 0 0 C 1 1 0 0 0 D 0 1 1 1 1 E 0 0 1 0 0 F 0 0 1 1 0 A =
1 2 3 4 5 A 0 0 0 0 1 B 1 0 0 0 0 C 1 1 0 0 0 D 0 1 1 1 1 E 0 0 1 0 0 F 0 0 1 1 0 A = A B C D E F 1 0 1 1 0 0 0 2 0 0 1 1 0 0 3 0 0 0 1 1 1 4 0 0 0 1 0 1 5 1 0 0 1 0 0 AT = Social Network Data Basic Data Elements: Modes Working with two-mode data One can get either projection easily with a little matrix multiplication. First define AT as the transpose of A (simply reverse the rows and columns). If A is of size P x G, then AT will be of size G x P.
A * AT = P (6x5)(5x6) (6x6) AT * A = P (5x6) 6x5) (5x5) P A B C D E F A 1 0 0 1 0 0 B 0 1 1 0 0 0 C 0 1 2 1 0 0 D 1 0 1 4 1 2 E 0 0 0 1 1 1 F 0 0 0 2 1 2 G 1 2 3 4 5 1 2 1 0 0 0 2 1 2 1 1 1 3 0 1 3 2 1 4 0 1 2 2 1 5 0 1 1 1 2 Social Network Data Basic Data Elements: Modes 1 2 3 4 5 A 0 0 0 0 1 B 1 0 0 0 0 C 1 1 0 0 0 D 0 1 1 1 1 E 0 0 1 0 0 F 0 0 1 1 0 A B C D E F 1 0 1 1 0 0 0 2 0 0 1 1 0 0 3 0 0 0 1 1 1 4 0 0 0 1 0 1 5 1 0 0 1 0 0 P = A(AT) G = AT(A) A = AT = (5x6) (6x5)
Social Network Data Basic Data Elements: Modes Theoretically, these two equations define what Breiger means by duality: “With respect to the membership network,…, persons who are actors in one picture (the P matrix) are with equal legitimacy viewed as connections in the dual picture (the G matrix), and conversely for groups.” (p.87) The resulting network: 1) Is always symmetric 2) the diagonal tells you how many groups (persons) a person (group) belongs to (has) In practice, most network software (UCINET, PAJEK) will do all of these operations. It is also simple to do the matrix multiplication in programs like SAS, SPSS, or R.
Social Network Data Network Data Sources: Existing data sources • Existing Sources of Social Network Data: • There are lots of network data archived. Check INSNA for a listing. The PAJEK data page includes a number of exemplars for large-scale networks. • 2-Mode Data • One can construct networks from many different data sources if you want to work with 2-mode data. Any list can be so transformed. • Director interlocks • Protest event participation • Authors on papers • Words in documents… • 1-Mode Data • Local Network data: • Fairly common, because it is easy to collect from sample surveys. • GSS, NHSL, Urban Inequality Surveys, etc. • Pay attention to the question asked • Key features are (a) number of people named and (b) whether alters are able to nominate each other.
Social Network Data Network Data Sources: Existing data sources • Existing Sources of Social Network Data: • 1-Mode Data • Partial network data: • Much less common, because cost goes up significantly once you start tracing to contacts. • Snowball data: start with focal nodes and trace to contacts • CDC style data on sexual contact tracing • Limited snowball samples: • Colorado Springs drug users data • Geneology data • Small-world network samples • Limited Boundary data: select data within a limited bound • Cross-national trade data • Friendships within a classroom • Family support ties
Social Network Data Network Data Sources: Existing data sources • Existing Sources of Social Network Data: • 1-Mode Data • Complete network data: • Significantly less common and never perfect. • Start by defining a theoretically relevant boundary • Then identify all relations among nodes within that boundary • Co-sponsorship patterns among legislators • Friendships within strongly bounded settings (sororities, schools) • Examples: • Add Health on adolescent friendships • Hallinan data on within-school friendships • McFarland’s data on verbal interaction • Electronic data on citations or coauthorship (see Pajek data page) • See INSNA home page for many small-scale networks
Social Network Data Network Data Sources: Collecting network data • Boundary Specification Problem • Network methods describe positions in relevant social fields, where flows of particular goods are of interest. As such, boundaries are a fundamentally theoretical question about what you think matters in the setting of interest. • See Marsden (19xx) for a good review of the boundary specification problem • In general, there are usually relevant social foci that bound the relevant social field. We expect that social relations will be very clumpy. Consider the example of friendship ties within and between a high-school and a Jr. high:
Social Network Data Network Data Sources: Collecting network data • Network data collection can be time consuming. It is better (I think) to have breadth over depth. Having detailed information on <50% of the sample will make it very difficult to draw conclusions about the general network structure. • Question format: • If you ask people to recall names (an open list format), fatigue will result in under-reporting • If you ask people to check off names from a full list, you can often get over-reporting • c)It is common to limit people to a small number if nominations (~5). This will bias network measures, but is sometimes the best choice to avoid fatigue. • d) Concrete relational indicators are best (who did you talk to?) over attitudes that are harder to define (who do you like?)
Social Network Data Network Data Sources: Collecting network data Boundary Specification Problem While students were given the option to name friends in the other school, they rarely do. As such, the school likely serves as a strong substantive boundary
Social Network Data Network Data Sources: Collecting network data • Local Network data: • When using a survey, common to use an “ego-network module.” • First part: “Name Generator” question to elicit a list of names • Second part: Working through the list of names to get information about each person named • Third part: asking about relations among each person named. • GSS Name Generator: • “From time to time, most people discuss important matters with other people. Looking back over the last six months -- who are the people with whom you discussed matters important to you? Just tell me their first names or initials.” • Why this question? • Only time for one question • Normative pressure and influence likely travels through strong ties • Similar to ‘best friend’ or other strong tie generators • Note there are significant substantive problems with this name generator
Social Network Data Network Data Sources: Collecting network data • Electronic Small World name generator:
Social Network Data Network Data Sources: Collecting network data • Local Network data: • The second part usually asks a series of questions about each person • GSS Example: • “Is (NAME) Asian, Black, Hispanic, White or something else?” ESWP example: Will generate N x (number of attributes) questions to the survey
1 2 3 4 5 1 2 3 4 5 Social Network Data Network Data Sources: Collecting network data • Local Network data: • The third part usually asks about relations among the alters. Do this by looping over all possible combinations. If you are asking about a symmetric relation, then you can limit your questions to the n(n-1)/2 cells of one triangle of the adjacency matrix: GSS: Please think about the relations between the people you just mentioned. Some of them may be total strangers in the sense that they wouldn't recognize each other if they bumped into each other on the street. Others may be especially close, as close or closer to each other as they are to you. First, think about NAME 1 and NAME 2. A. Are NAME 1 and NAME 2 total strangers? B. ARe they especially close? PROBE: As close or closer to eahc other as they are to you?
Social Network Data Network Data Sources: Collecting network data • Local Network data: • The third part usually asks about relations among the alters. Do this by looping over all possible combinations. If you are asking about a symmetric relation, then you can limit your questions to the n(n-1)/2 cells of one triangle of the adjacency matrix:
Social Network Data Network Data Sources: Collecting network data • Snowball Samples: • Snowball samples work much the same as ego-network modules, and if time allows I recommend asking at least some of the basic ego-network questions, even if you plan to sample (some of) the people your respondent names. • Start with a name generator, then any demographic or relational questions. • Have a sample strategy • Random Walk designs (Klovdahl) • Strong tie designs • All names designs • Get contact information from the people named • Snowball samples are very effective at providing network context around focal nodes. New work on “Respondent Driven Sampling (RDS) makes it possible to get good representation even with initially biased seed nodes. http://www.respondentdrivensampling.org/reports/RDSrefs.htm
Social Network Data Network Data Sources: Collecting network data • Snowball Samples:
Social Network Data Network Data Sources: Collecting network data • Complete Network data • Data collection is concerned with all relations within a specified boundary. • Requires sampling every actor in the population of interest (all kids in the class, all nations in the alliance system, etc.) • The network survey itself can be much shorter, because you are getting information from each person (so ego does not report on alters). • Two general formats: • Recall surveys (“Name all of your best friends”) • Check-list formats: Give people a list of names, have them check off those with whom they have relations.
Social Network Data Network Data Sources: Collecting network data • Complete network surveys require a process that lets you link answers to respondents. • You cannot have anonymous surveys. • Recall: • Need Id numbers & a roster to link, or hand-code names to find matches • Checklists • Need a roster for people to check through