510 likes | 664 Views
Social Network Data. Outline Collecting Data Relations Level of analysis Sources Data Accuracy How accurate is it? What can we do about it Network Ethics Data collection Informed Consent Deductive Disclosure Building a frame (closed-answer sets) Illicit Relations
E N D
Social Network Data • Outline • Collecting Data • Relations • Level of analysis • Sources • Data Accuracy • How accurate is it? • What can we do about it • Network Ethics • Data collection • Informed Consent • Deductive Disclosure • Building a frame (closed-answer sets) • Illicit Relations • Novel Compilations of “extant” data • Data Use • Risks of identifying R’s (and named non-R’s) positions • Action in response to nets (military, police, firm)
Social Network Data Collecting: theory concepts • What information do you want to collect? This is ultimately a theory question – about how you think the social network setting matters. Some dimensions to this question: • “actually existing social relations” or “perceived relations” • “Who do you eat lunch with?” vs. “Who is your friend” • “Who do you talk to” vs. “who is important in your life” • Are you more interested in getting the right contacts or the right type of contacts? • Dynamism: “Episodic” relations or “typical”/ “long-term” ties? • Research shows that people have a bias toward naming the normal – so we include people who are “usually” there – is that what you want? • Do you need to be able to distinguish naming flux from structural dynamics?
Social Network Data Collecting: theory concepts Grannis, AJS
Social Network Data Collecting: theory concepts Giant component transition as example of small changes making a big difference. Grannis, AJS
Social Network Data Level of Analysis • What scope of information do you want? • Boundary Specification: key is what constitutes the “edge” of the network Global Local Everyone connected to ego in the relevant manner (all friends, all (past?) sex partners) All relations relevant to social action (“adolescent peers network” or “Ruling Elite” ) “Realist” (Boundary from actors’ Point of view) Relations defined by a name-generator, typically limited in number (“5 closest friends”) Relations within a particular setting (“friends in school” or “votes on the supreme court”) Nominalist (Boundary from researchers’ point of view)
Social Network Data Level of Analysis Boundary Specification Problem While students were given the option to name friends in the other school, they rarely do. As such, the school likely serves as a strong substantive boundary
Social Network Data Level of Analysis Boundary Specification Problem Time Boundary effects on characteristics of the PhD exchange graph.
Social Network Data Level of Analysis Boundary Specification Problem For component size, Grannis gives a formula based on ratio of first to 2nd neighbors to identify where the phase transition is, and the conclusion one draws depends entirely on the definition of the relation.
Social Network Data Level of Analysis Network Sampling • The level of analysis implies a perspective on sampling: • Local random probability sampling • Complete Census • These two are not as dissimilar as they may appear: • Local nets imply global connectivity: • Every ego-network is a sample from the population-level global network, and thus should be consistent with a constrained range of global networks. See Jeff Smith’s work on this. • If you have a clustered setting, many alters in a local network may overlap, making partial connectivity information possible. • For attribute mixing (proportion of whites with black friends, etc.), ego-network data is sufficient to draw population inference
Social Network Data Level of Analysis Network Sampling • The level of analysis implies a perspective on sampling: • Local random probability sampling • Complete Census • These two are not as dissimilar as they may appear: • Complete networks never are: • Lots of efficiency in data collection is had because you don’t have to ask ego about alter’s characteristics if you are going to interview alter anyway. But, taking advantage of this efficiency assumes very high coverage rates.
Social Network Data Level of Analysis Network Sampling • Snowball and “link trace” designs Link-Tracing Designs Ego-networks Complete Census Basic idea is to use “adaptive sampling” – start with (a) seed node(s), identify the network partners, and then interview them. Earliest “snowball” samples are of this type. Most recent work is “respondent driven sampling. (RDS)” -- If done systematically, some inference elements are knowable. Else, you have to try and disentangle the sampling process from the real structure
Social Network Data Level of Analysis • Snowball Samples: • Start with a name generator, then any demographic or relational questions. • Have a sample strategy, examples include: • Random Walk designs (Klovdahl) • Strong tie designs • All names designs • Cross Links (Project 90) • Get contact information from the people named • If time, as at least some of the people a “network density” module as you would for any ego-network design • Snowball samples are very effective at providing network context around focal nodes. Detailed treatments of snowball sampling estimates are given in O. Frank’s work.
Social Network Data Sources • Existing Sources of Social Network Data: • There are lots of network data archived. Check INSNA for a listing. The PAJEK data page includes a number of exemplars for large-scale networks. • Local Network data: • Fairly common, because it is easy to collect from sample surveys. • GSS, NHSL, Urban Inequality Surveys, etc. • Pay attention to the question asked • Key features are (a) number of people named and (b) whether alters are able to nominate each other or not.
Social Network Data Sources • Existing Sources of Social Network Data: • Partial network data: • Much less common, because cost goes up significantly once you start tracing to contacts. • Snowball data: start with focal nodes and trace to contacts • CDC style data on sexual contact tracing • Limited snowball samples: • Colorado Springs drug users data • Geneology data • Small-world network samples • Limited Boundary data: select data within a limited bound • Cross-national trade data • Friendships within a classroom • Family support ties
Social Network Data Sources • Existing Sources of Social Network Data: • Complete network data: • Significantly less common and never perfect. • Start by defining a theoretically relevant boundary • Then identify all relations among nodes within that boundary • Co-sponsorship patterns among legislators • Friendships within strongly bounded settings (sororities, schools) • Examples: • Add Health on adolescent friendships • Hallinan data on within-school friendships • McFarland’s data on verbal interaction • Electronic data on citations or coauthorship (see Pajek data page) • See INSNA home page for many small-scale networks
Social Network Data Sources • Existing Sources of Social Network Data: • Complete network data: • Electronic Trace Data • Examples: • Sensor data (PNAS on high-school) • Cell Phone logs • Email logs • Bluetooth devices • Web traffic cookie data (which sites do you visit) • Often complete and non-intrusive; but meaning is still ambiguous and there are potential ethical issues that run deep.
Social Network Data Sources - Survey • Network data collection can be time consuming. It is better (I think) to have breadth over depth. Having detailed information on <50% of the sample will make it very difficult to draw conclusions about the general network structure. • Question format: • If you ask people to recall names (an open list format), fatigue will result in under-reporting • If you ask people to check off names from a full list, you can often get over-reporting • c)It is common to limit people to a small number if nominations (~5). This will bias network measures, but is sometimes the best choice to avoid fatigue. • d) People answer the question you ask, so be clear in what you ask.
Social Network Data Sources - Survey • Local Network data: • When using a survey, common to use an “ego-network module.” • First part: “Name Generator” question to elicit a list of names • Second part: Working through the list of names to get information about each person named • Third part: asking about relations among each person named. • GSS Name Generator: • “From time to time, most people discuss important matters with other people. Looking back over the last six months -- who are the people with whom you discussed matters important to you? Just tell me their first names or initials.” • Why this question? • Only time for one question • Normative pressure and influence likely travels through strong ties • Similar to ‘best friend’ or other strong tie generators • Note there are significant ambiguities with this name generator
Social Network Data Sources - Survey • Electronic Small World name generator:
Social Network Data Sources - Survey • Local Network data: • The second part usually asks a series of questions about each person • GSS Example: • “Is (NAME) Asian, Black, Hispanic, White or something else?” ESWP example: Will generate N x (number of attributes) questions to the survey
1 2 3 4 5 1 2 3 4 5 Social Network Data Sources - Survey • Local Network data: • The third part usually asks about relations among the alters. Do this by looping over all possible combinations. If you are asking about a symmetric relation, then you can limit your questions to the n(n-1)/2 cells of one triangle of the adjacency matrix: GSS: Please think about the relations between the people you just mentioned. Some of them may be total strangers in the sense that they wouldn't recognize each other if they bumped into each other on the street. Others may be especially close, as close or closer to each other as they are to you. First, think about NAME 1 and NAME 2. A. Are NAME 1 and NAME 2 total strangers? B. ARe they especially close? PROBE: As close or closer to eahc other as they are to you?
Social Network Data Sources - Survey • Local Network data: • The third part usually asks about relations among the alters. Do this by looping over all possible combinations. If you are asking about a symmetric relation, then you can limit your questions to the n(n-1)/2 cells of one triangle of the adjacency matrix:
Social Network Data Sources - Survey • Snowball Samples:
Social Network Data Sources - Survey • Complete Network data • Data collection is concerned with all relations within a specified boundary. • Requires sampling every actor in the population of interest (all kids in the class, all nations in the alliance system, etc.) • The network survey itself can be much shorter, because you are getting information from each person (so ego does not report on alters). • Two general formats: • Recall surveys (“Name all of your best friends”) • Check-list formats: Give people a list of names, have them check off those with whom they have relations.
Social Network Data Sources - Survey • Complete network surveys require a process that lets you link answers to respondents. • You cannot have anonymous surveys. • Recall: • Need Id numbers & a roster to link, or hand-code names to find matches • Checklists • Need a roster for people to check through
Social Network Data Sources - Archive • We often have information on links among people or organizations from archival records. Examples: • Citation or Acknowledgements in Science Networks • Co-membership in boards of directors • See as examples: Olimoney.net or theyrule.org both projects that use electronic tools to “scrape” the web for data on companies or campaign contributions. • http://dirtyenergymoney.com/view.php • http://www.theyrule.net/
Social Network Data Accuracy & Missing Data Whatever method is used, data will always be incomplete. What are the implications for analysis? Example 1. Ego is a matchable person in the School Out Un Out Out Un Un M Ego M Ego M M M M M M True Network Observed Network
Social Network Data Accuracy & Missing Data Example 2. Ego is not on the school roster M M M Un M Un M M M M M M Un Un Un True Network Observed Network
Social Network Data Accuracy & Missing Data Example 3: Node population: 2-step neighborhood of Actor X Relational population: Any connection among all nodes F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3 Full (0) Full Full (0) F 1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 Full (0) F Full Full 1-step UK F (0) Full Full 2-step 3-step F (0) Full (0) Unknown UK
1-step 2-step 3-step Social Network Data Accuracy & Missing Data Example 4 Node population: 2-step neighborhood of Actor X Relational population: Trace, plus All connections among 1-step contacts F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3 Full (0) Full Full (0) F 1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 Full (0) F Full Full UK F (0) Full Unknown F (0) Full (0) Unknown UK
Social Network Data Accuracy & Missing Data Example 5. Node population: 2-step neighborhood of Actor X Relational population: Only tracing contacts F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3 Full (0) Full Full (0) F 1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 Full (0) F Unknown Full 1-step UK F (0) Full Unknown 2-step 3-step F (0) Full (0) Unknown UK
Social Network Data Accuracy & Missing Data Example 6 Node population: 2-step neighborhood from 3 focal actors Relational population: All relations among actors Focal 1-Step 2-Step 3-Step Focal Full Full Full (0) Full (0) Full (0) Full Full Full 1-Step UK Full (0) Full Full 2-Step Full (0) 3-Step Full (0) Unknown UK
Social Network Data Accuracy & Missing Data Example 7. Node population: 1-step neighborhood from 3 focal actors Relational population: Only relations from focal nodes Focal 1-Step 2-Step 3-Step Focal Full Full Full (0) Full (0) Full (0) Full Unknown Unknown 1-Step UK Full (0) Unknown Unknown 2-Step Full (0) 3-Step Full (0) Unknown UK
Social Network Data Accuracy & Missing Data – Cloning Headless Frogs • Key Questions – an evaluation of the GSS name generator • What do people talk about? • Why do so many people not report talking about anything with anybody? • Given the heterogeneity of the topics discussed, is there a foundation from which one could use the GSS data to describe anything meaningful about core discussion networks? • Is there a pattern of topics to alters and how does this affect comparative analyses?
Social Network Data Accuracy & Missing Data – Cloning Headless Frogs • Key Questions: • What do people talk about? & Who did they talk to? Note that the topic was heavily dependent on the questionnaire order. In this survey, it was the first question.
Social Network Data Accuracy & Missing Data – Cloning Headless Frogs • Key Questions: • Why do so many people not report talking about anything with anybody?
Social Network Data Accuracy & Missing Data – Cloning Headless Frogs talks about what with who? Connections are significant cells from table 5.
Social Network Data Accuracy & Missing Data – Cloning Headless Frogs Who talks about what? Females Males Connections are large values from figure 1.
Social Network Data Accuracy & Missing Data – Cloning Headless Frogs • Why do so many people not report talking about anything with anybody? • 44% report nobody to talk to • More likely to be without spouses, unemployed and non-white • 56% report nothing important to talk about.
Social Network Data Accuracy & Missing Data – Cloning Headless Frogs • How good is the name generator? • Bearman and Parigi ask about what is being captured in the GSS name generator, which because of it’s placement in the GSS has become a standard question. • Others have done this, and found that the resulting list of names does not differ significantly (see Straits 2000). • Bearman & Parigi argue that to understand the network, you need to understand what it is people are really talking about. • The basic assumption of the GSS question is that people talk about important matters to people who are important to them.
Social Network Data Accuracy & Missing Data – Cloning Headless Frogs • End result suggest using questions that are linked directly to conversation domains of substantive interest. • Or, more generally, defining relationships that are of importance for your topic of study.
Social Network Data Ethics – Data Collection Goal: To gain key social insights in a manner than helps without hurting. - Responsibility to respondents - Fair, honest, safe treatment in return for participation. - Safety has typically relied on some combination of: a) Informed consent (they know what they are in for) b) Anonymity / Confidentiality - Responsibility to other network scientists - Our works should not jeopardize other’s ability to work Key problems: - Need to link respondents means anonymity cannot be complete - Some people named may not be respondents – thus have not given consent - Position within the network may create social hardship
Social Network Data Ethics – Data Collection Informed Consent - Respondents have a right to refuse to participate if they feel the work is unethical, burdensome, dangerous, or just plain don’t want to play. What standing do “secondary” respondents have? What dangers? - Imagine link-tracing from a mistress to a spouse….
Social Network Data Ethics – Data Collection Anonymity & Confidentiality Can work without names (in Ego-network) survey. But if the setting is highly clustered, you may still be able to identify through “deductive disclosure” -
Deductive Disclosure Risks: Social Network Data Ethics – Data Collection Start with: 536 White, Male, 10th Graders in Two parent Households: Who are Jewish: 10 And Have No Siblings: 1 Start with: 484 White, Male, 7th Graders in Two parent Households: Who Have Ever Been Held Back A Grade in School: 87 And Play Basketball: 5 And Smoke: 1
Deductive Disclosure Risks: Social Network Data Ethics – Data Collection Start with: 87 Black, Female, 12th Graders in Two parent Households: Who have Never been Held Back: 77 And Smoke Regularly: 5 And Have 2 siblings 1 And are Catholic 1
Deductive Disclosure Risks: Social Network Data Ethics – Data Collection Start with: 98 Black, Female, 7th Graders in One parent Households: Who Are Baptist: 41 And have no Siblings: 9 And Play Baskettball: 1 And have one Sibling: 13 And Smoke: 1 And have > one Sibling: 19 And are Born in April: 1
Deductive Disclosure Risks: Social Network Data Ethics – Data Collection This same feature can allow for anonymous interviewing of sensitive partners: use simple information to find implicit matches
Social Network Data Ethics – Data Collection Other Data Collection issues: -- Building a roster? You may need people’s permission to be on the roster. -- Binding limits of past research promises (if the interviewer knows, does that violate a “we will not tell anyone your answer” clause? -- Illicit or Illegal relations. If you find evidence of a crime in your snowball, do you have to report? (think age of sex partners vs selling/buying drugs) -- “non-invasive” data collection – bluetooth device readers, email logs, web-click or purchase / marketing data?
Social Network Data Ethics – Data use How the data are ultimately used is a key issue for the analyst. Consider this diagram. What role does the social network analyst have in deconstructing terrorist networks? Criminal Cartels? Gangs? Covert connectivity is a hallmark of many illicit relations, how do we fit into that work?