1 / 54

Network Data Collection

Explore the process of collecting social network data, including research design, boundary specification, and the impact of data accuracy. Discover how various social mechanisms and relations play a role in health outcomes and other areas of interest.

alicel
Download Presentation

Network Data Collection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Data Collection

  2. Social Network Data Outline • Collecting Data • Research Design • Relational Content • Boundary Specification • Network Samples • Local • Global • Link Tracing designs • B. Sources • Archive, Observation, Survey • Survey • a. Name Generators • b. Delivery Mode • Data Accuracy • How accurate are network survey data? • Effect on measurement • What can we do about inaccurate or missing data?

  3. Social Network Data Research Design: new data collection What information do you want to collect? This is ultimately a theory question about how you think the social network matters and what social or biological mechanisms matter for the outcome of interest. This is driven by thinking through: Health Outcome  Mechanism  Relation(s) Examples: Sometimes the relations are clear: STD/HIV  Contagion-carrying contact  Sex, Drug sharing, etc. Sometimes not so much: Health Behavior  Information flow  Discussion networks Health Behavior  Social Conformity Pressure  Admiration nets Health Behavior  opportunities  Unsupervised interaction

  4. Social Network Data Research Design: new data collection What information do you want to collect? Sometimes the outcome is deliberately unspecified, as when you are collecting data for a large common use projects (GSS, Add Health, NHRS). Then the design is effectively reversed: What relations capture the most (general? comprehensive? efficacious? Reliable?) social mechanisms that will be of broad interest? Disease Contact Suicidal Ideation Excitement Relation(s) Substance Use Respect BMI Pressure Treatment adherence Information Social mechanism ambiguity allows broad use, which favors relations that tend to be general. This, of course, makes crisp causal associations more difficult.

  5. Social Network Data Research Design: New data collection What information do you want to collect? Health Outcome  Mechanism  Relation(s) Relations themselves are often multi-dimensional…do these matter for your question? - Perception vs. interaction? “who do you like?”  “who do you talk with?” - Intensity? “How often …”, “how much…” strong vs. weak • Dynamics? Starting & ending dates, everyday contact or sporadic?

  6. Social Network Data Research Design: Boundary Specification What is the theoretically relevant population? Boundary Specification Network methods describe positions in relevant social fields, where flows of particular goods are of interest. As such, boundaries are a fundamentally theoretical question about what you think matters in the setting of interest. In general, there are usually relevant social foci that bound the relevant social field. We expect that social relations will be very clumpy. Consider the example of friendship ties within and between a high-school and a Jr. high:

  7. Social Network Data Research Design: Boundary Specification What is the theoretically relevant population? Networks are (generally) treated as bounded systems, what constitutes your bound? Global Local Everyone connected to ego in the relevant manner (all friends, all sex partners) All relations relevant to social action (“adolescent peer network” or “Community Health Leaders” ) “Realist” (Boundary from actors’ Point of view) Relations defined by a name-generator, typically limited in number (“5 closest friends”) Relations within a particular setting (“School friends” or “Physicians serving this hospital”) Nominalist (Boundary from researchers’ point of view) Most of the time….these boundaries are porous

  8. Social Network Data Research Design: Boundary Specification What is the theoretically relevant population? Add Health: while students were given the option to name friends in the other school, they rarely do. As such, the school likely serves as a strong substantive boundary

  9. Social Network Data Research Design: Boundary Specification Boundaries are often defined theoretically the relation not the setting: Physician patient-sharing networks: Physicians who share (Medicare) patients (within one hospital) For all patients selected in Ohio….

  10. Social Network Data Research Design: Boundary Specification • In practice: • set a pragmatic bound that captures the bulk of theoretically relevant data • Collect data on boundary crossing. • You might ask “friends in this neighborhood” but also “Other close friends?” • Don’t limit nominations to current setting, but only trace within the bounds. • Good prior research, ethnography, informants, etc. should be used to identify the bounds as best as possible, but these sorts of data allow one to at least control for out-of-sample effects in models. • For adaptive sampling, such as link-trace designs, you might use a capture/recapture rule to figure out if you’ve saturated your population. Once you stop receiving new names…you’ve finished. • --but, if you jump to a new population…this can be hard to discern.

  11. Social Network Data Research Design: Network Sample • The level of analysis implies a perspective on sampling: • Local  random probability sampling • Adaptive  Link trace, RDS • Complete  Census These are not as dissimilar as they may appear: • Local nets imply global connectivity: • Every ego-network is a sample from the population-level global network, and thus should be consistent with a constrained range of global networks. • If you have a clustered setting, many alters in a local network may overlap, making partial connectivity information possible. • For attribute mixing (proportion of whites with black friends, low BMI with high, users with non-usres, etc.), ego-network data is sufficient to draw global inference

  12. Social Network Data Research Design: Network Sample Data collection strategy (The column distinction is squishy…)

  13. Social Network Data Research Design: Network Sample 1. Ego Network Sampling (analysis will be covered in separate session) • Most similar to standard social survey: • Easily sampled (as any other survey implementation) • All information comes from the respondent, so very subject to personal projection. • Ask ego to report on characteristics of alter • For k alters and q attributes  adding kq questions • i.e. 5 friends with 10 behaviors adds 50 questions to the survey! • Ask ego to report on relations amongst alters. • For k alters and j relational features  j(k(k-1)/2) questions • i.e. 5 friends and 2 relation question is 20 questions: 2*((5*4)/2) Alter 1 Alter 2 Respondent Alter 3 Alter 4

  14. Social Network Data Research Design: Network Sample • Snowball and “link trace” designs Link-Tracing Designs Ego-networks Complete Census Basic idea is to use “adaptive sampling” – start with (a) seed node(s), identify the network partners, and then interview them. Earliest “snowball” samples are of this type. Most recent work is “respondent driven sampling. (RDS)” -- If done systematically, some inference elements are knowable. Else, you have to try and disentangle the sampling process from the real structure

  15. Social Network Data Research Design: Network Sample • Global network samples: Population Census • Key issue is to enumerate the population & collect relational information on all. • If dynamic, this can make implementation difficult • Tends to force case-study style designs (highly clustered settings) • Contrast N of networks with N of respondents • Because behavior is self-reported (rather than alter reported), adding network questions to a census-based survey is low cost. • If you are doing a census anyway….then good to add network questions. Propser Peers followed this strategy.

  16. Social Network Data Network Data Sources: Secondary & archival data • Extant direct network data • National Health and Social Life Survey • Americans’ Changing Life Study • Add Health • Prosper Peers • Archival Sources • Most common is two-mode data, records of people in groups or shared activity • Examples: • Electronic Health Records • Hospital transfer records • Admission records • Group membership • collaboration Key issue with any secondary or archival data is you have to take what you can get…

  17. Social Network Data Network Data Sources: survey data Survey Elements • Informed consent • It is important to let people know that their identities matter: network data are confidential but (at least in the construction) not anonymous. • Name Generator Questions • General term for what relation you are trying to tap. • Many extant name generators out there…most evidence suggests that people are very sensitive to the questions asked. • If you ask multiple relations, be clear whether it is OK to repeat names! • Response Format • Open List  number of lines suggests “right” answer • Check off/select  very simple on/off, might result in over-estimates • Limit choice  limiting choice limits degree which affects *every* network statistics. • Rank/Rate  asking people to rank each other is difficult (and can backfire!) • If multiple name generators – grid or separate questions?

  18. Social Network Data Network Data Sources: survey data If you use surveys to collect data, some general rules of thumb: • Network data collection can be time consuming. If interests are in network-level structure effects, it is better to have breadth over depth. Having detailed information on <50% of the sample will make it very difficult to draw conclusions about the general network structure. If interest is in detail interpersonal information – social support for example – detailed information on one or two key ties might be more important. Survey time is the crucial resource: never enough to ask everything you want. • Question format: • If you ask people to recall names (an open list format), fatigue will result in under-reporting • If you ask people to check off names from a full list, you can often get over-reporting c)It is common to limit people to ~5 nominations. This will bias network stats for stars, but is sometimes the best choice to avoid fatigue.

  19. Social Network Data Network Data Sources: survey data Local Network data: • When using a survey, common to use an “ego-network module.” • First part: “Name Generator” question to elicit a list of names • Second part: Working through the list of names to get information about each person named • Third part: asking about relations among each person named. GSS Name Generator: “From time to time, most people discuss important matters with other people. Looking back over the last six months -- who are the people with whom you discussed matters important to you? Just tell me their first names or initials.” Why this question? • Only time for one question • Normative pressure and influence likely travels through strong ties • Similar to ‘best friend’ or other strong tie generators • Note there are significant substantive problems with this name generator

  20. 1 2 3 4 5 1 2 3 4 5 Social Network Data Network Data Sources: survey data Local Network data: The third part usually asks about relations among the alters. Do this by looping over all possible combinations. If you are asking about a symmetric relation, then you can limit your questions to the n(n-1)/2 cells of one triangle of the adjacency matrix: GSS: Please think about the relations between the people you just mentioned. Some of them may be total strangers in the sense that they wouldn't recognize each other if they bumped into each other on the street. Others may be especially close, as close or closer to each other as they are to you. First, think about NAME 1 and NAME 2. A. Are NAME 1 and NAME 2 total strangers? B. ARe they especially close? PROBE: As close or closer to eahc other as they are to you?

  21. Social Network Data Network Data Sources: survey data Local Network data: The third part usually asks about relations among the alters. Do this by looping over all possible combinations. If you are asking about a symmetric relation, then you can limit your questions to the n(n-1)/2 cells of one triangle of the adjacency matrix:

  22. Social Network Data Network Data Sources: survey data Complete network surveys require a process that lets you link answers to respondents. • You cannot have anonymous surveys. • Recall format: • Need Id numbers & a roster to link, or hand-code names to find matches • Checklists • Need a roster for people to check through (1994)

  23. Social Network Data Network Data Sources: survey data Complete network surveys require a process that lets you link answers to respondents. • Typically you have a number of data tradeoffs: • Limited number of responses. • Eases survey construction & coding, lowers density & degree, which affects nearly every other system-level measure. • Evidence that people try to fill all of the slots. • Name check-off roster (names down a row or on screen, relations as check-boxes). • Easy in small settings or CADI, but encourages over-response. • The “Amy Willis” Problem. • Open recall list. • Very difficult cognitively, requires an extra name-matching step in analysis. • Still have to give slots in pen & paper, can be dynamic on-line. Think carefully about what you want to learn from your survey items.

  24. Social Network Data Network Data Sources: survey data Check off or Open Ended? Open ended require more of respondents…subject to fatigue & size suggestion

  25. Social Network Data Network Data Sources: survey data Check off or Open Ended? Check off is simpler – particularly if yes/no – but also subject to over-response.

  26. Social Network Data Network Data Sources: survey data • Ask respondent for yes/no decisions or quantitative assessment? • Yes/no are cognitively easier (therefore reliable, believable), • Yes/no *much* faster to administer • But yes/no provides no discrimination among levels –ratings provide more nuance • •A series of binaries can replace one quant rating: Instead of “How often do you see each person?” • 1 = once a year; 2 = once a month; 3 = once a week; etc. • Use three questions (in this order): • Who do you see at least once a year? • Who do you see at least once a month? • Who do you see at least once a week? Slide from Steve Borgatti: http://www.analytictech.com/mgt780/slides/survey.pdf

  27. Social Network Data Network Data Sources: survey data • Absolute: • “How often do you talk to _____, on average?” • –Need to do pre-testing to determine appropriate time scale • Danger of getting no variance • –Assumes a lot of respondents • Relative: • “How often do you speak to each person on the list below?” • Very infrequently, Somewhat infrequently, About average, Somewhat frequently, Very frequently • Assumes less of respondents; easier task • Is automatically normalized within respondent • Makes it harder to compare values across respondents Slide from Steve Borgatti: http://www.analytictech.com/mgt780/slides/survey.pdf

  28. Social Network Data Network Data Sources: survey data Survey Mode Lots of ongoing research on best practices. Focus on clear design, careful wording. Pretest as much as you can afford Key advantage of electronic survey is data processing on the back-end. Even with open-ended; no data entry. See: https://www.une.edu/sites/default/files/Microsoft-Word-Guiding-Principles-for-Mail-and-Internet-Surveys_8-3.pdf

  29. Social Network Data Data Accuracy: Survey induced error How reliable are network data? In a well-known series of studies, BKS compare recall of communication with records of communication, and recall doesn’t do well… • Killworth, P. D . , Bernard, H. R. 1976. Informant accuracy in social network data. Hum. Organ. 35:269-86 • Bernard, H. R . , Killworth, P. D. 1977. Informant accuracy in social network data, II. Hum. Commun. Res. 4:3-18 • Killworth, P. D. , Bernard, H. R. 1979. Informant accuracy in social network data, III. A Comparison of triadic structures in behavioral and cognitive data. Soc. Networks 2 : 1 9-46 • Bernard, H. R., Killworth , P. D . , Sailer, L. 1980. Informant accuracy in social network data, IV. A comparison of clique-level structure in behavioral and cognitive data. Soc. Networks 2: 1 91-218 • Bernard H, Killworth P and Sailer L. 1982. Informant accuracy in social network data V. Social Science Research, 11, 30-66. The Problem of Informant Accuracy: The Validity of Retrospective Data Annual Review of Anthropology Vol. 13: 495-517 (Volume publication date October 1984) DOI: 10.1146/annurev.an.13.100184.002431

  30. Social Network Data Data Accuracy: Survey induced error How reliable are network data? The BKS studies sparked a bunch of work on network survey reliability and the results are mixed. Some general features: Important relations are recalled People bias toward “common” activities… …that are relationally salient. Behavior reports are more consistent than attitude reports Strong survey, interviewer or instrument effects.

  31. Social Network Data Data Accuracy: Survey induced error How reliable are network data?

  32. Social Network Data Data Accuracy: Survey induced error How reliable are network data? Assessing accuracy is difficult, because respondents report on relations over the last 6 months (or year, depending on type), but may be interviewed at different times.

  33. Social Network Data Data Accuracy: Survey induced error How reliable are network data? Once we account for observation windows and question length, we find very high concordance on dates of relations.

  34. Social Network Data Data Accuracy: Survey induced error How reliable are network data? For ego-level ties that were not timed, we can ask if a t1 nomination is retained: If I “ever did drugs” with you at t1, then I should also have reported doing so at future data collections. Very few relations are “recanted” (4.7% sex, 13.6% drug, 3% social).

  35. Social Network Data Data Accuracy: Survey induced error How reliable are network data? Proportion of times a “matrix” tie is corroborated by a direct response? A Given: Ego How often: B A B B A

  36. Social Network Data Data Accuracy: Survey induced error How reliable are network data? Proportion of times a “matrix” tie is corroborated by a direct response? A Given: How often: B Ego A B A B

  37. Social Network Data Data Accuracy: Survey induced error How reliable are network data? Why are the Colorado Springs data so much more reliable than the BKS data? Very dedicated data collectors No nomination limits on self-reports Highly salient relations in a small community

  38. Social Network Data Data Accuracy: Survey induced error • Interviewer effects • Systematic variation in responses by interviewer (Paik and Sagacharin, 2013; Marsden, 2003) • Design of the survey instrument (Lozar, Vehovarand Hlebec, 2004) • Panel Conditioning (Lazarsfeld, 1940; Warren and Halpern-Manner, 2012) • Rise of panels for basic social research (Keeter et al., 2015) • Survey memory is short (Groves, 1986)

  39. Social Network Data Data Accuracy: Survey induced error

  40. Social Network Data Data Accuracy: Survey induced error Respondent Names 5 Confidants Probability Source: Clergy Health Panel Survey 2008

  41. Social Network Data Data Accuracy: Survey induced error

  42. Social Network Data Effects of missing data Whatever method is used, data will always be incomplete. What are the implications for analysis? Example 1. Ego is a matchable person in the School Out Un Out Out Un Un M Ego M Ego M M M M M M True Network Observed Network

  43. Social Network Data Effects of missing data Example 2. Ego is not on the school roster M M M Un M Un M M M M M M Un Un Un True Network Observed Network

  44. Social Network Data Effects of missing data Example 3: Node population: 2-step neighborhood of Actor X Relational population: Any connection among all nodes F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3 Full (0) Full Full (0) F 1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 Full (0) F Full Full 1-step UK F (0) Full Full 2-step 3-step F (0) Full (0) Unknown UK

  45. 1-step 2-step 3-step Social Network Data Effects of missing data Example 4 Node population: 2-step neighborhood of Actor X Relational population: Trace, plus All connections among 1-step contacts F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3 Full (0) Full Full (0) F 1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 Full (0) F Full Full UK F (0) Full Unknown F (0) Full (0) Unknown UK

  46. Social Network Data Effects of missing data Example 5. Node population: 2-step neighborhood of Actor X Relational population: Only tracing contacts F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3 Full (0) Full Full (0) F 1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 Full (0) F Unknown Full 1-step UK F (0) Full Unknown 2-step 3-step F (0) Full (0) Unknown UK

  47. Social Network Data Effects of missing data Example 6 Node population: 2-step neighborhood from 3 focal actors Relational population: All relations among actors Focal 1-Step 2-Step 3-Step Focal Full Full Full (0) Full (0) Full (0) Full Full Full 1-Step UK Full (0) Full Full 2-Step Full (0) 3-Step Full (0) Unknown UK

  48. Social Network Data Effects of missing data Example 7. Node population: 1-step neighborhood from 3 focal actors Relational population: Only relations from focal nodes Focal 1-Step 2-Step 3-Step Focal Full Full Full (0) Full (0) Full (0) Full Unknown Unknown 1-Step UK Full (0) Unknown Unknown 2-Step Full (0) 3-Step Full (0) Unknown UK

  49. Social Network Data Effects of missing data on measures Smith & Moody, 2014, Smith, Morgan & Moody 2016 Identify the practical effect of missing data as a measurement error problem: induce error and evaluate effect. Randomly select nodes to delete, remove their edges & recalculate statistics of interest.

  50. Social Network Data Effects of missing data on measures Smith & Moody, 2014

More Related