570 likes | 677 Views
Measuring Things. Measurement. In this Slide Show Measurement: From Concept to Variable Coding Levels of Measurement Validity and Reliability. Measurement. A: FROM CONCEPT TO VARIABLE. Measurement .
E N D
Measurement In this Slide Show Measurement: • From Concept to Variable • Coding • Levels of Measurement • Validity and Reliability
Measurement A: FROM CONCEPT TO VARIABLE
Measurement To speak responsibly about our world, we need ways to document and measure what we believe is out there—concepts. “Variable” is the term that we use to denote a measured concept. A variable has more than one value or category. (A constant has only one value or category and is of no use.) In social sciences, measurement is sometimes complex. How should a researcher define poverty? Income level Nourishment Living conditions Property There is an official definition in the US. Should we use it?
Measurement To do research on a concept such as poverty, one needs a definition of poverty that one then uses consistently and communicates to consumers of the research. One must also devise strategies for measuring a concept like poverty. Using official guidelines, one would need to measure income and family size to classify people. “Operationalization” is the term used to denote the ways one measures concepts to form variables.
Measurement An example of operationalization for poverty, ask respondents to report via survey questions their family: • Income, “Before taxes, what is the amount of money that you make from all sources?” • Size, “What is the number of persons in your household, including only yourself, spouse, and all dependents?” Classify the answer according to the US Poverty Guidelines. Poverty: Yes, under threshold No, over threshold
Measurement “Operationalization” can be done in many ways. For example: • One can make observations. • One can use official information. • One can ask questions of respondents. Social scientists most commonly ask questions of people via survey.
Measurement If one is going to collect data using questions, one must learn to construct good questions. We typically err on the side of using others’ questions that have already been used constructed well to measure concepts. (Constructing questions will be more fully addressed later.) At this point, we are going to discuss types of measurement and determining whether questions are measuring concepts well.
Measurement There are Two General Categories of Questions: (1) Open-ended questions allow respondents to write in their answers, without response options • Preferable if full range of responses cannot be anticipated • Useful for exploratory research • Better for smaller samples/qualitative research Example: Reflect on the type of television programming you prefer to watch. What do you enjoy most about that programming?
Measurement There are Two General Categories of Questions: (2) Closed-ended questions offer respondents a limited set of response options that should be mutually exclusive and exhaustive (unless “check all that apply” is useful) • Easy to process and quantify, efficient • Much thought must go into constructing each question • May obscure what people really think • Better for larger samples/quantitative research Bad Example? Which type of television program do you enjoy the most? a. Drama b. Comedy c. Romance d. Talk e. News Question Response Options
Measurement Closed-ended questions offer respondents a limited set of response options that should be mutually exclusive and exhaustive (unless “check all that apply” is useful) • Easy to process and quantify, efficient • Much thought must go into constructing each question • May obscure what people really think • Better for larger samples/quantitative research Better Example: Which type of television program do you enjoy the most? a. Drama b. Comedy c. Romance d. Talk e. News f. Sports g. Other, please specify: ___________________ Question Response Options
Measurement • Concepts become Numbers • In quantitative research, we turn responses to our “measuring devices” (questions) into numbers. This is called coding. • This is just like using instruments in other research (such as medicine) to quantify concepts.
Measurement Do you agree or disagree with the following statement? Research methods class is awesome! • Strongly agree • Agree • Neither agree nor disagree • Disagree • Strongly disagree The researcher later codes the responses, assigning numbers to each response option, such as: a = 1, b = 2, c = 3, d = 4, e = 5
Measurement • Special Issue: Coding • Assigning numbers to response options so that statistics can be generated from answers (or so that computers can calculate our statistics) • Keep conventional logic and data analysis in mind when determining your coding schemes for responses • Code Dummy Variables with 0 = absence and 1 = presence • Increasing magnitude should be reflected by increasing codes • The GSS is horrifically coded!
Measurement • Special Issue: Coding • If your variable were Religiosity, and you asked, “how religious do you consider yourself?” Then you should not have a coding scheme like: 1, very religious 2, somewhat religious 3, barely religious 4, not at all religious Coding should be reversed! This is called reverse-coding.
Measurement • Special Issue: Coding From the general social survey codebook: Do you Feel that you're constantly under stress? CONTENT CODE SAMPLE Yes 1 3681 No 2 6422 Don't know 8 51 Not stated 9 595 This should be No = 0 and Yes =1 because the idea of the question is about stress. In binary logic, 1 equals the presence of something.
Measurement • Special Issue: Coding From the general social survey codebook: How frequently do you participate in bowling? CONTENT CODE SAMPLE 2-3/month 1 13 1-2/week 2 90 3+/week 3 5 This is actually appropriate, because the higher the code number, the more the person bowls.
Measurement Many concepts are measured with scales… Indexes and Scales A series of questions is used to more comprehensively measure a concept than would be possible with a single question. These are especially appropriate for measuring concepts that we know exist but cannot see. • We know the following exist, but we cannot directly view them: Self-esteem Well-being Gender Identity Depression Index: Each item is equally weighed to create a sum or average Scale: Some items add more value to the total measure than other items
Measurement For Example, Researchers typically operationalize self-esteem by using the Rosenberg Self-esteem Scale (which is technically an index).
Measurement Coding for the Rosenberg Self-Esteem Scale Positive Items Includes: Strongly Disagree = 1 Disagree = 2 Agree = 3 Strongly Agree = 4 The researcher enters the number that corresponds with each person’s answer. Then adds up all responses to form self-esteem score.
Measurement C: LEVELS OF MEASUREMENT
Measurement Levels of Measurement One must know the nature of one’s variables in order to understand what manipulations are appropriate (and later, which statistical tests to use because they must be mathematically manipulated for statistics). Nominal Level of Measurement Ordinal Level of Measurement Interval Level of Measurement Ratio Level of Measurement
Measurement Levels of Measurement Nominal Level of Measurement • Items or responses are categorical. When assigned numbers, the numbers have no mathematical interpretation. • A nominal variable classifies persons, places or things without implying any rank among them. • For example: Race: 1=black 2=white 3=Asian • Cars: 1=Chevy 2=Honda 3=Ford • Marriage:1=single 2=married 3=formerly mar • It makes no sense to add, subtract, multiply, or divide these.
Measurement Levels of Measurement Ordinal Level of Measurement • Items or responses are assigned to categories along a dimension of types with increasing value (or in order). The numbers only indicate order, not magnitude. • An ordinal variable ranks persons, places or things, but there is no accurate way to gauge the distance between them. • For example: • Professor Rank: 1=Assistant 2=Associate 3=Full • Sexy Cars: 1=Green Gremlin 2=Blue Impala 3=Red Audi • It typically makes no sense to add, subtract, multiply, or divide these. You may if using good judgment.
Measurement Levels of Measurement Interval-Ratio Level of Measurement • Interval: An interval variable assigns persons, places or things to a continuum that has specific intervals (of equal magnitude) between units of measure, but does not have an absolute zero point. E.g., Self-Esteem: Scale ranges from 10 to 40 • Ratio: A ratio variable notes the number of persons, places or things on a continuum that has a zero point and has specific intervals (of equal magnitude) between units of measure. Units of measure denote quantity. • E.g., Age: 0=0years, 1=1 year, 2=2years, 3=3years, etc. • It typically makes sense to conduct mathematical operations on these.
Measurement While each variable has a number assigned to each response, we must decide whether the numbers are meaningful or not. Nominal variables: meaningless numbers Ordinal variables: If meaningless, treat as nominal If meaningful, treat as interval-ratio Interval ratio: Numbers are numbers!
Measurement The special case of dichotomous variables: A dichotomous variable can take one of two values. For example: Female: 0=Male, 1=Female Hispanic: 0=Other, 1=Hispanic SUV: 0=Other, 1=SUV Are dichotomous variables nominal, ordinal, or interval-ratio?
Measurement • What level of measurement for this GSS variable?
Measurement • What level of measurement for this GSS variable?
Measurement • What level of measurement for this GSS variable?
Measurement • What level of measurement for this GSS variable?
Measurement • What level of measurement for this GSS variable?
Measurement D: VALIDITY and RELIABILITY
Measurement Validity and Reliability (The Quality of Our Variables) Researchers have an obligation to assess whether they operationalized concepts (measured variables) well. We do this by assessing the validity and reliability. Validation research is often an enterprise unto itself, and is used often in applied fields such as education and psychometrics. From Vogt’s Dictionary of Statistics… Validity: A term to describe a measurement instrument or test that accurately measures what it is supposed to measure; the extent to which a measure is free of systematic error. Validity requires reliability, but the reverse is not true. Reliability: Freedom from measurement error. In practice, this boils down to the consistency or stability of a measure or test from one use to the next.
Measurement Assessing our Measures Types of Validity • Face Validity • Content Validity • Criterion Validity Concurrent Validity Predictive Validity • Construct Validity Convergent Validity Discriminant Validity
Measurement 1. Face Validity The variable is a product of accurate measurement if the operationalization makes sense “on its face.” Logic tells you that you are measuring what you intend to measure. Example: Emotional bonds with father Valid Thinking of your father, how close in personal and emotional terms are you to him? Invalid How much time do you spend alone with your father?
Measurement 2. Content Validity The variable is a product of accurate measurement if the operationalization covers the full range of the concept’s meaning. Example: Self-esteem Valid Rosenberg’s ten questions covering aspects of feelings of worth and comparison with others. Invalid One question: Do you feel that you are as worthy as other people?
Measurement 3. Criterion Validity This validity is determined by the extent to which the variable is demonstrably related to concrete criteria in the "real" world. The variable was accurately measured if it: • predicts an outcome on another variable for which it should, or • is correlated with another established variable that measures the same or similar thing. Example: Racial Identification Valid Race Identity Scale (RIS) predicts involvement in race/ethnic student groups OR RIS has high correlation with the Racial Attitudes Scale for respondents’ own racial group. Invalid Race Identity Scale cannot predict names given to children OR is uncorrelated with Race Consciousness Scale.
Measurement 3a. Concurrent Criterion Validity The variable was accurately measured if it: • is correlated with another established variable measuring the same or similar concept at the same time. Example: Racial Identification Valid RIS has high correlation with the Racial Attitudes Scale for respondents’ own racial group. Invalid RIS is uncorrelated with Race Consciousness Scale the respondents were also given.
Measurement 3.Predictive Criterion Validity The variable was accurately measured if it: • predicts an outcome on another variable for which it should later in time. Example: Racial Identification Valid Race Identity Scale (RIS) for freshmen can predict involvement in race/ethnic student groups later on. Invalid Race Identity Scale of parents cannot predict names given to their children.
Measurement 4. Construct Validity The variable reflects accurate measurement of the concept if it is related to other variables as it theoretically should be. To assess, one determines whether the variable is associated with other variables as specified by theory (usually determined over multiple studies). Example: Locus of Control Valid People with high locus of control report greater compliance with treatment regimens. Also, high and low locus of control persons have similar self-esteem as each other (because these should be unrelated). Invalid People with low locus of control show similar health self-advocacy as those with high locus of control. Higher locus of control corresponds with higher sense of well-being.
Measurement 4. Construct Validity 4a. Convergent Validity (it behaves as it should) Variables that theoretically should be related to each other are, in fact, observed to be related to each other (there is convergence between similar or related concepts) Example: Locus of Control Valid People with high locus of control report greater compliance with treatment regimens. Invalid People with low locus of control show similar health self-advocacy as those with high locus of control.
Measurement 4. Construct Validity 4b. Discriminant Validity (it doesn’t behave as it should not) Variables that theoretically should not be related to each other are, in fact, observed to not be related to each other (you can discriminate between dissimilar concepts) Example: Locus of Control Valid People with high locus of control have similar self-esteem as those with low locus of control—because they should not be related. Invalid Higher locus of control is positively correlated with grade fatalism in college students (but it should not be).
Measurement • Other examples of criterion and construct validity? • Employment tests • Driving tests • Gender Identity and Depression • Gender Identity and reaction to depression
Measurement Assessing our Measures Types of Reliability Measuring Devices • Test-retest Reliability • Inter-item Reliability • Alternate-forms Reliability • Split-halves Reliability People Inter-observer Reliability
Measurement 1. Test-retest Reliability The variable is a reliable measure of a concept if a subsequent administration of the questionnaire yields similar scores (assuming that the concept should not have changed much over time). There should be a high correlation between the sample’s two scores on the measure. Example: Self-esteem Reliable Those with high (low) scores with the first administration of my scale have high (low) scores the second time. Not Reliable The second scores on my scale are not correlated with first.
Measurement 2. Inter-item Reliability(very commonly used) Responses are similar across items when using several items to form a single variable to measure a concept (scale). Often to avoid response bias, researchers use positively and negatively worded items. In these cases, each item’s responses should be positively correlated with the others of the same valence (+ or - phrasing). Example: Satisfaction with Life Scale Reliable The five items on the SWLS are highly correlated with each other. People who score low on one tend to score low on all. Not Reliable I add “became famous” and “used drugs,” but now the items are NOT provoking consistently similar results. 1. Life is ideal 2. Conditions are excellent 3. Satisfied 1..2..3..4..5..6..7 4. Got what wanted Disagree Agree 5. Would change nothing