460 likes | 575 Views
Chapter 6: Selecting Measurement Instruments. Objectives
E N D
Chapter 6:Selecting Measurement Instruments • Objectives • State the relation between a variable and a construct, and distinguish among categories of variables (e.g., categorical and quantitative; dependent and independent) and the scales to measure them (e.g., nominal, ordinal, interval, and ratio). • Define measurement, and describe ways to interpret measurement data.
Selecting Measurement Instruments Objectives • Describe the types of measuring instruments used to collect data in qualitative and quantitative studies (e.g., cognitive, affective, and projective tests). • Define validity, and differentiate among content, criterion-related, construct, and consequential validity.
Selecting Measurement Instruments Objectives • Explain how to measure reliability, and differentiate among stability, equivalence, equivalence and stability, internal consistency, and scorer/rater reliability. • Identify useful sources of information about specific tests, and provide strategies for test selection. • Provide guidelines for test construction and test administration.
Data & Constructs • Data are the pieces of information you collect and use to examine your topic. • You must determine what type of data to collect. • A construct is an abstraction that cannot be observed directly but is invented to explain behavior. • e.g., intelligence, motivation, ability
Constructs & Variables • Constructs must be operationally- defined to be observable and measurable. • Variables are operationally-defined constructs. • Variables are placeholders that can assume any one of a range of values. • Variables may be measured by instruments.
Measurement Scales • The measurement scale is a system for organizing data. • Knowing your measurement scale is necessary to determine the type of analysis you will conduct.
Measurement Scales • Nominal variables describe categorical data. • e.g., gender, political party affiliation, school attended, marital status • Nominal variables are qualitative. • Quantitative variables range on a continuum with ordinal, interval, and ratio variables.
Measurement Scales • Ordinal variables describe rank order with unequal units. • e.g., order of finish, ranking of schools or groups as levels • Interval variables describe equal intervals between values. • e.g., achievement, attitude, test scores
Measurement Scales • Ratio variables describe all of the characteristics of the other levels but also include a true zero point. • e.g., total number of correct items on a test, time, distance, weight
Independent & Dependent Variables • Dependent variables are those believed to depend on or to be caused by another variable. • Dependent variables are also called criterion variables. • Independent variables are the hypothesized cause of the dependent variable. There must be at least two levels of an independent variable. • Independent variables are also called an experimental variables, manipulated variables, or treatment variables.
Characteristics of Instruments • There are three major ways for researchers to collect data. • A researcher can administer a standardized test. • e.g., an achievement test • A researcher can administer a self-developed instrument. • e.g., a survey you might develop • A researcher can record naturally-occurring events or use already available data. • e.g., recording off-task behavior of a student in a classroom
Instruments • Using standardized instruments takes less time than developing an instrument. • With standardized instruments, results from different studies that use the same instrument can be compared. • At times researchers may need to develop their own instruments. • To effectively design an instrument one needs expertise and time.
Instruments • Tests are a formal systematic procedure for gathering information about people. • Cognitive characteristics (e.g., thinking, ability) • Affective characteristics (e.g., feelings, attitude)
Instruments • A standardized test is administered, scored, and interpreted the same way across administrations. • e.g., ACT or SAT or Stanford Achievement test
Instruments • Assessment refers to the process of collecting, synthesizing, and interpreting information, including data from tests as well as from observations. • Formal or informal • Numerical or textual • Measurement is the process of quantifying or scoring assessment information. • Occurs after data collection
Instruments • Qualitative researchers often use interviews and observations. • Quantitative researchers often use paper and pencil (or electronic) methods. • Selection methods: The respondent selects from possible answers (e.g., multiple choice test). • Supply methods: The respondent has to provide an answer (e.g., essay items).
Instruments • Performance assessments emphasize student process and require creation of a product (e.g., completing a project).
Interpreting Instrument Data • Raw Score • Number or point value of items correct (e.g., 18/20 items correct). • Norm-referenced scoring • Student’s performance is compared with performance of others (e.g., grading on a curve).
Interpreting Instrument Data • Criterion-referenced scoring • Student’s performance is compared to preset standard (e.g., class tests). • Self-referenced scoring • How individual student’s scores change over time is measured (e.g., speeded math facts tests).
Types of Instruments • Cognitive tests measure intellectual processes (e.g., thinking, memorizing, calculating, analyzing). • Standardized tests measure individual’s current proficiency in given areas of knowledge or skill. • Standardized tests are often given as a test battery (e.g., Iowa test of basic skills, CTBS).
Types of Instruments • Diagnostic tests provide scores to facilitate identification of strengths and weaknesses (e.g., tests given for diagnosing reading disabilities). • Aptitude tests measure prediction or potential versus what has been learned (e.g., Wechsler Scales).
Affective Instruments • Affective tests measure affective characteristics (e.g., attitude, emotion, interest, personality). • Attitude scales measure what a person believes or feels. • Likert scales measure agreement on a scale. • Strongly agree, Agree, Undecided, Disagree, Strongly disagree
Affective Instruments • Semantic differential scales require the individual to indicate attitude by position on a scale. • Fair Unfair 3 2 1 0 -1 -2 -3 • Rating scales may require a participant to check the most appropriate description. • 5=always; 4=almost always, 3=sometimes… • The Thurstone Scale & Guttman Scales are also used to measure attitudes.
Additional Inventories • Interest inventories assess personal likes and dislikes (e.g., occupational interest inventories). • Values tests assess the relative strength of a person’s values (e.g., Study of Values instrument).
Additional Inventories • Personality inventories provide participants with statements that describe behaviors characteristic of given personality traits and the participant answers each statement (e.g., MMPI). • Projective tests were developed to eliminate some of the concerns with self-report measures. These tests are ambiguous so that presumably the respondent will project true feelings (e.g., Rorschach).
Criteria for Good Instruments • Validity refers to the degree that the test measures what it is supposed to measure. • Validity is the most important test characteristic.
Criteria for Good Instruments • There are numerous established validity standards. • Content validity • Criterion-related validity • Concurrent validity • Predictive validity • Construct validity • Consequential validity
Content Validity • Content validity addresses whether the test measures the intended content area. • Content validity is an initial screening type of validity. • Content validity is sometimes referred to as Face Validity. • Content validity is measured by expert judgment (content validation).
Content Validity • Content validity is concerned with both: • Item validity: Are the test items measuring the intended content? • Sampling validity: Do the items measure the content area being tested? • One example of a lack of content validity is a math test with heavy reading requirements. It may not only measure math but also reading ability and is therefore not a valid math test.
Criterion-Related Validity • Criterion-related validity is determined by relating performance on a test to performance on an alternative test or other measure. • Correlation coefficients are used to determine relative validity.
Criterion-Related Validity • Two types of criterion-related validity include: • Concurrent: The scores on a test are correlated to scores on an alternative test given at the same time (e.g., two measures of reading achievement). • Predictive: The degree to which a test can predict how well a person will do in a future situation, e.g., GRE, (with predictor represented by GRE score and criterion represented as success in graduate school).
Construct Validity • Construct validity is the most important form of validity. • Construct validity assesses what the test is actually measuring. • It is very challenging to establish construct validity.
Construct Validity • Construct validity requires confirmatory and disconfirmatory evidence. • Scores on tests should relate to scores on similar tests and NOT relate to scores on other tests. • For example, scores on a math test should be more highly correlated with scores on another math test than they are to scores from a reading test.
Consequential Validity • Consequential validity refers to the extent to which an instrument creates harmful effects for the user. • Some tests may harm the test taker. • For example, a measure of anxiety may make a person more anxious.
Validity • Some factors that threaten validity include: • Unclear directions • Confusing or unclear items • Vocabulary or required reading ability too difficult for test takers • Subjective scoring • Cheating • Errors in administration
Self-Report Instruments There are some concerns with data derived from self-report instruments. • One concern is response set, or the tendency for a participant to respond in a certain way (e.g., social desirability). • Bias may also play a role in self-report instruments (e.g., cultural norms).
Reliability • Reliability refers to the consistency of an instrument to measure a construct. • Reliability is expressed as a reliability coefficient based upon a correlation. • Reliability coefficients should be reported for all measures. • Reliability affects validity. • There are several forms of reliability.
Reliability • Test-Retest (Stability) reliability measures the stability of scores over time. • To assess test-retest reliability, a test is given to the same group twice and a correlation is taken between the two scores. • The correlation is referred to Coefficient of Stability.
Reliability • Alternate forms (Equivalence) reliability measures the relationship between two versions of a test that are intended to be equivalent. • To assess alternate forms reliability, both tests are given to the same group and the scores on each test are correlated. • The correlation is referred to as the Coefficient of Equivalence.
Reliability • Equivalence and stability reliability is represented by the relationship between equivalent versions of a test given at two different times. • To assess equivalence and stability reliability, first one test is given, after a time a similar test is given, and the scores are correlated. • The correlation is referred to as the Coefficient of Stability and Equivalence.
Reliability • Internal Consistency reliability represents the extent to which items in a test are similar to one another. • Split-half: The test is divided into halves and a correlation is taken between the scores on each half. • Coefficient alpha and Kuder-Richardson measure the relationship between and among all items and total scale of a test.
Reliability • Scorer and rater reliabilities reflect the extent to which independent scorers or a single scorer over time agree on a score. • Interjudge (inter-rater) reliability: Consistency of two or more independent scorers. • Intrajudge (intra-rater) reliability: Consistency of one person over time.
Reliability • Standard Error of Measurement is an estimate of how often one can expect errors of a given size in an individual’s test score. SEm=SD * SQT 1-r Sem=Standard error of measurement SD=Standard deviation of the test scores r=the reliability coefficient
Selecting a Test Once your have defined the purpose for your study: • Determine the type of test that you need. • Identify and locate appropriate tests. • Determine which test to use after a comparative analysis.
Selecting a Test There are several locations where one can obtain information and reviews about available tests. These are a good place to start when selecting a test. • MMY: The Mental Measurements Yearbook is the most comprehensive source of test information • Pro-Ed Publications • ETS Test Collection Database • Professional Journals • Test publishers and distributors
Selecting a Test When comparing tests you have located and you are deciding which to use attend to each of the following: • First, examine validity. • Next, consider reliability. • Consider ease of test use. • Assure participants have not been previously exposed to the test. • Assure sensitive information is not unnecessarily included.