270 likes | 698 Views
Topics: Quality of Measurements. Reliability Validity. The Quality of Measuring Instruments: Definitions. Reliability: Consistency - the extent to which the data are consistent Validity: Accuracy- the extent to which the instrument measures what it purports to measure.
E N D
Topics: Quality of Measurements • Reliability • Validity
The Quality of Measuring Instruments: Definitions • Reliability: Consistency - the extent to which the data are consistent • Validity: Accuracy- the extent to which the instrument measures what it purports to measure
The Questions of Reliability • To what degree does a subject’s measured performance remain consistent across repeated testings? How consistently will results be reproduced if we measure the same individuals again? • What is the equivalence of results of two measurement occasions using “parallel” tests? • To what extent do the individual items that go together to make up a test or inventory consistently measure the same underlying characteristic? • How much consistency exists among the ratings provided by a group of raters? • When we have obtained a score, how precise is it?
True and Error Score Parallel Tests
Sources of Error: Conditions of Test Administration and Construction • Changes in time limits • Changes in directions • Different scoring procedures • Interrupted testing session • Qualities of test administrator • Time test is taken • Sampling of items • Ambiguity in wording of items/questions • Ambiguous directions • Climate of test situation (heating, light, ventilation, etc) • Differences in observers
Sources of Error: Conditions of the Person Taking the Test • Reaction to specific items • Health • Motivation • Mood • Fatigue • Luck • Memory and/or attention fluctuations • Attitudes • Test-taking skills (test-wiseness) • Ability to understand instructions • Anxiety
Reliability • Reliability: ratio of true variance to observed variance • Reliability coefficient: a numerical index which assumes a value between 0 and +1.00
Relation between Reliability and Error True-Score Variability Error True-Score Variability Error Reliable Measure (A) Unreliable Measure (B)
Methods of Estimating Reliablity • Test-Retest: Repeated measures with the same test (coefficient of stability) • Parallel Forms: Repeated measures with equivalent forms of a test (coefficient of equivalence) • Internal Consistency: Repeated measures using items on a single test • Inter-Rater: Judgments by more than one rater.
Reliability Is The Consistency Of A Measurement Repeated Measurements/Observations Person X1 X2 X3 . . . Xk-->infinity Charlie 20 19 21 . . . 20 Harry 15 17 16 . . . 16 Reliable Repeated Measurements/Observations Person X1 X2 X3 . . . Xk-->infinity Charlie 20 10 8 . . . 23 Harry 2 11 4 . . . 15 Unreliable
Test-Retest Reliability • Situation: Same people taking two administrations of the same test • Procedure: Correlate scores on the two tests which yields the coefficient of stability • Meaning: the extent to which scores on a test can be generalized over different occasions (temporal stability). • Appropriate use: Information about the stability of the trait over time.
Parallel (Alternate)Forms Reliability • Situation: Testing of same people on different but comparable forms of the test • Procedure: correlate the scores from the two tests which yields a coefficient of equivalence • Meaning: the consistency of response to different item samples (where testing is immediate) and across occasions (where testing is delayed). • Appropriate use: to provide information about the equivalence of forms
Internal Consistency Reliability • Situation: a single administration of one test form • Procedure: Divide test into comparable halves and correlate scores from both halves. • Split Half with Spearman Brown adjustment • Kuder Richardson #20 and #21 • Cronbach’s Alpha • Meaning: consistency across the parts of a measuring instrument (“parts” = individual items or subgroups of items). • Appropriate Use: Where focus is on the degree to which same characteristic is being measured. A measure of test homogeneity.
Inter-rater Reliability • Situation: Having a sample of test papers (essays) scored independently by two examiners • Procedure: correlate the two sets of scores • Kendall’s coefficient of concordance • Cohen’s kappa • Intraclass correlation • Pearson product moment • Meaning: measure of scorer (rater) reliability (consistency, agreement) which yields the coefficient of concordance. • Appropriate Use: For ensuring consistency between raters
When is a reliability satisfactory? • Depends on the type of instrument • Depends on the purpose of the study • Depends on who is affected by results
Factors Affecting Reliability Estimates • Test length • Range of scores • Item similarity
Standard Error of Measurement • All tests scores contain some error • For any test, the higher the reliability estimate, the lower the error • The standard error or measurement is the average standard deviation of the error variance over the number of people in the sample • Can be used to estimate a range within which a true score would likely fall
Use of Standard Error of Measurement • We never know the true score • By knowing the s.e.m. and by understanding the normal curve, we can assess the likelihood of the true score being within certain limits. • The higher the reliability the lower the standard error of measurement, hence more confidence we can place in the accuracy of a person’s test score.
Normal Curve Areas Under the Curve .3413 .3413 .1359 .1359 68% .0214 .0214 95% .0013 .0013 99% -3se -2se -1se +1se +2se +3se X=test score
Warnings about Reliability • No such thing as “the” reliability; Different methods are assessing consistency from different perspectives • Reliability coefficients apply to the data, NOT to the instrument • Any reliability is only an estimate of consistency