350 likes | 612 Views
Technical Adequacy of Tests. Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment. Two Essential Concepts. Reliability: Test consistency Validity: Test measures what it says it does. Psychometric Properties.
E N D
Technical Adequacy of Tests Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment
Two Essential Concepts • Reliability: Test consistency • Validity: Test measures what it says it does
Psychometric Properties • To create an instrument with sound psychometric properties means that it is reliable.
Reliability • Reliability is whether a test or measurement tool measures something consistently.
Reliable - Hits same part of the target each time, consistent but not valid because goal is center. • Valid – evenly distributed around goal, but not reliable because shots are off the mark and inconsistent. • Not reliable – shots not tightly clustered, not consistent and not valid because pattern is not around true center. • Reliable – darts are close together and valid because darts clustered around where they were aimed.
Interpreting Reliability Coefficients • We want two things: • For reliability coefficients to be positive (or direct) and not to be negative (or indirect) • Reliability coefficients that are as large as possible (between .00 and +1.00). • Reliability is a function of how much error contributes to the observed score. The lower the error, the higher the reliability.
To Increase Test Reliability… • Ensure instructions are standardized across all settings when the test is administered. • Increase the number of items because the larger the sample the more likely it is representative and reliable; this is especially true for achievement tests. • Delete unclear items. • Minimize the effects of external events.
Validity • The property of an assessment tool that indicates the tool does what it says it does. • A valid test measure what it is supposed to.
Quantifying Validity • The maximum level of validity is equal to the square root of the reliability coefficient. • Example, if the reliability coefficient of a test is .87, the validity coefficient can be no larger than .93 (the square root of .87).
Reliability and Validity • You can have a test that is reliable but not valid. • But, you can’t have a valid test without it first being reliable. • If a test does what it is supposed to, then it has to do it consistently to work!
Test Scores • A raw score or obtained score on a test is the number of points obtained by an aminee. • A true score is that part of an examinee’s observed score uninfluenced by random events. • The error of measurement or error score is the difference between an obtained score and its theoretical true score counterpart.
Error Score • The error score is that part of the obtained score which is unsystematic, random, and due to chance.
Standard Error of Measurement • The standard deviation of errors of measurement that is associated with the test scores for a specified group of test takers.” • It is a measure of the variability of the errors of measurement. • It is used to help us predict true scores based upon knowledge of obtained score. • SEM is the standard deviation of errors of measurement associated with test scores for a specific group.
Score Bands • Score bands are sometimes called confidence intervals or confidence bands because they allow us to make probabilistic statements of confidence about an unknown value. • Score bands have lower and upper limits on the score scale and provide an estimate that is a range or band of possible test scores.
Score Bands • An example of a score band or confidence interval is “I am 95 percent confident that the examinee’s obtained score will be between 46 and 54 (given a true score of 50 and an SEM of two). • 68 percent confidence intervals are the most commonly used.
Confidence Intervals • In a normal distribution: • The area between one SDs below and one SDs above the mean is 68 % of the total area under the curve • The area between two (actually 1.96) SDs below and two SDs above the mean is 95% • The area between 2.58 SDs below and 2.58 SDs above the mean is 99%
Confidence Intervals • If we add and subtract one SEM from a person’s test score, we will have an estimate of the true score.
Standard Scores • Test developers calculate the statistical average based on the performance of students tested in the norming process of test development. • That score is assigned a value. • Different performance levels are calculated based on the differences among student scores from the statistical average and are expressed as standard deviations.
Standard Scores • These standard deviations are used to determine at what scores fall within the above average, average, and below average ranges. • Standard scores and standard deviations are different for different tests. Many of the commonly used tests, such as the Wechsler Intelligence Scales, have an average score of 100 and a standard deviation of 15.
Standard Scores • Standardized test scores enable us to compare a student's performance on different types of tests. • Although all test scores should be considered estimates, some are more precise than others. • Standard scores and percentiles, for example, define a student's performance with more precision than do t-scores, z-scores, or stanines.
Standard Deviation • Standard deviation measures how widely spread data points are. • If data values are all equal to one another, then the standard deviation is zero. • Under a normal distribution, ± one standard deviation encompasses 68% of the measurements and ± two standard deviations encompasses 96% of the measurements.
Standard Deviation • If a high proportion of data points lie near the mean (average) value, then the standard deviation is small. • An experiment that yields data with a low standard deviation is said have high precision. • If a high proportion of data points lie far from the mean value, then the standard deviation is large. • An experiment that yields data with a high standard deviation is said to have low precision.
Test Scores • Observed score – what someone actually gets on a test. • True score – the true, 100% accurate reflection of what the someone actually knows. • An observed score is usually close to the true score but they are rarely the same. • The difference between the two is in the amount of error that is introduced.
Observed Score • If someone has 89 on a test, but their true score is 80, that means the 9 points in the difference (the error score) are due to error or the reason why individual test scores vary from being 100% true. • What may be the source of the error: • Room is too warm • Person didn’t have time to study • Person has a fever • ????
Observed Score • We need to reduce the errors as much as possible. • The less error, the more reliable the score.
Reporting Test Scores • Score or confidence bands are the best way to report test scores. • The score band provides reasonable limits for estimating true score; it is an adequate approximation when the test reliability is reasonably high and the obtained score is not an extreme deviate from the mean of the reference group. • You can say, “It is fairly likely your daughter’s true ability lies between 110 and 120.”
Test Selection • In selecting a published test: • Read test manual to determine if it has reported the reliability, SEM and norms (including confidence bands) • The above information is reported for a reference group similar to your examinee • Be sure manual explains clearly how the information was gathered and how the confidence bands in the manual were calculated
Reliable and Value • If the tools you use to collect data are not reliable nor valid, then the results will be inconclusive.