1 / 18

Measurement:

Classroom Assessment Reliability. Reliability = Assessment Consistency.Consistency within tests across examinees.Consistency within tests over multiple administrations to the same examinees.Consistency across alternative forms of the same tests for same examinees.. Three Types of Reliability. Sta

lyle
Download Presentation

Measurement:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Measurement: Reliability

    2. Classroom Assessment Reliability Reliability = Assessment Consistency. Consistency within tests across examinees. Consistency within tests over multiple administrations to the same examinees. Consistency across alternative forms of the same tests for same examinees.

    3. Three Types of Reliability Stability reliability. Alternate form reliability. Internal consistency reliability. “Reliability refer to the degree to which test scores are free from errors of measurement” Standards for Educational and Psychological Testing. Some synonyms include: Dependability Stability Consistency (main connotation) THREE types of reliability Test-Retest Reliability (Stability): Consistency of results over different testings Assumption: No significant (I.e., performance enhancing) events occur between tests. Typically requires computing a correlation. Alternate-Form Reliability: Consistency of results over different forms Requires two or more equivalent forms of the test. Equivalent forms are most typically found in high stakes testing situations (e.g., EOG and EOC tests). Also requires a correlation coefficient. Internal Consistency Reliability: Consistency of results over items within the test Concerned with inter-item consistency (homogeneity). Or, with unidimensionality. Requires only one administration of a test. For dichotomous items use Kuder-Richardson formulas For polytomous items use Cronbach’s Coefficient Alpha. The three types of reliability are truly different from each other“Reliability refer to the degree to which test scores are free from errors of measurement” Standards for Educational and Psychological Testing. Some synonyms include: Dependability Stability Consistency (main connotation) THREE types of reliability Test-Retest Reliability (Stability): Consistency of results over different testings Assumption: No significant (I.e., performance enhancing) events occur between tests. Typically requires computing a correlation. Alternate-Form Reliability: Consistency of results over different forms Requires two or more equivalent forms of the test. Equivalent forms are most typically found in high stakes testing situations (e.g., EOG and EOC tests). Also requires a correlation coefficient. Internal Consistency Reliability: Consistency of results over items within the test Concerned with inter-item consistency (homogeneity). Or, with unidimensionality. Requires only one administration of a test. For dichotomous items use Kuder-Richardson formulas For polytomous items use Cronbach’s Coefficient Alpha. The three types of reliability are truly different from each other

    4. Stability Reliability Stability Reliability Concerned with the question: Are assessment results consistent over time (over occasions). Think of some examples where stability reliability might be important. Why might test results NOT be consistent over time? Reliability coefficients are given as correlations. A correlation ranges from 0 to 1 Reliability is defined as the correlation between two simultaneous administrations of a test to the same group of respondents. A theoretical notion Reliability coefficients close to 0 imply weak reliability. Reliability coefficients close to 1 imply strong reliability. Classroom (teacher-made) tests commonly have reliabilities in the 50’s or lower. Commercial ... achievement tests have reliabilities in the 90’s (or at least high 80’s). Aptitude tests have reliabilities in the 70’s and 80’s Incidentally, achievement tests and aptitude tests are called cognitive tests. In general tests (especially cognitive tests) are less reliable for younger childrenReliability coefficients are given as correlations. A correlation ranges from 0 to 1 Reliability is defined as the correlation between two simultaneous administrations of a test to the same group of respondents. A theoretical notion Reliability coefficients close to 0 imply weak reliability. Reliability coefficients close to 1 imply strong reliability. Classroom (teacher-made) tests commonly have reliabilities in the 50’s or lower. Commercial ... achievement tests have reliabilities in the 90’s (or at least high 80’s). Aptitude tests have reliabilities in the 70’s and 80’s Incidentally, achievement tests and aptitude tests are called cognitive tests. In general tests (especially cognitive tests) are less reliable for younger children

    5. Evaluating Stability Reliability Test-Retest Reliability. Compute the correlation between a first and later administration of same test. Classification-consistency. Compute the percentage of consistent student classifications over time. Main concern is with the stability of the assessment over time.

    6. Example of Classification Consistency

    7. Example of Classification Consistency (Good Reliability)

    8. Example of Classification Consistency (Poor Reliability)

    9. Alternate-form Reliability Are two, supposedly equivalent, forms of an assessment in fact actually equivalent? The two forms do not have to yield identical scores. The correlation between two or more forms of the assessment should be reasonably substantial.

    10. Evaluating Alternate-form Reliability Administer two forms of the assessment to the same individuals and correlate the results. Determine the extent to which the same students are classified the same way by the two forms. Alternate-form reliability is established by evidence, not by proclamation.

    11. Example of Using a Classification Table to Assess Alternate-Form Reliability

    12. Example of Using a Classification Table to Assess Alternate-Form Reliability

    13. Internal Consistency Reliability Concerned with the extent to which the items (or components) of an assessment function consistently. To what extent do the items in an assessment measure a single attribute? For example, consider a math problem-solving test. To what extent does reading comprehension play a role? What is being measured?

    14. Evaluating Internal Consistency Reliability Split-Half Correlations. Kuder-Richardson Formua (KR20). Used with binary-scored (dichotomous) items. Average of all possible split-half correlations. Cronbach’s Coefficient Alpha. Similar to KR20, except used with non-binary scored (polytomous) items (e.g., items that measure attitude.

    15. Reliability Components of an Observation O = T + E Observation = True Status + Measurement Error. Reliability is a function of the E(rror) component. The T(rue) score is assumed fixed (stable, invariant). Hence the greater the variability in O(bservation), the greater the variability in E. The more variability in E, the more unreliable the test.Reliability is a function of the E(rror) component. The T(rue) score is assumed fixed (stable, invariant). Hence the greater the variability in O(bservation), the greater the variability in E. The more variability in E, the more unreliable the test.

    16. Standard Error of Measurement Provides an index of the reliability of an individual’s score. The standard deviation of the theoretical distribution of errors (i.e. the E’s). The more reliable a test, the smaller the SEM. The SEM is smallest near the average score on a test. SEM gives the margin of error in a test score. Similar to the margin of error in a survey response. Typically we would multiply the SEM by 1.96 to get a 95% confidence band around the true score. If SEM = 2.5, 1.96 x 2.5 = 4.9 Given some score, X, for an individual, we would state, with a 95% level of confidence, that the individual’s true score lies somewhere between X-4.9 and X+4.9. If X = 79 then 74.1 <= T <= 83.9. Generally, the larger the reliability of a test, the smaller the SEM SEM gives the margin of error in a test score. Similar to the margin of error in a survey response. Typically we would multiply the SEM by 1.96 to get a 95% confidence band around the true score. If SEM = 2.5, 1.96 x 2.5 = 4.9 Given some score, X, for an individual, we would state, with a 95% level of confidence, that the individual’s true score lies somewhere between X-4.9 and X+4.9. If X = 79 then 74.1 <= T <= 83.9. Generally, the larger the reliability of a test, the smaller the SEM

    17. Things to Do to Improve Reliability Use more items or tasks. Use items or tasks that differentiate among students. Use items or tasks that measure within a single content domain. Keep scoring objective. Eliminate (or reduce) extraneous influences Use shorter assessments more frequently.

    18. End

More Related