1 / 16

Development of a Confidence Interval for Small Sample Expert Review of Item Content Validation

Development of a Confidence Interval for Small Sample Expert Review of Item Content Validation. Jeffrey M. Miller & Randall D. Penfield FERA November 19, 2003 University of Florida millerjm@ufl.edu & penfield@coe.ufl.edu. INTRODUCING CONTENT VALIDITY.

Download Presentation

Development of a Confidence Interval for Small Sample Expert Review of Item Content Validation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Development of a Confidence Interval for Small Sample Expert Review of Item Content Validation Jeffrey M. Miller & Randall D. Penfield FERA November 19, 2003 University of Florida millerjm@ufl.edu & penfield@coe.ufl.edu

  2. INTRODUCING CONTENT VALIDITY • “Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests (AERA/APA/NCME, 1999) • Content validity refers to the degree to which the content of the items reflects the content domain of interest (APA, 1954)

  3. THE NEED FOR IMPROVED REPORTING Content is a precursor to drawing a score-based inference. It is evidence-in-waiting (Shepard, 1993; Yalow & Popham, 1983) “Unfortunately, in many technical manuals, content representation is dealt with in a paragraph, indicating that selected panels of subject matter experts (SMEs) reviewed the test content, or mapped the items to the content standards – and all is well (Crocker, 2003)”

  4. QUANTIFYING CONTENT VALIDITY • Several indices for quantifying expert agreement have been proposed • For many, experts quantify the match of the item to an objective using a rating scale • The mean rating across raters is often used in calculations • Klein & Kosecoff’s Correlation (1975) • Aiken’s V (1985) • The mean, by itself, does not account for error and does not tell us how far it lies from the population mean. WE NEED A CONFIDENCE INTERVAL!

  5. THE CONFIDENCE INTERVAL • The traditional confidence interval assumes a normal distribution for the sample mean of a rating scale. • However, the assumption of population normality can not be justified when analyzing the mean of an individual scale item because • 1.) the outcomes of the items are discrete • 2.) the items are bounded by the limits of the Likert-scale. • 3.) sample size for raters is too small even if the above were not problematic

  6. SCORE CONFIDENCE INTERVAL FOR RATING SCALES • The Score confidence interval (Penfield, 2003) treats rating scale variables as outcomes of a binomial distribution. • This interval is asymmetric • Hence, it is based on the actual distribution for the item of concern. • Further, the limits cannot extend below or above the actual limits of the categories.

  7. 1. Obtain values for n, k, and z • n = the number of raters • k= the number of possible ratings • The highest rating is scale starts with 0 • The highest rating minus 1 if scale starts greater than 0 • z = the standard normal variate associated with the confidence level (e.g., +/- 1.96 at 95% confidence)

  8. 2. CalculateThe sum of the ratings for an item divided by the number of raters

  9. 3. Calculate p Or if scale begins with 1 then

  10. 4. Use p to calculate the upper and lower limits for the population proportion (Wilson, 1927)

  11. 5. Calculate the upper and lower limits of the Score confidence interval

  12. Shorthand Example (cont.) Let n = 10, k = 4, z = 1.96, and let the sum of the items = 31 so, the mean equals 31/10 = 3.100 so, p = 31 / (10*4) = 0.775

  13. Shorthand Example (cont.) = 3.100 – 1.96*sqrt(0.938/10) = 2.500 = 3.100 + 1.96*sqrt(0.421/10) = 3.507

  14. = (65.842 – 11.042) / 87.683 = 0.625 = (65.842 + 11.042) / 87.683 = 0.877

  15. Conclusion We are 95% confident that the population mean rating falls somewhere between 2.500 and 3.507

  16. Rating Frequency for 10 Raters 95% Score CI Item 0 1 2 3 4 Mean Lower Upper 1 0 0 0 4 6 3.60 3.08 3.84 2 0 0 2 5 3 3.10 2.50 3.51 3 2 0 2 6 0 2.20 1.59 2.77 4 1 2 3 3 1 2.10 1.50 2.68 EXAMPLE WITH 4 ITEMS

More Related