150 likes | 305 Views
Archived File. The file below has been archived for historical reference purposes only. The content and links are no longer maintained and may be outdated. See the OER Public Archive Home Page for more details about archived files.
E N D
Archived File The file below has been archived for historical reference purposes only. The content and links are no longer maintained and may be outdated. See the OER Public Archive Home Page for more details about archived files.
Providing Information on the Dispersion of Reviewer Ratings A Consideration of Some Alternatives:Standard DeviationsRangesGraphical RepresentationsPresentation to the Peer Review Advisory Committee1/23/06
Summary of Working Group’s Activity • The NIH Office of Extramural Research formed a working group to: • Consider whether there might be more information contained in reviewers’ scores, not captured by priority and percentile scores alone, that might be useful to either applicants or NIH staff • Consider how such information might be presented • The working group examined: • Distributions of individual reviewer scores given to a sample of applications reviewed by CSR study sections • Alternative methods for characterizing those distributions (various measures of dispersion) • Alternative methods for presenting this information (numerically, graphically)
Conclusions • When applied across a broad range of applications and study sections, no numerical measure of dispersion adequately represents the score distributions. • A straightforward way of communicating information about the distribution of scores would be to display them graphically. • The working group uncovered additional issues which will need to be addressed.
Examples • Numerical measures of dispersion: • Can fail to capture differences in the underlying distributions • May not work well with small study sections • May not be comparable across applications reviewed by different study sections (e.g., study sections differing in size) • Can suggest differences that aren’t otherwise apparent • May not be grasped intuitively
A. Fail to capture differences in the underlying distributionsExample 1 70% 60% s.d. = 12 s.d. = 14 50% 40% % of Scores 30% 20% 10% 0% 130 140 150 160 170 180 190 200 210 Reviewer Score
B. May not work well with small study sectionsExample with 5 Reviewers 100% 90% IQR = 0 80% IQR = 0 70% 60% 50% % of Scores 40% 30% 20% 10% 0% 140 150 160 170 180 190 200 Reviewer Score
C. May not be comparable across study sectionsExample with small and large study sections 70% 60% s.d.: 26 n = 5 50% 40% % of Scores s.d.: 26 30% n = 26 20% 10% 0% 150 160 170 180 190 200 210 220 230 240 Reviewer Score
D. Can suggest differences that aren’t otherwise apparentExample of similar (?) distributions with different standard deviations 60% 50% s.d. = 14 40% s.d. = 9 30% % of Scores 20% 10% 0% 130 140 150 160 170 180 190 200 210 Reviewer Score
E. May be hard to grasp intuitivelyStandard deviations differ by a factor of 2 80% 70% s.d. = 11 60% s.d. = 22 50% % of Scores 40% 30% 20% 10% 0% 100 120 140 160 180 200 220 240 Reviewer Score
Additional Issues Raised • Are these measures of dispersion (either statistical or graphical) informative in the absence of more information on the reasons for differences in viewpoints among reviewers? • With graphical representations, would reviewers (particularly in small study sections) have concerns about maintaining anonymity/confidentiality? • Will knowledge that the distribution of individual scores is to be shared, either with staff or the applicant, have any detrimental effects on reviewers’ scoring behavior?
Next Steps • Assess benefits and costs of dispersion reporting • Clarify need, usage, and potential value to PIs, staff • Fully explore unintended consequences • Develop cost estimates • Determine if there is sufficient benefit:cost to proceed • Consider a pilot approach with a volunteer Panel or IC • Requires change to information systems • Consider a “virtual” pilot • Present simulated/archival data to staff, PIs via survey or focus group • Easy next step, but does not address feasibility