250 likes | 259 Views
This research explores the proposed metrics-based allocation system for research assessment and its potential to reduce the burden of peer review. The current outline metrics model is found to be inadequate and unbalanced, but a basket of metrics might provide a workable solution. However, the complexity of research necessitates a comprehensive assessment system and caution in using income as a measure of quality. Assured data sourcing, discipline mapping, and agreed benchmarks are crucial for effective implementation. The goal is to lighten the burden of peer review while ensuring research quality.
E N D
Research MetricsWhat was proposed … … what might workJonathan Adams
Overview • RAE was seen as burdensome and distorting • Treasury proposed a metrics-based QR allocation system • The outline metric model is inadequate, unbalanced and provides no quality assurance • A basket of metrics might nonetheless provide a workable way of reducing the peer review load • Research is a complex process so no assessment system sufficient to purpose is going to be completely “light touch”
The background • RAE introduced in 1986 • ABRC and UGC consensus to increase selectivity • Format settled by 1992 • Progressive improvement in UK impact • Dynamic change and improvement at all levels
The RAE period is linked to an increase in UK share of world citations
UK performance gain is seen across all RAE grades (Data are core sciences, grade at RAE96)
Treasury proposals • RAE peer review produced a grade • Weighting factor in QR allocation model • Quality assurance • But there were doubters • Community said the RAE was onerous • Peer review was opaque • Funding appeared [too] widely distributed • Treasury wanted transparent simplification of the allocation side
The ‘next steps’ model • Noted correlation between QR and earned income (RC or total) • Evidence drew attention to statistical link in work on dual support for HEFCE and UUK in 2001 & 2002 • Treasury hard-wired the model as an allocation system • So RC income determines QR • But … • Statistical correlation is not a sufficient argument • Income is not a measure of quality and should not be used as a driver for evaluation and reward
QR and RC income scale together, but the residual variance would have an impact HEPI produced additional analyses in report
Unmodified outcomes of outline metrics model perturb current system unduly A new model might produce reasonable change, but few would accept that the current QR allocations are as erroneous as these outcomes suggest
The problem • The Treasury model over-simplifies • Outcomes are unpredictable • There are confounding factors such as subject mix • Even within subjects there are complex cost patterns • The outcome does not inspire confidence and would affect morale • There are no checks and balances • Risk of perverse outcomes, drift from original model • Drivers might affect innovation, emerging fields, new staff • There is no quality assurance
What are we trying to achieve?We want to lighten the peer review burden so we need ‘indicators’ to evaluate ‘research performance’ but not simplistic mono-metrics What we want to know research quality Research black box Inputs Outputs Time Time Funding Numbers.. Publications What we have to use
Informed assessment comes from an integrated picture of research, not single metrics
Data options for metrics and indicators • Primary data from a research phase • Input, activity, output, impact • Secondary data from combinations of these • e.g. money or papers per FTE • Three attributes for every datum • Time, place, discipline • This limits possible sources of valid data • Build up a picture • Weighted use of multiple indicators • Balance adjusted for subject • Balance adjusted for policy purpose
We need assured data sourcing • Where the data comes from • Indicator data must emerge naturally from the process being evaluated • Artificial PIs are just that, artificial • Who collects and collates the data • This affects accessibility, quality and timeliness • HESA • Data quality and validation • Discipline structure • Game playing
We have to agree how to account for the distribution of data values e.g. income Minimum Maximum
Distribution of data values - impact The variables for which we have metrics are skewed and therefore difficult to picture in a simple way
Agree purpose for data usage • Data are only indicators • So we need some acceptable reference system • Skewed profiles are difficult to interpret • We need simple, transparent descriptions • Benchmarks • Make comparisons • Track changes • Use metrics to monitor performance • Set baseline against RAE2008 outcomes • Check thresholds to trigger fuller reassessment
Example - categorising impact data This grouping is the equivalent of a log 2 transformation. There is no place for zero values on a log scale.
UK ten-year profile 680,000 papers MODE (cited) AVERAGE RBI = 1.24 MODE MEDIAN THRESHOLD OF EXCELLENCE?
HEIs – 10 year totals – 4.1 Smoothing the lines would reveal the shape of the profile
HEIs – 10 year totals – 4.2 Absolute volume would add a further element for comparisons
Conclusions • We can reduce the peer review burden by increased use of metrics • But the transition won’t be simple • Research is a complex, expert system • Assessment needs to produce • Confidence among the assessed • Quality assurance among users • Transparent outcome for funding bodies • Light touch is possible, but not featherweight • Initiate a metrics basket linked to RAE2008 peer review • Set benchmarks & thresholds, then track the basket • Invoke panel reviews to evaluate change, but only where variance exceeds band markers across multiple metrics
Overview (reprise) • RAE was seen as burdensome and distorting • Treasury proposed a metrics-based QR allocation system • The outline model is inadequate, unbalanced and provides no quality assurance • A basket of metrics might nonetheless provide a workable way of reducing the peer review load • But research is a complex process so no assessment system sufficient to purpose is going to be completely “light touch”