230 likes | 296 Views
Metrics, research award grades, and the REF. Harvey Goldstein University of Bristol. With support from Mary Day, Ian Diamond and Phil Sooben. The context. REF proposal to use metrics Journal impact factors and citations Research income Research students
E N D
Metrics, research award grades, and the REF Harvey Goldstein University of Bristol With support from Mary Day, Ian Diamond and Phil Sooben
The context • REF proposal to use metrics • Journal impact factors and citations • Research income • Research students • Research council grant application grades Little discussion so far of the technical measurement issues associated with Research Council awards
The database • All ESRC applications 2001-2007 • Details of applicants, reviewer, assessor and board grades • Identification of departments and HEIs • Award amounts (not considered) • Final analysis of 2698 applications, 1698 departments
A naïve analysis Consider the discipline of Education • Note that we have not been able to assign departments to RAE disciplines so ‘principal discipline’ used. • Similar results for other disciplines • Final award grade converted to a numeric score • All award types considered – similar results if fellowships excluded • PI weighted more than Co-apps: same award score given to each applicant • Weighted analysis of these scores in a 3-level model: • Application within Applicant within HEI
Results of 3-level model • HEI/DEPT.; APPLICANT; APPLICATION
Problems • Invalid analysis since scores not independent: • Imagine a situation where we have N applications, each of which has a different pair of applicants drawn from two particular HEIs, A & B where for an application each applicant is given the application’s awarded score. A simple analysis would compare the mean score for HEI A with the mean score for B, but these mean scores are equal by definition. Thus this analysis contains no information about HEI differences, as opposed to the case where for each pair we have a score derived separately for each applicant. • Applicants may also come from different departments not associated with the principal discipline
A more valid analysis • We reconceptualise the data as follows: • We assume each applicant contributes a level of ‘quality’ to the application – • The application score is just the average of these (weighted according to whether PI or Coapp) • Some applicants are on more than one application associated with different combinations of other applicants and this allows us, in principle, to assign (estimate) a score for each applicant • Known as a multiple membership (MM) modelFormally: • i indexes application, j indexes applicant, is application score
Another serious problem • There are, for education, 454 applications and 989 applicants and in general there are more applicants than applications. • This means that we cannot use the MM model to score applicants – non-identifiability. • However, there are only 98 HEIs so we can fit a model that identifies the HEI only (aggregating all applicants for one HEI within an application – will lead to some overestimation of the separation of HEIs). • This provides HEI/department scores.
Results Note that HEI variance now about half what we saw before.
Caterpillar plot Note how all confidence intervals overlap zero So no separation from overall mean is possible. Also, of the four highest in ‘naïve’ analysis, only one is in four highest here. Similar result if fellowships excluded
It’s even more complicated • So far all applicants on an application have been assigned to the principal discipline. • We need to assign to their actual discipline/department and this implies we should carry out a joint analysis of all applications • Again, there are 2698 applications and but only 1698 departments • So we have a MM model and we estimate scores for each department
Results The between-department variance is now larger (19%). Only 0.5% of departments have CIs overlapping the mean. Including the principal discipline in the model indicates (moderate) discipline differences in award grading (see below).
One hundred lowest and highest ranked residuals for multiple membership model using all departments, with 95% confidence intervals.
MM model with selected principal disciplines (>100 applications) Parameter Estimate Standard error Intercept (Econ) 7.17 0.11 Management -0.69 0.17 Social Policy -0.54 0.20 Education -0.24 0.11 Sociology 0.06 0.15 Human Geog 0.10 0.18 Psychology 0.16 0.13 Level 2 variance (HEI/department) 0.45 0.11 Level 1 variance (Application) 2.37 0.09 VPC 16.0%
Using the results • Given uncertainty how useful are they? • Can they be combined (formally) with citations to provide greater precision? • The technical limitations of the analyses are likely to apply to citation analyses also • E.g. analysis of NAS 2001 database shows 2,600 papers with 13,000 unique authors (Borner et al., 2004) • What are side effects – perverse incentives
Perverse incentives • All high stakes performance monitoring systems encourage ‘gaming’ – some possibilities: • Large numbers of co-applicants squeezed into applications • Discouraging of cross-disciplinary applications • HEI behaviour would change over time with a destabilising and distorting effect. • Encouragement of many small and short term grants rather than fewer large and long term ones. • Distort behaviour of referees and board members (How?)
Comparisons with RAE 2008 scores • Results for Economics and Education: • Simple (4,3,2,1,0) RAE scoring system • Insensitive to other scorings • Dept. results (residuals) from ESRC analysis (weighted) averaged to RAE HEI categories.
Correlations between RAE and ESRC scores – selected disciplines
Economics • 27 HEIs. Correlation =0.50 (P<0.01) highest 7 RAE scores are (from the top) are:LSE, UCL, Warwick, Oxford, Essex, Nottingham, Bristol
Education • 37 HEIs. Correlation = 0.30 (P=0.07) The top 7 are: IOE=Oxford, Cambridge=Kings, Bristol= Leeds, Exeter
What next? • Incorporation of other research councils in a combined analysis • Include citation data in a combined model: • In the REF it can be argued that an analysis at least as complex as the present is unavoidable for validity • Using citations encounters the same issues of more applicants than papers/books.