REF is the 5 yearly exercise to assess the quality of UK university research, the results of which are crucial for both funding and prestige. In 2014, I served on the sub-panel that assessed computing submissions. Since, the publication of the results I have been using public domain data from the REF process in order to validate the results using citation data.
The results have been alarming suggesting that, despite the panel’s best efforts to be fair, in fact there was significant bias both in terms of areas of computer science and types of universities. Furthermore the first of these is also likely to have led to unintentional emergent gender bias.
I’ve presented results of this at a bibliometrics workshop at WebSci 2015 and at a panel at the British HCI conference a couple of weeks ago. However, I am aware that the full data and spreadsheets can be hard to read, so in a couple of posts I’ll try to bring out the main issues. A report and mini-site describes the methods used in detail, so in these posts I will concentrate on the results, and implications, starting in this post by setting the scene seeing how REF ranked sub-areas of computing and the use of citations for validation of the process. The next post will look at how UK computing sits amongst world research, and whether this agrees with the REF assessment.
Few in UK computing departments will have not seen the ranking list produced as part of the final report of the computing REF panel.
Here topic areas are ranked by the percentage of 4* outputs (the highest rank). Top of the list is Cryptography, with over 45% of outputs ranked 4*. The top of the list is dominated by theoretical computing areas, with 30-40% 4*, whilst the more applied and human areas are at the lower end with less than 20% 4*. Human-centred computing and collaborative computing, the areas where most HCI papers would be placed, are pretty much at the bottom of the list, with 10% and 8.8% of 4* papers respectively.
Even before this list was formally published I had a phone call from someone in an institution where the knowledge of it had obviously leaked. Their department was interviewing for a lectureship and the question being asked was whether they should be recruiting candidates from HCI as this will clearly not be good looking towards REF 2020.
Since then I have heard of numerous institutions who are questioning the value of supporting these more applied areas, due to their apparent poor showing under REF.
In fact, even taken at face value, the data says nothing at all about the value in particular departments., and the sub-panel report includes the warning “These data should be treated with circumspection“.
There are three possible reasons any, or all of which would give rise to the data:
- the best applied work is weak — including HCI :-/
- long tail — weak researchers choose applied areas
- latent bias — despite panel’s efforts to be fair
I realised that citation data could help disentangle these.
There has been understandable resistance against using metrics as part of research assessment. However, that is about their use to assess individuals or small groups. There is general agreement that citation-based metrics are a good measure of research quality en masse; indeed I believe HEFCE are using citations to verify between-panel differences in 4* allocations, and in Morris Sloman’s post REF analysis slides (where the table above first appeared), he also uses the overall correlation between citations and REF scores as a positive validation of the process.
The public domain REF data does not include the actual scores given to each output, but does include citations data provided by Scopus in 2013. In addition, for Morris’ analysis in late 2014, Richard Mortier (then at Nottingham, now at Cambridge) collected Google Scholar citations for all REF outputs.
Together, these allow detailed citation-based analysis to verify (or otherwise) the validity of the REF outputs for computer science.
I’ll go into details in following posts, but suffice to say the results were alarming and show that, whatever other effects may have played a part, and despite the very best efforts of all involved, very large latent bias clearly emerged during the progress.