REF Redux 3 – plain citations

This third post in my series on the results of REF 2014, the UK periodic research assessment exercise, is still looking at subarea differences.  Following posts will look at institutional (new vs old universities) and gender issues.

The last post looked at world rankings based on citations normalised using the REF ‘contextual data’.  In this post we’ll look at plain unnormalised data.  To some extent this should be ‘unfair’ to more applied areas as citation counts tend to be lower, as one mechanical engineer put it, “applied work doesn’t gather citations, it builds bridges”.  However, it is a very direct measure.

The shocking things is that while raw citation measures are likely to bias against applied work, the REF results turn out to be worse.

There were a number of factors that pushed me towards analysing REF results using bibliometrics.  One was the fact that HEFCE were using this for comparison between sub-panels, another was that Morris Sloman’s analysis of the computing sub-panel results, used Scopos and Google Scholar citations.

We’ll first look at the two relevant tables in Morris’ slides, one based on Scopos citations:

Sloman-scopos-citation-table

and one based on Google Scholar citations:

Sloman-google-scholar-citation-table

Both tables rank all outputs based on citations, divide these into quartiles, and then look at the percentage of 1*/2*/3*/4* outputs in each quartile.  For example, looking at the Scopos table, 53.3% of 4* outputs have citation counts in the top (4th) quartile.

Both tables are roughy clustered towards the diagonal; that is there is an overall correlation between citation and REF score, apparently validating the REF process.

There are, however, also off-diagonal counts.  At the top left are outputs that score well in REF, but have low citations.  This is be expected; non-article outputs such as books, software, patents may be important but typically attract fewer citations, also good papers may have been published in a poor choice of venue leading to low citations.

More problematic is the lower right, outputs that have high citations, but low REF score.  There are occasional reasons why this might be the case, for example, papers that are widely cited for being wrong, however, these cases are rare (I do not recall any in those I assessed).  In general this areas represent outputs that the respective communities have judged strong, but the REF panel regard as weak.  The numbers need care in interpreting as only there are only around 30% of outputs were scored 1* and 2* combined; however, it still means that around 10% of outputs in the top quartile were scored in the lower two categories and thus would not attract funding.

We cannot produce a table like the above for each sub-area as the individual scores for each output are not available in the public domain, and have been destroyed by HEFCE (for privacy reasons).

However, we can create quartile profiles for each area based on citations, which can then be compared with the REF 1*/2*/3*/4* profiles.  These can be found on the results page of my REF analysis micro-site.  Like the world rank lists in the previous post, there is a marked difference between the citation quartile profiles for each area and the REF star profiles.

One way to get a handle on the scale of the differences, is to divide the proportion of REF 4* by the proportion of top quartile outputs for each area.  Given the proportion of 4* outputs is just over 22% overall, the top quartile results in an area should be a good predictor of the proportion of 4* results in that area.

The following shows an extract of the full results spreadsheet:

quartile-vs-REF

The left hand column shows the percentage of outputs in the top quartile of citations; the column to the right of the area title is the proportion of REF 4*; and the right hand column is the ratio.  The green entries are those where the REF 4* results exceed those you would expect based on citations; the red those that get less REF 4* than would be expected.

While there’re some areas (AI, Vision) for which the citations are an almost perfect predictor, there are others which obtain two to three times more 4*s under REF than one would expect based on their citation scores, ‘the winners’, and some where REF gives two to three times fewer 4*s that would be expected, ‘the losers’.  As is evident, the winners are the more formal areas, the losers the more applied and human centric areas.  Remember again that if anything one would expect the citation measures to favour more theoretical areas, which makes this difference more shocking.

Andrew Howes replicated the citation analysis independently using R and produced the following graphic, which makes the differences very clear.

scatter-citation-vs-REF-rank

The vertical axis has areas ranked by proportion of REF 4*, higher up means more highly rated by REF.  the horizontal axis shows areas ranked by proportion of citations in top quartile.  If REF scores were roughly in line with citation measures, one would expect the points to lie close to the line of equal ranks; instead the areas are scattered widely.

That is, there seems little if any relation between quality as measured externally by citations and the quality measures of REF.

The contrast with the tables at the top of this post is dramatic.  If you look at outputs as a whole, there is a reasonable correspondence, outputs that rank higher in terms officiations, rank higher in REF star score, apparently validating the REF results.  However, when we compare areas, this correspondence disappears.  This apparent contradiction is probably due to the correlation being very strong within area, just that the areas themselves are scattered.

Looking at Andrew’s graph, it is clear that it is not a random scatter, but systematic; the winners are precisely the theoretical areas, and the losers the applied and human centred areas.

Not only is the bias against applied areas critical for the individuals and research groups affected, but it has the potential to skew the future of UK computing. Institutions with more applied work will be disadvantaged, and based on the REF results it is clear that institutions are already skewing their recruitment policies to match the areas which are likely to give them better scores in the next exercise.

The economic future of the country is likely to become increasingly interwoven with digital developments and related creative industries and computing research is funded more generously than areas such as mathematics, precisely because it is expected to contribute to this development — or as a buzzword ‘impact’.  However, the funding under REF within computing is precisely weighted against the very areas that are likely to contribute to digital and creative industries.

Unless there is rapid action the impact of REF2014 may well be to destroy the UK’s research base in the areas essential for its digital future, and ultimately weaken the economic life of the country as a whole.

Or … is Amazon becoming the publishing Industry?

A recent Blog Kindle post asked “Is Amazon’s Kindle Destroying the Publishing Industry?“.  The post defends Kindle seeing the traditional publishers as reactionaries, whose business model depended on paper publishing and, effectively. keeping authors from their public.

However, as an author myself (albeit academic) this seems to completely miss the reasons for the publishing industry.  The printing of physical volumes has long been a minimal part of the value, indeed traditional publishers have made good use of the changes in physical print industry to outsource actual production.  The core value for the author are the things around this: marketing, distribution and payment management.

Of these, distribution is of course much easier now with the web, whether delivering electronic copies, or physical copies via print-on demand services.  However, the other core values persist – at their best publishers do not ring fence the public from the author, but on the contrary connect the two.

I recall as a child being in the Puffin Club and receiving the monthly magazine.  I could not afford many books at the time, but since have read many of the books described in its pages and recall the excitement of reading those reviews.  A friend has a collection of the early Puffins (1-200) in their original covers; although some stories age, some are better, some worse, still just being a Puffin Book was a pretty good indication it was worth reading.

The myth we are being peddled is of a dis-intermediated networked world where customers connect directly to suppliers, authors to readers1, musicians to fans.  For me, this has some truth, I am well enough known and well enough connected to distribute effectively.  However for most that ‘direct’ connection is mediated by one of a small number of global sites … and smaller number of companies: YouTube, Twitter, Google, iTunes, eBay, not to forget Amazon.

For publishing as in other areas, what matters is not physical production, the paper, but the route, the connection, the channel.

And crucially Kindle is not just the device, but the channel.

The issue is not whether Kindle kills the publishing industry, but whether Amazon becomes the publishing industry.  Furthermore, if Amazon’s standard markdown and distribution deals for small publishers are anything to go by, Amazon is hardly going to be a cuddly home for future authors.

To some extent this is an apparently inexorable path that has happened in the traditional industries, with a few large publishing conglomerates buying up the smaller publishing houses, and on the high street a few large bookstore chains such as Waterstones, Barnes & Noble squeezing out the small bookshops (remember “You’ve Got Mail“), and it is hard to have sympathy with Waterstones recent financial problems given this history.

Philip Jones of the Bookseller recently blogged about these changes, noting that it is in fact book selling, not publishing that is struggling with profits … even Amazon – no wonder Amazon want more of the publishing action.  However, while Jones notes that the “digital will lead to smaller book chains, stocking fewer titles” in fact “It wasn’t digital that drove this, but it is about to deliver the coup de grâce.”

Which does seem a depressing vision both as author and reader.

  1. Maybe unbound.co.uk is actually doing this – see Guardian article, although it sounds more useful to the already successful writer than the new author.[back]

On the edge: universities bureacratised to death?

Just took a quick peek at the new JISC report “Edgeless University: why higher education must embrace technology” prompted by a blog about it by Sarah Bartlett at Talis.

The report is set in the context of both an increasing number of overseas students, attracted by the UK’s educational reputation, and also the desire for widening access to universities.  I am not convinced by the idea that technology is necessarily the way to go for either of these goals as it is just so much harder and more expensive to produce good quality learning materials without massive economies of scale (as the OU has).  Also the report seems to mix up open access to research outputs and open access to learning.

However, it was not these issues, that caught my eye, but a quote by Thomas Kealey vice-chancellor of the University of Buckingham,  the UKs only private university.  For three years Buckingham has come top of UK student satisfaction surveys, and Kealey says:

This is the third year that we’ve come top because we are the only university in Britain that focuses on the student rather than on government or regulatory targets. (Edgeless University, p. 21)

Of course, those in the relevant departments of government would say that the regulations and targets are inteded to deliver education quality, but as so often this centralising of control, (started paradoxically in the UK during the Thatcher years), serves instead to constrain real quality that comes from people not rules.

In 1992 we saw the merging of the polytechnic and university sectors in the UK.  As well as diffferences in level of education, the former were tradtionally under the auspices of local goverment, whereas the latter were independent educational isntitutions. Those in the ex-polytechnic sector hoped to emulate the levels of attaiment and ethos of the older universities.  Instead, in recent years the whole sector seems to have been dragged down into a bureacratic mire where paper trails take precidence over students and scholarship.

Obviously private institutions, as  Kealey suggests, can escape this, but I hope that current and future government can have the foresight and humility to let go some of this centralised control, or risk destroying the very system it wishes to grow.

French subvert democatic process to pass draconian internet laws

Just saw on Rob @ dynamicorange, that the French have passed a law forcing ISPs to withdraw access based on accusations of IP infringement. Whether one agrees or disagrees  or even understands the issues involved, it appear this was forced through by a vote of 16 (out of 577) members of the French parliament at a time when the vote was not expected.  This reminds me of the notorious Shetland Times case back in the late 1990s, where the judgement  implied that simply, linking to another site infringed copyright and caused some sites to stop interlinking for fear of prosecution1, not to mention some early US patents that were granted because patent officers simply did not understand the technology and its implications2.

It would be nice to think that the UK had learnt from the Shetland case, but sadly not.  Earler this year the Government released its interim Digital Britain report. This starts well declaring “The success of our manufacturing and services industries will increasingly be defined by their ability to use and develop digital technologies“; however the sum total of its action plan to promote ‘Digital Content’ is to strengthen IP protection.  Whatever one’s views on copyright, file sharing etc., the fact that a digital economy is a global economy seems to have somehow been missed on the way; and this is the UK’s “action plan to secure the UK’s place at the forefront of innovation, investment and quality in the digital and communications industries3.

  1. See “Copyright battles: The Shetlands” @ Ariadne and “Scottish Court Orders Online Newspaper to Remove Links to Competitor’s Web Site” @ Harvard’s Berkman Center for Internet & Society.[back]
  2. and for that matter, more recent cases like the ‘wish list’ patent[back]
  3. UK Department for Culture, Media and Sport Press Release 106/08 “Digital Britain – the future of communications” 17th October 2008[back]