Students love digital … don’t they?

In the ever accelerating rush to digital delivery, is this actually what students want or need?

Last week I was at Talis Insight conference. As with previous years, this is a mix of sessions focused on those using or thinking of using Talis products, with lots of rich experience talks. However, also about half of the time is dedicated to plenaries about the current state and future prospects for technology in higher education; so well worth attending (it is free!) whether or not you are a Talis user.

Speakers this year included Bill Rammell, now Vice-Chancellor at the University of Bedfordshire, but who was also Minister of State for Higher Education during the second Blair government, and during that time responsible for introducing the National Student Survey.

Another high profile speaker was Rosie Jones, who is Director of Library Services at the Open University … which operates somewhat differently from the standard university library!

However, among the VCs, CEOs and directors of this and that, it was the two most junior speakers who stood out for me. Eva Brittin-Snell and Alex Davie are to SAGE student scholars from Sussex. As SAGE scholars they have engaged in research on student experience amongst their peers, speak at events like this and maintain a student blog, which includes, amongst other things the story of how Eva came to buy her first textbook.

Eva and Alex’s talk was entitled “Digital through a student’s eyes” (video). Many of the talks had been about the rise of digital services and especially the eTextbook. Eva and Alex were the ‘digital natives’, so surely this was joy to their ears. Surprisingly not.

Alex, in her first year at university, started by alluding to the previous speakers, the push for book-less libraries, and general digital spiritus mundi, but offered an alternative view. Students were annoyed at being asked to buy books for a course where only a chapter or two would be relevant; they appreciated the convenience of an eBook, when core textbooks were permanently out on and, and instantly recalled once one got hold of them. However, she said they still preferred physical books, as they are far more usable (even if heavy!) than eBooks.

Eva, a fourth year student, offered a different view. “I started like Aly”, she said, and then went on to describe her change of heart. However, it was not a revelation of the pedagogical potential of digital, more that she had learnt to live through the pain. There were clear practical and logistic advantages to eBooks, there when and where you wanted, but she described a life of constant headaches from reading on-screen.

Possibly some of this is due to the current poor state of eBooks that are still mostly simply electronic versions of texts designed for paper. Also, one of their student surveys showed that very few students had eBook readers such as Kindle (evidently now definitely not cool), and used phones primarily for messaging and WhatsApp. The centre of the student’s academic life was definitely the laptop, so eBooks meant hours staring at a laptop screen.

However, it also reflects a growing body of work showing the pedagogic advantages of physical note taking, potential developmental damage of early tablet and smartphone use, and industry figures showing that across all areas eBook sales are dropping and physical book sales increasing. In addition there is evidence that children and teenagers people prefer physical books, and public library use by young people is growing.

It was also interesting that both Alex and Eva complained that eTextbooks were not ‘snappy’ enough. In the age of Tweet-stream presidents and 5-minute attention spans, ‘snappy’ was clearly the students’ term of choice to describe their expectation of digital media. Yet this did not represent a loss of their attention per se, as this was clearly not perceived as a problem with physical books.

… and I am still trying to imagine what a critical study of Aristotle’s Poetics would look like in ‘snappy’ form.

There are two lessons from this for me. First what would a ‘digital first’ textbook look like. Does it have to be ‘snappy’, or are there ways to maintain attention and depth of reading in digital texts?

The second picks up on issues in the co-authored paper I presented at NordiChi last year, “From intertextuality to transphysicality: The changing nature of the book, reader and writer“, which, amongst other things, asked how we might use digital means to augment the physical reading process, offering some of the strengths of eBooks such as the ability to share annotations, but retaining a physical reading experience.  Also maybe some of the physical limitations of availability could be relieved, for example, if university libraries work with bookshops to have student buy and return schemes alongside borrowing?

It would certainly be good if students did not have to learn to live with pain.

We have a challenge.

Sandwich proofs and odd orders

Revisiting an old piece of work I reflect on the processes that led to it: intuition and formalism, incubation and insight, publish or perish, and a malaise at the heart of current computer science.

A couple of weeks ago I received an email requesting an old technical report, “Finding fixed points in non-trivial domains: proofs of pending analysis and related algorithms” [Dx88].  This report was from nearly 30 years ago, when I was at York and before the time when everything was digital and online. This was one of my all time favourite pieces of work, and one of the few times I’ve done ‘real maths’ in computer science.

As well as tackling a real problem, it required new theoretical concepts and methods of proof that were generally applicable. In addition it arose through an interesting story that exposes many of the changes in academia.

[Aside, for those of more formal bent.] This involved proving the correctness of an algorithm ‘Pending Analysis’ for efficiently finding fixed points over finite lattices, which had been developed for use when optimising functional programs. Doing this led me to perform proofs where some of the intermediate functions were not monotonic, and to develop forms of partial order that enabled reasoning over these. Of particular importance was the concept of a pseudo-monotonic functional, one that preserved an ordering between functions even if one of them is not itself monotonic. This then led to the ability to perform sandwich proofs, where a potentially non-monotonic function of interest is bracketed between two monotonic functions, which eventually converge to the same function sandwiching the function of interest between them as they go.

Oddly while it was one my favourite pieces of work, it was at the periphery of my main areas of work, so had never been published apart from as a York technical report. Also, this was in the days before research assessment, before publish-or-perish fever had ravaged academia, and when many of the most important pieces of work were ‘only’ in technical report series. Indeed, our Department library had complete sets of many of the major technical report series such as Xerox Parc, Bell Labs, and Digital Equipment Corporation Labs where so much work in programming languages was happening at the time.

My main area was, as it is now, human–computer interaction, and at the time principally the formal modelling of interaction. This was the topic of my PhD Thesis and of my first book “Formal Methods for Interactive Systems” [Dx91] (an edited version of the thesis).   Although I do less of this more formal work now-a-days, I’ve just been editing a book with Benjamin Weyers, Judy Bowen and Philippe Pallanque, “The Handbook of Formal Methods in Human-Computer Interaction” [WB17], which captures the current state of the art in the topic.

Moving from mathematics into computer science, the majority of formal work was far more broad, but far less deep than I had been used to. The main issues were definitional: finding ways to describe complex phenomena that both gave insight and enabled a level of formal tractability. This is not to say that there were no deep results: I recall the excitement of reading Sannella’s PhD Thesis [Sa82] on the application of category theory to formal specifications, or Luca Cardelli‘s work on complex type systems needed for more generic coding and understanding object oriented programing.

The reason for the difference in the kinds of mathematics was that computational formalism was addressing real problems, not simply puzzles interesting for themselves. Often these real world issues do not admit the kinds of neat solution that arise when you choose your own problem — the formal equivalent of Rittel’s wicked problems!

Crucially, where there were deep results and complex proofs these were also typically addressed at real issues. By this I do not mean the immediate industry needs of the day (although much of the most important theoretical work was at industrial labs); indeed functional programming, which has now found critical applications in big-data cloud computation and even JavaScript web programming, was at the time a fairly obscure field. However, there was a sense in which these things connected to a wider sphere of understanding in computing and that they could eventually have some connection to real coding and computer systems.

This was one of the things that I often found depressing during the REF2014 reading exercise in 2013. Over a thousand papers covering vast swathes of UK computer science, and so much that seemed to be in tiny sub-niches of sub-niches, obscure variants of inconsequential algebras, or reworking and tweaking of algorithms that appeared to be of no interest to anyone outside two or three other people in the field (I checked who was citing every output I read).

(Note the lists of outputs are all in the public domain, and links to where to find them can be found at my own REF micro-site.)

If this had been pure mathematics papers it is what I would have expected; after all mathematics is not funded in the way computer science is, so I would not expect to see the same kinds of connection to real world issues. Also I would have been disappointed if I had not seen some obscure work of this kind; you sometimes need to chase down rabbit holes to find Aladdin’s cave. It was the shear volume of this kind of work that shocked me.

Maybe in those early days, I self-selected work that was both practically and theoretically interesting, so I have a golden view of the past; maybe it was simply easier to do both before the low-hanging fruit had been gathered; or maybe just there has been a change in the social nature of the discipline. After all, most early mathematicians happily mixed pure and applied mathematics, with the areas only diverging seriously in the 20th century. However, as noted, mathematics is not funded so heavily as computer science, so it does seem to suggest a malaise, or at least loss of direction for computing as a discipline.

Anyway, roll back to the mid 1980s. A colleague of mine, David Wakeling, had been on a visit to a workshop in the States and heard there about Pending Analysis and Young and Hudak’s proof of its correctness . He wanted to use the algorithm in his own work, but there was something about the proof that he was unhappy about. It was not that he had spotted a flaw (indeed there was one, but obscure), but just that the presentation of it had left him uneasy. David was a practical computer scientist, not a mathematician, working on compilation and optimisation of lazy functional programming languages. However, he had some sixth sense that told him something was wrong.

Looking back, this intuition about formalism fascinates me. Again there may be self-selection going on, if David had had worries and they were unfounded, I would not be writing this. However, I think that there was something more than this. Hardy and Wright, the bible of number theory , listed a number of open problems in number theory (many now solved), but crucially for many gave an estimate on how likely it was that they were true or might eventually have a counter example. By definition, these were non-trivial hypotheses, and either true or not true, but Hardy and Wright felt able to offer an opinion.

For David I think it was more about the human interaction, the way the presenters did not convey confidence.  Maybe this was because they were aware there was a gap in the proof, but thought it did not matter, a minor irrelevant detail, or maybe the same slight lack of precision that let the flaw through was also evident in their demeanour.

In principle academia, certainly in mathematics and science, is about the work itself, but we can rarely check each statement, argument or line of proof so often it is the nature of the people that gives us confidence.

Quite quickly I found two flaws.

One was internal to the mathematics (math alert!) essentially forgetting that a ‘monotonic’ higher order function is usually only monotonic when the functions it is applied to are monotonic.

The other was external — the formulation of the theorem to be proved did not actually match the real-world computational problem. This is an issue that I used to refer to as the formality gap. Once you are in formal world of mathematics you can analyse, prove, and even automatically check some things. However, there is first something more complex needed to adequately and faithfully reflect the real world phenomenon you are trying to model.

I’m doing a statistics course at the CHI conference in May, and one of the reasons statistics is hard is that it also needs one foot on the world of maths, but one foot on the solid ground of the real world.

Finding the problem was relatively easy … solving it altogether harder! There followed a period when it was my pet side project: reams of paper with scribbles, thinking I’d solved it then finding more problems, proving special cases, or variants of the algorithm, generalising beyond the simple binary domains of the original algorithm. In the end I put it all into a technical report, but never had the full proof of the most general case.

Then, literally a week after the report was published, I had a notion, and found an elegant and reasonably short proof of the most general case, and in so doing also created a new technique, the sandwich proof.

Reflecting back, was this merely one of those things, or a form of incubation? I used to work with psychologists Tom Ormerod and Linden Ball at Lancaster including as part of the Desire EU network on creativity. One of the topics they studied was incubation, which is one of the four standard ‘stages’ in the theory of creativity. Some put this down to sub-conscious psychological processes, but it may be as much to do with getting out of patterns of thought and hence seeing a problem in a new light.

In this case, was it the fact that the problem had been ‘put to bed’, enabled fresh insight?

Anyway, now, 30 years on, I’ve made the report available electronically … after reanimating Troff on my Mac … but that is another story.


[Dx91] A. J. Dix (1991). Formal Methods for Interactive Systems. Academic Press.ISBN 0-12-218315-0

[Dx88] A. J. Dix (1988). Finding fixed points in non-trivial domains: proofs of pending analysis and related algorithms. YCS 107, Dept. of Computer Science, University of York.

[HW59] G.H. Hardy, E.M. Wright (1959). An Introduction to the Theory of Numbers – 4th Ed. Oxford University Press.

[Sa82] Don Sannella (1982). Semantics, Imlementation and Pragmatics of Clear, a Program Specification Language. PhD, University of Edinburgh.

[WB17] Weyers, B., Bowen, J., Dix, A., Palanque, P. (Eds.) (2017) The Handbook of Formal Methods in Human-Computer Interaction. Springer. ISBN 978-3-319-51838-1

[YH96] J. Young and P. Hudak (1986). Finding fixpoints on function spaces. YALEU/DCS/RR-505, Yale University, Department of Computer Science

the educational divide – do numbers matter?

If a news article is all about numbers, why is the media shy about providing the actual data?

On the BBC News website this morning James McIvor‘s article “Clash over ‘rich v poor’ university student numbers” describes differences between Scottish Government (SNP) and Scottish Labour in the wake of Professor Peter Scott appointment as commissioner for fair access to higher education in Scotland.

Scottish Labour claim that while access to university by the most deprived has increased, the educational divide is growing, with the most deprived increasing by 0.8% since 2014, but those in the least deprived (most well off) growing at nearly three times that figure.  In contrast, the Sottish Government claims that in 2006 those from the least deprived areas were 5.8 times more likely to enter university than those in the most deprived areas, whereas now the difference is only 3.9 times, a substantial decrease in educational inequality..

The article is all about numbers, but the two parties seem to be saying contradictory things, one saying inequality is increasing, one saying it is decreasing!

Surely enough to make the average reader give up on experts, just like Michael Gove!

Of course, if you can read through the confusing array of leasts and mosts, the difference seems to be that the two parties are taking different base years: 2014 vs 2006, and that both can be true: a long term improvement with decreasing inequality, but a short term increase in inequality since 2014.  The former is good news, but the latter may be bad news, a change in direction that needs addressing, or simply ‘noise’ as we are taking about small changes on big numbers.

I looked in vain for a link to the data, web sites or reports n which this was based, after all this is an article where the numbers are the story, but there are none.

After a bit of digging, I found that the data that both are using is from the UCAS Undergraduate 2016 End of Cycle Report (the numerical data for this figure and links to CSV files are below).

Figure from UCAS 2016 End of Cycle Report

Looking at these it is clear that the university participation rate for the least deprived quintile (Q5, blue line at top) has stayed around 40% with odd ups and downs over the last ten years, whereas the participation of the most deprived quintile has been gradually increasing, again with year-by-year wiggles.  That is the ratio between least and most deprived used to be about 40:7 and now about 40:10, less inequality as the SNP say.

For some reason 2014 was a dip year for the Q5.  There is no real sign of a change in the long-term trend, but if you take 2014 to 2016, the increase in Q5 is larger than the increase in Q1, just as Scottish Labour say.  However, any other year would not give this picture.

In this case it looks like Scottish Labour either cherry picked a year that made the story they wanted, or simply accidentally chose it.

The issue for me though, is not so much who was right or wrong, but why the BBC didn’t present this data to make it possible to make this judgement?

I can understand the argument that people do not like, or understand numbers at all, but where, as in this case, the story is all about the numbers, why not at least present the raw data and ideally discuss why there is an apparent contradiction!


Numerical from figure 57 of UCAS  2016 End of Cycle Report

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Q1 7.21 7.58 7.09 7.95 8.47 8.14 8.91 9.52 10.10 9.72 10.90
Q2 13.20 12.80 13.20 14.30 15.70 14.40 14.80 15.90 16.10 17.40 18.00
Q3 21.10 20.60 20.70 21.30 23.60 21.10 22.10 22.50 22.30 24.00 24.10
Q4 29.40 29.10 30.20 30.70 31.50 29.10 29.70 29.20 28.70 30.30 31.10
Q5 42.00 39.80 41.40 42.80 41.70 40.80 41.20 40.90 39.70 41.10 42.30

UCAS provide the data in CSV form.  I converted this to the above tabular form and this is available in CSV or XLSX.

the internet laws of the jungle

firefox-copyright-1Where are the boundaries between freedom, license and exploitation, between fair use and theft?

I found myself getting increasingly angry today as Mozilla Foundation stepped firmly beyond those limits, and moreover with Trump-esque rhetoric attempts to dupe others into following them.

It all started with a small text add below the Firefox default screen search box:


Partly because of my ignorance of web-speak ‘TFW‘ (I know showing my age!), I clicked through to a petition page on Mozilla Foundation (PDF archive copy here).

It starts off fine, with stories of some of the silliness of current copyright law across Europe (can’t share photos of the Eiffel tower at night) and problems for use in education (which does in fact have quite a lot of copyright exemptions in many countries).  It offers a petition to sign.

This sounds all good, partly due to rapid change, partly due to knee jerk reactions, internet law does seem to be a bit of a mess.

If you blink you might miss one or two odd parts:

“This means that if you live in or visit a country like Italy or France, you’re not permitted to take pictures of certain buildings, cityscapes, graffiti, and art, and share them online through Instagram, Twitter, or Facebook.”

Read this carefully, a tourist forbidden from photographing cityscapes – silly!  But a few words on “… and art” …  So if I visit an exhibition of an artist or maybe even photographer, and share a high definition (Nokia Lumia 1020 has 40 Mega pixel camera) is that OK? Perhaps a thumbnail in the background of a selfie, but does Mozilla object to any rules to prevent copying of artworks?


However, it is at the end, in a section labelled “don’t break the internet”, the cyber fundamentalism really starts.

“A key part of what makes the internet awesome is the principle of innovation without permission — that anyone, anywhere, can create and reach an audience without anyone standing in the way.”

Again at first this sounds like a cry for self expression, except if you happen to be an artist or writer and would like to make a living from that self-expression?

Again, it is clear that current laws have not kept up with change and in areas are unreasonably restrictive.  We need to be ale to distinguish between a fair reference to something and seriously infringing its IP.  Likewise, we could distinguish the aspects of social media that are more like looking at holiday snaps over a coffee, compared to pirate copies for commercial profit.

However, in so many areas it is the other way round, our laws are struggling to restrict the excesses of the internet.

Just a few weeks ago a 14 year old girl was given permission to sue Facebook.  Multiple times over a 2 year period nude pictures of her were posted and reposted.  Facebook hides behind the argument that it is user content, it takes down the images when they are pointed out, and yet a massive technology company, which is able to recognise faces is not able to identify the same photo being repeatedly posted. Back to Mozilla: “anyone, anywhere, can create and reach an audience without anyone standing in the way” – really?

Of course this vision of the internet without boundaries is not just about self expression, but freedom of speech:

“We need to defend the principle of innovation without permission in copyright law. Abandoning it by holding platforms liable for everything that happens online would have an immense chilling effect on speech, and would take away one of the best parts of the internet — the ability to innovate and breathe new meaning into old content.”

Of course, the petition is signalling out EU law, which inconveniently includes various provisions to protect the privacy and rights of individuals, not dictatorships or centrally controlled countries.

So, who benefits from such an open and unlicensed world?  Clearly not the small artist or the victim of cyber-bullying.

Laissez-faire has always been an aim for big business, but without constraint it is the law of the jungle and always ends up benefiting the powerful.

In the 19th century it was child labour in the mills only curtailed after long battles.

In the age of the internet, it is the vast US social media giants who hold sway, and of course the search engines, who just happen to account for $300 million of revenue for Mozilla Foundation annually, 90% of its income.


A tale of two conferences and the future of learning technology in the UK

Over the past few weeks I’ve been to two conferences focused on different aspects of technology and learning, Talis Insight Europe and ACM Learning at Scale (L@S). This led me to reflect on the potential for and barriers to ground breaking research in these areas in the UK.

The first conference, Talis Insight Europe, grew out of the original Talis User Group, but as well as company updates on existing and new products, also has an extensive line-up of keynotes by major educational visionaries and decision makers (including pretty much the complete line-up of JISC senior staff) and end-user contributed presentations.

hole-in-the-wall-Begin02The second, Learning @ Scale, grew out of the MOOC explosion, and deals with the new technology challenges and opportunities when we are dealing with vast numbers of students. It also had an impressive array of keynote speakers, including Sugata Mitra, famous for the ‘Hole in the Wall‘, which brought technology to street children in India.

Although there were some common elements (big data and dashboards got a mention in both!), the audiences were quite different. For Insight, the large majority were from HE (Higher Education) libraries, followed by learning technologists, industry representatives, and HE decision-makers. In contrast, L@S consisted largely of academics, many from computing or technical backgrounds, with some industry researchers, including, as I was attending largely with my Talis hat on, me.

insight-2016-jisc-keynoteIn a joint keynote at Insight, Paul Fieldman and Phil Richards the CEO and CIO of JISC, described the project to provide a learning analytics service [FR16,JI16] (including student app and, of course, dashboard) for UK institutions. As well as the practical benefits, they outlined a vision where the UK leads the way in educational big data for personalised learning.

Given a long track record of education and educational technology research in the UK, the world-leading distance-learning university provision of the Open University, and recent initiatives both those outlined by JISC and FutureLearn (building on the OUs vast experience), this vision seems not unreasonable.

However, on the ground at Learning @ Scale, there was a very different picture; the vast majority of papers and attendees were from the US, an this despite the conference being held in Edinburgh.

To some extent this is as one might expect. While traditional distance learning, including the OU, has class sizes that for those in face-to-face institutions feel massive; these are dwarfed by those for MOOCs, which started in the US; and it is in the US where the main MOOC players (Coursera, udacity, edX) are based. edX alone had initial funding more than ten times that available to FutureLearn, so in sheer investment terms, the balance at L@S is representative.

FutureLearn-logoHowever, Mike Sharples, long-term educational technology researcher and Academic Lead at FutureLearn, was one of the L@S keynotes [Sh16]. In his presentation it was clear that FutureLearn and UK MOOCs punch well above their weight, with retention statistics several times higher than US counterparts. While this may partly be due to topic areas, it is also a reflection of the development strategy. Mike outlined how empirically founded educational theory has driven the design of the FutureLearn platform, not least the importance of social learning. Perhaps then not surprisingly, one of the areas where FutureLearn substantially led over US counterparts was in social aspects of learning.

So there are positive signs for UK research in these areas. While JISC has had its own austerity-driven funding problems, its role as trusted intermediary and active platform creator offers a voice and forum that few, if any, other countries posses. Similarly, while FutureLearn needs to be sustainable, so has to have a certain inward focus, it does seem to offer a wonderful potential resource for collaborative research. Furthermore the open education resource (OER) community seems strong in the UK.

The Teaching Excellence Framework (TEF) [HC16,TH15] will bring its own problems, more about justifying student fee increases than education, potentially damaging education through yet more ill-informed political interference, and re-establishing class-based educational apartheid. However, it will certainly increase universities’ interest in education technology.

Set against this are challenges.

First was the topic of my own L@S work-in-progress paper – Challenge and Potential of Fine Grain, Cross-Institutional Learning Data [Dx16]. At Talis, we manage half a million reading lists, containing over 20 million resources, spread over more than 85 institutions including more than half of UK higher education. However, these institutions are all very different, and the half million courses each only may have only tens or low hundreds of students. That is very large scale in total volume, but highly heterogeneous. The JISC learning analytics repository will have exactly the same issues, and are far more difficult to deal with by machine learning or statistical analysis than the relatively homogeneous data from a single huge MOOC.


These issues of heterogeneous scale are not unique to education and ones that as a general information systems phenomena, I have been interested in for many years, and call the “long tail of small data” [Dx10,Dx15]. While this kind of data is more complex and difficult to deal with, this is of course a major research challenge, and potentially has greater long-term promise than the study of more homogeneous silos. I am finding this in my own work with musicologist [IC16,DC14], and is emerging as an issue in the natural sciences [Bo13,PC07].


Another problem is REF, the UK ‘Research Excellence Framework’. My post-hoc analysis of the REF data revealed the enormous bias in the computing sub-panel against any form of applied and human-oriented work [Dx15b,Dx15c]. Of course, this is not a new issue, just that the available data has made this more obvious and undeniable. This affects my own core research area of human–computer interaction, but also, and probably much more substantially, learning technology research. Indeed, I think most learning technologists had already sussed this out well before REF2014 as there were very few papers submitted in this area to the computing panel. I assume most research on learning technology was submitted to the education panel.

To some extent it does not matter where research is submitted and assessed; however, while in theory the mapping between university departments and submitted units is fluid for REF, in practice submitting to ‘other’ panels is problematic making it difficult to write coherent narratives about the research environment. If learning technology research is not seen as REF-able in computing, computing departments will not recruit in these areas and discourage this kind of research. While my hope is that REF2020 will not re-iterate the mistakes of REF2014, there is no guarantee of this, and anyway the effects on institutional policy will already have been felt.

However, and happily, the kinds of research needed to make sense of this large-scale heterogeneous data may well prove more palatable to a computing REF panel than more traditional small-scale learning technology. It would be wonderful to see research collaborations between those with long-term experience and understanding of educational issues, with hard-core machine learning and statistical analysis – this is BIG DATA and challenging data. Indeed one of the few UK papers at L@S involved Pearson’s London-based data analysis department, and included automatic clustering, hidden Markov models, and regression analysis.

In short, while there are barriers in the UK, there is also great potential for exciting research that is both theoretically challenging and practically useful, bringing the insights available from large-scale educational data to help individual students and academics.


[Bo13] Christine L. Borgman. Big data and the long tail: Use and reuse of little data. Oxford eResearch Centre Seminar, 12th March 2013.

[Dx10] A. Dix (2010). In praise of inconsistency – the long tail of small data. Distinguished Alumnus Seminar, University of York, UK, 26th October 2011.

[Dx15] A. Dix (2014/2015). The big story of small data. Talk at Open University, 11th November 2014; Oxford e-Research Centre, 10th July 2015; Mixed Reality Laboratory, Nottingham, 15th December 2015.

[DC14] Dix, A., Cowgill, R., Bashford, C., McVeigh, S. and Ridgewell, R. (2014). Authority and Judgement in the Digital Archive. In The 1st International Digital Libraries for Musicology workshop (DLfM 2014), ACM/IEEE Digital Libraries conference 2014, London 12th Sept. 2014.

[Dx15b] Alan Dix (2015/2016).  REF2014 Citation Analysis. accessed 8/5/2016.

[Dx15c] A. Dix (2015). Citations and Sub-Area Bias in the UK Research Assessment Process. In Workshop on Quantifying and Analysing Scholarly Communication on the Web (ASCW’15) at WebSci 2015 on June 30th in Oxford.

[Dx16]  Alan Dix (2016). Challenge and Potential of Fine Grain, Cross-Institutional Learning Data. Learning at Scale 2016. ACM.

[FR16] Paul Feldman and Phil Richards (2016).  JISC – Helping the UK become the most advanced digital teaching and research nation in the world.  Talis Insight Europe 2016.

[HC16] The Teaching Excellence Framework: Assessing Quality in Higher Education. House of Commons, Business, Innovation and Skills Committee, Third Report of Session 2015–16. HC 572.  29 February 2016.

[IC16] In Concert (2014-2016).  accessed 8/5/2016

[JI16]  Effective learning analytics. JISC, accessed   8/5/2016.

[PC07] C. L. Palmer, M. H. Cragin, P. B. Heidorn and L.C. Smith. 2007. Data curation for the long tail of science: The Case of environmental sciences. 3rd International Digital Curation Conference, Washington, DC. download/attachments/32666/Palmer_DCC2007.pdf

[Sh16]  Mike Sharples (2016).  Effective Pedagogy at Scale, Social Learning and Citizen Inquiry (keynote). Learning at Scale 2016. ACM.

[TH15] Teaching excellence framework (TEF): everything you need to know.  Times Higher Education, August 4, 2015.


Of academic communication: overload, homeostatsis and nostalgia

open-mailbox-silhouetteRevisiting on an old paper on early email use and reflecting on scholarly communication now.

About 30 years ago, I was at a meeting in London and heard a presentation about a study of early email use in Xerox and the Open University. At Xerox the use of email was already part of their normal culture, but it was still new at OU. I’d thought they had done a before and after study of one of the departments, but remembered clearly their conclusions: email acted in addition to other forms of communication (face to face, phone, paper), but did not substitute.

Gilbert-Cockton-from-IDFIt was one of those pieces of work that I could recall, but didn’t have a reference too. Facebook to the rescue! I posted about it and in no time had a series of helpful suggestions including Gilbert Cockton who nailed it, finding the meeting, the “IEE Colloquium on Human Factors in Electronic Mail and Conferencing Systems” (3 Feb 1989) and the precise paper:

Fung , T. O’Shea , S. Bly. Electronic mail viewed as a communications catalyst. IEE Colloquium on Human Factors in Electronic Mail and Conferencing Systems, , pp.1/1–1/3. INSPEC: 3381096

In some extraordinary investigative journalism, Gilbert also noted that the first author, Pat Fung, went on to fresh territory after retirement, qualifying as a scuba-diving instructor at the age of 75.

The details of the paper were not exactly as I remembered. Rather than a before and after study, it was a comparison of computing departments at Xerox (mature use of email) and OU’s (email less ingrained, but already well used). Maybe I had simply embroidered the memory over the years, or maybe they presented newer work at the colloquium, than was in the 3 page extended abstract.   In those days this was common as researchers did not feel they needed to milk every last result in a formal ‘publication’. However, the conclusions were just as I remembered:

“An exciting finding is its indication that the use of sophisticated electronic communications media is not seen by users as replacing existing methods of communicating. On the contrary, the use of such media is seen as a way of establishing new interactions and collaboration whilst catalysing the role of more traditional methods of communication.”

As part of this process following various leads by other Facebook friends, I spent some time looking at early CSCW conference proceedings, some at Saul Greenburg’s early CSCW bibliography [1] and Ducheneaut and Watts (15 years on) review of email research [2] in the 2005 HCI special issue on ‘reinventing email’ [3] (both notably missing the Fung et al. paper). I downloaded and skimmed several early papers including Wendy McKay’s lovely early (1988) study [4] that exposed the wide variety of ways in which people used email over and above simple ‘communication’. So much to learn from this work when the field was still fresh,

This all led me to reflect both on the Fung et al. paper, the process of finding it, and the lessons for email and other ‘communication’ media today.

Communication for new purposes

A key finding was that “the use of such media is seen as a way of establishing new interactions and collaboration“. Of course, the authors and their subjects could not have envisaged current social media, but the finding if this paper was exactly an example of this. In 1989 if I had been trying to find a paper, I would have scoured my own filing cabinet and bookshelves, those of my colleagues, and perhaps asked people when I met them. Nowadays I pop the question into Facebook and within minutes the advice starts to appear, and not long after I have a scanned copy of the paper I was after.

Communication as a good thing

In the paper abstract, the authors say that an “exciting finding” of the paper is that “the use of sophisticated electronic communications media is not seen by users as replacing existing methods of communicating.” Within paper, this is phrased even more strongly:

“The majority of subjects (nineteen) also saw no likelihood of a decrease in personal interactions due to an increase in sophisticated technological communications support and many felt that such a shift in communication patterns would be undesirable.”

Effectively, email was seen as potentially damaging if it replaced other more human means of communication, and the good outcome of this report was that this did not appear to be happening (or strictly subjects believed it was not happening).

However, by the mid-1990s, papers discussing ’email overload’ started to appear [5].

I recall a morning radio discussion of email overload about ten years ago. The presenter asked someone else in the studio if they thought this was a problem. Quite un-ironically, they answered, “no, I only spend a couple of hours a day”. I have found my own pattern of email change when I switched from highly structured Eudora (with over 2000 email folders), to Gmail (mail is like a Facebook feed, if it isn’t on the first page it doesn’t exist). I was recently talking to another academic who explained that two years ago he had deliberately taken “email as stream” as a policy to control unmanageable volumes.

If only they had known …

Communication as substitute

While Fung et al.’s respondents reported that they did not foresee a reduction in other forms of non-electronic communication, in fact even in the paper the signs of this shift to digital are evident.

Here are the graphs of communication frequency for the Open University (30 people, more recent use of email) and Xerox (36 people, more established use) respectively.

( from Fung et al., 1989)

( from Fung et al., 1989)

( from Fung et al., 1989)

( from Fung et al., 1989)

It is hard to draw exact comparisons as it appears there may have been a higher overall volume of communication at Xerox (because of email?).  Certainly, at that point, face-to-face communication remains strong at Xerox, but it appears that not only the proportion, but total volume of non-digital non-face-to-face communications is lower than at OU.  That is sub substitution has already happened.

Again, this is obvious nowadays, although the volume of electronic communications would have been untenable in paper (I’ve sometimes imagined printing out a day’s email and trying to cram it in a pigeon-hole), the volume of paper communications has diminished markedly. A report in 2013 for Royal Mail recorded 3-6% pa reduction in letters over recent years and projected a further 4% pa for the foreseeable future [6].

academic communication and national meetungs

However, this also made me think about the IEE Colloquium itself. Back in the late 1980s and 1990s it was common to attend small national or local meetings to meet with others and present work, often early stage, for discussion. In other fields this still happens, but in HCI it has all but disappeared. Maybe I have is a little nostalgia, but this does seem a real loss as it was a great way for new PhD students to present their work and meet with the leaders in their field. Of course, this can happen if you get your CHI paper accepted, but the barriers are higher, particularly for those in smaller and less well-resourced departments.

Some of this is because international travel is cheaper and faster, and so national meetings have reduced in importance – everyone goes to the big global (largely US) conferences. Many years ago research on day-to-day time use suggested that we have a travel ‘time budget’ reactively constant across counties and across different kinds of areas within the same country [7]. The same is clearly true of academic travel time; we have a certain budget and if we travel more internationally then we do correspondingly less nationally.

(from Zahavi, 1979)

(from Zahavi, 1979)

However, I wonder if digital communication also had a part to play. I knew about the Fung et al. paper, even though it was not in the large reviews of CSCW and email, because I had been there. Indeed, the reason that the Fung et al.paper was not cited in relevant reviews would have been because it was in a small venue and only available as paper copy, and only if you know it existed. Indeed, it was presumably also below the digital radar until it was, I assume, scanned by IEE archivists and deposited in IEEE digital library.

However, despite the advantages of this easy access to one another and scholarly communication, I wonder if we have also lost something.

In the 1980s, physical presence and co-presence at an event was crucial for academic communication. Proceedings were paper and precious, I would at least skim read all of the proceedings of any event I had been to, even those of large conferences, because they were rare and because they were available. Reference lists at the end of my papers were shorter than now, but possibly more diverse and more in-depth, as compared to more directed ‘search for the relevant terms’ literature reviews of the digital age.

And looking back at some of those early papers, in days when publish-or-perish was not so extreme, when cardiac failure was not an occupational hazard for academics (except maybe due to the Cambridge sherry allowance), at the way this crucial piece of early research was not dressed up with an extra 6000 words of window dressing to make a ‘high impact’ publication, but simply shared. Were things more fun?


[1] Saul Greenberg (1991) “An annotated bibliography of computer supported cooperative work.” ACM SIGCHI Bulletin, 23(3), pp. 29-62. July. Reprinted in Greenberg, S. ed. (1991) “Computer Supported Cooperative Work and Groupware”, pp. 359-413, Academic Press. DOI:

[2] Nicolas Ducheneaut and Leon A. Watts (2005). In search of coherence: a review of e-mail research. Hum.-Comput. Interact. 20, 1 (June 2005), 11-48. DOI= 10.1080/07370024.2005.9667360

[3] Steve Whittaker, Victoria Bellotti, and Paul Moody (2005). Introduction to this special issue on revisiting and reinventing e-mail. Hum.-Comput. Interact. 20, 1 (June 2005), 1-9.

[4] Wendy E. Mackay. 1988. More than just a communication system: diversity in the use of electronic mail. In Proceedings of the 1988 ACM conference on Computer-supported cooperative work (CSCW ’88). ACM, New York, NY, USA, 344-353. DOI=

[5] Steve Whittaker and Candace Sidner (1996). Email overload: exploring personal information management of email. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’96), Michael J. Tauber (Ed.). ACM, New York, NY, USA, 276-283. DOI=

[6] The outlook for UK mail volumes to 2023. PwC prepared for Royal Mail Group, 15 July 2013 The%20outlook%20for%20UK%20mail%20volumes%20to%202023.pdf

[7] Yacov Zahavi (1979). The ‘UMOT’ Project. Prepared For U.S. Department Of Transportation Ministry Of Transport and Fed. Rep. Of Germany.

principles vs guidelines

I was recently asked to clarify the difference between usability principles and guidelines.  Having written a page-full of answer, I thought it was worth popping on the blog.

As with many things the boundary between the two is not absolute … and also the term ‘guidelines’ tends to get used differently at different times!

However, as a general rule of thumb:

  • Principles tend to be very general and would apply pretty much across different technologies and systems.
  • Guidelines tend to be more specific to a device or system.

As an example of the latter, look at the iOS Human Interface Guidelines on “Adaptivity and Layout”   It starts with a general principle:

“People generally want to use their favorite apps on all their devices and in multiple contexts”,

but then rapidly turns that into more mobile specific, and then iOS specific guidelines, talking first about different screen orientations, and then about specific iOS screen size classes.

I note that the definition on page 259 of Chapter 7 of the HCI textbook is slightly ambiguous.  When it says that guidelines are less authoritative and more general in application, it means in comparison to standards … although I’d now add a few caveats for the latter too!

Basically in terms of ‘authority’, from low to high:

lowest principles agreed by community, but not mandated
guidelines proposed by manufacture, but rarely enforced
highest standards mandated by standards authority

In terms of general applicability, high to low:

highest principles very broad e.g. ‘observability’
guidelines more specific, but still allowing interpretation
lowest standards very tight

This ‘generality of application’ dimension is a little more complex as guidelines are often manufacturer specific so arguably less ‘generally applicable’ than standards, but the range of situations that standard apply to is usually much tighter.

On the whole the more specific the rules, the easier they are to apply.  For example, the general principle of observability requires that the designer think about how it applies in each new application and situation. In contrast, a more specific rule that says, “always show the current editing state in the top right of the screen” is easy to apply, but tells you nothing about other aspects of system state.

Human-Like Computing

Last week I attended an EPSRC workshop on “Human-Like Computing“.

The delegate pack offered a tentative definition:

“offering the prospect of computation which is akin to that of humans, where learning and making sense of information about the world around us can match our human performance.” [E16]

However, the purpose of this workshop was to clarify, and expand on this, exploring what it might mean for computers to become more like humans.

It was an interdisciplinary meeting with some participants coming from more technical disciplines such as cognitive science, artificial intelligence, machine learning and Robotics; others from psychology or studying human and animal behaviour; and some, like myself, from HCI or human factors, bridging the two.


Perhaps the first question is why one might even want more human-like computing.

There are two obvious reasons:

(i) Because it is a good model to emulate — Humans are able to solve some problems, such as visual pattern finding, which computers find hard. If we can understand human perception and cognition, then we may be able to design more effective algorithms. For example, in my own work colleagues and I have used models based on spreading activation and layers of human memory when addressing ‘web scale reasoning’ [K10,D10].

robot-3-clip-sml(ii) For interacting with people — There is considerable work in HCI in making computers easier to use, but there are limitations. Often we are happy for computers to be simply ‘tools’, but at other times, such as when your computer notifies you of an update in the middle of a talk, you wish it had a little more human understanding. One example of this is recent work at Georgia Tech teaching human values to artificial agents by reading them stories! [F16]

To some extent (i) is simply the long-standing area of nature-inspired or biologically-inspired computing. However, the combination of computational power and psychological understanding mean that perhaps we are the point where new strides can be made. Certainly, the success of ‘deep learning’ and the recent computer mastery of Go suggest this. In addition, by my own calculations, for several years the internet as a whole has had more computational power than a single human brain, and we are very near the point when we could simulate a human brain in real time [D05b].

Both goals, but particularly (ii), suggest a further goal:

(iii) new interaction paradigms — We will need to develop new ways to design for interacting with human-like agents and robots, not least how to avoid the ‘uncanny valley’ and how to avoid the appearance of over-competence that has bedevilled much work in this broad area. (see more later)

Both goals also offer the potential for a fourth secondary goal:

(iv) learning about human cognition — In creating practical computational algorithms based in human qualities, we may come to better understand human behaviour, psychology and maybe even society. For example, in my own work on modelling regret (see later), it was aspects of the computational model that highlighted the important role of ‘positive regret’ (“the grass is greener on the other side”) to hep us avoid ‘local minima’, where we stick to the things we know and do not explore new options.

Human or superhuman?

Of course humans are not perfect, do we want to emulate limitations and failings?

For understanding humans (iv), the answer is probably “yes”, and maybe by understanding human fallibility we may be in a better position to predict and prevent failures.

Similarly, for interacting with people (ii), the agents should show at least some level of human limitations (even if ‘put on’); for example, a chess program that always wins would not be much fun!

However, for simply improving algorithms, goal (i), we may want to get the ‘best bits’, from human cognition and merge with the best aspects of artificial computation. Of course it maybe that the frailties are also the strengths, for example, the need to come to decisions and act in relatively short timescales (in terms of brain ‘ticks’) may be one way in which we avoid ‘over learning’, a common problem in machine learning.

In addition, the human mind has developed to work with the nature of neural material as a substrate, and the physical world, both of which have shaped the nature of human cognition.

Very simple animals learn purely by Skinner-like response training, effectively what AI would term sub-symbolic. However, this level of learning require many exposures to similar stimuli. For more rare occurrences, which do not occur frequently within a lifetime, learning must be at the, very slow pace of genetic development of instincts. In contrast, conscious reasoning (symbolic processing) allows us to learn through a single or very small number of exposures; ideal for infrequent events or novel environments.

Big Data means that computers effectively have access to vast amounts of ‘experience’, and researchers at Google have remarked on the ‘Unreasonable Effectiveness of Data’ [H09] that allows problems, such as translation, to be tackled in a statistical or sub-symbolic way which previously would have been regarded as essentially symbolic.

Google are now starting to recombine statistical techniques with more knowledge-rich techniques in order to achieve better results again. As humans we continually employ both types of thinking, so there are clear human-like lessons to be learnt, but the eventual system will not have the same ‘balance’ as a human.

If humans had developed with access to vast amounts of data and maybe other people’s experience directly (rather than through culture, books, etc.), would we have developed differently? Maybe we would do more things unconsciously that we do consciously. Maybe with enough experience we would never need to be conscious at all!

More practically, we need to decide how to make use of this additional data. For example, learning analytics is becoming an important part of educational practice. If we have an automated tutor working with a child, how should we make use of the vast body of data about other tutors interactions with other children?   Should we have a very human-like tutor that effectively ‘reads’ learning analytics just as a human tutor would look at a learning ‘dashboard’? Alternatively, we might have a more loosely human-inspired ‘hive-mind’ tutor that ‘instinctively’ makes pedagogic choices based on the overall experience of all tutors, but maybe in an unexplainable way?

What could go wrong …

There have been a number of high-profile statements in the last year about the potential coming ‘singularity’ (when computers are clever enough to design new computers leading to exponential development), and warnings that computers could become sentient, Terminator-style, and take over.

There was general agreement at the workshop this kind of risk was overblown and that despite breakthroughs, such as the mastery of Go, these are still very domain limited. It is many years before we have to worry about even general intelligence in robots, let alone sentience.

A far more pressing problem is that of incapable computers, which make silly mistakes, and the way in which people, maybe because of the media attention to the success stories, assume that computers are more capable than they are!

Indeed, over confidence in algorithms is not just a problem for the general public, but also among computing academics, as I found in my personal experience on the REF panel.

There are of course many ethical and legal issues raised as we design computer systems that are more autonomous. This is already being played out with driverless cars, with issues of insurance and liability. Some legislators are suggesting allowing driverless cars, but only if there is a drive there to take control … but if the car relinquishes control, how do you safely manage the abrupt change?

Furthermore, while the vision of autonomous robots taking over the world is still far fetched; more surreptitious control is already with us. Whether it is Uber cabs called by algorithm, or simply Google’s ranking of search results prompting particular holiday choices, we all to varying extents doing “what the computer tells us”. I recall in the Dalek Invasion of Earth, the very un-human-like Daleks could not move easily amongst the rubble of war-torn London. Instead they used ‘hypnotised men’ controlled by some form of neural headset. If the Daleks had landed today and simply taken over or digitally infected a few cloud computing services would we know?


Sometimes it is sufficient to have a ‘black box’ that makes decisions and acts. So long as it works we are happy. However, a key issue for many ethical and legal issues, but also for practical interaction, is the ability to be able to interrogate a system, so seek explanations of why a decision has been made.

Back in 1992 I wrote about these issues [D92], in the early days when neural networks and other forms of machine learning were being proposed for a variety of tasks form controlling nuclear fusion reactions to credit scoring. One particular scenario, was if an algorithm were used to pre-sort large numbers of job applications. How could you know whether the algorithms were being discriminatory? How could a company using such algorithms defend themselves if such an accusation were brought?

One partial solution then, as now, was to accept underlying learning mechanisms may involve emergent behaviour form statistical, neural network or other forms of opaque reasoning. However, this opaque initial learning process should give rise to an intelligible representation. This is rather akin to a judge who might have a gut feeling that a defendant is guilty or innocent, but needs to explicate that in a reasoned legal judgement.

This approach was exemplified by Query-by-Browsing, a system that creates queries from examples (using a variant of ID3), but then converts this in SQL queries. This was subsequently implemented [D94] , and is still running as a web demonstration.

For many years I have argued that it is likely that our ‘logical’ reasoning arises precisely form this need to explain our own tacit judgement to others. While we simply act individually, or by observing the actions of others, this can be largely tacit, but as soon as we want others to act in planned collaborate ways, for example to kill a large animal, we need to convince them. Once we have the mental mechanisms to create these explanations, these become internalised so that we end up with internal means to question our own thoughts and judgement, and even use them constructively to tackle problems more abstract and complex than found in nature. That is dialogue leads to logic!


We split into groups and discussed scenarios as a means to understand the potential challenges for human-like computing. Over multiple session the group I was in discussed one man scenario and then a variant.

Paramedic for remote medicine

The main scenario consisted of a patient far form a central medical centre, with an intelligent local agent communicating intermittently and remotely with a human doctor. Surprisingly the remote aspect of the scenario was not initially proposed by me thinking of Tiree, but by another member of the group thinking abut some of the remote parts of the Scottish mainland.

The local agent would need to be able communicate with the patient, be able to express a level of empathy, be able to physically examine (needing touch sensing, vision), and discuss symptoms. On some occasions, like a triage nurse, the agent might be sufficiently certain to be able to make a diagnosis and recommend treatment. However, at other times it may need to pass on to the remote doctor, being able to describe what had been done in terms of examination, symptoms observed, information gathered from the patient, in the same way that a paramedic does when handing over a patient to the hospital. However, even after the handover of responsibility, the local agent may still form part of the remote diagnosis, and maybe able to take over again once the doctor has determined an overall course of action.

The scenario embodied many aspects of human-like computing:

  • The agent would require a level of emotional understanding to interact with the patient
  • It would require fine and situation contingent robotic features to allow physical examination
  • Diagnosis and decisions would need to be guided by rich human-inspired algorithms based on large corpora of medical data, case histories and knowledge of the particular patient.
  • The agent would need to be able to explain its actions both to the patient and to the doctor. That is it would not only need to transform its own internal representations into forms intelligible to a human, but do so in multiple ways depending on the inferred knowledge and nature of the person.
  • Ethical and legal responsibility are key issues in medical practice
  • The agent would need to be able manage handovers of control.
  • The agent would need to understand its own competencies in order to know when to call in the remote doctor.

The scenario could be in physical or mental health. The latter is particularly important given recent statistics, which suggested only 10% of people in the UK suffering mental health problems receive suitable help.


As a more specific scenario still, one fog the group related how he had been to an experienced physiotherapist after a failed diagnosis by a previous physician. Rather than jumping straight into a physical examination, or even apparently watching the patient’s movement, the physiotherapist proceeded to chat for 15 minutes about aspects of the patient’s life, work and exercise. At the end of this process, the physiotherapist said, “I think I know the problem”, and proceeded to administer a directed test, which correctly diagnosed the problem and led to successful treatment.

Clearly the conversation had given the physiotherapist a lot of information about potential causes of injury, aided by many years observing similar cases.

To do this using an artificial agent would suggest some level of:

  • theory/model of day-to-day life

Thinking about the more conversational aspects of this I was reminded of the PhD work of Ramanee Peiris [P97]. This concerned consultations on sensitive subjects such as sexual health. It was known that when people filled in (initially paper) forms prior to a consultation, they were more forthcoming and truthful than if they had to provide the information face-to-face. This was even if the patient knew that the person they were about to see would read the forms prior to the consultation.

Ramanee’s work extended this first to electronic forms and then to chat-bot style discussions which were semi-scripted, but used simple textual matching to determine which topics had been covered, including those spontaneously introduced by the patient. Interestingly, the more human like the system became the more truthful and forthcoming the patients were, even though they were less so wit a real human.

As well as revealing lessons for human interactions with human-like computers, this also showed that human-like computing may be possible with quite crude technologies. Indeed, even Eliza was treated (to Weizenbaum’s alarm) as if it really were a counsellor, even though people knew it was ‘just a computer’ [W66].

Cognition or Embodiment?

I think it fair to say that the overall balance, certainly in the group I was in, was towards the cognitivist: that is more Cartesian approach starting with understanding and models of internal cognition, and then seeing how these play out with external action. Indeed, the term ‘representation’ used repeatedly as an assumed central aspect of any human-like computing, and there was even talk of resurrecting Newells’s project for a ‘unified theory of cognition’ [N90]

There did not appear to be any hard-core embodiment theorist at the workshops, although several people who had sympathies. This was perhaps as well as we could easily have degenerated into well rehearsed arguments for an against embodiment/cognition centred explanations … not least about the critical word ‘representation’.

However, I did wonder whether a path that deliberately took embodiment centrally would be valuable. How many human-like behaviours could be modelled in this way, taking external perception-action as central and only taking on internal representations when they were absolutely necessary (Alan Clark’s 007 principle) [C98].

Such an approach would meet limits, not least the physiotherapist’s 25 minute chat, but I would guess would be more successful over a wider range of behaviours and scenarios then we would at first think.

Human–Computer Interaction and Human-Like Computing

Both Russell and myself were partly there representing our own research interest, but also more generally as part of the HCI community looking at the way human-like computing would intersect exiting HCI agendas, or maybe create new challenges and opportunities. (see poster) It was certainly clear during the workshop that there is a substantial role for human factors from fine motor interactions, to conversational interfaces and socio-technical systems design.

Russell and I presented a poster, which largely focused on these interactions.


There are two sides to this:

  • understanding and modelling for human-like computing — HCI studies and models complex, real world, human activities and situations. Psychological experiments and models tend to be very deep and detailed, but narrowly focused and using controlled, artificial tasks. In contrast HCI’s broader, albeit more shallow, approach and focus on realistic or even ‘in the wild’ tasks and situations may mean that we are in an ideal position to inform human-like computing.

human interfaces for human-like computing — As noted in goal (iii) we will need paradigms for humans to interact with human-like computers.

As an illustration of the first of these, the poster used my work on making sense of the apparently ‘bad’ emotion of regret [D05] .

An initial cognitive model of regret was formulated involving a rich mix of imagination (in order to pull past events and action to mind), counter-factual modal reasoning (in order to work out what would have happened), emption (which is modified to feel better or worse depending on the possible alternative outcomes), and Skinner-like low-level behavioural learning (the eventual purpose of regret).


This initial descriptive and qualitative cognitive model was then realised in a simplified computational model, which had a separate ‘regret’ module which could be plugged into a basic behavioural learning system.   Both the basic system and the system with regret learnt, but the addition of regret did so with between 5 and 10 times fewer exposures.   That is, the regret made a major improvement to the machine learning.


Turning to the second. Direct manipulation has been at the heart of interaction design since the PC revolution in the 1980s. Prior to that command line interfaces (or worse job control interfaces), suggested a mediated paradigm, where operators ‘asked’ the computer to do things for them. Direct manipulation changed that turning the computer into a passive virtual world of computational objects on which you operated with the aid of tools.

To some extent we need to shift back to the 1970s mediated paradigm, but renewed, where the computer is no longer like an severe bureaucrat demanding the precise grammatical and procedural request; but instead a helpful and understanding aide. For this we can draw upon existing areas of HCI such as human-human communications, intelligent user interfaces, conversational agents and human–robot interaction.


[C98] Clark, A. 1998. Being There: Putting Brain, Body and the World Together Again. MIT Press.

[D92] A. Dix (1992). Human issues in the use of pattern recognition techniques. In Neural Networks and Pattern Recognition in Human Computer Interaction Eds. R. Beale and J. Finlay. Ellis Horwood. 429-451.

[D94] A. Dix and A. Patrick (1994). Query By Browsing. Proceedings of IDS’94: The 2nd International Workshop on User Interfaces to Databases, Ed. P. Sawyer. Lancaster, UK, Springer Verlag. 236-248.

[D05] Dix, A..(2005).  The adaptive significance of regret. (unpublished essay, 2005)

[D05b] A. Dix (2005). the brain and the web – a quick backup in case of accidents. Interfaces, 65, pp. 6-7. Winter 2005.

[D10] A. Dix, A. Katifori, G. Lepouras, C. Vassilakis and N. Shabir (2010). Spreading Activation Over Ontology-Based Resources: From Personal Context To Web Scale Reasoning. Internatonal Journal of Semantic Computing, Special Issue on Web Scale Reasoning: scalable, tolerant and dynamic. 4(1) pp.59-102.

[E16] EPSRC (2016). Human Like Computing Hand book. Engineering and Physical Sciences Research Council. 17 – 18 February 2016

[F16] Alison Flood (2016). Robots could learn human values by reading stories, research suggests. The Guardian, Thursday 18 February 2016

[H09] Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The Unreasonable Effectiveness of Data. IEEE Intelligent Systems 24, 2 (March 2009), 8-12. DOI=10.1109/MIS.2009.36

[K10] A. Katifori, C. Vassilakis and A. Dix (2010). Ontologies and the Brain: Using Spreading Activation through Ontologies to Support Personal Interaction. Cognitive Systems Research, 11 (2010) 25–41.

[N90] Allen Newell. 1990. Unified Theories of Cognition. Harvard University Press, Cambridge, MA, USA.

[P97] DR Peiris (1997). Computer interviews: enhancing their effectiveness by simulating interpersonal techniques. PhD Thesis, University of Dundee.

[W66] Joseph Weizenbaum. 1966. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 9, 1 (January 1966), 36-45. DOI=

REF Redux 6 — Reasons and Remedies

This, the last of my series of posts on post-REF analysis, asks what went wrong and what could be done to improve things in future.

Spoiler: a classic socio-technical failure story: compromising the quality of human processes in order to feed an algorithm

As I’ve noted multiple times, the whole REF process and every panel member was focused around fairness and transparency, and yet still the evidence is that quite massive bias emerged. This is evident in my own analysis of sub-area and institutional differences, and also in HEFCE’s own report, which highlighted gender differences.

Summarising some of the effects we have seen in previous posts:

  1. sub-areas: When you rank outputs within their own areas worldwide: theoretical papers ranked in the top 5% (top 1 in 20) worldwide get a 4* of whereas those in more applied human/centric papers need to be in the top 0.5% (top 1 in 200) – a ten-fold difference (REF Redux 2)
  2. institutions: Outputs that appear equivalent in terms of citation are ranked more highly in Russell Group universities compared with other old (pre-1992) universities, and both higher than new (post-1992) universities.  If two institutions have similar citation profiles, the Russell Group one, on average, would receive 2-3 times more money per member of staff than the equivalent new university (REF Redux 4)
  3. gender: A male academic in computing is 33% more likely to get a 4* then a female academic, and this effect persists even when other factors considered (HEFCE report “The Metric Tide”). Rather than explicit bias, I believe this is likely to be an implicit bias due to the higher proportions of women in sub-areas disadvantaged by REF (REF Redux 5)

These are all quite shocking results, not so much that the differences are there, but because of the size.

Before being a computer scientist I was trained as a statistician.  In all my years both as a professional statistician, and subsequently as a HCI academic engaged in or reviewing empirical work, I have never seen effect sizes this vast.

What went wrong?

Note that this analysis is all for sub-panel 11 Computer Science and Informatics. Some of the effects (in particular institutional bias) are probably not confined to this panel; however, there are special factors in the processes we used in computing which are likely to have exacerbated latent bias in general and sub-area bias in particular.

As a computing panel, we of course used algorithms!

The original reason for asking submissions to include an ACM sub-area code was to automate reviewer allocation. This meant that while other panel chairs were still starting their allocation process, SP11 members already had their full allocations of a thousand or so outputs a piece. Something like 21,000 output allocations at the press of a button. Understandably this was the envy of other panels!

We also used algorithms for normalisation of panel members’ scores. Some people score high, some score low, some bunch towards the middle with few high and few low scores, and some score too much to the extremes.

This is also the envy of many other panel members. While we did discuss scores on outputs where we varied substantially, we did not spend the many hours debating whether a particular paper was 3* or 4*, or trying to calibrate ourselves precisely — the algorithm does the work. Furthermore the process is transparent (we could even open source the code) and defensible — it is all in the algorithm, no potentially partisan decisions.

Of course such an algorithm cannot simply compare each panel member with the average as some panel members might have happened to have better or worse set of outputs to review than others. In order to work there has to be sufficient overlap between panel members’ assessments so that they can be robustly compared. In order to achieve this overlap we needed to ‘spread our expertise’ for the assignment process, so that we reviewed more papers slightly further from our core area of competence.

Panels varies substantially in the way they allocated outputs to reviewers. In STEM areas the typical output was an article of, say, 8–10 pages; whereas in the humanities often books or portfolios; in performing arts there might even be a recording of a performance taking hours. Clearly the style of reviewing varied. However most panels tried to assign two expert panelists to each output. In computing we had three assessors per output, compared to two in many areas (and in one sub-panel a single assessor per output). However, because of the expertise spreading this meant typically one expert and two more broad assessors per output.

For example, my own areas of core competence (Human-centered computing / Visualization and Collaborative and social computing) had between them 700 outputs, and were two others assessors with strong knowledge in these areas. However, of over 1000 outputs I assessed, barely one in six (170) were in these areas, that is only 2/3 more than if the allocation had been entirely random.

Assessing a broad range of computer science was certainly interesting, and I feel I came away with an understanding of the current state of UK computing that I certainly did not have before. Also having a perspective from outside a core area is very valuable especially in assessing the significance of work more broadly within the discipline.

This said the downside is that the vast majority of assessments were outside our core areas, and it is thus not so surprising that default assessments (aka bias) become a larger aspect of the assessment. This is particularly problematic when there are differences in methodology; whereas it is easy to look at a paper with mathematical proofs in it and think “that looks rigorous”, it is hard for someone not used to interpretative methodologies to assess, for example, ethnography.

If the effects were not so important, it is amusing to imagine the mathematics panel with statisticians, applied and pure mathematicians assessing each others work, or indeed, if formal computer science were assessed by a pure mathematicians.

Note that the intentions were for the best trying to make the algorithm work as well as possible; but the side effect was to reduce the quality of the human process that fed the algorithm. I recall the first thing I ever learnt in computing was the mantra, “garbage in — garbage out”.

Furthermore, the assumption underlying the algorithm was that while assessors differed in their severity/generosity of marking and their ‘accuracy’ of marking, they were all equally good at all assessments. While this might be reasonable if we all were mainly marking within our own competence zone, this is clearly not valid given the breadth of assessment.  That is the fundamental assumptions of the algorithm were broken.

This is a classic socio-technical failure story: in an effort to ‘optimise’ the computational part of the system, the overall human–computer system was compromised. It is reasonable for those working in more purely computational areas to have missed this; however, in retrospect, those of us with a background in this sort of issue should have foreseen problems (John 9:41), mea culpa.  Indeed, I recall that I did have reservations, but had hoped that any bad effects would average out given so many points of assessment.  It was only seeing first Morris Sloman’s analysis and then the results of my own that I realised quite how bad the distortions had been.

I guess we fell prey to another classic systems failure: not trialling, testing or prototyping a critical system before using it live.

What could be done better?

Few academics are in favour of metrics-only systems for research assessment, and, rather like democracy, it may be that the human-focused processes of REF are the worst possible solution apart from all the alternatives.

I would certainly have been of that view until seeing in detail the results outlined in this series. However, knowing what I do now, if there were a simple choice for the next REF of what we did and a purely metrics-based approach, I would vote for the latter. In every way that a pure metrics based approach would be bad for the discipline, our actual process was worse.

However, the choice is not simply metrics vs human assessment.

In computing we used a particular combination of algorithm and human processes that amplified rather than diminished the effects of latent bias. This will have been particularly bad for sub-areas where differences in methodology lead to asymmetric biases. However, it is also likely to have amplified institution bias effects as when assessing areas far from one’s own expertise it is more likely that default cues, such as the ‘known’ quality of the institution, will weigh strongly.

Clearly we need to do this differently next time, and other panels definitely ought not to borrow SP11’s algorithms without substantial modification.

Maybe it is possible to use metrics-based approaches to feed into a human process in a way that is complimentary. A few ideas could be:

  1. metrics for some outputs — for example we could assess older journal and conference outputs using metrics, combined with human assessment for newer or non-standard outputs
  2. metrics as under-girding – we could give outputs an initial grade based on metrics, which is then altered after reading, but where there is a differential burden of proof — easy to raise a grade (e.g. because of badly chosen venue for strong paper), but hard to bring it down (more exceptional reasons such as citations saying “this paper is wrong”)
  3. metrics for in-process feedback — a purely human process as we had, but part way through calculate the kinds of profiles for sub-areas and institutions that I calculated in REF Redux 2, 3 and 4. At this point the panel would be able to decide what to do about anomalous trends, for example, individually examine examples of outputs.

There are almost certainly other approaches, the critical thing is that we must do better than last time.

level of detail – scale matters

We get used to being able to zoom into every document picture and map, but part of the cartographer’s skill is putting the right information at the right level of detail.  If you took area maps and then scaled them down, they would not make a good road atlas, the main motorways would hardly be visible, and the rest would look like a spider had walked all over it.  Similarly if you zoom into a road atlas you would discover the narrow blue line of each motorway is in fact half a mile wide on the ground.

Nowadays we all use online maps that try to do this automatically.  Sometimes this works … and sometimes it doesn’t.

Here are three successive views of Google maps focused on Bournemouth on the south coast of England.

On the first view we see Bournemouth clearly marked, and on the next, zooming in a little Poole, Christchurch and some smaller places also appear.  So far, so good, as we zoom in more local names are shown as well as the larger place.

bournemouth-1  bournemouth-2

However, zoom in one more level and something weird happens, Bournemouth disappears.  Poole and Christchurch are there, but no  Bournemouth.


However, looking at the same level scale on another browser, Bournemouth is there still:


The difference between the two is the Hotel Miramar.  On the first browser I am logged into Google mail, and so Google ‘knows’ I am booked to stay in the Hotel Miramar (presumably by scanning my email), and decides to display this also.   The labels for Bournemouth and the hotel label overlap, so Google simply omitted the Bournemouth one as less important than the hotel I am due to stay in.

A human map maker would undoubtedly have simply shifted the name ‘Bournemouth’ up a bit, knowing that it refers to the whole town.  In principle, Google maps could do the same, but typically geocoding (e.g. Geonames) simply gives a point for each location rather than an area, so it is not easy for the software to make adjustments … except Google clearly knows it is ‘big’ as it is displayed on the first, zoomed out, view; so maybe it could have done better.

This problem of overlapping legends will be familiar to anyone involved in visualisation whether map based or more abstract.


The image above is the original Cone Tree hierarchy browser developed by Xerox PARC in the early 1990s1.  This was the early days of interactive 3D visualisation, and the Cone Tree exploited many of the advantages such as a larger effective ‘space’ to place objects, and shadows giving both depth perception, but also a level of overview.  However, there was no room for text labels without them all running over each other.

Enter the Cam Tree:


The Cam Tree is identical to the cone tree, except because it is on its side it is easier to place labels without them overlapping 🙂

Of course, with the Cam Tree the regularity of the layout makes it easy to have a single solution.  The problem with maps is that labels can appear anywhere.

This is an image of a particularly cluttered part of the Frasan mobile heritage app developed for the An Iodhlann archive on Tiree.  Multiple labels overlap making them unreadable.  I should note that the large number of names only appear when the map is zoomed in, but when they do appear, there are clearly too many.


It is far from clear how to deal with this best.  The Google solution was simply to not show some things, but as we’ve seen that can be confusing.

Another option would be to make the level of detail that appears depend not just on the zoom, but also the local density.  In the Frasan map the locations of artefacts are not shown when zoomed out and only appear when zoomed in; it would be possible for them to appear, at first, only in the less cluttered areas, and appear in more busy areas only when the map is zoomed in sufficiently for them to space out.   This would trade clutter for inconsistency, but might be worthwhile.  The bigger problem would be knowing whether there were more things to see.

Another solution is to group things in busy areas.  The two maps below are from house listing sites.  The first is Rightmove which uses a Google map in its map view.  Note how the house icons all overlap one another.  Of course, the nature of houses means that if you zoom in sufficiently they start to separate, but the initial view is very cluttered.  The second is; note how some houses are shown individually, but when they get too close they are grouped together and just the number of houses in the group shown.

rightmove-houses  daft-ie-house-site

A few years ago, Geoff Ellis and I reviewed a number of clutter reduction techniques2, each with advantages and disadvantages, there is no single ‘best’ answer. The grouping solution is for icons, which are fixed size and small, the text label layout problem is far harder!

Maybe someday these automatic tools will be able to cope with the full variety of layout problems that arise, but for the time being this is one area where human cartographers still know best.

  1. Robertson, G. G. ; Mackinlay, J. D. ; Card, S. K. Cone Trees: animated 3D visualizations of hierarchical informationProceedings of the ACM Conference on Human Factors in Computing Systems (CHI ’91); 1991 April 27 – May 2; New Orleans; LA. NY: ACM; 1991; 189-194.[back]
  2. Geoffrey Ellis and Alan Dix. 2007. A Taxonomy of Clutter Reduction for Information VisualisationIEEE Transactions on Visualization and Computer Graphics 13, 6 (November 2007), 1216-1223. DOI=10.1109/TVCG.2007.70535[back]